1. Overview

The SRCH2 engine can be customized using a configuration file in XML. This documentation describes the details of how to do this configuration. The package includes a sample configuration file. Each parameter in the file can have one of the following two labels: "Required" means this tag must be present in the configuration file, and "Optional" means this tag does not need to be present. Tag names and tag attribute names (but not values) are case insensitive.

2. Home Directory (Required)


This parameter specifies the installation folder of the SRCH2 engine, which includes the binary, a folder "example-demo" with a movie example, and other files needed by the engine. Other file paths used in the configuration should be relative to this home path. For example, if we give the engine a data file at "/usr/joe/srch2/example-data/movie-data.json", and the SRCH2 home directory is set to "/usr/joe/srch2", then the configuration file should reference the data file's location as "example-data/movie-data.json".

3. Hostname/Ports (Required)


This parameter is the network identification (i.e., IP address or host name) of the server on which the SRCH2 engine is listening. NOTE: If you want the server to accept all incoming packets, please use

The following parameters allow certain operations on different ports. This feature allows network-based security to be implemented using standard firewall functionality. Typically, the "search" and "info" operations remain on a port accessible from any IP address. Operations such as "suggest", "docs", "update", "save", "export", and "resetLogger" are placed on a port to which access is restricted by a firewall to internal IP addresses. An example of relocating ports is provided in the advanced example configuration file in example-demo/srch2-config-advanced.xml.

3.1 SearchPort (Optional)

This parameter overrides "listeningPort" for "search" requests. "Search" requests must be sent to this port instead.

3.2 SuggestPort (Optional)

This parameter overrides "listeningPort" for "suggest" requests. "Suggest" requests must be sent to this port instead.

3.3 InfoPort (Optional)

This parameter overrides "listeningPort" for "info" requests. "Info" requests must be sent to this port instead.

3.4 DocsPort (Optional)

This parameter overrides "listeningPort" for "docs" requests. "Docs" requests must be sent to this port instead.

3.5 UpdatePort (Optional)

This parameter overrides "listeningPort" for "update" requests. "Update" requests must be sent to this port instead.

3.6 SavePort (Optional)

This parameter overrides "listeningPort" for "save" requests. "Save" requests must be sent to this port instead.

3.7 ExportPort (Optional)

This parameter overrides "listeningPort" for "export" requests. "Export" requests must be sent to this port instead.

3.8 ResetLoggerPort (Optional)

This parameter overrides "listeningPort" for "resetlogger" requests. "ResetLogger" requests must be sent to this port instead.

4. Data Directory (Required)


This value specifies the path of a folder where the engine stores serialized index files.

5. Thread Number (Optional)


This setting specifies the number of threads that serve search requests. Its default value is 1.

6. Data

6.1. Data Source (Optional)


This value specifies the type of data source indexed by the engine. Notice that if there are previous indexes already created and serialized on disk, the engine will always load them and ignore the flag. The following flags are used only when there are no previous indexes on disk.

6.2. Data File (Required if "dataSourceType" is 1)


This value specifies a JSON data file to be loaded by the engine. Notice that it is a path relative to the home directory.

6.3. Database (Required if "dataSourceType" is 2)

For MongoDB:

            <dbKeyValue key="host" value="" />
            <dbKeyValue key="port" value="27017" />
            <dbKeyValue key="db" value="demo" />
            <dbKeyValue key="collection" value="movies" />
            <dbKeyValue key="listenerWaitTime" value="3" />

For SQLite:

            <dbKeyValue key="db" value="demo.db" />
            <dbKeyValue key="dbPath" value="." />
            <dbKeyValue key="tableName" value="COMPANY" />
            <dbKeyValue key="listenerWaitTime" value="3" />

This section specifies database instance information. Note: "listenerWaitTime" specifies how frequently (in seconds) the SRCH2 engine pulls changes from the database instance (e.g., every 3 seconds).

7. Index Config

This tag has parameters to customize the search engine's indexes.


7.1. Field Boosts (Optional)


This parameter is used to increase or decrease the weight of fields when calculating the ranking of search results. It specifies boost values for the searchable attributes and has a range of [1,100]. If the value is outside the range, it will be set to 1. For example, the boost values for two searchable attributes can be specified as:

 <fieldBoost>title^2 director^1</fieldBoost>

This parameter, together with the following parameters, is used in the ranking function of the engine. They are grouped under the "indexConfig" tag because they are used to compute static scores during the construction of the indexes, which happens only once when the search engine is started. There are also parameters under the "query" tag that are used to compute query-specific scores.

In the example above, the field "title" has a boost value of 1. Notice that the fields are separated by a single white space. If a value is not provided for a field, the engine uses a default value of 1.

7.2. Record Boost (Optional)


This setting specifies an attribute whose value is used as the score (boost value) of a record. This attribute should be refining and of type float.

7.3. Default Term Boost (Optional)


The value of this field specifies default ranking value for terms that do not have the field specified in "recordBoostField". If the parameter is not given, the engine uses 1 as the default value.

7.4. Enable Positional Index (Optional)

A positional index supports queries with conditions related to keyword positions in a record, such as proximity search. The "enablePositionIndex" tag is used to enable or disable positional indexing. For instance, the following example shows how to enable this indexing:


Note that the positional index takes more memory resources. By default it is disabled.

7.5. Enable Character Offset Index (Optional)

A character offset index supports fast keyword highlighting during the snippet generation. The "enableCharOffsetIndex" tag can be used to enable or disable character offset indexing. For instance, the following example shows how to enable this indexing:


Note that the character offset index takes more memory resources. By default it is disabled.

8. Query Parameters

The following tag includes parameters related to queries.


8.1. Ranking

The following tags and parameters are related to ranking.


This expression is used to compute the static score of a record with respect to a term. It is computed during index construction and stored in the indexes statically. It can use the following parameters:

The expression allows operators such as +, -, /, *, (, ), 0-9, etc. Check the ranking page to see how this static score is used in ranking. An example of the power and complexity of this expression is provided in example-demo/srch2-config-advanced.xml.

8.2. Fuzzy Match Penalty (Optional)


"FuzzyMatchPenalty" is used to decide how much penalty we want to give to records with a fuzzy match to a search term compared to terms with an exact match. See the ranking page about how this parameter is used. The default value is 1.

8.3. Term Similarity Threshold (Optional)


This value is used in the computation of the maximum edit distance we allow in fuzzy search for a search term. Edit distance is the minimum number of single-character changes required transform a search term into a term in a record. Conceptually, edit distance represents the difference between the search term and a word in a record.

The value of queryTermSimilarityThreshold must lie in [0,1], with 1 representing exact match, and 0 being the largest possible error tolerance. The default value is 0.5.

This threshold is normalized by the length of the query term. The engine uses the following formula to compute the edit-distance threshold for this term:

floor((1-queryTermSimilarityThreshold) * length(query term)).

For instance, suppose the query term is "schwarazenneggar", and queryTermSimilarityThreshold is 0.8. Then the threshold of edit distance for this term is: floor((1-0.8)*16) = 3.

8.4. Support Swap as an Edit Distance Operation (Optional)


This setting allows character swap operations (e.g., "iphone" versus "ipohne") in the edit distance function, in addition to insertion, deletion, and substitution operations. When true, swapping two adjacent characters is counted as a single operation when calculating the edit distance. By default "supportSwapInEditDistance" is true.

8.5. Prefix Match Penalty (Optional)


This setting specifies how much penalty we want to give to a record where the search term matches only the beginning of a term in the record (e.g., "can" versus "candy"). Its value should be in the range of 0 to 1, with a default value of 0.95. See the ranking page for more information.

8.6. Cache Size (Optional)


This parameter specifies the number of bytes used in caching. The number should be in the range [52428800,524288000] (i.e., [50MB, 500MB]). Its default value is 52428800 (50MB).

8.7. Default value for number of returned results (Optional)


"Rows" specifies the number of results retrieved and returned in a single query response. If there are more results that "Rows", another query with a different "start" value must be sent to retrieve the next "page" of results. The default value of "Rows" is 10.

8.8. Field-Based Search (Optional)


This is a flag to toggle attribute-based search. An attribute-based search applies conditions that determine in which attributes a keyword should appear.

0: Disabled
1: Enabled
When either positional index or character offset is enabled, field based search will also be enabled.

8.9. Search Type (Optional)


This parameter sets the default search type of a query to the engine.

0: find the top relevant results (with a number specified by the "rows" parameter);
1: find all matching results.

The default value is 0, i.e., the engine returns top results.

8.10. Query Term Fuzzy Type (Optional)


This parameter specifies how to treat a term by default with respect to match fuzziness.

0: exact match;
1: fuzzy match.

Its default value is 0.

8.11. Query Term Prefix Type (Optional)


This value specifies how to treat a term by default with respect to match completeness.

0: prefix match (the query term matches the beginning of a word);
1: complete match (the entire word must match the search term).

Its default value is 0.

8.12. Query Term Length Boost


This value boosts record based on the length of the query term. Valid values for the parameter lie in the range of 0-1 (inclusive of 0 and 1). When it is not specified, a default value of 0.5 is used.

8.13. Highlighter (Optional)

The engine can generate a brief snippet of matching text in a searchable field. Each matching keyword is highlighted in the snippet. A snippet with highlighted keywords allows users to see more meaningful information from the field's text. The following is an example setting in the configuration file:

    <fuzzyTagPre value = '<span class="fuzzy">'></fuzzyTagPre>
    <fuzzyTagPost value = '</span>'></fuzzyTagPost>
    <exactTagPre value = '<span class="exact">' ></exactTagPre>
    <exactTagPost value = '</span>'></exactTagPost>

This composite parameter allows a user to specify snippet size and tags to enclose the matched keywords.

In order for an attribute to be highlighted, its highlight attribute in the schema configuration section should be enabled. There is a highlighting example in example-demo/srch2-config-advanced.xml. Notice that to improve the query search time when the highlighting is enabled, consider enabling the character offset index.

9. Response Writer

This tag has parameters related to configuring the response of a query.


9.1. Response Content (Optional)

ResponseContent specifies the search engine and record fields to be returned for each JSON record in the response. We should set the "type" attribute of "ResponseContent" to one of the following options:

The default value is 0. If "type" is 2, ResponseContent should also contain a comma-separated list of fields to be returned for each record in the query response. The following are two examples:

 <responseContent type="0"></responseContent>


 <responseContent type="2">lat,lng</responseContent>

Note that the fields mentioned in responseContent for type 2 should have either Searchable, Refining, or Indexed set to true, otherwise they will not be returned by the engine in query results.

9.2 JSON Response Format (Optional)

ResponseFormat specifies if the response should be JSON or JSONP formatted. Possible values are:

The default value is 0.

10. Update Handler

This tag includes tags related to configuring the engine to handle update requests.


10.1. Maximum Number of Documents (Optional)


This is the maximum number of records that can be indexed by the engine. Inserting more records than what is specified here will cause the insertions to fail. By default the engine uses 15000000 as maximum number of records.

10.2. Maximum Memory (Optional)


This is the maximum memory the engine will use. By default the engine uses 1GB as maximum memory.

10.3. Merge Policy (Optional)


10, 100 will be used as the default value for mergeEveryNSeconds and mergeEveryMWrites respectively if they are missing from the config file.

11. Data Schema

This section of the configuration concerns the data source to be searched. The following sections explain each sub-tag under the schema component.

<schema name="SRCH2" version="3">

11.1. Primary Key (Required)


This value specifies the attribute used as the primary key to reference each record in the data, and thus the value of the specified field must be unique across all records. If a new record has a primary key conflicting with an existing record, the new record will be rejected by the engine. The attribute must be a "text" type.

11.2. Fields (Required)

"Fields" contains the specification of the each "field" in the records.


Here is an example:

<field name="category" type="text" searchable="true" acl="true" />
<field name="name" type="text" indexed="true" highlight="true" />
<field name="model" default="benz" required="true" type="text" refining="true"/>
<field name="price" default="200" required="true" type="float" refining="true" acl="true"/>
<field name="likes" default="0" required="true" type="integer" refining="true"/>
<field name="seller_id" default="12324,546436,576864" required="false" type="integer" refining="true" multivalued="true">

These parameters give information about each field, including its name, type, and certain properties.

The engine supports the following formats of time and date:

The engine supports time durations of the format "{integer}{date/time}", in which date and time can be SECOND, SECONDS, MINUTE, MINUTES, HOUR, HOURS, DAY, DAYS, WEEK, WEEKS, MONTH, MONTHS, YEAR, and YEARS. Examples are: "DAY" (meaning 1 day) or "34WEEKS" (meaning 34 weeks).

Each field has certain properties:


11.3. Attribute Access Control (Optional)

This field specifies an access control file for the attributes having acl flag set to true in the schema. The engine bulk loads access control mapping from the file during the index-creation phase. The file is not used when the indexes are loaded from the disk.


The attribute access control file should be in a JSON format having one JSON object per line as shown below:

{ attributes : { "name", "seller_id", "price" }, roleId : ["user1"] }
{ attributes : { "seller_id" }, roleId : ["user2"] }

Attribute access control mapping can also be defined at runtime using the RESTful APIs. More details about how to do attribute-based access control can be found here.

11.4. Enable Facet (Optional)

This setting toggles the faceted search feature.


Its value is either false (facet search disabled) or true (enabled). Its default value is false.

11.5. Facet Fields

This tag contains facet-related tags, and it is required if facet is enabled.


Here is an example.

  <facetField name="genre" facetType="categorical"/>
  <facetField name="year"  facetType="range" facetStart="1990" facetEnd="2050" facetGap="10"/>

Note that facet ranges are created in correspondence with the "facetStart" value, not the "facetEnd" value. Therefore, the last interval may not span the entire facetGap. Also all facet fields must be declared in a field tag as well (with refining="true").

11.6. Analyzer

The parameters contained in this tag configure the analyzer to be used by the engine. Currently we support English-like languages and Chinese. We can specify which language to use by setting fieldType name="text_standard" or fieldType name="text_chinese" correspondingly. For example, the following configuration specifies the standard analyzer:

  <fieldType name="text_standard">

And the following configuration specifies the Chinese analyzer:

  <fieldType name="text_chinese" dictionary="/relativ/path/to/srch2home/srch2_dictionary_zh_cn.bin">

11.6.1 Stemmer (Optional)

The engine supports Porter Stemming. Inflected or derived words are reduced to their base or root form for matching against search terms. The engine uses a dictionary file with keywords that should be ignored by the stemmer. Keywords not in the dictionary will go through a sequence of stemming rules. If no file if provided, the stemmer is disabled.

<filter name="PorterStemFilter" dictionary="srch2_stemmer_dictionary.txt" />

11.6.2 Stop Words (Optional)

This tag specifies a file of stop words that will be ignored by the engine. Stop words will not be indexed, and will not match any search term. This will consume fewer system resources. In English grammar, particle words have no meaning by themselves, and could be considered for inclusion as stop words. Providing an empty string ("") for the "words" parameter disables this filter.

<filter name="StopFilter" words="srch2_stop_words.txt" />

The advanced example provides a working example of stop words.

11.6.3 Protected Keywords Filter (Optional)

It specifies a file of special protected keywords that should be preserved by the analyzer, such as "C++", "C#", ".NET", and "AT&T". Providing an empty string ("") for the "words" parameter disables this filter.

<filter name="protectedKeyWordsFilter" words="srch2_protected_words.txt" />

The advanced example provides a working example of protected keywords.

11.6.4 Synonym Filter (Optional)

The synonym filter allows synonyms while analyzing data records for indexing. The configuration setting is shown below.

<filter name="synonymFilter" synonyms="relative/path/to/synonyms/file" expand="true"/>

The synonyms attribute specifies the path to a synonym file relative to srch2Home. The boolean attribute expand indicates whether all the synonyms should be kept in index or replaced by a single synonym.

The format of the synonym file is shown below. All comments start with '#' at the begining of a line.

#1. explicit synonym rule : Tokens on LHS will be replaced with tokens on RHS (the "expand" flag is ignored)
nyc=>new york city, new york
jvm=> java virtual machine
centOS, fedora => linux
#2. equivalent synonym rule 
ipad, i-pad, tablet
#   if expand == true, then the rule is treated as "ipad, i-pad, tablet" => "ipad, i-pad, tablet"
#   if expand == false, then the rule is treated as "i-pad, tablet" => "ipad"


Consider a record "The CentOS Project is a community-driven free software effort"

If the synonym rule is "centOS=>linux", then the query keyword "centOS" will not return any result because the keyword "CentOS" has been replaced by "linux". The query "linux" will return the record as a search result.

If the synonym rule is "centOS, linux" (equivalent synonyms) and the expand attribute is set to true, then the both keyword queries "centOS" and "linux" will return the record as a search result.

11.6.5 Allowed Special Characters (Optional)

The engine has a number of rules on what characters can be used to decompose words, and what characters are punctuation or delimiters separating words. To provide additional flexibility, the engine can be configured to add characters that should not be considered as delimiters between words. Those sequences of characters that include those characters will be treated as a single term and will not be broken into separate words. Here is an example for this setting to allow apostrophes, dashes, underscores, and at signs to occur within single words:

<filter name="allowedRecordSpecialCharacters">#@</filter>

This example will allow a search on "L'Absinthe" to match and return a record with the phrase "Edgar Degas' painting L'Absinthe hangs in the Musée d'Orsay in Paris" in an indexed attribute. Note that whitespace characters will also be considered word delimiters and are not permitted in this field.

The advanced example provides an example of allowing Twitter-hashtag-like terms to be queried.

11.6.6 Chinese Analyzer

If we want to use the Chinese analyzer by setting fieldType name="text_chinese", then we should also provide the path of a Chinese dictionary (relative to the srch2Home folder) for the engine to use to tokenize records and queries. The following is an example:

 <fieldType name="text_chinese" dictionary="/relativ/path/to/srch2home/srch2_dictionary_zh_cn.bin">

For example, the engine can tokenize the string "海边城市" (meaning "ocean cities" in English) into "海边" ("ocean") and "城市" ("cities"). If the user doesn't provide the dictionary file, the engine will treat each Chinese character as a separate keyword. For instance, the string "海边城市" will be tokenized as "海","边","城", and "市". The SRCH2 Chinese dictionary is a binary file named as "srch2_dictionary_zh_cn.bin" inside the "srch2_data" folder.

Here we explain how to use SRCH2 to do text search in Chinese. The folder example-demo in the package includes a sample configuration file srch2-config-chinese.xml, which uses using a json file chinese-data.json that has about two hundred Chinese records. Configure Engine

Find the following line inside the config element to specify the path of sample data file.


Find the following lines inside the config element to specify information about the Chinese dictionary, stop words, and protected words:

        <fieldType name="text_chinese" dictionary="../srch2_data/srch2_dictionary_zh_cn.bin"> 
                <filter name="StopFilter" words="../srch2_data/srch2_stop_words_zh_cn.txt" />
                <filter name="protectedKeyWordsFilter" words="../srch2_data/srch2_protected_words.txt" />
    </types> Start Engine

Go to the install folder of SRCH2 and run the following command to start the SRCH2 engine:

/home/joe/srch2/example-demo> ../bin/srch2-engine --config-file=./srch2-config-chinese.xml

The engine will load the sample data to build the indexes.

Note that each RESTful request needs to use a proper URL encoder to be converted to characters in a format that can be transmitted properly. In addition, make sure to use quotes ("%22") for each search query.

In a shell terminal, run the following command:

shell> curl ""

If running in the browser, You should see a record with the keyword "喜欢". Insert a Record

The following request adds a record to the indexes.

shell> curl "" -i -X PUT -d '{"id" : "300", "content" : "饺子,又名(水饺)起源于东汉中原宛城,是古老的汉族传统面食,深受中国广大人民喜爱的食品。"}'

Now search with the keyword "面食"

shell> curl ""

The engine should return the newly inserted record. Update a Record

The following request will update the content of the newly inserted record.

shell>  curl "" -i -X PUT -d '{"id" : "300", "content" : "饺子是一种有馅的半圆形或半月形、角形的面食。"}'

Wait for about 4 seconds for the server to make the change, then do the above search query again. The engine should return this updated record. Delete a Record

Run the following command to delete the updated record:

shell> curl "" -i -X DELETE

Do the above search query again. The SRCH2 engine should not return the record that we just deleted.

For more information about SRCH2 RESTful API, please see RESTful Search and RESTful Update.

12. Logging

The following tag includes parameters related to logging by the SRCH2 engine.


12.1. Log Level (Optional)


The parameter controls the amount of feedback information the engine will report in its log.

Its default value is 3.

12.2. Log File (Optional)


This value specifies the file to which the engine writes logging information. Third-party tools such as logrotate can be utilized to manage the log messages. Check this page for more information about how to manage this log file using third-party tools such as "logrotate".

If it is not specified, the default file path for log file would be "srch2Home/logs/srch2-log.txt".

Note that when no core is specified in the configuration file, then the log message will have "__DEFAULTCORE__" as the default core name. For a multicore configuration, it will use the corresponding core name in the log message.

13. Cores (Optional)

The "Cores" tag can be used to support the creation of multiple data sources (called "cores") within the same SRCH2 server.
An engine can have multiple sources, and all cores run within the same server process. A query can specify a single core.

When instantiated with multiple cores, queries specify a core by name to process the query. For instance, the following query:

requests that the core called "example-core" process a search request for "termina". In the case when no core is specified in a query to a multi-core server, the server will use the core specified by the "defaultCoreName" tag.

A sample multi-core configuration is provided in example-demo/srch2-config-advanced.xml.

13.1 Core (Optional)

Each "Core" tag under "Cores" specifies a server core to be created. Each core requires a name attribute. This name is used in query URI paths, as well as the aforementioned "defaultCoreName" attribute of "Cores". Nearly all server configuration settings documented above can also appear under a "Core" tag. The exceptions are "listeningHostName", "listeningPort", and "maxSearchThreads", which are global for all the cores. Each Core must define all its own data source and behavioral rules, so for example, "dataDir" and "schema" are typically defined under each core.

14. Authorization Key (Optional)


This parameter specifies an authorization key that is required in each HTTP request to the engine. If this key is specified, each valid HTTP request needs to provide the following key-value pair in order to get the authorization.


Here's an example search query:

 curl -i ""

Notice that we use "foobar" as an example to indicate that the key in the HTTP request should match the key in the config file. A valid authorization key should include alphanumeric characters only, and the authorization is case sensitive. If no authorization key is provided in the config file, then an HTTP request doesn't need to provide a key.

15. User Feedback (Optional)

The SRCH2 engine has a unique, powerful feature to dynamically boost the ranking of records based on user feedback. Please refer to feedback ranking and feedback API for details.

We can use the following setting in the configuration file to enable feedback ranking:

<userFeedback maxFeedbackQueries = "30"  maxFeedbackRecordsPerQuery = "20" />

The maxFeedbackRecordsPerQuery value specifies the maximal number of unique records that the engine stores per query as feedbacks. If the number of feedback records grows more than this parameter, then the engine will discard the oldest feedback record.

The maxFeedbackQueries value specifies the maximal number of queries that the engine stores at any give time. If the number of feedback queries grows more than this parameter, then the engine will discard the oldest query.