Overview

Fast Instant Search Engine

 

SRCH2 is a search engine specifically designed for fast instant search or search as you type. It also supports on-the-fly error correction (fuzzy search). Built from scratch in C++, it delivers high performance and rich features. Based on over a decade of advanced research, SRCH2 utilizes many algorithmic and design innovations to bring “Google Instant” experiences to your applications.

 

The engine runs servers as well as mobile devices.

There are existing solutions in the search space, including open source engines such as Elastic and Sphinx. SRCH2 is developed with the belief that the search space is big, and there is no single universal solution that works for all applications. SRCH2 adopts novel techniques and design principles to deliver superior performance and rich features, meeting the requirements of many emerging applications. Here we highlight several unique strengths of the engine.

Powerful Type-Ahead Search

 

Type-ahead search (also known as instant search) has proven to be very useful to help users find information quickly and easily, especially for mobile environments. The solutions used by Web search engines such as Google and Bing are not applicable for enterprises since they use large amounts of log records that are not available for specific domains. SRCH2 is particularly optimized to support instant search for enterprises, even without query log. Compared to existing solutions, SRCH2 not only provides a higher speed for queries with prefix conditions, but also allows customizable ranking for such queries. Notice that there is a big difference between “instant search” (finding answers as the user types) and “auto suggestions” (giving suggested queries as the user types).

search

Search on Mobile Devices

 

Mobile devices such as smart phones are becoming powerful in terms of CPU cores, cycles, memory size, and storage. As they have more local data, on-device search is more demanded. Due to their limited interface and the notorious “fat fingers” problem, information access on these devices especially needs the support of instant search and error correction. Built in C++ from scratch, SRCH2 has more advantages than other Java-based solutions on mobile devices since SRCH2’s binary doesn’t need to pay the overhead of Java virtual machine. An experimental study showed that SRCH2 can be 100 times faster than Lucene on Android.

fatFingerOniPhoneKeyboard

In-memory Technology

 

We see the trend that computer memory is becoming larger and cheaper. SRCH2 uses in-memory indexing techniques and algorithms to support high performance and powerful features. This approach has many advantages compared to solutions that rely on disk-based structures and a buffer manager. SRCH2 deploys a novel technique called “3-way indexing” to seamless integrate inverted index, trie, and forward index to improve search speed. A recent study shows that SRCH2’s performance was 3 to 4 times faster than Solr on a Linux server.

RAM

Cost-Based Optimization

 

Since keyword conditions can have different semantic meanings such as prefixes or error correction, there are many different ways to compute the answers to a query. SRCH2 has a built-in query optimizer (largely influenced by the techniques developed over decades from the database research community) to do cost-based analysis to compute an efficient plan to deliver high performance.

cost-analysis

Access Control

 

Enterprises need to make sure right information be accessed by the right people, thus require data access control (ACL). SRCH2 supports both record-based ACL and attribute-based ACL. Using record-based ACL, we can specify which users can access which records. Using attribute-based ACL, we can specify which users can access which attributes. With these features, we can make the data access secure and safe.

datasecurity

Smart Ranking Learning

 

Ranking is a key component in any search solution, but it’s also hard. SRCH2 develops novel techniques to make the ranking function automatically do self tuning based on user query feedback. In particular, after submitting a query, if a user clicks a result and passes that information to the engine, then the engine can use this feedback to improve the ranking function. The next time the same query is submitted, possibly by a different user, the engine can intelligently improve the ranking and boost the relevance of the result clicked by the first user. In this way, the engine can gradually get “smarter” to find relevant answers for users.

ranking