I had to build a system, which has to perform a full-text search. Two choices immediately rang to me, Postgres and Elasticsearch.
Postgres
I usually work with Postgres on a day-to-day basis but have not explored full-text search.
To do the full-text search in Postgres, you need to do the following:
Postgres
I usually work with Postgres on a day-to-day basis but have not explored full-text search.
To do the full-text search in Postgres, you need to do the following:
- create a column with tsvector type
- update all the values to this column, from the search target column
- create a gin index on this column
The code looks something like this
alter table articles add column document_tokens tsvector
UPDATE articles a SET document_tokens = to_tsvector('english', a.title) CREATE INDEX clever_stuff ON articles USING gin(document_tokens);
Search query looks something like this:
select id, ts_rank( document_tokens, to_tsquery('english', 'synergize & integrated') ) AS rank FROM articles WHERE document_tokens @@ to_tsquery('english', 'synergize & integrated') ORDER BY rank DESC LIMIT 10
The results were pretty satisfactory, I had a database of 350 k records, the search takes around ~30 ms.
Advantage:
Able to give the relevant results even with past and future tenses of a word, as the tsvector stores the word's root form.
Disadvantage:
It's unable to understand the context of the search, if the words are jumbled or a similar form of the sentence is present somewhere, it's not able to figure that out.
Elasticsearch
To index documents in elastic search, well, you need to create an index.
While indexing, elastic search analyses the text by doing the following
Advantage:
Able to give the relevant results even with past and future tenses of a word, as the tsvector stores the word's root form.
Disadvantage:
It's unable to understand the context of the search, if the words are jumbled or a similar form of the sentence is present somewhere, it's not able to figure that out.
Elasticsearch
To index documents in elastic search, well, you need to create an index.
While indexing, elastic search analyses the text by doing the following
- Character Filters
- Tokenizers
- Token filters
There's an in-depth article about these concepts here
For my use case of indexing documents with special characters, the customization of all of the mentioned above things made elastic search a winner.
With the help of character filters, I was able to replace the special characters with custom words. Otherwise, these special characters would not be stored in vector representation.
With the help of token filters, you can specify whether you want synonyms to be added to individual words while indexing, or whether you want the words to be stored in lower case, etc
The ability to customize helped me produce a better system with much better relevance.
The results are snappier than Postgres, though it's a micro-optimization. All the results are less than 10 ms for searches.
But the reason we are going to go ahead with Elastic search is that it's understanding the context better and can return results that have better quality.