Search Implementation¶
Tachi's search implementation uses MongoDB's $text index. This breaks a query into words and compares each of them to the provided text fields.
__textScore¶
For our code, we mutate the documents we want to return with a special field: __textScore
.
This field declares how 'close' the provided query was to the $text fields in this document.
This is sometimes exposed in the API for sorting reasons.
Bug
MongoDB's $text matching algorithm isn't great for fuzzy matches - It doesn't like song titles like 'A', as it thinks 'a' is an article, and doesn't match it properly as a word.
Info
Why not regex for fuzzy matches?
Regex has performance issues on larger datasets and we want to avoid it. Most regexes cannot use indexes, and therefore invoke a COLLSCAN, which we want to avoid.
User Searching¶
Searching users, on the other hand, has to use regex-based searching.
The $text
method attempts to break things up based on their
words, but that doesn't help with usernames, as they are all
too frequently XxX_One_Long_Str1ng_xXx
.
Instead, we use a case insensitive regex - similar to SQL's
LIKE
.
This means we do not have a __textScore
property for this
search to sort on. Instead, we just constrict returns to
around 15, and have the user whittle their search down
better.