Exact vs. approximate search
We want to give you guidelines on when to use exact search and when to switch to approximate search.
What is exact search?
Flow uses trained AI models to analyze an image extract and represent the main content in a compact list of numbers (a vector or descriptor). The calculated descriptors are then indexed. When performing a visual similarity search, the query descriptor has to be compared against all candidate descriptors to calculate a similarity score for each. In the end, the top-n results with the highest score are returned.
This means that the search time is linear to the number of documents in the collection. Searching a collection of 10 million images takes ten times longer than searching 1 million.
What is approximate search?
To speed up search time in big collections, we apply approximate search. This search mode combines several techniques to achieve higher search performance than exact search.
The approximate search process is:
- Reducing the set of images that have to be scored using the Smartfilter
- Ranking the remaining images using fast approximate scoring
- Re-ranking the top-50 images using exact scoring
By reducing the number of descriptors to be compared and using a faster but less accurate scoring technique, approximate search is much faster than exact search.
Searching a collection of 10 million images is up to 10x faster than exact search. This comes at the expense of search accuracy.
When to use approximate search
Depending on the Smartfilter level, the approximate search causes higher query preparation costs (qpc) compared to the exact search. Therefore, the approximate search is only useful from a certain number of images.
The diagram below visualizes the various search times in comparison to number of images and search mode.
You might be tempted to just use Smartfilter extreme
for best search performance.
But the more agressive the filtering is, the worse the result accuracy becomes.
So it's a trade-off between speed and accuracy and depending on your use case different options might be appropriate.
Our search recommendation
- Use exact search for up to 200,000 docs (100% accuracy)
- Use approximate search with Smartfilter
high
from 200,000 docs and more (93% accuracy)
The table below shows the intersection points, from how many images an approximated search is faster than an exact search at which filter level. The accuracy describes the probability that the first result of an exact search is also the first result of an approximated search.
Approximate search with Smartfilter | Faster than exact search | Search Accuracy1 |
---|---|---|
LOW |
>500,000 docs | 98% |
MEDIUM |
>300,000 docs | 97% |
HIGH |
>200,000 docs | 93% |
ULTRA |
>100,000 docs | 86% |
EXTREME |
>50,000 docs | 72% |
-
The accuracy was measured with our internal test dataset. Actual accuracy may vary with different image sets. ↩