embedding text into vector spaces
Word embedding is an nlp technique to map text or words to vectors
Allows vector space operations to happen, such as summing or computing the distances of vectors
Once words are generated into individual vectors, combine them into text vectors (aka document or sentence vecotrs)
- Easy way typically sums or averages the vectors together

Two snippets of text are discovered by mapping both of them into vector space and finding the distances between the vectors
- Typically uses the angular distance
nearest neighbor can be used
- high dimensional vectors of word embeddings typically break down
- approximate nearest neighbor must be used (ANN)
assuming 4 billion 200-dimensional query vectors
- 4 billions lets you store each dimension as a 4 byte float
- ~3TB
- tried quantization/discretization
Cliqz uses graph based approximate nearest neighbor search (granne)