embedding text into vector spaces

Word embedding is an nlp technique to map text or words to vectors

Allows vector space operations to happen, such as summing or computing the distances of vectors

Once words are generated into individual vectors, combine them into text vectors (aka document or sentence vecotrs)
 Easy way typically sums or averages the vectors together

Two snippets of text are discovered by mapping both of them into vector space and finding the distances between the vectors
 Typically uses the angular distance

nearest neighbor can be used
 high dimensional vectors of word embeddings typically break down
 approximate nearest neighbor must be used (ANN)

assuming 4 billion 200dimensional query vectors
 4 billions lets you store each dimension as a 4 byte float
 ~3TB
 tried quantization/discretization

Cliqz uses graph based approximate nearest neighbor search (granne)