Quantization methods
There are several quantization methods that can be employed to reduce the memory footprint of vectors:Binary quantization
Binary quantization is the simplest form of quantization, where each component in a vector is converted from a float32 to a single bit. This is done by rounding each component in a given vector to if it is or if it is . This converts a float32 component to a single bit (0 or 1), which greatly reduces the memory consumption of the vector. Binary quantization can be defined as:Example
Here we will use binary quantization on a 5-dimensional vector:Scalar quantization
Scalar quantization is a method that reduces the precision of each component in a vector by mapping it to a smaller set of values. This is done by dividing the range of values into intervals and assigning each interval a representative value. The number of bits allocated to each component determines the number of intervals.Example
Let’s say we have a -dimensional vector Where each component is a continuous float32 value between . We can apply scalar quantization by mapping each component to a smaller set of values, such as integers or fixed-point numbers. For this example, we will map each float32 component to a 8-bit integer.- We first define the number of bits to allocate to each component, which in this case is .
- We then calculate the number of intervals as .
- Next, we map each component of our original float32 vector to the nearest interval in the discrete range . To achieve this, we can use min-max normalization to scale our values linearly.
- After passing each component through the min-max normalization, we can round each value to the nearest integer in the range .
- Finally, we can represent each component as an 8-bit integer, which reduces the memory consumption of the vector. Our initial -dimensional vector consumed bits, but now only consumes bits which is a 4X reduction in memory consumption!
Product quantization
Product quantization is a more advanced quantization method which uses a technique called sub-vector quantization to compress high-dimensional vectors using a set of centroids. This method is particularly useful for large datasets and can achieve significant memory savings while maintaining a reasonable level of accuracy.Example
- We have a 6-dimensional index () where we would like to apply product quantization to the following vector:
- We then choose to split the vector into sub-vectors, each with dimensions .
- We then generate codebooks for each sub-vector. A codebook is a set of centroids in a -dimensional space, which are used to approximate each sub-vector. For example, let’s assume we have the following 3 codebooks:
- Given our sub-vectors above, we can then approximate each sub-vector by finding the closest centroid in each codebook using simple Euclidean distance. Let’s say after computing the nearest centroids to each sub-vector, we approximate them as follows:
Generating the codebooks
One of the challenges with product quantization is generating the codebooks. This is because when approximating a sub-vector to a centroids in a codebook creates loss. This means that the centroids we decide to use will directly affect how lossy the quantization will be, therefore affecting the overall accuracy of results in the vector database. A common approach for generating codebooks is to use a clustering algorithm, like k-means clustering. K-means clustering is a machine learning algorithm that partitions a set of vectors into clusters, where each cluster is represented by its centroid. As with most machine-learning algorithms, k-means needs to learn the shape of data it’s given to accurately partition it.
Comparison of quantization methods
Method | Memory Reduction | Lossiness | Complexity |
---|---|---|---|
Binary Quantization | High | High | Low |
Scalar Quantization | Moderate | Moderate | Moderate |
Product Quantization | Very High | Low | High |