Skip to content

Commit 2e5ac8a

Browse files
feat: add explaination
1 parent b0241ce commit 2e5ac8a

1 file changed

Lines changed: 10 additions & 7 deletions

File tree

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,13 @@ pip install bitcodes-pytorch
1111

1212
## Usage
1313

14-
### Quantize
1514
```python
1615
from bitcodes_pytorch import Bitcodes
1716

1817
bitcodes = Bitcodes(
19-
features=8,
20-
num_bits=4,
21-
temperature=10,
18+
features=8, # Number of features per vector
19+
num_bits=4, # Number of bits per vector
20+
temperature=10, # Gumbel softmax training temperature
2221
)
2322

2423
# Set to eval during inference to make deterministic
@@ -42,14 +41,14 @@ bits = tensor([[
4241
"""
4342
```
4443

45-
### Recover Output from Bits
44+
### Dequantize
4645
```python
4746
y_decoded = bitcodes.from_bits(bits)
4847

4948
assert torch.allclose(y, y_decoded) # Assert passes in eval mode!
5049
```
5150

52-
### Decimal-Binary Conversion
51+
### Utils: Decimal-Binary Conversion
5352
```python
5453
from bitcodes_pytorch import to_decimal, to_binary
5554

@@ -72,4 +71,8 @@ bits = tensor([[
7271

7372
## Explaination
7473

75-
TODO
74+
Current vector quantization methods (e.g. [VQ-VAE](https://arxiv.org/abs/1711.00937#), [RQ-VAE](https://arxiv.org/abs/2203.01941)) either use a single large codebook or multiple smaller codebooks that are used as residuals. Residuals allow for an exponential increase in the number of possible combinations while keeping the number of total codebook items reasonably small by overlapping many codebook elements. If we let $C$ be the codebook size, and $R$ the number of residuals, we can get a theoretical maximum of $C^R$ combinations, assuming that all residuals have the same codebook size. The total number of codebook elements, which is proportional to parameter count, is instead $C\cdot R$. Thus it makes sense to keep $C$ as small as possible to maintain the parameter count reasonably small, while increasing $R$ to exploit the exponential number of combinations.
75+
76+
Here we use $C=2$ making the code binary, where $R=$`num_bits` can be freely chosen. The residuals are overlapped to get the output, instead of quantizing the difference - this allows to remove the residual loop and quantize with large $R$ in parallel.
77+
78+
Another nice property of bitcodes is that we can choose to quantize the bit matrix to integers in different ways after training (e.g. convert to decimal one or two rows at a time).

0 commit comments

Comments
 (0)