Models¶
RVQ autoencoder model architecture using heliaEDGE components.
compressionkit.models.rvq_autoencoder.build_rvq_autoencoder(frame_size, *, embedding_dim=16, latent_width=256, in_ch=1, out_ch=1, base_filters=32, multiplier=1.25, num_levels=2, beta=0.25, num_stages=4, encoder_block_norm='batch', encoder_head_norm='none', decoder_block_norm='none', decoder_head_norm='layer')
¶
Build encoder, RVQ bottleneck, decoder, and composite VQAutoencoder.
Returns:
| Type | Description |
|---|---|
Model
|
|
ResidualVectorQuantizer
|
|
Source code in compressionkit/models/rvq_autoencoder.py
compressionkit.models.rvq_autoencoder.build_encoder_2d(input_len=2048, in_ch=1, base=32, embedding_dim=16, multiplier=1.25, num_stages=4, block_norm='batch', head_norm='none')
¶
Build a configurable encoder with 2**num_stages downsampling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_len
|
int
|
Number of input time samples. |
2048
|
in_ch
|
int
|
Number of input channels. |
1
|
base
|
int
|
Base filter count for the first stage. |
32
|
embedding_dim
|
int
|
Latent channel dimension after projection. |
16
|
multiplier
|
float
|
Filter count multiplier per stage. |
1.25
|
num_stages
|
int
|
Number of stride-2 downsampling stages. |
4
|
block_norm
|
str
|
Normalization mode for conv blocks ( |
'batch'
|
head_norm
|
str
|
Normalization mode for the final projection. |
'none'
|
Source code in compressionkit/models/rvq_autoencoder.py
compressionkit.models.rvq_autoencoder.build_decoder_2d(output_len=2048, out_ch=1, base=32, embedding_dim=16, multiplier=1.25, num_stages=4, decoder_block_norm='none', head_norm='layer')
¶
Build a configurable decoder that mirrors the encoder stages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_len
|
int
|
Number of output time samples. |
2048
|
out_ch
|
int
|
Number of output channels. |
1
|
base
|
int
|
Base filter count (mirroring the encoder). |
32
|
embedding_dim
|
int
|
Latent channel dimension. |
16
|
multiplier
|
float
|
Filter count multiplier per stage. |
1.25
|
num_stages
|
int
|
Number of upsample stages (must match encoder). |
4
|
decoder_block_norm
|
str
|
Normalization mode for decoder blocks. |
'none'
|
head_norm
|
str
|
Normalization mode for the output head. |
'layer'
|
Source code in compressionkit/models/rvq_autoencoder.py
compressionkit.models.rvq_autoencoder.compute_compression_stats(frame_size, *, bit_depth, latent_width, num_levels, downsample_factor=16)
¶
Compute compression ratio and related statistics for an RVQ configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_size
|
int
|
Number of input time samples per frame. |
required |
bit_depth
|
int
|
Bits per raw input sample. |
required |
latent_width
|
int
|
Number of codebook entries (determines bits per index). |
required |
num_levels
|
int
|
Number of RVQ codebook levels. |
required |
downsample_factor
|
int
|
Total temporal downsampling factor. |
16
|
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
Dictionary with compression statistics. |