Skip to content

rsqrt

Classes

RsqrtOperator

RsqrtOperator(op: AirOperator, model: AirModel, platform: SocPlatform, prefix: str = 'aot', attributes: dict[str, str] | None = None)

RSQRT operator with configurable INT16 LUT mode.

Attributes

lut_mode property
lut_mode: str

Return the configured RSQRT LUT mode.

Functions

Functions

resolve_rsqrt_lut_mode

resolve_rsqrt_lut_mode(attributes: dict[str, Any] | None = None) -> str

Resolve the RSQRT LUT mode from operator attributes.

warn_on_rsqrt_lut_mode

warn_on_rsqrt_lut_mode(model: AirModel, operators_config: list | None = None) -> None

Log a guidance warning when multiple INT16 RSQRT ops all use per_op mode.

Only considers INT16 RSQRT operators; INT8 RSQRT ignores lut_mode entirely.

Parameters:

  • model

    (AirModel) –

    The AIR model to inspect.

  • operators_config

    (list | None, default: None ) –

    The operators attribute ruleset from ConvertArgs.

make_rsqrt_lut_s8

make_rsqrt_lut_s8(input_scale: float, input_zero_point: int, output_scale: float, output_zero_point: int) -> np.ndarray

Build a 256-entry int8 RSQRT LUT using fixed-point emulation.

make_universal_rsqrt_lut_s16

make_universal_rsqrt_lut_s16() -> np.ndarray

Build a universal RSQRT base LUT shared by all INT16 operators.

The table stores 1 / sqrt(q) in Q30 for positive quantized values q sampled every 64 steps. Per-operator output scaling is applied later during code generation/runtime.

extract_per_op_rsqrt_lut_s16

extract_per_op_rsqrt_lut_s16(input_scale: float, output_scale: float) -> np.ndarray

Extract a 513-entry INT16 RSQRT LUT by probing a throwaway TFLite model.

Builds a minimal single-op TFLite model with the given quantization parameters and feeds grid-aligned inputs (frac=0) so that TFLite returns the exact LUT entries. This guarantees bit-exact agreement with TFLite's LUTPopulate<int16_t> (float32 arithmetic) without reimplementing its rounding in Python.

The positive domain (LUT indices 257-511) is probed directly. The negative domain (indices 0-256, never accessed by valid RSQRT inputs) and the last endpoint (index 512) are computed in float32 to match TFLite's Prepare().

Parameters:

  • input_scale

    (float) –

    Quantization scale for the INT16 input tensor.

  • output_scale

    (float) –

    Quantization scale for the INT16 output tensor.

Returns:

  • ndarray

    A 513-entry INT16 LUT matching TFLite's internal table.

compute_rsqrt_scale_s16

compute_rsqrt_scale_s16(input_scale: float, output_scale: float) -> AirFixedPointScale

Compute the fixed-point scale for the universal RSQRT base LUT.

The shared LUT stores base samples in Q30 for 1 / sqrt(q) where x_real = input_scale * q.

RSQRT in real space is:

y_real = 1 / sqrt(x_real) = 1 / sqrt(input_scale * q) = (1 / sqrt(input_scale)) * (1 / sqrt(q))

Output quantization is:

y_real = output_scale * y_q y_q = y_real / output_scale

Combining both gives:

y_q = (1 / (output_scale * sqrt(input_scale))) * (1 / sqrt(q))

The LUT stores 2^30 * (1 / sqrt(q)), so converting that LUT value into the quantized output domain requires one extra division by 2^30:

real_multiplier = 1 / (output_scale * sqrt(input_scale) * 2^30)

This real multiplier is then converted into the CMSIS-NN fixed-point pair (multiplier, shift) via AirFixedPointScale.from_real_multiplier, and the generated C passes that pair to arm_rsqrt_s16_universal(...).

Parameters:

  • input_scale

    (float) –

    Quantization scale for the INT16 input tensor.

  • output_scale

    (float) –

    Quantization scale for the INT16 output tensor.

Returns: