Function arm_rsqrt_s16_universal

Function Documentation

arm_cmsis_nn_status arm_rsqrt_s16_universal(const int16_t *input, const int32_t input_offset, int16_t *output, const int32_t out_offset, const int32_t out_mult, const int32_t out_shift, const bool needs_rescale, const int32_t out_activation_min, const int32_t out_activation_max, const int32_t block_size, const int32_t *lut)

INT16 reciprocal square root using a shared universal LUT.

In universal mode all RSQRT operators share a single LUT that captures the base 1/sqrt(x) shape, and operator-specific quantization is applied afterward via out_mult / out_shift. Because this two-step process introduces extra rounding stages, the output may differ from the per-op variant (arm_rsqrt_s16_per_op) by up to ±3 LSB per element. This is expected and acceptable for deployment.

Parameters:
  • input[in] Pointer to the input buffer.

  • input_offset[in] Input tensor zero offset. The kernel evaluates each element as input - input_offset before the LUT lookup.

  • output[out] Pointer to the output buffer.

  • out_offset[in] Output tensor zero offset.

  • out_mult[in] Output requantization multiplier.

  • out_shift[in] Output requantization shift.

  • needs_rescale[in] Whether requantization is required.

  • out_activation_min[in] Minimum output clamp.

  • out_activation_max[in] Maximum output clamp.

  • block_size[in] Number of elements.

  • lut[in] Pointer to a 513-entry INT32 shared LUT in Q30 domain.

Returns:

The function returns ARM_CMSIS_NN_SUCCESS or ARM_CMSIS_NN_ARG_ERROR.