Function arm_rsqrt_s16_universal¶
Defined in File arm_nnfunctions.h
Function Documentation¶
-
arm_cmsis_nn_status arm_rsqrt_s16_universal(const int16_t *input, const int32_t input_offset, int16_t *output, const int32_t out_offset, const int32_t out_mult, const int32_t out_shift, const bool needs_rescale, const int32_t out_activation_min, const int32_t out_activation_max, const int32_t block_size, const int32_t *lut)¶
INT16 reciprocal square root using a shared universal LUT.
In universal mode all RSQRT operators share a single LUT that captures the base 1/sqrt(x) shape, and operator-specific quantization is applied afterward via
out_mult/out_shift. Because this two-step process introduces extra rounding stages, the output may differ from the per-op variant (arm_rsqrt_s16_per_op) by up to ±3 LSB per element. This is expected and acceptable for deployment.- Parameters:
input – [in] Pointer to the input buffer.
input_offset – [in] Input tensor zero offset. The kernel evaluates each element as
input - input_offsetbefore the LUT lookup.output – [out] Pointer to the output buffer.
out_offset – [in] Output tensor zero offset.
out_mult – [in] Output requantization multiplier.
out_shift – [in] Output requantization shift.
needs_rescale – [in] Whether requantization is required.
out_activation_min – [in] Minimum output clamp.
out_activation_max – [in] Maximum output clamp.
block_size – [in] Number of elements.
lut – [in] Pointer to a 513-entry INT32 shared LUT in Q30 domain.
- Returns:
The function returns ARM_CMSIS_NN_SUCCESS or ARM_CMSIS_NN_ARG_ERROR.