Functional Testing Plan for Helia AOT Conversion

This document outlines a comprehensive strategy for validating the functional correctness of the six conversion stages implemented in helia_aot/converter.py:

Load (backend → AIR)
Transform (graph-level optimizations)
Resolve (handler initialization)
Plan (memory & handler planning)
Emit (code generation)
Export (module export)

For each stage we describe the observable behaviors, required fixtures, and the concrete assertions that functional tests should perform. The goal is to cover both happy-path scenarios and critical failure modes so that the conversion pipeline is reliable end to end.

1. Load Stage — `convert_backend_model`

Key behavior: Convert a TFLite/LiteRT flatbuffer to an AirModel instance that exposes stable operator topology, tensor metadata, and typed options for downstream stages.【F:helia_aot/converter.py†L43-L71】【F:helia_aot/converters/init.py†L17-L64】

Fixtures

On-the-fly LiteRT builders: Instead of checking in serialized graphs, we synthesize minimal single-op and composite TFLite models during tests (see tests/unit/utils/operator_models.py and tests/unit/utils/composite_models.py). Each generator encodes deterministic tensor shapes, buffers, builtin options, quantization metadata, and sparsity so the loader exercises every registered parser.
Temporary file helpers: Tests materialize models under tmp_path and delete them after execution; we do not require a dedicated CLI ConvertArgs harness for this stage.

Tests

Successful load path - Invoke convert_backend_model on the generated .tflite files and verify the returned AirModel contains the expected operator ordering and tensor connectivity (sequential and branching graphs). - Assert unsupported suffixes raise ValueError, missing files raise FileNotFoundError, and corrupted flatbuffers surface exceptions so validation guards stay in place.
Operator coverage & tensor metadata - Parameterized tests span every registered LiteRT parser, checking that each AirOperator reports the correct AirOpType, named tensors (e.g., weights, bias), constant tensor payloads, typed options, and that tensor quantization/sparsity metadata is preserved (including blockwise quantization and dimension metadata for models that expose it).

2. Transform Stage — `TransformPipeline.apply`

Key behavior: Build a pipeline from specs (wildcard default handling + per-transform toggles) and mutate the model according to registered transforms.【F:helia_aot/converter.py†L73-L80】【F:helia_aot/transforms/transform_pipeline.py†L1-L61】

Fixtures

Synthetic AIR inputs: The tests reuse the Stage 1 LiteRT builders to emit minimal models containing identity ops, depthwise convolutions, and transpose convolutions. Each model is loaded once via convert_backend_model before the transform pipeline is exercised.
Transform specs: Tests assemble TransformSpec instances inline to cover wildcard defaults, per-transform overrides, and invalid names.

Tests

Pipeline construction - apply_wildcard_and_validate is verified to fan out the wildcard entry across the registered transforms, preserving explicit overrides. TransformPipeline.from_config is then checked to ensure each transform instance reflects the requested enabled flag and options payload, and that unknown transform names raise ValueError.
PruneIdentityOps - Using a single RESHAPE graph, the transform is expected to mark the output tensor as an alias of the input while leaving non-identity operators untouched. A follow-up call confirms the transform is idempotent.
DepthwiseToConv - Positive-path coverage asserts that a convertible depthwise convolution is rewritten as CONV_2D, produces AirConv2DOptions, and rewrites the weights tensor with the anticipated permutation. A separate test increases the efficiency threshold to prove the transform opts out.
TransposeReverseConv - Exercises both the reversible conversion (weights flipped + transposed and AirConv2DOptions emitted) and the negative-path where a large threshold keeps the original operator intact.

3. Resolve Stage — Handler resolution

Key behavior: Instantiate each handler and invoke resolve() so they populate the shared CodeGenContext (operators, interpreter, directory scaffolding).【F:helia_aot/converter.py†L93-L122】

Fixtures

Context builder: Tests materialize LiteRT models on disk, load them via convert_backend_model, and spin up a CodeGenContext pointed at a temporary work_path. No CLI harness or Stage 1 test code is reused beyond the shared model builders.
Monkeypatch hooks: Lightweight stubs replace side-effectful pieces (e.g., create_interpreter) so we can observe calls without invoking real interpreters or touching the filesystem outside of the temporary work directory.

Tests

OperatorHandler resolution - Running OperatorHandler.resolve() produces one AotOperator per AIR operator and marks each as resolved (the _has_resolved flag is set).【F:helia_aot/aot/handlers/operator_handler.py†L12-L55】
ModuleHandler cleanup - Verify it wipes any stale contents under work_path and recreates the src/ and includes-api/ directories used by later stages.【F:helia_aot/aot/handlers/module_handler.py†L52-L76】
TestHandler interpreter wiring (optional) - With tests enabled, resolve() must call create_interpreter with the configured model path and stash the returned interpreter for emission.【F:helia_aot/aot/handlers/test_handler.py†L12-L45】
Failure propagation - Monkeypatching a handler’s resolve() to raise should surface the exception through AotConverter.convert(), demonstrating that StepContext aborts the pipeline instead of swallowing the error.

4. Plan Stage — Memory planner & handler hooks

Key behavior: Run the configured memory planner and invoke plan() on every handler.【F:helia_aot/converter.py†L124-L139】

Fixtures

A temporary CodeGenContext is constructed for each registered platform using on-the-fly LiteRT models so the greedy planner runs under a variety of memory maps.
The context is seeded with constants, persistent tensors, and scratch tensors (including multi-layer graphs) to exercise allocation reuse and arena growth.

Tests

Planner integration - Instantiate the greedy planner for every platform and confirm the resulting MemoryPlan includes allocations for constants/persistent tensors and that arenas reflect the platform’s memory sizes. Scratch tensors must land in writable arenas with non-zero peak usage.【F:helia_aot/memory/greedy_planner.py†L1-L143】
Constraint validation - Check invalid constraints on type/size raise ValueError, and tight limits cause insufficient-memory failures. Custom constraints also verify arena resizing behavior.
Tensor constraints - Apply attribute overrides (e.g., force PSRAM) to specific tensor IDs and assert allocations respect those directives.
Gap reuse & preferred order - Use multi-layer graphs to ensure scratch buffers reuse freed offsets and fall back through each platform’s preferred memory order when earlier arenas are constrained.
Handler plan hooks - Monkeypatch handler plan() methods to verify they are invoked after planning so downstream emit stages receive the prepared context.【F:helia_aot/aot/handlers/operator_handler.py†L57-L60】

5. Emit Stage — Artifact generation

Key behavior: Call emit(save_path) on all handlers to create headers, sources, docs, and optional tests within the staging directory.【F:helia_aot/converter.py†L141-L147】

Fixtures

Use a temporary workspace and configure module type variations (neuralspot, zephyr, cmake) to cover all code paths.
Provide deterministic CodeGenContext values (operators list, memory plan) so templates render reproducibly.

Tests

ModuleHandler outputs - Assert the expected file set exists per module type and check key template substitutions (e.g., prefix, CMSIS version) in generated files.【F:helia_aot/aot/handlers/module_handler.py†L14-L115】
Operator/Tensor/Model handlers - Confirm the expected headers/sources exist with the configured prefix (e.g., {prefix}_model.c, {prefix}_tensors.h).【F:helia_aot/aot/handlers/tensor_handler.py†L1-L66】【F:helia_aot/aot/handlers/model_handler.py†L1-L59】
DocHandler/TestHandler - With documentation HTML disabled, ensure license/README files are generated and MkDocs artifacts are absent; when tests are enabled, assert test-case sources are emitted (mocking the interpreter as needed).【F:helia_aot/aot/handlers/doc_handler.py†L1-L88】【F:helia_aot/aot/handlers/test_handler.py†L47-L82】
Idempotent emit - Re-run emit on the same context and ensure files are overwritten deterministically without residual artifacts.
Complex graphs - Emit from a multi-layer model to guarantee operator-specific artifacts (e.g., generated conv kernels) materialize under src/ and operator manifests update accordingly.

6. Export Stage — Packaging the workspace

Key behavior: Move or archive the staged module to the configured output path, handling zip archives and directory exports with overwrite protection.【F:helia_aot/converter.py†L149-L188】

Fixtures

Temporary output directories with and without pre-existing content.
Toggle config.force to exercise both overwrite rejection and forced clobbering paths.

Tests

Zip export - Configure module.path with .zip, run conversion, and assert the resulting archive contains the staged structure (inspect via zipfile). Re-run without force to ensure the expected FileExistsError is raised when the archive already exists.
Directory export & force - Point module.path to a directory, verify the workspace is copied into <path>/<module.name>, and confirm re-running with force=True removes stale contents before re-exporting.

Running the tests locally

All functional tests live under tests/unit. From the project root you can run:

uv sync --group ci
uv run pytest tests/unit

Or target specific stages:

uv sync --group ci
uv run pytest tests/unit/converters        # Stage 1 load
uv run pytest tests/unit/transforms        # Stage 2 transform
uv run pytest tests/unit/aot/test_resolve_stage.py
uv run pytest tests/unit/aot/test_plan_stage.py
uv run pytest tests/unit/aot/test_emit_stage.py
uv run pytest tests/unit/aot/test_export_stage.py
uv run pytest tests/unit/platforms/test_platform_resolution.py

These commands assume you are using pytest (e.g., via uv run pytest …).

Functional Testing Plan for Helia AOT Conversion

1. Load Stage — convert_backend_model

Fixtures

Tests

2. Transform Stage — TransformPipeline.apply

Fixtures

Tests

3. Resolve Stage — Handler resolution

Fixtures

Tests

4. Plan Stage — Memory planner & handler hooks

Fixtures

Tests

5. Emit Stage — Artifact generation

Fixtures

Tests

6. Export Stage — Packaging the workspace

Fixtures

Tests

Running the tests locally

1. Load Stage — `convert_backend_model`

2. Transform Stage — `TransformPipeline.apply`