2026-04-28 - Phi-4-mini Next Public Optimization Direction Intent
Intent: After establishing the public Phi-4-mini baseline topology 20+4+6+2 and rejecting 20+5+5+2 as slower, start the next book-shaped ANE optimization direction. The two likely probes are Iverson/APL-style fatter token shapes (T>1 layer-shard inputs, treating more token work as one array operation) and Stepanov-style hierarchical LM-head reduction (using associative reduction structure to reduce projection/result handling depth).
Setup: Planning note only. Existing public CoreML Phi-4-mini topology comparison is the starting point; proposed probes must use CoreML .mlpackage artifacts targeting ANE for compute-heavy work. Host work remains limited to permitted bookkeeping/sampling/string/file tasks; no CPU/GPU matmul, projection, norm, attention, FFN, or LM-head compute shortcut is acceptable.
Result: Intent recorded before implementation. No new artifacts, placement numbers, latency, energy, cosine, perplexity, or topology result yet.
Surprise / hurdle: The public topology search is in a diminishing-returns region where nearby shard shapes can become slower, so the next optimization should change the problem shape rather than only nudge layer group sizes.
Lesson: When fused-layer topology gains plateau, move to array-shape and reduction-structure probes, but keep every heavy compute path ANE-resident and gated before scale-out.
Next: Design the smallest representative gate for either T>1 layer-shard inputs or hierarchical LM-head reduction; run strict ANE residency and golden quality before any broader build, runtime migration, performance claim, energy benchmark, cleanup, or deletion.
Refs: research/ANE_CHAIN_SCHEMA.md