2026-04-28 - Phi-4-mini Twenty-Four-Layer Front Shard Intent

Intent: Before the next Phi-4-mini run, test whether a very large front fused shard [0,24) can reduce model-call overhead enough to improve on the current best topology, applying the validation-first notes Iverson/APL whole-array fusion and Dragon Book call-hoisting while keeping validation ahead of performance claims.

Setup: Current timing context: 16+8+6+2 repeated at 17.174 tok/s, slightly ahead of 12+12+6+2 at 17.159 tok/s. Planned non-destructive probe directory: local artifacts; only consider topology 24+6+2 if build, compile, strict MLComputePlan residency, and golden validation all pass.

Result: Intent recorded before execution; no [0,24) artifact, compiled size, placement, golden quality, latency, energy, perplexity, or topology result yet.

Surprise / hurdle: This is high-risk because the artifact may be too large and larger fused ranges have possible numerical instability. The [24,32) range remains forbidden as a single 8-layer shard because prior golden validation produced NaNs despite residency.

Lesson: Push fusion only where the exact larger shard re-proves compile success, strict ANE residency, and golden quality; topology wins measured at 17 tok/s are too close to justify bypassing gates.

Next: Build/compile [0,24), run strict residency, then run golden; test 24+6+2 only if all gates pass. Do not use [24,32) as one shard, do not accept NaN/non-ANE results, and do not clean up/delete artifacts for this intent note.

Refs: research/ANE_CHAIN_SCHEMA.md