2026-04-27 - Phi-4-mini LM Head Shard Builder Intent

Intent: After all 32 Phi-4-mini layer shards completed and the user said “keep going from the top,” proceed to the next ANE-only compute-heavy component: final RMSNorm plus LM head projection, following validation-before-scale discipline and the project ANE-only mandate.

Setup: Planned implementation: build a Phi-4-mini LM head shard builder that reads token_embd.weight as the tied LM head and output_norm.weight from GGUF, splits vocab=200064 into 4 INT8 CoreML shards, and emits one RMSNorm+Conv2d shard per vocab slice for Xcode Python/CoreML compilation.

Result: Intent recorded before implementation; no LM-head artifacts, placement numbers, latency, energy, cosine, or perplexity results yet.

Surprise / hurdle: The host-side LM head remains compute-heavy and must not be optimized as a CPU/GPU shortcut; shard 0 must prove ANE residency before scaling to the other vocab shards.

Lesson: Once transformer layers are ANE-resident, the final projection becomes the next mandatory ANE shard rather than an optional runtime optimization.

Next: Implement the builder, compile and validate shard 0 residency first, then build and validate shards 1–3 only if shard 0 passes; do not run perf/energy benchmarking and do not clean up or delete artifacts.

Refs: research/ANE_CHAIN_SCHEMA.md