Chapter 9 — Experiment Index

Date: 2026-04-14
Purpose: Mine 9 classic CS/math books for new AutoEML kernel optimization ideas.
Context: After 15 experiments (7 kept, 8 reverted), the kernel is at 3,917 μs / 803,712 transcendentals.
We need fundamentally new strategies, not incremental tuning.


Books Surveyed

# Book Author(s) Key Chapters Studied
a Concrete Mathematics Graham, Knuth, Patashnik Ch. 2 (Summation), Ch. 7 (Generating Functions), Ch. 9 (Asymptotics)
b The Art of Computer Programming Knuth Vol. 2 Ch. 4 (Arithmetic), Vol. 4A–4B (Combinatorial), Fascicles 5–7 (Backtracking, SAT, Constraint Satisfaction)
c Elements of Programming Stepanov, McJones Foundations, Associative Operations, Semigroups, Orbits
d A Programming Language Iverson Array operators, Inner/Outer product, Reduction operators
e Thinking Forth Brodie Factoring, stack discipline, composition-as-optimization
f Compilers: Principles, Techniques, and Tools (Dragon Book) Aho, Lam, Sethi, Ullman Code optimization, peephole optimization, data flow analysis, register allocation
g Elements of Automata Theory Sakarovitch Weighted automata, transducers, semirings
h Types and Programming Languages (TAPL) Pierce Type inference, System F, subtyping, polymorphism
i Constraint Processing Dechter Arc consistency (AC-3), constraint propagation, backtracking, CSP formulation

Experiment log timeline

Experiments

# Experiment Theme
16 Log-Sum-Exp Peephole Rewrite AutoEML kernel idea
17 Fused APL-Style Inner Product AutoEML kernel idea
18 Constraint Propagation for Realness AutoEML kernel idea
19 Balanced Tree Reduction (Semigroup Accumulator) AutoEML kernel idea
20 Weighted-Automaton Layer Partition Search Phi / speculative decode
21 APL-Style Token/Stream Batching Phi / speculative decode
22 Hierarchical LM-Head Reduction Phi / speculative decode
23 Prompt-Lookup / N-Gram Speculation Phi / speculative decode
24 Structured CoT as a Grammar-Constrained Sampler Phi / speculative decode
25 Prompt-Lookup Force Mode as a Head-Skip Ceiling Phi / speculative decode
26 Multi-Token Verifier Feasibility Phi / speculative decode
27 MicroGPT on ANE — Minimum Size Constraint Discovery ANE size floor
28 HyMT 1.8B RangeDim T=1..4 + N-Gram Speculative Decode HyMT RangeDim
29 ZAYA1-8B MoE Feasibility Probe on ANE ZAYA MoE
30 ZAYA1-8B Stateful Attn Shards + KV Cache on ANE ZAYA MoE
31 ZAYA1-8B CCA (conv_qk) gates wired into 40 stateful attn shards (2025-07-14) ZAYA MoE
32 ZAYA1-8B Speculative Decode (T=4 Verifier + n-gram) [IMPLEMENTED; BOTTLENECKED] ZAYA MoE
33 Phi-4-mini ARC-Challenge Eval (5-shot, raw completion) [COMPLETE] Phi evaluation
34 ZAYA1-8B MoE RangeDim Rebuild (T=1..4 speculative MoE) [COMPLETE] ZAYA MoE
35 ZAYA1-8B MoE INT4pal (per-grouped-channel palettization, group_size=32) [COMPLETE] ZAYA MoE
36 Gemma 4-26B-A4B INT8 Per-Channel Rebuild — T4.3 Quality Fix Gemma quality gate

Experiment Execution Order

Order Exp Rationale
1 17 (Fused APL dot) Highest potential: K→1 ln reduction. Subsumes Exp 16.
2 16 (LSE rewrite) If Exp 17 isn’t viable as full fusion, LSE is the fallback.
3 18 (Constraint propagation) Generalize real-bypass. Independent of 16/17.
4 19 (Tree reduction) ILP improvement. Can stack on top of 16/17.

References

These are the sources behind the first experiment set: