Chapter 9 — Experiment Index

Date: 2026-04-14
Purpose: Mine 9 classic CS/math books for new AutoEML kernel optimization ideas.
Context: After 15 experiments (7 kept, 8 reverted), the kernel is at 3,917 μs / 803,712 transcendentals.
We need fundamentally new strategies, not incremental tuning.

Books Surveyed

#	Book	Author(s)	Key Chapters Studied
a	Concrete Mathematics	Graham, Knuth, Patashnik	Ch. 2 (Summation), Ch. 7 (Generating Functions), Ch. 9 (Asymptotics)
b	The Art of Computer Programming	Knuth	Vol. 2 Ch. 4 (Arithmetic), Vol. 4A–4B (Combinatorial), Fascicles 5–7 (Backtracking, SAT, Constraint Satisfaction)
c	Elements of Programming	Stepanov, McJones	Foundations, Associative Operations, Semigroups, Orbits
d	A Programming Language	Iverson	Array operators, Inner/Outer product, Reduction operators
e	Thinking Forth	Brodie	Factoring, stack discipline, composition-as-optimization
f	Compilers: Principles, Techniques, and Tools (Dragon Book)	Aho, Lam, Sethi, Ullman	Code optimization, peephole optimization, data flow analysis, register allocation
g	Elements of Automata Theory	Sakarovitch	Weighted automata, transducers, semirings
h	Types and Programming Languages (TAPL)	Pierce	Type inference, System F, subtyping, polymorphism
i	Constraint Processing	Dechter	Arc consistency (AC-3), constraint propagation, backtracking, CSP formulation

Experiment log timeline

Experiments

#	Experiment	Theme
16	Log-Sum-Exp Peephole Rewrite	AutoEML kernel idea
17	Fused APL-Style Inner Product	AutoEML kernel idea
18	Constraint Propagation for Realness	AutoEML kernel idea
19	Balanced Tree Reduction (Semigroup Accumulator)	AutoEML kernel idea
20	Weighted-Automaton Layer Partition Search	Phi / speculative decode
21	APL-Style Token/Stream Batching	Phi / speculative decode
22	Hierarchical LM-Head Reduction	Phi / speculative decode
23	Prompt-Lookup / N-Gram Speculation	Phi / speculative decode
24	Structured CoT as a Grammar-Constrained Sampler	Phi / speculative decode
25	Prompt-Lookup Force Mode as a Head-Skip Ceiling	Phi / speculative decode
26	Multi-Token Verifier Feasibility	Phi / speculative decode
27	MicroGPT on ANE — Minimum Size Constraint Discovery	ANE size floor
28	HyMT 1.8B RangeDim T=1..4 + N-Gram Speculative Decode	HyMT RangeDim
29	ZAYA1-8B MoE Feasibility Probe on ANE	ZAYA MoE
30	ZAYA1-8B Stateful Attn Shards + KV Cache on ANE	ZAYA MoE
31	ZAYA1-8B CCA (conv_qk) gates wired into 40 stateful attn shards (2025-07-14)	ZAYA MoE
32	ZAYA1-8B Speculative Decode (T=4 Verifier + n-gram) [IMPLEMENTED; BOTTLENECKED]	ZAYA MoE
33	Phi-4-mini ARC-Challenge Eval (5-shot, raw completion) [COMPLETE]	Phi evaluation
34	ZAYA1-8B MoE RangeDim Rebuild (T=1..4 speculative MoE) [COMPLETE]	ZAYA MoE
35	ZAYA1-8B MoE INT4pal (per-grouped-channel palettization, group_size=32) [COMPLETE]	ZAYA MoE
36	Gemma 4-26B-A4B INT8 Per-Channel Rebuild — T4.3 Quality Fix	Gemma quality gate

Experiment Execution Order

Order	Exp	Rationale
1	17 (Fused APL dot)	Highest potential: K→1 ln reduction. Subsumes Exp 16.
2	16 (LSE rewrite)	If Exp 17 isn’t viable as full fusion, LSE is the fallback.
3	18 (Constraint propagation)	Generalize real-bypass. Independent of 16/17.
4	19 (Tree reduction)	ILP improvement. Can stack on top of 16/17.

References

These are the sources behind the first experiment set:

Aho, Lam, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools, section 8.7, for the peephole-optimization framing of the log-sum-exp rewrite.
Iverson, K. E., A Programming Language (1962), for the inner product operator +.× as a first-class fused array operation.
Dechter, R., Constraint Processing (2003), chapter 3, and Mackworth, A. K., “Consistency in Networks of Relations” (1977), for arc consistency and AC-3.
Stepanov, A. and McJones, P., Elements of Programming (2009), chapters 4 and 5, and Blelloch, G., “Prefix Sums and Their Applications” (1990), for associative reductions and balanced tree evaluation.
Odrzywołek, A., “All elementary functions from a single binary operator” (2026), arXiv:2603.21852, for the EML operator itself.