Chapter 9 — Experiment Index
Date: 2026-04-14
Purpose: Mine 9 classic CS/math books for new AutoEML kernel optimization ideas.
Context: After 15 experiments (7 kept, 8 reverted), the kernel is at 3,917 μs / 803,712 transcendentals.
We need fundamentally new strategies, not incremental tuning.
Books Surveyed
| # |
Book |
Author(s) |
Key Chapters Studied |
| a |
Concrete Mathematics |
Graham, Knuth, Patashnik |
Ch. 2 (Summation), Ch. 7 (Generating Functions), Ch. 9 (Asymptotics) |
| b |
The Art of Computer Programming |
Knuth |
Vol. 2 Ch. 4 (Arithmetic), Vol. 4A–4B (Combinatorial), Fascicles 5–7 (Backtracking, SAT, Constraint Satisfaction) |
| c |
Elements of Programming |
Stepanov, McJones |
Foundations, Associative Operations, Semigroups, Orbits |
| d |
A Programming Language |
Iverson |
Array operators, Inner/Outer product, Reduction operators |
| e |
Thinking Forth |
Brodie |
Factoring, stack discipline, composition-as-optimization |
| f |
Compilers: Principles, Techniques, and Tools (Dragon Book) |
Aho, Lam, Sethi, Ullman |
Code optimization, peephole optimization, data flow analysis, register allocation |
| g |
Elements of Automata Theory |
Sakarovitch |
Weighted automata, transducers, semirings |
| h |
Types and Programming Languages (TAPL) |
Pierce |
Type inference, System F, subtyping, polymorphism |
| i |
Constraint Processing |
Dechter |
Arc consistency (AC-3), constraint propagation, backtracking, CSP formulation |

Experiments
| # |
Experiment |
Theme |
| 16 |
Log-Sum-Exp Peephole Rewrite |
AutoEML kernel idea |
| 17 |
Fused APL-Style Inner Product |
AutoEML kernel idea |
| 18 |
Constraint Propagation for Realness |
AutoEML kernel idea |
| 19 |
Balanced Tree Reduction (Semigroup Accumulator) |
AutoEML kernel idea |
| 20 |
Weighted-Automaton Layer Partition Search |
Phi / speculative decode |
| 21 |
APL-Style Token/Stream Batching |
Phi / speculative decode |
| 22 |
Hierarchical LM-Head Reduction |
Phi / speculative decode |
| 23 |
Prompt-Lookup / N-Gram Speculation |
Phi / speculative decode |
| 24 |
Structured CoT as a Grammar-Constrained Sampler |
Phi / speculative decode |
| 25 |
Prompt-Lookup Force Mode as a Head-Skip Ceiling |
Phi / speculative decode |
| 26 |
Multi-Token Verifier Feasibility |
Phi / speculative decode |
| 27 |
MicroGPT on ANE — Minimum Size Constraint Discovery |
ANE size floor |
| 28 |
HyMT 1.8B RangeDim T=1..4 + N-Gram Speculative Decode |
HyMT RangeDim |
| 29 |
ZAYA1-8B MoE Feasibility Probe on ANE |
ZAYA MoE |
| 30 |
ZAYA1-8B Stateful Attn Shards + KV Cache on ANE |
ZAYA MoE |
| 31 |
ZAYA1-8B CCA (conv_qk) gates wired into 40 stateful attn shards (2025-07-14) |
ZAYA MoE |
| 32 |
ZAYA1-8B Speculative Decode (T=4 Verifier + n-gram) [IMPLEMENTED; BOTTLENECKED] |
ZAYA MoE |
| 33 |
Phi-4-mini ARC-Challenge Eval (5-shot, raw completion) [COMPLETE] |
Phi evaluation |
| 34 |
ZAYA1-8B MoE RangeDim Rebuild (T=1..4 speculative MoE) [COMPLETE] |
ZAYA MoE |
| 35 |
ZAYA1-8B MoE INT4pal (per-grouped-channel palettization, group_size=32) [COMPLETE] |
ZAYA MoE |
| 36 |
Gemma 4-26B-A4B INT8 Per-Channel Rebuild — T4.3 Quality Fix |
Gemma quality gate |
Experiment Execution Order
| Order |
Exp |
Rationale |
| 1 |
17 (Fused APL dot) |
Highest potential: K→1 ln reduction. Subsumes Exp 16. |
| 2 |
16 (LSE rewrite) |
If Exp 17 isn’t viable as full fusion, LSE is the fallback. |
| 3 |
18 (Constraint propagation) |
Generalize real-bypass. Independent of 16/17. |
| 4 |
19 (Tree reduction) |
ILP improvement. Can stack on top of 16/17. |
References
These are the sources behind the first experiment set:
- Aho, Lam, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools,
section 8.7, for the peephole-optimization framing of the log-sum-exp rewrite.
- Iverson, K. E., A Programming Language (1962), for the inner product operator
+.× as a first-class fused array operation.
- Dechter, R., Constraint Processing (2003), chapter 3, and Mackworth, A. K.,
“Consistency in Networks of Relations” (1977), for arc consistency and AC-3.
- Stepanov, A. and McJones, P., Elements of Programming (2009), chapters 4 and
5, and Blelloch, G., “Prefix Sums and Their Applications” (1990), for
associative reductions and balanced tree evaluation.
- Odrzywołek, A., “All elementary functions from a single binary operator”
(2026), arXiv:2603.21852, for the EML operator itself.