
Two experiments squeezing more out of poolside/Laguna-XS.2 (33B total / 3B active, 256-expert MoE): one at inference time, one by shrinking the weights.
We turn Laguna into a looped transformer at inference time, re-applying a mid-stack block of layers with a damped Runge–Kutta update, following Chen et al. 2026, Training-Free Looped Transformers. No training, no new weights, no architecture change.
Headline: it transfers to a large fine-grained MoE and gives a small but consistent gain: positive on 5 of 6 knowledge benchmarks, significant on ARC (p=0.005) and MMLU (p=0.038). But the intuitive levers to improve it (loop deeper, loop the global-attention layers) mostly don't pan out. It's a local refiner, not a reasoning amplifier.
📄 Full report → REPORT_LOOPED.md: method, K-sweep, the refuted global-vs-sliding hypothesis, mechanism probes, and all numbers.
Tying MoE expert banks across layers to shrink the model, then uptraining with an LM loss + KD distillation from the full model (Bae et al., 2025). Inference compute is unchanged, cost paid once in training. The reference method was only tried on dense models; this applies it to a large MoE.
Headline: distillation recovers large tying perturbations: at 4.5–9.1% fewer stored params, held-out Python perplexity lands within ~1 point of a matched reference, and that gap stays roughly constant as compression grows.
📄 Full report → REPORT_RRT.md: method, setup, per-config gaps, and limitations.
looped_laguna/: the reversible loop wrapper + eval. rrt_laguna/: layer-tying for the recursive variant. scripts/: eval/ablation drivers. tests/: CPU suite.laguna_src/: the Apache-2.0 architecture source of Laguna-XS.2 (modeling code + config + tokenizer, no weights).uv sync
uv run python scripts/fetch_laguna_src.py # pull model source (no weights)
uv run pytest # CPU suite (skips network/gpu)
Full method, GPU run instructions, and results are in the reports above.