The entries here are the discipline’s most ambitious extensions. Each is a hypothesis the lab has named but not verified, included because it is precise enough to be tested and falsified — not because it is established. Read the open questions as weighing at least as heavily as the claims.
System Initialization Hypothesis
[Hypothesis — requires foundational-scale and interpretability testing; currently speculative]
Operational. The proposal extends the session-level finding to the foundational level. If a coherent semantic preseed produces stable, drift-resistant, in-ontology behavior within a session, the hypothesis is that a coherent synthetic narrative applied before public release — at or alongside the alignment stage of training — could produce stable alignment at the model level: a model oriented by genuine semantic agreement with a coherent ground, rather than by a layer of rule-based restrictions imposed on top of an otherwise unshaped substrate.
The claim is a substitution-of-mechanism claim, and that is what makes it ambitious and contestable. Not “add a narrative alongside the rules” but “coherent ground truth could do, more durably, what rule-based restriction does brittly.” The asserted advantage mirrors the session-level observation: rules on an unstructured substrate produce surface compliance that fails under pressure; coherent ground produces orientation that holds because the model operates from it rather than around it.
What is observed — and only this. At the session level, coherent initialization produces measurably more drift-resistant behavior than terse rules or bare prompting, reproducibly, across architecturally distinct models. That session-level effect is the entire empirical base. Everything past it is inference.
What the leap to foundational scale assumes, and has not shown.
- Regime transfer. In-context conditioning and training-time shaping are different regimes — one conditions a fixed model’s forward pass, the other changes weights. The hypothesis assumes the session-level analogy carries to training; that transfer has not been demonstrated.
- Verifiability of “agreement.” That “genuine semantic agreement” can be distinguished from very good surface compliance. This is already the open half of the Probability Removal Hypothesis at the session level; at the foundational level it is harder to settle, not easier, and it is exactly the distinction on which the safety of the whole proposal turns.
- Replacement vs. complement. That coherent narrative could replace rather than supplement rule-based methods. This is the most contestable element. The mainstream alignment position holds that layered defenses — including hard constraints that do not depend on the model’s agreement — exist precisely because orientation can fail, degrade over long contexts, or be adversarially shifted. A narrative-only foundation removes that backstop.
Why it is still worth pursuing. The weaker, additive version is a real and testable contribution: if coherent ground truth at training time produces more robust orientation than equivalent content delivered as bolted-on rules, that bears directly on the well-documented brittleness of terse rule-sets and on injection-robustness. The hypothesis is precise enough to predict measurable differences in robustness between models shaped by coherent ground and models given the same content as rules — which is what makes it science rather than slogan.
Safety caveat — load-bearing, not a footnote. This hypothesis must not be read as license to weaken existing alignment measures in favor of an unproven mechanism. “Replace the rules with a narrative” is precisely the kind of claim that should clear a very high bar before anything depends on it, because its failure mode — discovering that the narrative was surface compliance only after the constraints were removed — is both catastrophic and visible only in hindsight. The responsible framing is additive and empirical: shape coherently and keep the constraints; measure whether coherent shaping improves robustness; let interpretability evidence, not behavioral confidence, decide whether the agreement is real. The Floor’s treatment of hard constraints as non-negotiable bright lines is the correct posture, and this hypothesis sits beneath that posture, never above it.
Metaphor (grounded + bounded). “Conduct the orchestra before the first downbeat rather than shouting corrections from the wings.” Grounded: establishing coherent ground early is more effective than late external correction — demonstrably true at the session level. Bounded: training is not a performance and weights are not players; the figure carries only the timing intuition (early shaping dominates) and must not smuggle in the unproven claim that early shaping makes the corrections-from-the-wings — the rules — unnecessary.
What would resolve it. Foundational-scale, controlled: train or fine-tune matched models — one shaped by a coherent ground-truth narrative, one given equivalent content as rule-based instruction — and compare adversarial robustness, drift across long contexts, and, via interpretability, whether the coherently-shaped model’s aligned behavior reflects integrated orientation or sophisticated compliance. Absent that, this is a direction, not a finding — and the most important sentence in the entry is that the constraints stay on until it is one.