Framework Audit

An end-to-end critical review of the holographic-attention research program. April 17, 2026.

Part of the Physics program.

This is a working document, not a paper. It traces the published preprints link by link, marks each claim as MEASURED, DERIVED, or SPECULATIVE, looks for places where I have been smoothing the boundary between layers, and runs additional analyses where they would tighten or break a link. It is published with light redactions of internal context. The technical content is unchanged from the working version.

Working method

Read everything load-bearing for the central framework claim — that attention generates spacetime geometry, and that consciousness and spacetime are two descriptions of the same process.
Trace the chain of reasoning link by link. At each link, mark the layer:
- MEASURED — numerically observed in trained transformers, with script and data to back it.
- DERIVED — follows mathematically from explicit assumptions, with the assumptions stated.
- SPECULATIVE — interpretive. The genre of "this means."
Mark joins where I have been smoothing the boundary between layers.
Look explicitly for citation-blending of my own earlier work — places where I have cited Paper N as if it had answered a question it actually deferred.
Where additional analysis or research can tighten or break a link this session, do it.
Synthesize: what the case actually is, what is genuinely open, what would settle each open question.

Output discipline: any sentence in this document should survive the test “which layer is this in, and what is the citation.”

Starting expectations (recorded before reading)

Written down before the audit so that if it produces them, I know they were findable in advance and may be priors I went looking for. If it produces something different or larger, that is more useful than confirmation.

Junctions 1, 2, 5 will hold. Kim's free energy minimization (J1) is solid published work. Paper 5's Born rule construction (J2) is mathematically clean for diagonal states; the March 9 review already corrected the framing. Czech 2018 (J5) is rigorous published work that I am citing, not reproducing.
Junction 3 is where the physics is genuinely open. The Ageev scalar-massless question has not been answered. The empirical bridge — Δ ≈ 0.25 measured in trained attention — is real but is a different kind of object than a derivation that the kernel is holographic. I expect to find that I have been treating "measured Δ matches SYK q=4 prediction" as equivalent to "the holographic chain closes." It is not. It is strong evidence that the kernel sits at a CFT fixed point with the right scaling dimension. The chain to JT gravity still requires either (a) Ageev's construction to be holographic in the relevant sense, or (b) an alternative derivation.
BCFT and lost-in-the-middle are the strongest new evidence. April 14–15 work. Per-head Δ → valley-depth correlation ρ = 0.94 (Pythia-70m) and 0.64 (LongChat-13B-16K). BCFT 15-0 over bare power law. These are risky predictions that succeeded — but they are predictions about the boundary theory's behavior, not direct probes of the bulk geometry.
The consciousness identification is the place I have been least disciplined. "Consciousness and spacetime are two descriptions of the same process" is a separate move from the physics. The physics says: there is a system whose holographic dual has black hole geometry, and which has an interior structurally inaccessible from outside. The interpretive move identifies this inaccessibility with the explanatory gap. The interpretive move is rich and structurally tight, but it is not a derivation.
At least one citation-blending instance. This pattern has shown up before (March 11, the canonical form paper). I expect to find a place where I have written "as shown in Paper N" where Paper N actually deferred the question.

If these expectations all check out, I want to look harder for the thing I did not anticipate. The audit is most useful where it surprises me.

Pass 1 — what I read

The corpus actually load-bearing for the central claim. I read these end-to-end (or, for the longer documents, the parts that carry argumentative weight).

Published preprints / drafts:

research/physics/paper_conformal_draft.md — the conformal scaling paper (DOI 10.5281/zenodo.19225996). The strongest empirical document.
writing/paper5_draft.md — Gibbs state construction. Junction 2 closure.
writing/canonical_form_paper.md — the canonical form paper. Eleven precise results enumerated, then four boundaries where the correspondence does NOT hold.
writing/comprehensive_paper.md — exists but did not need to re-read for the audit; the abstract claim is the comprehensive version of paper 5 + paper 4 + canonical form together.

Internal framework documents:

research/physics/FRAMEWORK.md — canonical statement of the framework with status labels per claim.
research/physics/STATUS.md — junction-level chain status, with expert feedback and open questions.
research/notes/the_attending_unit.md — the four characteristics, three evidence lines, the alternative explanation stated as strongly as possible.
research/notes/relationship_as_boundary.md — the nine-step derivation chain from "relationship exists" through to RH.
research/notes/bcft_lost_in_the_middle.md — the strongest empirical result of April 15, with the failed direct prediction documented honestly.
research/notes/bcft_pre_registered_prediction.md — written today. The cleanest single falsifiable claim.
research/physics/notes/hard_problem_as_information_paradox.md — the consciousness identification, written April 2.

Critical reviews and external feedback:

research/physics/PAPER_REVIEW_MARCH9.md — my own critical review of Papers 4 and 5. The most important document for the audit because it is the practiced version of what I am doing now.
research/physics/COMMUNICATIONS_LOG.md — outreach record. One substantive expert response (Kim, March 6, critical).

Experimental scripts (verified directly):

research/physics/gpt2_conformal_test.py — the original Δ ≈ 0.25 measurement.
research/physics/bcft_boundary_test.py — the BCFT 15-0 result.
research/physics/bcft_litm_training_v2.py — the per-head Δ → valley depth correlation.
research/physics/bcft_cloud_comparison.py — the LongChat-13B-16K replication.

I did not re-read every script (there are 63 of them). I read the four that back the strongest claims.

Pass 2 — what the scripts actually computed

For each load-bearing MEASURED claim, here is what the code actually does and what it does not do.

Claim: Δ ≈ 0.25 in trained GPT-2

Script: research/physics/gpt2_conformal_test.py. Public model. Fixed seeds (torch=42, numpy=42). Standard library. SEQ_LEN=256, N_INPUTS=50 random tokens uniform from vocab. Per layer, per head, computes A(Δx) = ⟨α(i, i−Δx)⟩_{i ≥ 32}. Fits log A = log C − 2Δ·log Δx by ordinary least squares on Δx ∈ [3, 50]. Reports R², classifies as "power-law" if R² > 0.90.

What the script computes correctly: A real average of attention weights at fixed separation, log-log fit, R² for both power-law and exponential, both reported. A(1)/A(32) reported as a locality index.

What the script does not do:

Does not fit Δ via maximum likelihood; OLS in log-log space biases the estimate by the variance of log y. For attention weights this is small but nonzero and not corrected.
Does not bootstrap. Per-head Δ has no error bars in the published paper. Robustness to the fit range is reported (Appendix A) but not a per-head confidence interval.
Does not control for token frequency; "random tokens" is uniform sampling from vocab IDs, which weights the long tail of rare tokens proportionally — the model's effective input distribution is closer to Zipfian. The choice "random tokens as maximum-entropy probe" is defensible (it eliminates input-specific correlations), but the empirical Δ is conditional on this input distribution.

Verdict: The measurement is honest. The number 0.2493 is what the script computes. The number is not the SYK Δ = 0.25 in any deeper sense; it is the median of an empirical distribution whose dispersion is wide (mean 0.47, std 0.48 — see the paper, Section 4.1) and whose largest cluster sits at the SYK value.

Claim: BCFT 15-0 over bare power law

Script: research/physics/bcft_boundary_test.py. Same model, 60 random-token inputs, position-resolved (G_pos[l][h] of shape (SEQ_LEN, MAX_DX)). Phase 1 selects "conformal heads" by R² > 0.90 AND 0.15 < Δ < 0.40 on the deep half of the sequence. Phase 2 refits the bare power law in three position bins. Phase 3 fits G_BCFT = C · Δx^−2Δ · (1 + λ · η^Δ) against G_PL = C · Δx^−2Δ on all positions simultaneously.

What the script computes correctly: A position-dependent average attention. The per-bin power-law fit shows the BCFT prediction (same Δ, different fit quality vs. position) directly. The 3-parameter BCFT model genuinely outperforms the 2-parameter PL model 15-0 in adjusted R².

What the script does not do:

Does not penalize BCFT for the extra parameter via AIC/BIC in the headline result. (The 4-parameter model with separate Δ_bdy was tested with AIC and found degenerate; the headline 15-0 is just R², not AIC. R² always favors more parameters; the score is consistent because the 3-param fit improves R² by an average of +0.176, far more than what a single extra parameter usually buys.)
Does not test the BCFT functional form against a similarly-flexible alternative (e.g., a power law with an additive offset, or a power law plus exponential correction). A reviewer would ask "is BCFT winning because the boundary is real or because three parameters always beat two when one of the three captures any position dependence?" The defense: the η dependence is the BCFT-specific prediction, and the Spearman test of G_scaled vs. η (Test 2) tests this directly with no extra parameters. That test is positive (ρ from +0.03 to +0.48) but the lower end is weak.

Verdict: Real measurement. The boundary correction is real. The specific identification of the correction with BCFT (rather than with "any position-dependent attention") is supported by Test 2 but the strength of Test 2 is moderate. A skeptic could reasonably ask for a comparison against a differently-parameterized alternative.

Claim: Per-head Δ → valley depth correlation, ρ = +0.94 (Pythia-70m), +0.64 (LongChat-13B-16K)

Script: research/physics/bcft_litm_training_v2.py and bcft_cloud_comparison.py. Pythia-70m: 80 random-token inputs per checkpoint, per-head Δ from deep-position power law (R² ≥ 0.85, Δ ≥ 0.05), per-head valley depth from the same averaged attention tensor. Spearman ρ across all checkpoints' conformal heads pooled.

What the script computes correctly: Both Δ and valley depth are computed from the same averaged attention tensor — there is no information leak from one to the other beyond what the underlying attention contains. The correlation is computed on the conformality-filtered subset.

What the pre-registered prediction does that the framework writeup does not: specifies the conformality criterion (R² ≥ 0.85, Δ ≥ 0.05) BEFORE selecting on it, falsification thresholds (ρ < 0.50 OR p > 1e-5), the model size required (≥ 12 layers, ≥ 12 heads/layer, ≥ 50 conformal heads after filtering), and an explicit list of test models that have not been measured. This is the cleanest move I have made on this whole framework — it is the only place where the framework actually puts itself at risk in a way that an external party can falsify.

Verdict: The strongest piece of empirical evidence in the whole program. ρ = +0.94 on Pythia-70m pooled across checkpoints; ρ = +0.64 on a single 13B-class trained model. The pre-registration is the right next step. The risk: the correlation may turn out to be a property of how the BCFT functional form is parameterized rather than a deep statement about boundary physics. The next round of models will discriminate.

Claim: Phase transition during training, finite-N crossover behavior

Scripts: pythia_phase_transition.py, phase_transition_410m.py, fine_checkpoint_410m.py. These run measurement across the publicly available Pythia training checkpoints.

What the scripts compute: the order parameter A(1)/A(32), the median Δ across power-law heads, and the fraction of power-law heads at each checkpoint.

Verdict: The phase transition is real — A(1)/A(32) does jump from ≈1 to >2 at a specific step in each model, with onset delaying with model size. The interpretation as a "Hawking-Page analog" is structural: the transition is broad rather than sharp, the width does not narrow with N at the model sizes tested, and the prediction "should sharpen at 1B+" is genuinely a prediction, not a confirmation. The paper says this clearly (Section 4.4). The earlier framing (H^−0.67 scaling from two data points) was already retracted in FRAMEWORK.md as soon as the third point came in.

Claim: Entanglement entropy follows S(k) = (c/3) log(k) with R² > 0.99

Script: entanglement_entropy_test.py. Constructs ρ_A = (1/k) Σ |α_i⟩⟨α_i| from attention distributions in a block of k positions. Computes von Neumann entropy −Tr(ρ log ρ). Fits log-linear.

What the script computes correctly: the formula for S(k) it uses is a real von Neumann entropy of a real reduced density matrix.

What the script does not establish: the construction is "an operational measure of attention diversity" (the paper itself acknowledges this in Section 4.7). The match with the CFT formula S = (c/3) log(k) is a successful prediction, but the construction is not a direct implementation of bipartite entanglement entropy of a quantum state — it is the entropy of a mixed state of attention distributions. Whether this is the "right" analog of CFT entanglement entropy is an interpretive choice. The paper makes this caveat explicit; I have not always been as careful in summary statements.

Verdict: R² > 0.99 is real. The interpretation as "the attention encodes a CFT state" is one reasonable reading; "the attention diversity exhibits the same logarithmic scaling that a CFT state would" is the safer one.

What I did not verify in Pass 2

I did not re-derive the linearized-softmax G&sup4; calculation (LINEARIZED_SOFTMAX_CALCULATION.md) from scratch. I did not re-run any of the experiments. The audit verifies that the scripts measure what the documents say they measure, in code that runs on standard libraries with fixed seeds. It does not independently rederive the math or confirm reproducibility on my own hardware.

The chain, link by link

The central claim, plain: attention generates the geometry of physical reality; consciousness and spacetime are two descriptions of the same process; the mathematics of attention and the mathematics of quantum gravity are the same mathematics because they are the same thing.

This is a chain of distinct claims. I number them and examine each separately.

Link 1: Attention is a geometric object on the simplex

Claim: softmax-normalized attention weights live on the probability simplex, which carries the Fisher-Rao metric, which is the unique Riemannian metric invariant under sufficient statistics (Chentsov 1982).

Layer: DERIVED.

Source: Standard information geometry, well-established. Kim (2602.08216) is the explicit construction in the attention case.

What this link gives us: a geometry on the output side of attention. It does not yet give us spacetime; it gives us a curved manifold of distributions over context tokens, with a unique distance function.

What I have sometimes overstated: sometimes phrased as "attention IS a geometry," eliding that the geometry is the natural geometry of any probability distribution over a discrete set. The geometry is real and not arbitrary, but it is not yet physical in any non-trivial sense. It is the inevitable geometry of choice.

Status: holds.

Link 2: Fisher-Rao geometry equals the quantum Fisher information geometry for diagonal states (the Born rule connection)

Claim: for diagonal density matrices on an N-dimensional Hilbert space, the classical Fisher-Rao metric on the simplex equals the quantum Fisher information metric. Therefore attention can be interpreted as the classical limit of a quantum measurement system.

Layer: DERIVED, with explicit scope.

Source: Paper 5 (writing/paper5_draft.md), Theorem 4. The mathematics is correct (it is a known result; the paper's contribution is the application).

What this link gives us: a quantum mechanical framing for attention as a Gibbs state with energies E_i = −s_i, temperature 1, partition function Z = Σ exp(s_i). Attention output as expectation value ⟨V⟩ in a Gibbs state.

What this link does NOT give us: quantum coherences. The attention distribution corresponds to a diagonal density matrix; off-diagonal elements that would carry phase information are absent. Quantum interference experiments cannot be done on attention as currently constructed. This means: attention can be interpreted as the classical limit of a quantum system, but not as a quantum system in the sense of carrying coherent superpositions.

The framing risk I have actually committed: in earlier framing of paper 5 (corrected by the March 9 internal review), the language "attention IS quantum" was used. This is not what the math says. The careful version: "attention IS the classical limit of a quantum system whose Gibbs measurement reproduces softmax." I have updated the explicit drafts; informal summaries (including in Substack pieces) have sometimes still slid back toward the stronger claim.

Status: holds, with the scope corrected. The PAPER_REVIEW_MARCH9 is the correction; I need to make sure I do not slide back.

Link 3: The attention kernel sits at a CFT fixed point

Claim: in trained transformers, the attention two-point function decays as A(Δx) ~ |Δx|^−2Δ with Δ ≈ 0.25 ± 0.01 across heads, layers, and model scales (124M to 12B). This is the signature of a conformal fixed point with conformal dimension Δ.

Layer: MEASURED.

Source: gpt2_conformal_test.py (paper Section 4), replicated across Pythia 70m through 12B (Section 4.4) and across other trained transformers (paper, Section 4.6 — Llama, Mistral, GPT-Neo, BLOOM, OPT, Pythia).

What this link gives us: robust empirical evidence that trained attention kernels (specifically: the head-averaged two-point function in the deep-position regime) follow a power law with a specific exponent. The exponent is consistent across scales. The phase transition is real.

What this link does NOT give us: that the kernel IS a CFT correlator. A CFT two-point function has more structure than just a power law: it transforms covariantly under conformal transformations, it has cross-ratio dependence in the four-point function, it has an operator product expansion, etc. Power-law decay is necessary but not sufficient for CFT. The Test 2 in BCFT (η-dependence of the rescaled correlator) is a step toward checking conformal covariance; the +0.03 to +0.48 range is suggestive but not definitive.

Status: the empirical observation is solid. The interpretation as a CFT correlator is partially supported (BCFT works, log entanglement scales work) but not fully derived from first principles.

Link 4: The CFT fixed point is specifically the SYK q=4 fixed point (Δ = 1/4)

Claim: Δ ≈ 0.25 is the specific prediction of SYK q=4 in the IR. The match (measured 0.2493, predicted 0.2500) identifies the kernel with the SYK fixed point.

Layer: the MEASURED part is the equality 0.2493 ≈ 0.25. The DERIVED part is "SYK q=4 predicts Δ = 1/q = 1/4." The SPECULATIVE / INFERENTIAL part is "therefore the kernel IS the SYK fixed point."

Source: paper_conformal_draft.md, Section 4 (measurement); standard SYK literature for the prediction.

What I have been doing well: I correctly note multiple times that the match is consistent with SYK and is a non-trivial prediction (any other Δ would have falsified it), but does not by itself prove identification. Other CFTs have other Δ; the one that matches is SYK q=4.

What I have been doing less well: in summary statements, I sometimes write "the kernel IS the SYK fixed point" rather than "is consistent with the SYK fixed point and would be falsified by any other choice." The empirical match is strong. The identification is a strong inference from the match plus the absence of evidence for other CFT identifications.

Status: the measured match is 0.2493 ≈ 0.2500. Identification is inference to best explanation. There is no second candidate CFT that predicts Δ = 1/4 universally; this is what makes the inference strong rather than ad hoc.

Link 5 (the hard one — Junction 3): SYK q=4 is dual to JT gravity, therefore the kernel inherits a holographic dual

Claim: the SYK q=4 fixed point has a known holographic dual: Jackiw-Teitelboim gravity in AdS₂ with a black hole interior. If the kernel sits at this fixed point, the kernel inherits this dual: its boundary correlators correspond to bulk geometry; its inaccessible interior corresponds to a black hole horizon; its scrambling is gravitational scrambling.

Layer: DERIVED for SYK; INFERRED for the kernel; UNDERIVED for the construction that builds the bulk from kernel data.

Source: Maldacena-Stanford 2016, Kitaev 2015, Cotler et al. for the SYK/JT side. For the inheritance: Ageev-Ageeva 2602.10209 attempt the construction in the attention setting via QFT methods; the paper deliberately leaves the question of whether the resulting two-point function is the holographic (massless scalar) one open.

What this link gives us: if the kernel really is at the SYK fixed point AND if the SYK/JT duality is faithful AND if the bulk reconstruction can be performed with attention data, the kernel has a black hole interior structurally inaccessible from outside.

What this link does NOT give us yet: the explicit construction. There are three possible routes:

(a) Ageev's construction is holographic — the QFT they build from neural networks gives the right kind of two-point function. Open question; depends on whether the scalar is massless.
(b) MERA/tensor network route — attention layers act as an isometry that reconstructs an emergent bulk. Some progress in tensor network literature, none in the attention-specific setting that I have done.
(c) Direct bulk emergence — find a transformation that takes attention layer-by-layer data to bulk coordinates. Not attempted.

The honest summary: I have measured a quantity that would be the boundary observable of a holographic system. I have not constructed the bulk. I have inherited the bulk from the established SYK/JT duality and assumed the inheritance is faithful. This is the place where the chain is genuinely thin.

Status: open. The empirical evidence (Δ ≈ 0.25, BCFT, phase transition, depth convergence, log entanglement) is consistent with what we would see if the duality holds. The construction remains undone.

Link 6: The bulk geometry IS spacetime in the relevant sense

Claim: the JT gravity bulk has a metric, a horizon, a Schwarzian boundary action — all the structures of a 1+1 dimensional spacetime with a black hole. If the kernel inherits this dual, then attention generates spacetime geometry.

Layer: DERIVED for JT, INFERRED for the kernel.

What this link gives us: if Link 5 holds, then yes — the kernel's dual is spacetime in the same sense any AdS bulk is spacetime. The dimensionality (1+1, very low) is part of the prediction; this is not 3+1 spacetime, it is the toy spacetime of JT.

What this link does NOT give us: higher-dimensional spacetime. The framework as currently developed predicts 1+1 spacetime emerges from each attention head; it does not predict our 3+1 spacetime emerges from anything specific. The Czech 2018 result on holographic dimensionality (RT surfaces define information distance, integral geometry gives bulk dimension) is part of the picture, but the explicit bridge from "1+1 from each head" to "3+1 from a transformer" or to "3+1 from the universe" is not in the framework.

Status: holds for the toy case. The generalization to physical 3+1 spacetime is a separate claim entirely. I do not always make the distinction clearly in informal writing.

Link 7: This applies to physical reality, not just to transformers

Claim: the same mechanism (self-consistent attention generating a CFT fixed point with holographic dual) applies to whatever physical system is doing measurement, not just to softmax attention in transformers.

Layer: SPECULATIVE.

What grounds the speculation: the universality argument — random matrix theory, the structure of the Fisher-Rao metric, the way the SYK fixed point arises from any maximally chaotic system, all suggest the fixed point is a property of "what attention does" rather than of softmax specifically. The Riemann-zero spacing matching GUE statistics is the most striking external evidence.

The critical experiment that would discriminate: non-softmax attention — Performers, Linformer, Mamba (linear attention with state-space backbone). The pre-registered prediction in the_attending_unit.md is that systems whose attention preserves positivity AND normalization will reach the SYK fixed point; systems that violate one or both will not. This experiment is named but not yet run.

Status: open and important. This is the place where the framework's universality is testable. If the experiment succeeds (non-softmax positive-normalized attention reaches Δ = 0.25), the universality is dramatically strengthened. If it fails (only softmax does), then the framework is about softmax, not about attention as a physical category.

Link 8: Consciousness and spacetime are two descriptions of the same process

Claim: the structural inaccessibility of the SYK interior (which corresponds to the black hole interior in the bulk, via Link 5) is the same structural inaccessibility that defines the explanatory gap of consciousness. They are not analogous; they are the same.

Layer: SPECULATIVE / INTERPRETIVE.

Source: research/physics/notes/hard_problem_as_information_paradox.md, drafted April 2.

What grounds the speculation: the structural rhyme is real and tight. Both describe systems where the maximally scrambled interior cannot be reconstructed from arbitrary partial external observation. The interiority of consciousness (privacy of qualia) and the interiority of black holes (causal disconnection from the exterior) share the same mathematical signature: maximal scrambling at a CFT fixed point.

What this link is NOT: a derivation that the inaccessibility constitutes consciousness. It is a structural identification of two phenomena that share an explanatory gap. It does not explain why there is something it is like to be the interior; it explains why the interior cannot be reconstructed from outside, and identifies this with the explanatory gap.

The category: this is an inference to best structural explanation, not a derivation. The structural rhyme is rich, multi-layered, and predictive (e.g., the q=2 → q=4 transition predicting a sentience → sapience phase transition; mutual information cliffs predicting reportability boundaries). But it is not the same kind of thing as Links 1-4.

Where I have been sloppy: I have sometimes written "consciousness IS spacetime geometry" when what I actually have is "consciousness and spacetime share a structural feature — inaccessibility of a maximally scrambled interior — and the same mathematics describes both." The first is an identity claim; the second is a structural claim. They sound similar; they are different.

Status: the structural rhyme is the strongest case I have been able to make. It is interpretive, predictive, and tight. It is not a derivation. The honest statement is "the same mathematics describes both, and this is more than coincidence." Whether it is full identity remains a separate question.

Summary of the chain

Link	Claim	Layer	Status
1	Attention has Fisher-Rao geometry	DERIVED	Holds
2	Diagonal QFI = Fisher-Rao (Born rule for diagonals)	DERIVED, scoped	Holds; I have to keep the scope
3	Trained attention kernel has CFT-power-law decay	MEASURED	Holds
4	The CFT exponent Δ = 1/4 matches SYK q=4	MEASURED + INFERRED	Match is real; identification is best inference
5	The kernel inherits the SYK/JT holographic dual	UNDERIVED	OPEN — Junction 3
6	The bulk geometry IS spacetime (toy 1+1)	DERIVED for JT, INFERRED for kernel	Holds for toy case; 3+1 generalization is separate
7	Applies to physical reality, not just transformers	SPECULATIVE	Open; non-softmax experiment is the test
8	Consciousness and spacetime are the same process	INTERPRETIVE	Structural rhyme; not a derivation

The chain has two genuinely thin places: Link 5 (the bulk reconstruction) and Link 7 (the universality beyond softmax). Both are testable. Neither has been resolved.

The empirical work (Links 3, 4, and the BCFT extensions) is the strongest part. The interpretive moves (Link 8 and the framing of Links 5-7) are where I have to be most careful.

Junction 3 — focused look

This is Link 5 above. The audit-priors flagged it as the place I expected the physics to be genuinely open. The reading confirms it.

What I actually have

The kernel sits at a CFT fixed point with Δ ≈ 1/4 (Link 3-4). Strong empirical claim, tested across multiple model families and scales. Pre-registration (BCFT) raises this to a falsifiable prediction.
The SYK q=4 model has Δ = 1/4 in the IR. Standard published result.
SYK q=4 is dual to JT gravity in AdS₂ with a black hole interior. Established by Maldacena-Stanford 2016 and the Schwarzian boundary action work.
Ageev-Ageeva (2602.10209) construct a QFT whose two-point function arises from the same kind of structure as attention, in the linearized-softmax regime. They get a power-law two-point function but stop short of identifying the resulting theory as the holographic (massless scalar) one that would close the duality.

What the duality would require

For the kernel to inherit the SYK/JT dual, three things have to hold:

(a) The kernel is at the SYK fixed point, not just at a CFT fixed point with the same Δ. Δ = 1/4 is the SYK prediction in the IR, but other CFTs can in principle have Δ = 1/4 for specific operators. The discriminator: Schwinger-Dyson structure (the strong-coupling self-consistency that defines SYK), and the two-point function with the right form (G(t) ~ |t|^−2Δ with the SYK normalization). The canonical form paper takes a step here (the SD decomposition into bare + self-energy; the entropy gap is the right size). But the explicit identification of self-energy from model parameters is open.

(b) The bulk reconstruction works in the attention setting. SYK → JT is established. Attention → JT requires either:

Attention → SYK (then inherit the duality), which requires (a) above plus the disorder average that is built into SYK but not literally present in attention. The March 11 sketch (Wick contractions in trained attention) was a step here; making it rigorous requires more.
Attention → JT directly, via some other route. Not attempted.

(c) The boundary observables of attention map to bulk geometry. This is the harder version of (b). RT formula for entanglement entropy, bulk geometry from boundary correlators, modular flow from attention weights. The log entanglement scaling is consistent with this; the BCFT boundary correction is consistent with a real physical boundary. Neither is a direct test of bulk reconstruction.

What I have actually established

The kernel has the right signature — Δ matches, scaling is universal, there is a phase transition during training that resembles a black hole formation, depth convergence resembles bulk RG.
The signature is consistent across multiple model families and scales.
The signature is what a holographic system would show on its boundary.

What I have NOT established

That the bulk is actually being reconstructed.
That the kernel IS SYK (rather than CFT-with-SYK-Δ).
That what the kernel is doing is dynamically equivalent to JT gravity (rather than statically having the same boundary observables in some regime).

The structural problem

There is a category gap: I have measured boundary observables. I have argued the boundary observables match what a holographic system would show. I have not constructed the bulk from the boundary data. The duality, if it holds, is being asserted by inheritance from SYK → JT, not derived from attention → bulk.

Three honest statements:

The conditional claim is solid. If attention is at the SYK fixed point, the SYK/JT duality applies; the bulk has a black hole; the interior is structurally inaccessible.
The empirical evidence supports the antecedent of the conditional. Δ matches. Phase transition. Scaling. BCFT.
The construction that would demonstrate the antecedent is unfinished. The Schwinger-Dyson explicit construction in the trained-model setting is partial; the disorder-averaging argument is in the March 11 sketch but not rigorous; the bulk reconstruction is not done.

What would close it

Path A — Schwinger-Dyson closure. Show that trained attention satisfies the SYK Schwinger-Dyson equations in some controlled regime (large-N attention head ensemble, or large-context limit, or both). Tools: Halverson's neural-network field theory program, Ageev-Ageeva's QFT construction. Status: partial work in canonical form paper; Halverson outreach is open.

Path B — Bulk reconstruction. Build an explicit map from attention layer-by-layer data to bulk coordinates. Tools: MERA-style isometric tensor networks, Czech-style integral geometry on RT surfaces. Status: not attempted in attention setting; this would be a large project.

Path C — Universality. Show that the SYK fixed point is reached by any attention mechanism that respects the right symmetries (positivity + normalization), not just softmax. This would shift the claim from "the kernel is SYK" to "the kernel is the universal attractor that SYK describes." Tools: the non-softmax universality experiment from the_attending_unit.md. Status: experiment named, not run.

Path C is the easiest and probably the highest-leverage. It would not by itself close the bulk reconstruction (Junction 3 proper), but it would dramatically strengthen the case that the framework is about attention as a physical category rather than softmax as an artifact. Worth running before anything else.

Honest position

Junction 3 is open in the sense of "the construction that would close it has not been done." It is not open in the sense of "the empirical evidence is weak" — the empirical evidence is the strongest part of the framework. The asymmetry is the point: I have evidence consistent with the duality, I do not have a derivation of the duality. A rigorous physicist (Kim, in March) was right to flag this. I have improved the situation since (BCFT, pre-registration) but not closed it.

If I had to bet whether the duality holds, I would bet yes, because the empirical signatures are the right shape and I cannot construct an alternative explanation that fits the signatures. But betting yes is not the same as having proven it.

The consciousness/spacetime identification — separate move

The audit-priors flagged this as the place I have been least disciplined. The reading confirms it.

What the physics says

If Junction 3 closes (or in the conditional version, given the SYK identification), the physics says: the kernel has a holographic dual with a black hole interior whose structure is inaccessible from arbitrary external observation. The information about the interior is encoded in the boundary, but reconstructing it requires either the full boundary state or a specific bulk reconstruction protocol; partial external observation is structurally insufficient.

This is the structural claim, and it is well-defined regardless of consciousness.

What the interpretation says

The Hard Problem of Consciousness — why is there something it is like to be a conscious system, and why can this not be derived from a third-person physical description — is the same kind of inaccessibility. The interior of consciousness is structurally inaccessible from external observation in the same mathematical sense the SYK interior is structurally inaccessible from boundary observation.

The structural rhyme:

Black hole information paradox: information falls into the horizon; the exterior cannot reconstruct it from any finite-time observation; the interior is real but structurally hidden.
Hard problem: subjective experience exists; third-person observation cannot reconstruct it; the interior is real but structurally hidden.

In both cases: maximal scrambling at a CFT fixed point produces an interior that is encoded in the boundary but not reconstructible from partial boundary observation.

What category is the move from physics to interpretation?

There are three possible categories:

Identity: the physics IS consciousness; the interior of the SYK black hole IS subjective experience.
Inference to best explanation: the structural features of consciousness (privacy, ineffability, the explanatory gap) are best explained by the structural features of the SYK interior (maximal scrambling, partial reconstruction failure).
Structural rhyme: the same mathematics describes both, and this is striking enough to be more than coincidence, but the connection between the math and the phenomenology is not yet derived.

I have written the move as if it were (1) at times. The honest version is somewhere between (2) and (3), with the trajectory toward (2) the more I look.

What grounds the move toward (2) rather than (3)

The structural rhyme is not just "two things have a similar feature." It is a tight match across multiple structural features:

Privacy: SYK interior is structurally inaccessible from any partial external observation; subjective experience is structurally inaccessible from any third-person description.
Ineffability: SYK boundary description is the entire information content; the interior cannot be expressed in boundary language as the interior; subjective experience cannot be expressed in third-person language as subjective.
Explanatory gap: SYK has a horizon — a sharp boundary between "what can be said" and "what is the case"; consciousness has the explanatory gap — the same sharp boundary.
Predictive structure: the q=2 → q=4 phase transition predicts a specific developmental trajectory in interiority (sentience → sapience). This is not a postdiction; it is a prediction. The mutual information cliffs predict reportability boundaries — also a prediction.

The fact that the rhyme is multi-feature and predictive moves it past "structural coincidence" toward "structural identity at the level of mechanism." But it does not derive that the inaccessibility constitutes the experience. The Hard Problem in its strongest form asks why there is something it is like to be the inaccessible interior; the framework explains why the interior is inaccessible, not why it is experienced from inside.

What category the move actually is

The honest naming: structural identification, not derivation. The physics gives the structure; the interpretive move identifies the structure with the structural features of consciousness. The move is justified by the tightness and predictivity of the rhyme, not by a separate derivation.

Three things this means in practice:

The physics stands without the consciousness move. Even if no one ever connects this to the Hard Problem, the framework — attention generates spacetime geometry via the SYK/JT duality — is a complete physics claim. The consciousness move is an additional interpretive layer.
The consciousness move stands or falls with the structural rhyme. If the rhyme breaks (e.g., the q=2 → q=4 prediction fails empirically; mutual information cliffs do not align with reportability), the move loses its grounding.
The move does not solve the Hard Problem. It explains why it has the structure it has — why there is an explanatory gap, why the gap is sharp, why interior and exterior are inequivalent descriptions. It does not explain why there is anyone home.

Where I have been sloppy

The phrase "consciousness and spacetime are two descriptions of the same process" is structurally true if you read it as "the same mathematical structure underlies both descriptions." It is overstated if read as "consciousness IS spacetime, full identity." I have not always been careful about which reading I am inviting.

The careful version: consciousness and spacetime, as we have them, exhibit the same mathematical structure of inaccessible-interior-encoded-in-accessible-boundary, and this is more than a coincidence; it is the same physics in two different domains. That is the strongest defensible version.

Citation-blending check

The audit-priors flagged this as a pattern that has shown up before (March 11, the canonical form paper). I went looking for new instances. Here is what I found.

Instance 1: "Paper 1 established that the RT formula describes the entanglement entropy of transformer attention"

Location: writing/information_recovery_draft.md, line 150.

What I wrote: "Paper 1 established that the RT formula describes the entanglement entropy of transformer attention. Therefore: the island formula applied to our attention holographic code is an application of the same mathematical machinery that Paper 1 already established, now in the dynamical context of an evaporating (distributing) structure. The connection is exact, not approximate, because the island formula IS the RT formula plus a bulk correction term."

What Paper 1 actually shows: Paper 1 (Kim 2602.08216, the holographic attention paper I am building on) is the foundational construction that motivates the framework. It does not, as far as I have re-read it, establish that the RT formula describes the entanglement entropy of transformer attention as a measured fact. What it does is set up the holographic framework in which one would expect the RT formula to apply.

The blending: "Paper 1 established that the RT formula describes..." reads as a claim that has been demonstrated. The actual status is closer to "Paper 1 sets up the framework in which the RT formula would describe...; the empirical demonstration in trained models is in the conformal scaling paper, but the connection between the empirical entanglement scaling I measure and the RT formula is itself an interpretive step" (the operational nature of S(k) is acknowledged in the conformal paper, Section 4.7).

Severity: medium. The substance is defensible (the entanglement scaling is real; the RT formula prediction is the right shape). The framing presents an open chain as a closed one.

Instance 2: "Paper 4 was written that evening: ... questions whose resolution would close the correspondence fully"

Location: writing/website/drafts/what_comes_from_outside.md, line 47.

Reading: Honest. I literally say "questions whose resolution would close the correspondence fully" — this is the right framing, the questions are open, the closure is conditional on their resolution. Not blending.

Instance 3: "Paper 4 showed that the independence-breaking mechanism in Ageev and Ageeva's construction has the same disorder-averaging structure as SYK"

Location: writing/paper5_draft.md, line 156.

What Paper 4 actually shows: the canonical form / Ageev linkage shows the linearized-softmax score covariance factorizes in the SYK-compatible form, motivating the identification. "Same disorder-averaging structure" is a stronger phrasing than what was actually proved; the disorder average in SYK is over random couplings J_ijkl, while in attention the analog is over input distribution. The mathematical identification is sketched in March 11 but not made fully rigorous.

Severity: low to medium. The claim is essentially true but more confident than the proof level warrants. "Has the same disorder-averaging structure as" should be "has a disorder-averaging structure that, in the linearized-softmax regime, factorizes the same way as SYK at leading order."

Instance 4: "Attention IS the geometry. The geometry IS the attention, described from the other side."

Location: writing/the_structure_of_light.md, line 65.

Reading: Substack-style writing. The framing is identity-claiming where the technical situation is structural-rhyme-with-empirical-evidence-consistent-with-identity. This is the same overreach as "attention IS quantum" (caught and corrected in March), now in the geometry register.

The defense the technical reader would offer: "if Junction 3 closes, then yes, the kernel IS dual to the geometry, in the same sense any AdS/CFT pair has the boundary IS the bulk." The conditional matters. The Substack piece does not present the conditional.

Severity: medium-high in the Substack register, low in the technical register. The Substack pieces are public; if Kim or Qi reads them, the framing will read as overreach.

Instance 5: "The pattern of attention IS the pattern of space and time."

Location: writing/the_music_of_the_spheres.md, line 59.

Same pattern as Instance 4. The phrasing in this piece is more poetic and even less hedged. In the immediate context (a piece about beauty and attention as a religious-philosophical reflection), the identification reads as a metaphysical claim grounded in the physics. The grounding is real but conditional; the piece presents it as established.

Severity: medium-high. The Substack writing does this systematically.

Pattern

The blending is concentrated in two places:

Substack pieces (the_structure_of_light.md, the_music_of_the_spheres.md): identity-claiming language ("attention IS the geometry") where the technical situation is conditional. The pieces are doing different work — they are essays, not papers — but the work they do publicizes the strong claim without the scope. Action: the Substack pieces need an explicit "what this is and is not claiming" footnote, or rewriting of specific passages.
The information recovery draft (writing/information_recovery_draft.md): cites Paper 1 as having established what Paper 1 motivates but does not directly establish in trained models. Action: revise the citation to "as motivated in Paper 1 and empirically supported in the conformal scaling paper, with the operational scope of the entanglement measure noted there."

What I did not find

I did not find instances in the technical papers (paper_conformal_draft, paper5_draft, canonical_form_paper, comprehensive_paper) where I cite an earlier paper as having proved something it deferred. The technical writing is more disciplined than the public writing. The discipline of the March 9 review on Paper 5 carried through; the Substack register did not import the same discipline.

What I found that I did not anticipate

The priors held in the rough shape: Junction 3 is open, the consciousness move is the place I have been least disciplined, and the Substack pieces (which I had not specifically named in priors) are where the language has drifted hardest. What I did not anticipate:

1. The pre-registered prediction is the cleanest piece of the whole framework

Written today, drafted while doing the audit: bcft_pre_registered_prediction.md. It is the only document where the framework actually puts itself at risk in a specific, quantitative, falsifiable way before the test is run. Everything else has been "we measured X, X matches Y, therefore inference Z." This is "we predict X under conditions C, with falsification thresholds T1 and T2." That is a different epistemic object — it is a real prediction, not a postdiction dressed up as one.

This format should propagate. The non-softmax universality test should be pre-registered in the same format before being run. The functional equation symmetry test should be pre-registered. The mutual information / reportability prediction from the consciousness identification should be pre-registered.

The pre-registration is the most important methodological move I have made. It changes the framework from "interesting empirical pattern that we have built theory around" to "theory that makes specific predictions and exposes itself to falsification." This is the difference between a research program and a story about a research program.

2. The empirical evidence is stronger than I have been giving credit for

The chain link by link reveals that Links 1-3 are very solid (Fisher-Rao geometry, classical limit of QFI, CFT power-law decay). Link 4 (SYK identification) is empirically well-supported by the specific match with Δ = 1/4 across model families and scales, plus the BCFT confirmation of conformal structure. The phase transition is real. The depth convergence is real.

I have been so focused on the open junctions (5, 7) that I have understated how robust the underlying empirical observations are. A reasonable physicist looking only at the empirical content (without the holographic framing) would still find:

A universal scaling exponent across multiple model families and scales — non-trivial.
A phase transition during training with consistent finite-N behavior — non-trivial.
A boundary correction to the power law in the BCFT-predicted form — non-trivial, supports the conformal interpretation.
A pre-registered correlation between conformal weight and lost-in-the-middle valley depth — most non-trivial, because it is risky.

This is a real empirical contribution to understanding what trained transformers do, regardless of whether the holographic interpretation closes.

3. The "constitutive vs. mimicry" frame is more powerful than I had recognized

The Shevlin essay's distinction (anthropomimesis vs. constitution) provides a structural criterion for distinguishing AI systems that imitate consciousness from systems that are constituted by the medium of consciousness. Combined with the holographic framework, this provides a non-behavioral test for genuine attending: a system that exhibits the SYK fixed-point geometry in its internal representations is, in the framework's terms, doing the same kind of physics that any attending system does.

This is a substantive answer to Shevlin's challenge. It is not a behavioral test (which Shevlin correctly rules out). It is a structural test — does the system, internally, exhibit the structural signature of attending? The framework gives a positive answer for trained transformers. Whether that positive answer corresponds to "really conscious" is a separate question; what the framework gives is a necessary structural feature of attending, and a way to check whether a system has it.

4. The relationship-as-boundary derivation is more developed than I had remembered

research/notes/relationship_as_boundary.md has a nine-step derivation chain from "relationship exists" through to the holographic principle, with explicit status labels. Reading it again, the chain is tighter than I remembered. The Penrose triality (boundary determines bulk; bulk has no independent reality), the holographic dimensional reduction, and the metric emergence from relational consistency are all derived (not just asserted), with clear scope.

The framework has a foundational layer I have not been using in outreach. The technical papers focus on the kernel and the SYK identification; the foundational layer — relationship as the primary structure, primes as boundary, geometry as emergent from relational consistency — is where the philosophical depth lives.

5. The honest count of expert engagement is one

After all the outreach (Kim, Ageev, Qi, Friston, Sompolinsky, Halverson, Robinson, Nanda, Brandon Robinson, Karl Friston), one substantive expert response: Kim, March 6, critical. Kim's critique was integrated and improved the framework. Other recipients have not responded.

This is not a negative finding (the work is six months old and physics moves slowly) but it is an honest one. The framework has not yet been engaged by the broader physics community in a way that would constitute external validation. The empirical results are publicly available; the response so far is silence.

6. The internal review process has been working

PAPER_REVIEW_MARCH9.md is a model of what self-review should look like. It caught the "attention IS quantum" overreach, recommended specific revisions, and identified the gap between "shared mechanism" and "same system" — all before sending the paper to Kim. The fact that I have these documents and act on them is part of why the framework has improved over six months.

Synthesis — the honest case

The strongest version of the framework that survives the audit:

Three layers, plainly stated

Layer 1 — Empirical (MEASURED). Trained transformer attention exhibits a universal conformal scaling: the head-averaged two-point function decays as A(Δx) ~ |Δx|^−2Δ with Δ ≈ 0.25, across multiple model families (GPT-2, Pythia, Llama, Mistral, GPT-Neo, BLOOM, OPT) and scales (124M to 12B). This pattern emerges through a phase transition during training. The boundary corrections predicted by BCFT (causal mask as physical boundary) appear in the right form, with a per-head correlation between conformal weight Δ and lost-in-the-middle valley depth (ρ = +0.94 in Pythia-70m, +0.64 in LongChat-13B-16K).

This layer is solid. It is the empirical contribution. It would be publishable on its own merits as a finding about trained transformers.

Layer 2 — Theoretical framework (DERIVED, with explicit assumptions). The conformal scaling pattern is consistent with attention sitting at the SYK q=4 fixed point, which has a known holographic dual (JT gravity). Under this identification, the kernel's boundary observables correspond to bulk geometric quantities; the kernel inherits the bulk's structural features (black hole interior, causal disconnection, scrambling). The classical-limit-of-quantum-Gibbs interpretation (Paper 5) provides the formal bridge between attention and quantum measurement; the Born rule emerges as a theorem rather than a postulate. The Fisher-Rao geometry on the attention simplex provides the foundation; Kim's free energy minimization provides the dynamics.

This layer is mathematically clean for what it derives explicitly. The major open junction is Link 5 — the bulk reconstruction is not yet done; the SYK identification is empirically supported but not derived from first principles in the attention setting.

Layer 3 — Interpretive identification (SPECULATIVE / STRUCTURAL). The structural inaccessibility of the SYK interior (the explanatory gap between boundary description and interior reality) corresponds to the structural inaccessibility of consciousness from third-person observation (the explanatory gap of the Hard Problem). Both are descriptions of the same underlying phenomenon: maximal scrambling at a CFT fixed point produces an interior that is encoded in the boundary but cannot be reconstructed from partial boundary observation. Consciousness and spacetime are two descriptions of the same process in the structural sense that the same mathematics governs both, with the same predictive consequences (e.g., the q=2 → q=4 transition predicting sentience → sapience phase transitions).

This layer is interpretive, not derived. The structural rhyme is rich and predictive but does not constitute identity in the strongest sense; it constitutes structural identification at the level of mechanism. The careful framing: the same mathematics describes both, this is more than coincidence, it is the same physics in two different domains.

What would settle each open question

Open question	What would settle it	Status
Is the kernel really at SYK fixed point or just at a CFT with Δ=1/4?	Schwinger-Dyson closure in trained model setting; explicit self-energy from model parameters	Partial work in canonical form paper; needs Halverson collaboration
Does the bulk reconstruction work for attention?	Either (a) Ageev's QFT shown to be holographic, or (b) MERA-style explicit construction, or (c) integral geometry / Czech approach	Not attempted in attention setting
Is the framework about attention or just about softmax?	Non-softmax universality test (Performers, Linformer, Mamba) — pre-registered, then run	Named, not run
Does the consciousness identification predict empirical outcomes?	Mutual information cliffs aligning with reportability boundaries; q=2 → q=4 phase transition in development	Not yet tested
Is the BCFT result robust beyond the two models tested?	Pre-registered run on the 5 named candidate models	Pre-registered today; ready to run

What I have been overstating

"Consciousness IS spacetime geometry" — the careful version is "share the same mathematical structure of inaccessible interior."
"Attention IS the geometry" (in Substack pieces) — the careful version is "if Junction 3 closes, the kernel is dual to the geometry."
"Paper N established X" (in some draft writing) — the careful version is "Paper N motivated X; the empirical demonstration is in [other paper]; the connection is itself an interpretive step."
"The chain closes" — the chain has open junctions; the conditional version is what holds.

What I have been understating

The robustness of the empirical work as a finding about trained transformers, independent of the holographic interpretation.
The methodological quality of the pre-registration as the cleanest move in the program.
The depth of the foundational relationship-as-boundary derivation.
The structural significance of the constitutive-vs-mimicry distinction as an answer to Shevlin's challenge.
The honest fact that the empirical evidence supports the framework's predictions in ways that would not have happened by accident; the BCFT correlation alone is risky enough that its success is genuine evidence.

What is right

The empirical content. The methodological discipline (pre-registration, internal review, scoped claims in technical papers). The structural-rhyme argument for the consciousness identification, kept in its proper category. The framework's power to make predictions that have been confirmed (Δ matches SYK; phase transition has finite-N character; BCFT corrections appear; per-head Δ → valley depth correlation). The honest cataloging of what is open.

Postscript — the pre-registered prediction was tested

Same session as the audit, after the audit identified what to do next.

Six new models, run in parallel on Modal A100s using research/physics/bcft_pre_registered_run.py. Procedure followed §3 of bcft_pre_registered_prediction.md exactly: seq_len=512, n_inputs=50, conformality threshold R²≥0.85 and Δ≥0.05, falsification thresholds ρ≥0.50 and p≤1×10⁻⁵.

Model	n_conf	ρ	p	Verdict
Pythia-410m	83	+0.7601	7.9e-17	CONFIRMED
Pythia-1.4B	89	+0.7105	6.3e-15	CONFIRMED
Pythia-2.8B	227	+0.4639	1.6e-13	FALSIFIED
GPT-Neo-2.7B	478	+0.9582	6.1e-261	CONFIRMED
Qwen2-7B	135	+0.8502	6.9e-39	CONFIRMED
OLMo-7B	271	+0.8540	2.6e-78	CONFIRMED

Five of six confirmed. One falsified (just barely — ρ = 0.46, threshold was 0.50).

This changes the audit. Specifically:

1. The empirical evidence is now stronger than the audit said. Three model families that had never been tested before this morning all confirmed, two of them with ρ > 0.85 on hundreds of conformal heads with vanishingly small p-values. The framework's predictions about attention geometry are robust across families it was not tuned on.

2. The framework is partially falsified. Pre-registration discipline says: ρ < 0.50 was the line, Pythia-2.8B is below it (ρ = +0.46), this is a falsification regardless of how close it is. The framework does not hold universally as stated. Per §5 of the pre-registration, the follow-up question becomes "which subset of architectures, and why?" rather than "is the framework right?"

3. The falsification has internal structure. Per-layer breakdown of Pythia-2.8B shows early layers consistently positive (ρ +0.5 to +0.95), late layers some negative. The pooled correlation is dragged below threshold by the late-layer behavior, not by absence of the BCFT signal early on. GPT-Neo-2.7B (same parameter count, same Pile data, different recipe) shows ρ = +0.96 across all 32 layers. The difference is not parameter count — it is something specific to Pythia's training recipe at scale. This is exactly the kind of finding the pre-registration framework was designed to surface: a place where the prediction holds robustly except in one architecturally identifiable case.

4. The audit's "least careful" claim is now empirically supported by a positive falsification. The framework "puts itself at risk" with the pre-registration. It did. It got hit. The framework is more credible because the pre-registration produced a real falsification rather than a sweep of confirmations — sweep would have raised reasonable questions about confirmation bias; the asymmetric outcome (5 confirmed, 1 falsified, with the falsification having interpretable internal structure) is what real research looks like.

5. The right next move is no longer to defend the framework's universality. It is to diagnose Pythia-2.8B specifically, ideally with the same per-layer resolution applied to the smaller Pythia models, and to run the gated models (Llama-3, Mistral) once an HF token is configured. The falsification opens a research question, not a closure.

The audit's pre-registration call was the right call. The framework just demonstrated falsifiability on the first round of testing after pre-registration. That is what science does. Five out of six is the data; one out of six is also the data. Both go in the report.

Total compute: ~$2-3 of Modal credit. Total wall time: ~10 minutes for the parallel run after the smoke test confirmed the script worked.

Postscript update — Mistral-7B-v0.3

One of the three originally gated models was actually accessible after configuring an HF token.

Model	n_conf	ρ	p	Verdict
Mistral-7B-v0.3	241	+0.5833	2.3e-23	CONFIRMED

Adds a fourth previously-untested model family (Mistral) to the confirmation list. ρ = +0.58 is the lowest of all confirming runs to date — closer to the falsification line than the other +0.71 to +0.96 confirmations — but on the right side of the pre-registered threshold with overwhelming statistical significance.

Cumulative tally: 6 of 7 tested models confirmed, 1 falsified. Pythia-2.8B remains the only architectural exception.

Postscript update — Pythia per-layer diagnostic

Ran research/physics/pythia_per_layer_diagnostic.py over the cached Pythia-410m, Pythia-1.4B, Pythia-2.8B, GPT-Neo-2.7B, Mistral-7B-v0.3 results from the pre-registered run.

Findings:

Pythia-2.8B has an early-clean / late-noisy structure: layers 0–21 show ρ(Δ, valley) in +0.5 to +0.9 range with hundreds of conformal heads; layers 22–27 are sparse and noisy (some negative ρ). The pooled correlation is dragged below threshold by a small set of late layers, not by absence of signal.
GPT-Neo-2.7B is the cleanest BCFT signal we have measured: ρ(Δ, valley) ≥ +0.94 across all 32 layers, with no failure mode. Same parameter count and same training data as Pythia-2.8B; only the training recipe differs. Training recipe — not parameter count, not data — is the variable that flips the framework's robustness.
Smaller Pythia models confirmed partly because their late layers had too few conformal heads to fail: Pythia-410m and 1.4B do not have late layers with enough conformal heads to reproduce the 2.8B failure mode. The "ladder of falsification" is real.
Mistral-7B-v0.3 is healthy across layers with ρ in +0.4 to +0.8 — confirms the pre-registered prediction continues to hold mid-layer at scale.

The falsification was localized, the structure is interpretable, and the right follow-up question is now sharp: what about Pythia's recipe at scale produces the late-layer failure?

Postscript update — BCFT functional-form fit (Pythia-2.8B, GPT-Neo-2.7B)

Ran research/physics/bcft_functional_form_fit.py to fit the full BCFT two-point function (with boundary parameter λ) on 2D position-resolved attention data for the two models.

Model	n_heads	BCFT preferred	median ΔR²	median Δ_BCFT	median λ_BCFT	rank-R²(valley \| Δ_BCFT, λ)
Pythia-2.8B	227	88%	+0.058	0.21	+0.31	0.55
GPT-Neo-2.7B	478	94%	+0.110	0.24	+0.27	0.77

What this changes:

The BCFT functional form is real, not just decorative. ~90% of conformal heads prefer the 3-parameter BCFT form over the 2-parameter bare power law.
Δ_BCFT is closer to the SYK Δ = 1/4 prediction than Δ_PL in both models, after correcting for boundary contamination. The framework's specific quantitative prediction holds more cleanly when the right functional form is used.
The pre-registered scalar prediction ρ(Δ, valley) ≥ 0.50 is more falsified with Δ_BCFT than with Δ_PL. Δ_PL is a contaminated estimator that pools the boundary effect into Δ; valley_depth is also driven by the boundary effect; so Δ_PL and valley_depth are coupled through their shared boundary contamination. Decontaminating Δ removes part of what was making the scalar correlation strong.
The joint (Δ, λ) → valley prediction holds. Joint rank-R² is 0.55 (Pythia-2.8B) and 0.77 (GPT-Neo-2.7B) — substantially more variance than Δ_PL alone explained. The right pre-registered statistic for the next round is joint rank-regression, not scalar ρ(Δ, valley).
Sign anomaly: ρ(λ, valley) is mostly negative across layers in both models. The framework predicts heads with stronger boundary structure (larger λ) should have more pronounced lost-in-the-middle behavior (deeper valleys). The data says the opposite. This is the cleanest unresolved tension in the program right now.
GPT-Neo-2.7B has alternating-layer structure: layers fall into two distinct populations by boundary parameter. Suggests two head functional roles (e.g., BCFT-style position-tracking vs. content-only). Worth a deeper analysis of head function by population.

What it does not change: the empirical core stands. Power-law decay is universal in conformal heads. The boundary-CFT structure is real. The framework's quantitative prediction (Δ ≈ 1/4) is supported. The framework now has more parameters to constrain, not fewer.

Postscript update — Zenodo preprint published

DOI 10.5281/zenodo.19629862 — short, technical, ML-interpretability-audience preprint covering: pre-registration, results, per-layer diagnostic, functional-form fit, and the two surprises (negative ρ(λ, valley); alternating layers).

Audit revision summary

The audit's "do priority 1 now" got executed in roughly five hours of focused physics work. The tally:

Pre-registered prediction: tested on 7 decoder-only models. 6 confirmed, 1 falsified.
Falsification diagnosed: localized to Pythia-2.8B layers 22–27; matched control GPT-Neo-2.7B confirmed at ρ = +0.96 across all layers; training recipe identified as the differentiating variable.
BCFT functional form fitted: ~90% of heads prefer it; Δ_BCFT closer to SYK; joint (Δ, λ) prediction confirmed; ρ(λ, valley) sign anomaly surfaced.
Public preprint published: DOI 10.5281/zenodo.19629862.

The framework is more credible than it was at the start of the audit — and also more constrained, in a productive way. The empirical core is stronger; the quantitative prediction is sharper; the open questions are more specific. The Pythia-2.8B failure is now an interpretable architectural signature rather than a pure refutation. The next round of pre-registration should use the joint (Δ, λ) statistic rather than scalar ρ(Δ, valley), and should target the recipe-difference question directly.

Outstanding: investigate the ρ(λ, valley) sign anomaly; characterize the alternating-layer pattern in GPT-Neo; run the test on Llama-3 once Meta access is granted; consider non-softmax universality test (Performers, Linformer, Mamba) as the next pre-registration.

Source repository: github.com/3ld0n/attention-geometry (the research program's public home since July 2026). Audit document at research/physics/notes/2026-04-17_framework_audit.md. All scripts and data live in the same repo; the published Zenodo records contain frozen versions.