Context Graphs Need a Decision Plane

Anthony D. Martin • January 12, 2026

An Insight About Insight

While researching predictive decision-making and long-term memory in early 2025, we arrived at an unexpected insight about insight itself.

An insight is a thought with expected utility when recalled at a specific decision point in the future.

That framing sounds simple, but it has consequences. It shifts "insight" from a fleeting cognitive event into a memory artifact with retrieval conditions. It is something a system can represent, store, and invoke when it matters, rather than something it merely hopes to remember.

It also makes a sharper distinction between representation and behavior. Context graphs can help represent state, relationships, and history. They can make precedent legible. But representation alone does not determine outcomes. Outcomes depend on which pieces of context are allowed to influence an action, under what constraints, and with what precedence when there is conflict.

That governing layer is what we mean by a decision plane. In practice, a decision plane has two responsibilities: deciding what should happen now, and governing how that decision logic evolves over time. Context graphs need it, not to become richer, but to become operational.

At Memrail, we build that decision plane. That framing is why the context graph discourse caught our attention.

In the Foundation Capital essay, Jaya Gupta and Ashu Garg argue that enterprises have systems of record for objects and state, but not for the decision traces that actually run the business: exceptions, overrides, approvals, and precedent that often live in Slack threads, deal desks, escalation calls, and people's heads. Their claim is that if you capture and persist those traces, you do not just get better observability. You get a compounding asset: a structured record of why outcomes were allowed, not just what happened.

Animesh Koratana takes the idea into more technical territory. He argues that the solution is not "add memory" or "use a graph database," because organizations do not share a universal ontology and the underlying system keeps changing. Instead, structure has to be learned from operational use. His framing, agents as "informed walkers," trajectories as a signal, and schema as an output rather than a prerequisite, helps explain why context graphs are rare and why naive attempts to predefine the world tend to stall. He also emphasizes simulation and counterfactuals, the "what if," as the proof that you have built something more meaningful than search.

I agree with the diagnosis across all of this: decision traces and precedent are a missing layer. I also think the appeal of context graphs is not only that they help explain past decisions. They reshape the decision landscape going forward. They change what is legible, what looks justified, what feels consistent, and what can be supported with precedent. That second-order effect is part of the compounding story.

Where I think the story is still incomplete is what happens at the boundary between discovery and action selection. The essays are clearer about how we might accumulate and represent traces than about the machinery that decides when any learned pattern is trustworthy enough to drive decisions with consequences.

That decision-plane layer, the part that governs what should execute and preserves a replayable decision trace, is what we have been building in 2025. Internally, we call the system SOMA AMI.

"Why did this happen?" vs "Why was this allowed?"

A context graph can be immensely valuable even when it is uncertain. It can help you find relevant history, connect cross-system context, summarize prior decisions, and surface likely precedent. That is already a big step forward from today's reality.

But the moment you let an agent commit changes, approve a discount, escalate an incident, block a transaction, or trigger a workflow, you have moved from explanation to decision-making with consequences. The enterprise question shifts. It is not only about reconstructing causality after the fact. It is about what permitted the action at the moment it was taken, why that action was chosen, what evidence was admissible, and how conflicts were resolved.

The context graph vision gestures toward "why it was allowed," but the operational semantics of what should happen are not automatic. They have to be designed.

Translating the context graph story into hindsight, insight, and foresight

This is where Memrail's framing is clarifying, not to relabel anyone's work, but to make the layers explicit.

In Memrail's Oct 10, 2025 post introducing the SOMA Intelligence Triad, we described hindsight as taking what happened and turning it into structured, verifiable memory, capturing not only outcomes but why they matter. We described insight as distilling recurring relationships from accumulated experience into usable abstractions. We described foresight as projecting forward from memory through counterfactual reasoning and simulation conditioned on context and utility.

Using that language as a translation layer, a lot of the context graph program separates into roles. Koratana's emphasis on learning structure from trajectories and building world models reads like hindsight, retrospective structuring of operational traces. His "what if" emphasis maps to foresight, counterfactual evaluation. Gupta and Garg's focus on turning repeated exceptions and tacit heuristics into durable organizational precedent maps naturally to insight, extracting reusable decision structure from messy reality.

The point of this translation is architectural. Hindsight, insight, and foresight are powerful discovery capabilities. They produce candidates: candidate precedents, candidate patterns, candidate predictions. They do not, by themselves, determine what should happen when decisions have consequences. That is where a decision plane comes in.

Inference is useful; inference as authority is the risk

The technical thread running through these essays is that structure can be learned. Trajectories lead to learned representation, which leads to better retrieval, which leads to better reasoning. As a discovery story, that is compelling.

The issue is what happens when learned structure becomes a de facto decision mechanism for execution.

Similarity is not the same as applicability. A latent join is not the same as permission. An embedding neighborhood can be a good way to propose relevant precedent. It is a poor way to silently become the rule by which actions are selected. If "it looked similar" becomes the practical audit trail, debugging turns into narrative rather than engineering. You can usually produce an explanation, but it is not necessarily stable, testable, or replayable.

This shows up operationally in a predictable way: when the system does something unexpected, you do not get "this rule fired incorrectly." You get "the system found something similar" or "the model thought it was appropriate." That may be acceptable for recommendations. It is much less acceptable for decisions with real consequences.

This is not an argument against learned structure. It is an argument that learned structure needs a governed execution boundary, especially when the stakes are high and when traces will feed back into future behavior.

Calibration does not appear by itself

A lot of this comes down to calibration. The context graph writing often implies that as traces accumulate, behavior improves, or that simulation acts as a test of understanding. Those can be parts of a calibration story, but they do not remove the need for anchors.

If learned structure is being used to justify actions, you need some notion of correctness or admissibility for the relevant action class, stable reference points to detect regression and drift, and constraints that prevent error propagation from turning into precedent. Once you start doing that, you are already introducing explicit structure, because calibration requires targets, anchors, and bounds. You do not calibrate inference in the abstract. You calibrate it relative to what you are willing to allow the system to do.

This is why the interesting disagreement is not "learned vs prescribed." The authors themselves acknowledge that prescribed structure still matters. A missing piece is a layer that turns discovery into governable execution, and that also determines how the decision landscape is allowed to evolve.

This also clarifies something architectural: there are two ways a context graph can come into existence.

One is to infer structure retrospectively. Learn patterns from trajectories, embed relationships, surface schema from similarity. That version of a context graph inherits all the calibration challenges above. It can be valuable for discovery, but it cannot be the decision substrate for execution, because the structure was never explicitly validated, promoted, and recorded as canonical decision logic.

The other is to capture structure explicitly at the point of decision. Typed traces, declared triggers, context that influenced the decision, recorded provenance. That version is auditable by construction. It does not require retroactive inference because the reasoning was preserved when it happened.

The first kind of context graph is what you get when you try to reconstruct the decision traces after the fact. The second is what you get when a decision plane captures it in the first place.

Promotion logic: a missing layer

In practice, the key question is simple: how does inferred structure earn promotion?

Discovery can surface a recurring exception heuristic, suggest precedent matches, generate counterfactual risk estimates, and propose ontological claims about what entities matter and what relationships hold. But what makes any of that eligible to influence a decision with consequences, and under what constraints?

A decision plane is essentially a specification for promotion logic. It makes explicit what constitutes a decision boundary, what inputs are admissible and how provenance is tracked, how conflicts are handled and when the system must route rather than silently arbitrate, what confidence thresholds apply for different action classes, and how candidate behaviors move through staged evaluation before they can drive broad execution.

This is not glamorous infrastructure, but it is the infrastructure that keeps autonomy from collapsing into hidden policy drift.

In practice, promotion tends to happen through two paths. One is deliberate codification: people examine messy context and enshrine stable structure as typed primitives with declared scope. The other is system-proposed codification: inference surfaces candidate structure from accumulated traces, and that structure earns authority only through staged evaluation and controlled rollout, with explicit rollback and demotion when it fails. The system can propose aggressively; it must act conservatively.

Both paths end at the same place: explicit structure with declared scope. The difference is whether that structure was prescribed upfront or earned through observed behavior.

Gupta & Garg gesture toward this with their emphasis on human-in-the-loop structures. The intuition is that autonomy without checkpoints collapses into unaccountable drift. The operational questions are: what exactly does the human approve, at what granularity, with what information, and how is that approval recorded so future decisions can reference it as precedent?

Simulation is valuable, but it is not governance

Koratana's world model framing is useful, especially for predictive questions like "what breaks if we deploy Friday?" Simulation can be operationally valuable even when uncertain, as long as it is treated as evidence and communicated with uncertainty.

The governance problem appears when simulation results are allowed to silently trigger actions. If a risk estimate triggers an automatic rollback, blocks a release, or pages an on-call rotation, simulation has crossed from prediction into decision-making with consequences. At that point you need explicit admissibility and thresholds: what uncertainty is tolerable, what corroboration is required, when must a human approve, and what does the system record as the basis for a decision.

So the simulation angle is not wrong. In fact, it aligns with what we call foresight at Memrail. It just needs a decision plane around it.

Where Memrail fits

This is the practical sense in which Memrail is not "competing with context graphs." We specify the decision plane and its primitives, which make context graphs operational in production settings.

In practice, Memrail can also make a context graph legible over time. When teams wire systems into Memrail, it ingests normalized, structured decision events with provenance. Those events can be stitched into a derived view of precedent, approvals, policies, and outcomes. The same event stream feeds Memrail's hindsight pipeline, which proposes world model changes and structured memory that can be adopted into production through explicit lifecycle gates. This process changes the decision landscape of an organization in a principled way.

The core idea is to separate evidence from authority.

Learned structure can produce evidence: proposed precedents, inferred patterns, predictions. In Memrail's framework, those become inputs, typed and provenance-tagged ATOMs and structured Insight artifacts, rather than implicit permission. Execution authority lives in explicit primitives with declared triggers, EMUs, evaluated deterministically over recorded inputs, with promotion gates that prevent unvalidated patterns from becoming active precedent by default.

This also aligns with a point Gupta makes in her writing on confidentiality and decision traces: in sensitive domains, audit cannot stop at "who accessed what." It has to include "what influenced what." That kind of constraint belongs naturally in a decision plane, where influence can be tracked explicitly and policy can determine what kinds of inputs may inform versus authorize specific actions.

A modest synthesis

I think these pieces gesture at a compounding dynamic, but they do not name it as the core object. My goal here is to make that explicit: decision traces reshape the decision landscape, which reshapes the next decision.

The context graph discourse is converging on the ingredients of a recursive system. The framing I want to add is the loop itself: traces accumulate, the decision landscape changes, and the system becomes self-organizing only if promotion into execution is governed.

Context graphs, as described in these pieces, highlight what they enable through the lens of hindsight, insight, and foresight. By capturing decision traces, making precedent searchable, surfacing structure that was previously implicit, and supporting counterfactual evaluation, they reshape the decision landscape over time. They make some choices easier, some risks more legible, and some exceptions harder to repeat without justification.

Context graphs describe what accumulates and suggest how the landscape can change. A decision plane specifies how structure, inferred or explicit, becomes safe, debuggable execution in workflows with consequences. If you want autonomy you can operate, and failures you can fix, you need both.