May 30, 2026 ·Memory Infrastructure ·James Ross

RAG Without Context Engineering Is a Fancy Search Bar

Retrieval-augmented generation ships demos fast. Without context engineering — provenance, freshness, permissions — it ships confident wrong answers at scale.

The pattern is everywhere: chunk documents, embed, retrieve top-k, prompt, ship. Stakeholders see an assistant that "knows our data." Engineering sees a pipeline that knows fragments with no opinion about which fragment is true, current, or allowed.

That is not a failure of embeddings. It is a failure to treat context as infrastructure — the same thesis as memory infrastructure, stated from the engineering bench.

What RAG actually solves

RAG is good at:

Grounding a model on text you provide instead of parametric guesswork
Updating answers when you re-index — faster than retraining
Scoping retrieval to a corpus you control — in theory

For prototypes and internal tools with forgiving users, that is enough to learn. It is rarely enough for production without the layer below.

Context engineering — the missing discipline

Context engineering is everything that turns retrieval into trustworthy context:

| Concern | RAG-only default | Context engineering | |--------|-------------------|---------------------| | Authority | Highest similarity wins | Provenance, version, owner | | Freshness | Re-index when someone remembers | TTL, staleness signals, rollback | | Permissions | Often index-wide | Record-level boundaries into retrieval | | Enrichment | Chunks in a vector store | Metadata on a system of record | | Evaluation | Vibes | Tests on known questions + failure budgets |

Skip that table and you built search with an LLM glued on top — useful, sometimes dangerous.

Failure modes we see in production

Stale policy. The model cites the handbook from two revisions ago because nobody invalidated the old chunks.

Duplicate truth. Three summaries of the same decision live in the index with slightly different wording. Retrieval picks the confident wrong one.

Leakage. Sensitive chunks surface because indexing was coarse and filtering happened too late.

No rollback. A bad ingest poisons answers until someone notices in support tickets — the same class of outage as a bad deploy without a revert path.

These are why we argue Notion plus ChatGPT is not a memory layer and why capture once, use anywhere is an architectural promise, not marketing.

RAG is a component, not a category

Teams selling "RAG platforms" often sell indexing and prompting. Memory infrastructure adds the contract around it:

Where capture happens (not only bulk upload)
How enrichment attaches to records
How assistants with different roles retrieve different slices
Who operates freshness when the org changes faster than the index

That is the difference between a weekend pipeline and rails you can hand to the team that runs production — what we deploy on Services and validate through CapturedIt.

A practical evaluation checklist

Before calling RAG "done":

Can you name the system of record for each retrieved chunk?
What happens when the source doc edits — invalidation, versioning, or silent drift?
Do permissions follow the record into every assistant surface?
Do you have evals for questions auditors or customers actually ask?
What is the rollback when retrieval quality drops?

If question four is empty, you have a demo. If questions one and three are vague, you have search.

Humans in the loop when stakes are high

Even good context engineering does not remove judgment. High-stakes outputs — customer commitments, safety, money movement — deserve human review, budgets, and fallbacks. We treat that as baseline production discipline, the same bar we describe for regulated operational intelligence.

Where to go next

If you are building or buying retrieval, start with the category definition on /memory-infrastructure, then read why everything becomes a context problem.

Scoping platform work? Signal on Contact with your stack — or explore engineering teams if the pain is context loss across repos, tickets, and copilots.

#rag#context-engineering#memory-infrastructure#production#ai-native

Back to Field Notes