RAG Without Context Engineering Is a Fancy Search Bar
Retrieval-augmented generation ships demos fast. Without context engineering — provenance, freshness, permissions — it ships confident wrong answers at scale.
Retrieval-augmented generation ships demos fast. Without context engineering — provenance, freshness, permissions — it ships confident wrong answers at scale.
The pattern is everywhere: chunk documents, embed, retrieve top-k, prompt, ship. Stakeholders see an assistant that "knows our data." Engineering sees a pipeline that knows fragments with no opinion about which fragment is true, current, or allowed.
That is not a failure of embeddings. It is a failure to treat context as infrastructure — the same thesis as memory infrastructure, stated from the engineering bench.
What RAG actually solves
RAG is good at:
- Grounding a model on text you provide instead of parametric guesswork
- Updating answers when you re-index — faster than retraining
- Scoping retrieval to a corpus you control — in theory
For prototypes and internal tools with forgiving users, that is enough to learn. It is rarely enough for production without the layer below.
Context engineering — the missing discipline
Context engineering is everything that turns retrieval into trustworthy context:
| Concern | RAG-only default | Context engineering | |--------|-------------------|---------------------| | Authority | Highest similarity wins | Provenance, version, owner | | Freshness | Re-index when someone remembers | TTL, staleness signals, rollback | | Permissions | Often index-wide | Record-level boundaries into retrieval | | Enrichment | Chunks in a vector store | Metadata on a system of record | | Evaluation | Vibes | Tests on known questions + failure budgets |
Skip that table and you built search with an LLM glued on top — useful, sometimes dangerous.
Failure modes we see in production
Stale policy. The model cites the handbook from two revisions ago because nobody invalidated the old chunks.
Duplicate truth. Three summaries of the same decision live in the index with slightly different wording. Retrieval picks the confident wrong one.
Leakage. Sensitive chunks surface because indexing was coarse and filtering happened too late.
No rollback. A bad ingest poisons answers until someone notices in support tickets — the same class of outage as a bad deploy without a revert path.
These are why we argue Notion plus ChatGPT is not a memory layer and why capture once, use anywhere is an architectural promise, not marketing.
RAG is a component, not a category
Teams selling "RAG platforms" often sell indexing and prompting. Memory infrastructure adds the contract around it:
- Where capture happens (not only bulk upload)
- How enrichment attaches to records
- How assistants with different roles retrieve different slices
- Who operates freshness when the org changes faster than the index
That is the difference between a weekend pipeline and rails you can hand to the team that runs production — what we deploy on Services and validate through CapturedIt.
A practical evaluation checklist
Before calling RAG "done":
- Can you name the system of record for each retrieved chunk?
- What happens when the source doc edits — invalidation, versioning, or silent drift?
- Do permissions follow the record into every assistant surface?
- Do you have evals for questions auditors or customers actually ask?
- What is the rollback when retrieval quality drops?
If question four is empty, you have a demo. If questions one and three are vague, you have search.
Humans in the loop when stakes are high
Even good context engineering does not remove judgment. High-stakes outputs — customer commitments, safety, money movement — deserve human review, budgets, and fallbacks. We treat that as baseline production discipline, the same bar we describe for regulated operational intelligence.
Where to go next
If you are building or buying retrieval, start with the category definition on /memory-infrastructure, then read why everything becomes a context problem.
Scoping platform work? Signal on Contact with your stack — or explore engineering teams if the pain is context loss across repos, tickets, and copilots.
Related: Why Copilot doesn't remember your project, ChatGPT Projects vs. institutional memory.