Sarthak Garg

Institutional memory in the model era

Build memory available to both people and agents.

·5 min read·

For most of my career, institutional memory meant insurance against people leaving. Capture what someone knows before the door closes, hope it gets read later. The model era moves the load somewhere else. Agents are now doing real engineering work, and the quality of every next output is bounded by the context they can reach in the moment. The ceiling is now the input.

What counts as memory has also changed. A language model digests meeting transcripts and ticket histories without anyone having to structure them up front. Before language models, getting unstructured artifacts into something that could reason about them was the whole problem. Now it is the easy part. A team's memory is no longer the wiki built five years ago and abandoned by year three; it is whatever a model can reach when an agent (or a person on its other end) needs to make the next decision.

The first thing that convinced me to act on this was small. We took the component library every engineer was already using and made it directly accessible inside their coding agent. One source, one consumer. The quality of agent-generated UI lifted measurably and quickly. Before, the agent invented components that did not exist or rewrote ones we had already styled. After, it reached for the right building blocks because they were sitting where it could see them. The substrate was not new; the access was. That is the moment I stopped treating institutional memory as a documentation problem and started treating it as an input layer.

The same shape repeats wherever an agent does meaningful work on a real ticket. With the right context, the agent ships what I have started calling a stable feature: code that follows the architecture and references the existing test cases. Without it, the agent ships plausible code that compiles, looks reasonable, and a senior engineer still has to wire into reality. The model is the same in both runs; the substrate around it is not.

So this year I am leading an organization-level program to build a single memory layer for the team that the agent reads from:

  • Feature artifacts (specs and design files).
  • Code and architecture.
  • Test cases.
  • Realtime signals (team chat, meeting transcripts).
  • The project tracker.

Two targets:

  • Coverage above ninety percent of active surfaces. Below that, the agent hits gaps and falls back to plausible code.
  • Retrieval under five seconds. Slower than that, the developer reaches around the system instead of through it.

You will have work in progress here, and it's harder than it sounds.

Build order falls out of staleness, the rate at which each source goes wrong on its own. Cheap sources stay fresh as a byproduct of work; expensive sources decay the moment you stop maintaining them by hand.

  • V1 (auto-maintained pair): the code repository and the project tracker. Every commit and every ticket update refreshes the substrate without anyone having to remember to do it. Start here, because the maintenance bill has already been paid by engineers doing their jobs.
  • V2 (maintained-but-not-automatic pair): the documented surfaces and the realtime conversation layer. This is where staleness becomes the central design problem rather than a free win.

Every spec decays the moment the real decision moves. The spec is written once, the actual decision lands in a chat thread, and the spec is never updated. Downstream work, human or agent, goes sideways. Realtime ingestion is non-negotiable for that reason; without it, V2 is a museum of stale claims.

Tool-side gaps shape the sequence further. Agents can sit on scheduled video meetings today, but not yet on the ad-hoc voice calls that happen inside team chat. So the lightest, fastest conversations, often the ones carrying the decision, still outrun your capture, and you compensate with team discipline (a one-line summary back into chat after the call) until the tooling closes the gap. At our scale, the problem is how fast volume goes stale, and how expensive it is to re-plug stale sources.

Three things stay true regardless of how cheap tokens get and how good the off-the-shelf connectors arrive:

  • Selectivity. Do not ingest everything. Noise poisons retrieval, and the cost of context per query stays a real constraint because what you store grows at least as fast as per-token cost falls. The aggregator's job is to extract signal and decide which slice goes in for a given question; the hard work is what to leave out.
  • Org-specific stitching. Your architecture and your coding guidelines encode judgment a general-purpose connector will not figure out by reading your tools. Whether your database is read-optimized or write-optimized for the user base you serve is one example. Vendors will commoditize the plumbing in the next year or two. The stitching stays the engineering manager's job, because no outsider can know how to weight a design artifact against a runbook for your team's particular question.
  • The model is the shared interface. Engineers will increasingly read this substrate through a language model the same way agents do, because that is the cheapest way to read anything. Design for the one consumption layer, and you have designed for everyone who will read it.

You do not need an organization-level program to start. Pick one source that stays fresh on its own, and one downstream consumer (the IDE agent, onboarding lookups, runbook search, code review) that will use it tomorrow. Wire that one path this quarter. The cheapest version is a small bridge from one source to one agent, and even that returns measurable lift, the same way the component-library bridge did for us. By the time the off-the-shelf tools land, you will have spent months learning what context your team actually needs them to deliver.