Business Context as Agent Infrastructure

May 3, 2026

I wrote a few months ago that the most dangerous people in a data org are the technical ones who understand the business. That post was about humans. Same rule applies to agents, harder.

An agent without business context is a fast idiot. It can execute. It can't reason about your business.

Almost every company racing to deploy agentic AI right now is making the same mistake. They're spending on the agent and skipping the knowledge layer underneath. That's backwards. The model is fungible. The knowledge isn't.

Here's the thesis. The most important investment your company will make in agentic AI for the next two years has nothing to do with which model you pick. It's whether you build a centralized, structured, machine-readable representation of your business that agents can actually load.

If you don't build that, every agent you deploy will reproduce the same generic CRM advice, the same hallucinated assumptions, the same answers that are technically correct and organizationally wrong. If you do build it, the same off-the-shelf agent goes from useless to indispensable.

This post is about what to build, what goes in it, and how to do it.

The mistake almost everyone is making

Pick any org racing to deploy agents. Look at what they've actually built.

You'll find model selection. Tool integrations. Vector databases. Prompt templates. A few narrow agents in production.

You won't find a centralized representation of how the company actually works. What its products are called now versus what they used to be called. How its tiers are structured. How its sales motion works. What its competitors are doing and how to position against them. What "Qualified Inquiry" or "Tier 3" or "Momentum package" actually means inside the four walls.

The reason isn't lack of source material. Most companies have plenty. Slide decks. PDFs. Sales scripts. Battle cards. Confluence. Notion. SharePoint.

The reason is that almost none of it is in a form an LLM can use. It's locked in slide layouts agents can't parse. It's scattered across five tools. It contradicts itself across documents. It's stale in ways nobody's audited. Pointing an agent at that pile is like pointing a junior employee at the corporate wiki and saying "go figure out the business." It doesn't work for humans. It really doesn't work for agents.

So what happens? Teams paper over it with longer system prompts. They cram in a few definitions, a few rules, hope for the best. The agent works on demos. It fails the moment it touches anything real.

The system prompt isn't the problem. The missing infrastructure is.

What you need to build

A centralized LLM knowledge base. One source of truth. Structured markdown. Machine-readable. Owned by the data team. Loaded as context into every agentic AI use case the company runs.

Treat it like a database for your business. Not your customers, not your transactions. Your business itself. How it's organized. What its terms mean. What its current state is. What tribal knowledge has been sitting in five people's heads for the last decade.

Three properties matter.

It's centralized. One repo, not five. Don't fragment.

It's structured. Markdown with consistent headings, tables for reference data, predictable formatting, so any agent can load any guide and know what to expect.

It's selectively loadable. An index lets an agent pick the right two or three guides for the task instead of choking on all of them.

This isn't a documentation project. Documentation is for humans. The LLM KB is for agents. The structure, the conventions, the update cadence are different. Most companies that already have "good documentation" still need to build this from scratch.

What goes in it

It varies by company. The KB shape follows the business shape. A B2B SaaS leans on sales-motion content. A healthcare provider leans on clinical pathways. A manufacturer leans on BOM hierarchy and supplier taxonomies. There's no universal template.

Here's the general idea. The KB holds buckets that capture how the business operates, in a form an agent can load. Pick the buckets that fit. Skip the ones that don't.

Possible inclusions:

Internal vocabulary. Definitions of terms that mean something specific inside the company. Where finance, product, and CS use the same word three different ways, write it down once. Every business has dozens of these. Most have never been written down.

Naming history. What products, segments, and processes used to be called. Renames after a rebrand. Old acronyms still in old contracts. Acquired-company products that never got harmonized. The agent has to translate between versions of the truth.

Structural taxonomy. How the offer is organized. Product hierarchy, tier structure, what unlocks where, included versus add-on. Not reference data. The system of how the business is shaped.

Operational logic. The rules that segment, route, score, or price. Thresholds, formulas, if-this-then-that policies. Not the values, the logic. The agent needs to reason with the rule, not memorize the table.

Process knowledge. The playbooks. How a discovery call should flow. How a renewal conversation gets reframed. How an incident gets triaged. Decision trees encoded in prose. Most companies have these in someone's head or in a deck nobody reads. The KB makes them executable.

Competitive and external context. The tells that identify which competitor a prospect is weighing you against, or which regulatory regime is in play, paired with the move that follows. Procedural knowledge that lives in your top operators' heads. Agents need it written down.

What doesn't go in: system prompts (agent configuration), code or schemas (repos and the data layer), customer-specific data (lookup at runtime), real-time signals (warehouse).

The test is simple. If something changes per-customer or per-task, it's not in the KB. If it's true about the company in general, it is.

How to build it

The build cost has collapsed. A coding agent can build most of it in a weekend.

The pattern is four moves.

First, ingest. Throw the existing source material into a sources folder. Decks, PDFs, scripts, contracts, internal wikis. Whatever's already written. Don't try to author from scratch. The goal is to extract what already exists into a useful form.

Second, distill. Use an LLM to read the sources and produce structured markdown guides. One guide per topic. Each guide has the same shape: a core principle, the framework or table, the signals to watch for. The LLM does the work of pulling content out of decks and into prose. It also handles deduplication when the same fact shows up in five places with five slightly different wordings.

Third, index. Generate a programmatic index alongside the guides. A JSON file mapping each guide to its topic and contents. This is the part that lets agents selectively load. A sales copilot pulls the pitch and product guides. A support bot pulls product and renewal. A risk-scoring agent pulls all of them. Without an index you have a pile. With one you have infrastructure.

Fourth, update additively. Source documents change all the time. Pricing updates. New packages. Renamed products. The pipeline should revise existing guides, not rewrite them. The KB grows with the business instead of getting recreated every quarter.

The leverage point is the fan-out on the right. Build the layer once. Every agent the company runs from that point forward picks up the same context for free.

I built one of these in my last few weeks at Niche. A side project, evenings and weekends. The pipeline came together fast because most of the heavy lifting is something a coding agent does well. I plugged the output into a sales intelligence app I'd been building, which used the KB to power partner health analysis and renewal risk scoring across thousands of accounts. Same model, same prompts, same data inputs. The KB was the difference between "this partner has expressed concerns about pricing" (useless) and "this partner is on the X pricing tier, the concerns are about Z, the upsell move is to do X to unlock Y" (actionable).

Same agent. Different infrastructure underneath. Different product.

Why this belongs to the data team

The LLM KB is a data product. Treat it that way.

It needs an owner. It needs a build pipeline. It needs review and update cadences. It needs the same governance discipline as a metrics catalog or a master data definition. Skip that and the KB rots in three quarters.

It's also the natural extension of what your team already does. Your team manages the canonical vocabulary, the metrics catalog, the master data definitions. The KB is the next layer up. It's what those definitions mean in human and agent-readable form. Pricing logic, product taxonomy, sales narratives, competitive context. These are downstream of the data layer your team already owns. Putting them in marketing or sales enablement is a category error. They belong with you.

If you're a CDO or VP of Data, three things follow. The KB is your asset, so add it to the data team's portfolio explicitly. Don't let it live in a marketing operations subfolder. You sponsor the build: the data team writes the pipeline, the business teams contribute source material and review output. Classic split. You set the standard, which means format, structure, naming conventions, update cadence. The KB needs to be predictable enough that any downstream agent can rely on it. That's a data discipline call.

This is a job nobody else in the org will do. If you don't claim it, it won't get built, and your agentic AI roadmap stalls at the demo stage.

Bottom line

If you're building agentic AI capability and you don't have a centralized knowledge layer, you're building on sand.

Start there. The agents can wait.

Vikram Chauhan

Business Context as Agent Infrastructure

The mistake almost everyone is making

What you need to build

What goes in it

How to build it

Why this belongs to the data team

Bottom line