Field Guide

Is your AI knowledge base safe?

If your team asks an AI tool questions and it answers from a store of your sales documents, that store quietly decides what your people and your buyers hear. This guide explains how that store can be made wrong, what the current security research and standards say about it, and the three controls you can put in place this week. You can apply all three without hiring anyone.

Author: Tim Doelger Reading time: 9 minutes Published: June 5, 2026
The problem

A knowledge base your AI reads from is an attack surface

An AI knowledge base is the store of documents an AI tool pulls from to answer a question. Load your pricing, your objection handling, your closed deals, and your call notes into a Project in Claude or ChatGPT, and your reps can ask it questions and get answers grounded in your own material. That is the value. It is also the exposure.

Anyone who can add a document to that store can shape the answers your team and your buyers receive. A planted document does not have to look suspicious. It can read like an ordinary paragraph about a topic your team asks about, with a single line inside it that changes the answer.

Picture a rep asking the AI about your cancellation terms before a renewal call. The answer comes back fast and confident, and it is wrong, because someone dropped one file into the base last month and nobody reviewed it. The rep repeats it to the buyer. The buyer believes it. That is the shape of the risk, and it does not require a hacker. A careless upload does the same damage as a malicious one.

What the research says

This is now a recognized risk, not a hypothetical

Security researchers put a number on it. In work accepted at USENIX Security 2025, a method called PoisonedRAG planted a handful of crafted documents into a knowledge base of millions and measured how often the AI returned the attacker's chosen answer.

~90%attack success rate

Five planted documents in a base of millions drove the AI to wrong, attacker-chosen answers about 90% of the time. That figure is from the black-box test, where the attacker knows the question but not the system's internals, so treat it as a floor. The defenses the researchers tried did not stop it.

Source: PoisonedRAG, accepted at USENIX Security 2025.

Read the scale carefully, because it cuts the other way from how it first sounds. That test ran against a base of millions of documents. Your sales knowledge base probably holds a few hundred. The point is not that your small base behaves like a giant one. The point is that even at enterprise scale, with millions of documents to hide among, five planted files were enough to win. At your scale, with fewer documents and connectors pulling in material from systems you do not fully police, the same logic means write access and source verification matter more, not less.

This also stopped being one paper you have to take on faith. In 2025, OWASP, the body that publishes the widely used Top 10 security lists, added this whole class of risk to its Top 10 for Large Language Model Applications. Data and Model Poisoning sits at LLM04, and Vector and Embedding Weaknesses, the risks specific to the way AI retrieves from a knowledge base, sit at LLM08. The mitigations OWASP names line up almost exactly with what follows below: accept data only from trusted and verified sources, and regularly audit the integrity of the base.

References: PoisonedRAG (USENIX Security 2025); OWASP Top 10 for LLM Applications 2025, entries LLM04 and LLM08.

The fix

Three controls, and a person who stays above the system

The defense is governance, and most of the weight is carried by three controls a small team can run.

1

Restrict who can write

Not just anyone gets to add to the base. A short list of people can write to it; everyone else reads from it. This alone removes most of the casual-upload risk.

2

Record every source

Every document carries where it came from. An unverified file does not get to quietly become the truth, because a reviewer can see its origin and judge it.

3

Approve every update

A person approves each change before it counts. Closed-won and closed-lost signals, objections, and call notes become reviewed updates on a fixed weekly cadence, not raw uploads.

There is a second reason a person stays above the system, and it has nothing to do with attackers. An AI workspace does not read your whole base to answer a question. It retrieves the handful of passages it judges most relevant, and full retrieval is never guaranteed in a single answer. A clean, well-structured base improves what surfaces, and a person catches what retrieval misses. That difference is what separates a base that sounds confident from one you can stand behind in front of a buyer.

You can apply these three controls to any base you already run, today, without hiring anyone. Restrict who can write to it. Record where each document came from. Put one person in charge of approving updates on a fixed weekly schedule. None of that requires new software.

Two questions people ask

Do you need a vector database? Is this just RAG?

Two technical questions come up whenever this gets serious, so here are plain answers.

You probably do not need a vector database

For most B2B teams of 5 to 50 revenue-facing staff, the base can live inside an AI workspace you likely already pay for, where you load your sales knowledge and connect to source systems through approved connectors. A custom vector database and retrieval pipeline is infrastructure you would have to run and secure yourself, and at this size it is rarely worth it. The skill is choosing the right tools, not building plumbing you then have to maintain.

This is the governance around RAG, not RAG itself

Retrieval, often called RAG, is the method an AI uses to pull relevant passages from your base. That part is the easy part. The work that matters is the governance around it: what goes in, who is allowed to change it, how it stays current, and who is accountable for the answer that reaches a buyer. The governance is what makes the answers safe to trust and the base an asset you own.

Do this this week

A six-point check you can run on the base you already have

You do not need a project to find out where you stand. Walk these six questions with whoever owns your AI tools. Any "no" is a place a wrong answer can reach a buyer.

The knowledge base safety check

  • Write access: Can you name the exact people who can add documents to the base? If the answer is "most of the team," that is the first fix.
  • Source of record: For any document in the base, can you say where it came from and who put it there?
  • Approval step: Does a person review and approve a document before it can shape an answer, or does it go live on upload?
  • Review cadence: Is there a fixed weekly time when one named owner updates the base from real sales signals?
  • Stale content: When did you last remove or correct an out-of-date document? A stale file does not sit quietly; it can surface as the answer.
  • Buyer-facing accountability: If the AI gave a buyer a wrong answer tomorrow, do you know whose job it was to catch it?

If most of those came back clean, your base is in better shape than most. If three or more came back "no," the wrong answer reaching a buyer is a matter of when, not if, and it is worth fixing before your next renewal cycle.

Questions

Common questions

Do I need a vector database for an AI sales knowledge base?
For most B2B teams of 5 to 50 revenue-facing staff, no. The base can live inside an AI workspace you likely already pay for, such as a Project in Claude or ChatGPT, where you load your sales knowledge and connect to source systems through approved connectors. A custom vector database and retrieval pipeline is infrastructure you would have to run and secure yourself, and it is rarely worth it at this size. The right move is to choose the right tools, not build plumbing you then have to maintain.
What stops the AI from giving wrong or poisoned answers?
Three governance controls and a person. Research accepted at USENIX Security 2025 showed that as few as five planted documents in a base of millions can push an AI to wrong, attacker-chosen answers about 90% of the time in a black-box test, and the common defenses the researchers tried did not stop it. So you restrict who can write to the base, record the source of every document, and require human approval before any update counts. Because an AI workspace also retrieves only the passages it judges most relevant rather than your whole base, the weekly human review is where a person catches what retrieval misses before a buyer ever sees it.
Is this just RAG?
Retrieval is the method an AI uses to pull relevant passages from your base. The work that matters is the governance around that method: what goes in, who is allowed to change it, how it stays current, and who is accountable for the answer that reaches a buyer. The retrieval is the easy part. The governance is what makes the answers safe to trust and the base an asset you own.
How does the knowledge base stay current?
A weekly human-run review turns real sales signals, closed-won, closed-lost, objections, and call notes, into approved updates. This matters because retrieval favors the most relevant passage it can find, so a stale document can surface as the answer. Keeping the base current is a governance task with a named owner and a fixed schedule, not a one-time upload.

Want the base built and governed for you?

The Revenue Knowledge Base Build is a flat 2,500 dollar done-with-you engagement. We set up a governed base inside an AI workspace you already own, with the three controls in place and a weekly review your team can run after we leave. You own the base. If you are still deciding whether AI belongs in your sales motion at all, start with a free AI Setup Call instead.

Or email Support@GeterDone.ai or call (732) 299-2543.