How Content Chunking Actually Works in AI Search Despite What Everyone Says

Two pieces of advice are fighting over your content right now, and both of them are wrong.

Illustration concept for content chunking

The first comes from Google. On its own Search Off the Record podcast, Danny Sullivan told people to stop turning content into bite-sized chunks, because you should not be crafting anything for the ranking system instead of for humans. The second comes from half the SEO industry, which heard the word "chunking," decided it meant shorter paragraphs and more headings, and started selling it as a formatting tip. I have watched clients bounce between those two stories for a year. One month they are told chunking is a gimmick. The next they are told it is the whole game. Neither version tells you what content chunking is actually doing inside an AI search engine, which is the only thing worth knowing.

What content chunking actually is in AI search

Content chunking is how AI search splits your content into chunks, then retrieves the single chunk that answers a query. It is a retrieval step, not formatting, so content chunking decides which chunk of your content competes for the answer.

It is not writing for robots either, and it is not new. Strip the jargon and content chunking means breaking a page of information into smaller, self-contained segments a machine can fetch one at a time, so a focused chunk on one topic gets retrieved while a buried one never surfaces. That definition matters because the word carries three unrelated meanings, and people argue past each other constantly. Google's own AI Overview for the term splits it into three buckets: chunking for AI and retrieval-augmented generation, chunking for human readability and design, and chunking for classroom instruction. The instructional and readability versions are about helping a person scan and remember. The retrieval version is about a machine breaking a document into segments it can index and fetch. They share a name and almost nothing else. When Sullivan says do not chunk, he is talking about the formatting version. When an AI engineer says chunking is everything, they mean the retrieval version. This article is about the second one, because that is the version quietly rewriting how your content gets found.

Passage retrieval and the content chunking nobody explains

Here is the mechanic underneath all of it. Modern search does not just rank whole pages anymore. It breaks pages into passages, stores each smaller segment, and matches individual passages to the queries people type. Google announced this directly when it shipped passage indexing, and it is the same idea running inside every retrieval-augmented generation pipeline behind AI answers. Your page gets cut into chunks, each chunk gets turned into a vector, and when someone asks a question that query becomes a vector too. The system runs an approximate nearest neighbor search to find the chunks sitting closest to the query in that vector space, and those are the candidates it reads from. Everything downstream depends on how your information got cut into segments in the first place.

If that sounds like the embedding math I have written about before, that is because it is the same machinery. The chunk is just the unit that gets embedded. How those embeddings predict which content gets surfaced is the layer below this one, and chunking is the step that decides what each embedding represents in the first place. Get the chunk boundaries wrong and you are embedding the wrong unit of meaning. A chunk that spans two topics produces a muddy semantic vector that sits near nothing in particular. A chunk that nails one topic produces a sharp vector that sits right next to the questions it answers. The retrieval system never sees your beautiful page. It sees a pile of chunks and picks the best one.

Get the chunk boundaries wrong and you are embedding the wrong unit of meaning.

Michael McDougald

Why single-topic chunking beats multi-topic chunking

This is where the formatting crowd accidentally stumbles onto something real without understanding why. A shorter passage about one subject scores higher relevance than a longer passage covering several subjects, and you can measure it. Mike King at iPullRank ran the test in public: he took one paragraph covering both machine learning and data privacy, split it into two single-topic paragraphs, and the machine learning passage jumped 19.24% in cosine similarity to its target query. Adding a heading above the data privacy passage lifted it another 17.54%. Same words. Better boundaries.

The reason is not aesthetic. Google holds a patent, Text Ranking with Pairwise Ranking Prompting, that describes comparing passages two at a time to decide which one is more relevant to a query before feeding the winner to the model. When your answer is welded to three other ideas in one block, it loses those pairwise comparisons to a competitor's cleaner passage. I pulled a client's service page that was ranking nowhere in AI answers, found the actual answer to their highest-value question buried in the back half of a 400-word paragraph, and split it into its own short passage under a question-shaped heading. The cosine similarity to the target query rose enough that the passage started getting cited within weeks. I changed the boundaries. I did not change the argument.

Fixed, semantic, and the chunking methods behind AI search

You do not get to choose which chunking method an engine runs on your page, but you should know the options, because your content is the raw material every one of them works from. Pinecone's breakdown of chunking strategies lays out the main approaches, and they sit on a spectrum from blunt to context-aware. Walk through them in order, because each one treats your structure a little more seriously than the last.

Fixed-size chunking

Fixed-size chunking slices text into equal token-sized blocks and ignores meaning entirely. You pick a chunk length, usually tied to the embedding model's context window, and the splitter counts tokens until it hits that number, then cuts wherever it lands. Pinecone calls this the right place to start for most applications because it is fast and it scales. It is also the method most likely to guillotine your best sentence in half, because it does not care that the sentence was in the middle of an idea. If a fixed-size chunker is reading your page, a buried answer is a coin flip.

Recursive chunking

Recursive chunking is the better-mannered version of fixed-size. Instead of cutting at a hard token count, it tries a list of separators in order, paragraph breaks first, then sentence breaks, then individual words, and only falls back to a blunt cut when nothing cleaner fits inside the size limit. LangChain's RecursiveCharacterTextSplitter is the common implementation. The practical upshot is simple: when you write in real paragraphs and complete sentences, a recursive chunker has clean seams to cut on, and your ideas and the information around them survive the split intact.

Layout-aware chunking

Layout-aware chunking, what Pinecone files under document-structure-based chunking, reads the actual scaffolding of the file: HTML tags, Markdown headings, the section hierarchy. It treats your H2s and H3s as boundaries and turns your sections into individual chunks. This is the method where your formatting stops being cosmetic and starts acting like instructions. A descriptive heading sitting above a self-contained section tells a layout-aware chunker exactly where one idea ends and the next begins, which is the whole game.

Semantic chunking

Semantic chunking, an approach popularized by Greg Kamradt, throws out fixed lengths and lets the embeddings decide. It walks the document sentence by sentence, measures the semantic distance between each sentence and the group before it, and cuts into smaller chunks where the meaning shifts. The payoff is chunks that hold a single topic by construction. The catch is that it costs more compute than the blunt methods, so you cannot assume any given engine is paying for it on your page.

Contextual chunking

Contextual chunking is the newest wrinkle, and it is telling. Anthropic's contextual retrieval prepends a short, model-generated summary of the whole document to each chunk before it gets embedded, so a passage that would have been ambiguous on its own carries its context with it. A frontier lab is spending inference to bolt context back onto isolated chunks. That is a direct admission that the chunk, not the page, is the unit that has to stand on its own.

What Google is probably running

Here is where it lands for search. You cannot see which method Google uses, but the evidence points to layout-aware. Metehan Yesilyurt's reverse-engineering of Google's AI Mode suggests a blend of fixed-length and layout-aware chunking that respects heading hierarchy, with a cascading-heading option that carries your H2 and H3 context down into each chunk it produces. If that is right, your headings are not decoration. They are the boundary hints the chunker reads when it decides where one chunk ends and the next begins. So write content whose natural boundaries already match its topic boundaries, and it stops mattering whether the engine runs fixed-size, recursive, or semantic chunking on the day it crawls you. Every one of those methods produces a clean single-topic unit, because you already did the cutting.

Despite what everyone says, content structure is not a trick

Sullivan's warning deserves a real answer, because the people quoting it have it backwards. His argument is that structure is a temporary hack and the systems will eventually reward "good writing" on its own. The research points the other way. As King documents, the frontier work on long context, from Infini-attention to memory-based architectures, all keeps passages as the atomic unit the model reasons over. Bigger context windows do not remove chunking. They make a clean, self-contained passage more valuable, because the model still has to decide which pieces to attend to and which to compress. Structure becomes more load-bearing as the systems get smarter, not less.

The real tell is that chunking and writing for humans were never opposites. A passage that states its point in the first sentence and stays on one idea is easier for a person to scan and easier for a machine to retrieve. The same edit serves both. When you make your strongest claims into clean, quotable passages, you are also feeding the off-page machinery that decides whether AI engines name you, and you are reducing the odds that a model stitches your words to the wrong context and says something about your brand you never said. None of that is crafting two versions of your content. It is writing one version that holds up when it gets torn into pieces.

How to chunk content so AI search can actually use it

Practically, this comes down to a few habits, and none of them require you to write worse. Group the information a buyer needs into sections that each answer one query. Put the answer in the first sentence under a heading, then support it, instead of building up to it. Keep one idea per paragraph so each chunk embeds as one clean topic. Phrase headings the way people phrase queries, because the heading travels with the chunk and sharpens its match. Make every passage self-contained, so it still makes sense when a retrieval system lifts it out of your content with no surrounding context. NVIDIA's testing on accurate AI responses found that page-level chunking delivered the highest average accuracy across a large document set, and most AI systems work in passages of roughly 100 to 300 words, so that is the scale to aim a section at. Aim each section at one question and you have done most of the content chunking work before an engine ever touches the page.

This is the work I do inside an enterprise SEO and AI search engagement, and it is unglamorous: read the page the way a chunker reads it, find the answers that are trapped in walls of text, and give each one a boundary and a heading. The full off-page side of this, the authority and corroboration that decides whether your clean chunks get trusted once they are retrieved, lives in our AI search survival manual. Chunking gets you into the candidate pool. Authority decides whether you win the pool.

The part that outlasts the next algorithm update

Every few months a new model arrives with a bigger context window and someone declares chunking dead. It never is, because the unit of retrieval has not changed. Whether the engine ranks pages, indexes passages, or feeds an agent, it still operates on chunks, not whole documents. The systems will keep changing. The fact that a focused passage on one topic beats a buried one will not. Write your content so the answer to every question your buyer asks stands on its own, and you have done the thing that survives the next update and the one after that. That is not optimizing for a machine. That is refusing to hide your best answer.

By Michael McDougald