How AI Overviews Decide Which Sources to Cite

A client forwarded me a screenshot last spring with one line of text: "Why are they citing this guy and not us?" The "this guy" was a competitor with a thinner page, fewer backlinks, and a domain a third the age of his. Yet there he was, linked on the right side of an AI Overview for a query my client had ranked first for in classic search for years. My client assumed it was a glitch or some kind of favoritism. It was neither. It was retrieval working exactly as designed, and his page was losing a contest he did not know he was entered in.

The mistake almost everyone makes is treating AI Overviews like a better search result, where the best page wins. That is not what is happening. Once you see the machine underneath the summary, the citation decisions stop looking random and start looking almost predictable.

Once you see the machine underneath the summary, the citation decisions stop looking random and start looking almost predictable.

Michael McDougald

How AI Overviews decide which sources to cite

AI Overviews decide which sources to cite by retrieval, not by which page is best. AI Overviews break the query into sub-queries and cite the sources whose passages match each sub-query. AI Overviews cite sources Google already ranks and already trusts. AI Overviews cite passages, not pages.

That last line is the whole game, so let me take it apart. The page does not get cited. A passage from the page gets cited, and the difference decides whether you show up.

The retrieval system underneath the summary

AI Overviews run on retrieval augmented generation, which is a long name for a simple idea. The Gemini model behind AI Overviews does not answer from memory. It searches Google's index, grabs supporting passages, and generates a summary grounded in what it just pulled. Google calls that grounding step exactly that, and Google's own documentation confirms the system uses a query fan-out technique, issuing multiple related searches across subtopics before it writes a word.

The fan-out is bigger than most people assume. Google's retrieval research, filed as query fan-out patent, describes using a large language model to generate synthetic queries and train a retrieval model to match many phrasings to the right document. In practice the system decomposes a single search into a fan of related sub-queries and retrieves passages for each one, and in AI Mode it may pull several chunks of context around each relevant passage. I wrote about that decomposition in depth in the piece on query fan-out, because it is the single mechanism that explains the most confusing citation behavior. Your page is not being scored against one query. It is being scored against eight, and you can win three and lose five.

This is why the unit of competition is the passage. iPullRank founder Mike King has argued for a while that chunking and passage-level relevance decide what gets retrieved, and his free Qforia tool is modeled directly on Google's fan-out patent. When I audit a page for AI visibility now, I stopped asking whether the page is good. I ask whether any single paragraph on it answers a specific sub-query better than the paragraph on the competing page. Those are different questions with different answers.

Why ranking in the top 10 still decides most citations

Here is the part that surprises people who think AI search threw out the old rules. It mostly did not. Ahrefs analyzed 1.9 million AI Overview citations and found that 76 percent of cited URLs also rank in the classic top 10, with the median position of the top cited URL sitting at position 2. Surfer's study of over 400,000 AI Overview searches put the figure at 52 percent of sources coming from the top 10. The numbers drift depending on the dataset and the month, but the direction never changes. If Google does not already surface your content in classic search, the retrieval layer rarely finds it either, because grounding pulls from the same index.

It helps to know where AI Overviews show up at all. Ahrefs found that AI Overviews trigger on 21 percent of keywords across 146 million SERPs, and that 99.9 percent of those keywords are informational, with question queries pulling an Overview 57.9 percent of the time. If your money pages are transactional, you will see fewer Overviews to compete in. If your topic is informational, you will see more competition and more chances to be cited.

AI Overview Trigger Statistics

Keywords with AI Overviews21 percent

Informational keywords99.9 percent

Question queries with Overview57.9 percent

Source: Ahrefs

So the floor is real. You have to be in the index and ranking for the relevant queries to even enter the selection pool. The competitor citing my client was not citing him despite ranking. He was ranking for the fan-out sub-queries my client had never targeted, which put his passages in the pool eight times while my client's page showed up once.

Why brand mentions outweigh your backlinks here

Once you are in the pool, the tiebreakers shift away from classic ranking signals toward something closer to reputation. Ahrefs' correlation study on AI Overview visibility found branded web mentions to be the single strongest correlating factor, ahead of branded anchors and most link metrics. A separate Ahrefs analysis found a 0.70 correlation between being mentioned on highly linked pages and AI Overview visibility, and their brand-factor work flagged YouTube mentions as the strongest individual correlate of all.

I do not read those correlations as proof that mentions cause citations. I read them as evidence that Google is using corroboration as a trust filter. When a brand shows up across many independent properties, the retrieval system has more reason to believe a passage from that brand is reliable enough to put its name next to Google's answer. Backlinks still matter for getting you into the index and the top 10. But at the citation layer, being talked about across the web does work that a clean backlink profile alone does not. That is a content and PR problem more than a technical one, which is why I push clients toward a content strategy that earns mentions instead of just chasing links.

The density beats length problem

The other thing the retrieval model rewards is focus, and this is where a lot of well-meaning content quietly disqualifies itself. DejanSEO's Dan Petrovic analyzed over 7,000 queries on Google's grounding behavior and found that the volume of content Google actually grounds on plateaus around 540 words. Pages over 2,000 words saw diminishing returns. His conclusion was blunt: adding more content dilutes your coverage percentage without increasing what gets selected. Density beats length.

This matches what Surfer found when it measured how often AI Overviews quote the exact keyword phrase from the query, which was only 5.4 percent of the time. The model is matching meaning, not strings. So stuffing a target phrase ten times does nothing, and padding a page with tangential sections actively hurts, because it lowers the share of the page that is on-topic for any single sub-query. I have watched this happen in real time on client pages. We add three "comprehensive" sections to a ranking article, traffic from AI surfaces dips, we cut them back, and the citations recover. The page got longer and less retrievable at the same time.

What actually gets you cited in AI Overviews

Put the pieces together and the citation logic is not mysterious. To get cited in AI Overviews you need to be indexed and ranking in the classic top 10 for the query and its fan-out variants, you need a recognizable brand that shows up across independent sources, and you need at least one tight, self-contained passage that answers a specific sub-query better than anyone else's passage. This is also where E-E-A-T stops being a slogan and starts being a filter. The brand recognition and authority signals Google leans on at the citation layer are the same expertise and trust signals its quality systems already score, which is why a recognized, authoritative source keeps getting cited while an anonymous one with a better page does not. Structured data plays a supporting role too. Google's John Mueller has confirmed structured data stays useful in the AI era because it helps the search systems that feed grounding understand your content, even if the language model itself reads mostly raw text.

None of this rewards the longest page or the cleverest writing. It rewards the most retrievable answer attached to a source Google already trusts. That is a humbler goal than "rank number one," and a more useful one, because it tells you what to actually do: cover the real sub-questions behind your topic, answer each one in a focused passage near a heading that names it, and earn the brand mentions that let Google trust you enough to say your name. The zero-click reality I described in the piece on how nobody scrolls anymore is not going to reverse. Getting cited is the new visibility, and for the full playbook on surviving that shift, the AI search survival manual walks through it end to end.

My client's fix was not a better page. It was eight focused passages targeting the sub-queries his competitor had quietly owned, on a domain Google already ranked. Six weeks later he was the one getting cited, and the competitor was the screenshot.

By Michael McDougald