Why LLM Citations Look Nothing Alike in Perplexity ChatGPT and Gemini

Last month I ran the same twelve buyer questions through Perplexity, ChatGPT, and Gemini for a client, then logged every source each one cited. I expected heavy overlap. What I got looked like three different internets. A page that anchored the Perplexity answer never showed up in ChatGPT. Gemini leaned hard on a source the other two ignored completely, as if it were reading a different web. Same questions, same afternoon, three almost unrelated lists of sources.

That is not a glitch, and it is not random. The engines disagree because they are built to disagree. If you treat LLM citations as one thing you can win once, you will keep losing in two places you never checked.

That is not a glitch, and it is not random.

Michael McDougald

What LLM citations are, and why they rarely match

LLM citations are the sources an AI engine names or links when it answers your question. The catch is that LLM citations are not one list. Perplexity, ChatGPT, and Gemini each assemble their own LLM citations from a different index and a different set of rules, so one question returns three different sets of sources instead of a single shared answer.

That last part trips up almost everyone. A citation is the AI crediting a page it leaned on, usually with a link. A mention is your brand name showing up in the answer text, often with no link at all, a distinction LLM citations breakdown. You can have either, both, or neither, and which one you get changes from engine to engine. Citations and mentions are not interchangeable, and the gap between them is where most brands quietly lose.

How LLMs build a citation, from retrieval to selection

To see why the lists diverge, you have to see how a citation gets made. LLMs pull from two places: training data, the knowledge baked in before release, and live retrieval through RAG, retrieval augmented generation, where the engine searches the web in the moment. Most major LLMs now lean on RAG for anything current, and citations almost always come from that retrieval path, because RAG is the only mode where the model has real web content, the information it just pulled, and a URL in hand. When an LLM answers from memory alone, it has no source to credit and shows no links.

Retrieval does not run your question as one search. The engine decomposes it into a spray of smaller sub-queries, a step the industry calls query fan-out. Google describes this directly in its patent on prompt-based query generation for diverse retrieval, which covers generating multiple synthetic queries to pull a diverse set of documents. Your page is not competing for the question the user typed. It is competing for the sub-queries the engine invented, which is the same machinery I walked through in how content chunking actually works in AI search.

Then comes selection, and selection is brutal. The engine retrieves far more than it cites. Ahrefs studied 1.4 million ChatGPT prompts and found the model cites only about half the pages it actually retrieves. Reddit was pulled constantly yet credited just 1.93 percent of the time. In that same dataset, pages with clean, readable URLs earned a citation 89.78 percent of the time they surfaced, against 81.11 percent for opaque ones, so even the shape of your slug moves the needle. A citation is the survivor of three separate filters: which sub-queries fired, which pages got retrieved, and which of those the model decided to name.

Citation Rates

Reddit1.93 percent

Clean URLs89.78 percent

Opaque URLs81.11 percent

Source: Ahrefs

Why the three engines cite different sources

Here is the core of it. Those three filters are configured differently in every engine, so the output diverges by design.

Start with the index. The engines are not even reading the same web. Perplexity and Claude lean on Brave's index, while Google's models read Google's, a split Ann Smarty documents from independent testing, and OpenAI and Anthropic crawler sources. Different index, different candidate pool, different citations before any ranking even happens.

Then the fan-out depth. Passionfruit's research found a default ChatGPT model firing roughly one sub-query per prompt while a thinking model fired around eight. More sub-queries means a wider net and a different set of pages surfacing. The same data showed the thinking model swinging its citations toward pricing pages and homepages while the default model leaned on blog posts, so even two versions of one engine name different kinds of pages. Gemini, wired into Google Search and the Knowledge Graph, leans on the same quality signals Google already trusts, which is why its sources skew toward mainstream, heavily corroborated pages.

The numbers on divergence are louder than most people expect. An independent study of 118,000 responses found 11% overlap in cited sources. Even inside Google's own house the overlap is thin: AI Overviews and AI Mode cite the same URLs only 13.7 percent of the time. A separate benchmark that ran 75 buyer-intent prompts across five engines cross-engine citation overlap, which is a statistician's way of saying barely any. When I lined up the data from my client's twelve prompts, that 11 percent figure felt generous. The takeaway is the one I keep repeating to clients: a source that owns Perplexity can be invisible in ChatGPT, and you will never see it unless you check all three.

The signals every engine rewards: freshness, authority, and content structure

Different filters still share a few instincts, and those instincts are where the edge is.

Freshness is one. AI engines carry a real recency bias, weighting recently published or updated content over older pages, especially on anything time sensitive. Authority is another. Most ChatGPT citations come from domains with a Domain Rating above 60, not because the model reads domain authority directly, but because high-authority pages rank in the search systems it retrieves from. Relevance is the third. Every engine scores how directly a passage answers the fanned-out query, so semantic relevance to the sub-query beats raw keyword matching. Then there is extractability. Passionfruit found heavily cited passages ran about 20.6 percent entity density, three to four times normal prose, and that nearly 44 percent of citations came from the first 30 percent of a page. Bury the information that answers the query under 600 words of throat-clearing and the chunk that matters never surfaces.

The catch is how each engine weights those instincts. ChatGPT favors established authority and a short list of publisher domains it treats as kingmakers, names like Forbes and TechRadar that Passionfruit found surfacing again and again. Perplexity leans harder on community sources and data-heavy pages, which is why Reddit threads and statistics do well there. Gemini wants the consensus pick that Google's ranking already endorses, and the overlap proves it: 76% of AI Overview citations rank in top 10, per SeoProfy. Same raw signals, three different recipes. Optimizing for one recipe and assuming it carries to the others is exactly how the 11 percent overlap happens to you.

Why getting cited is not the same as your brand getting mentioned

Even when you win a citation, the win can be hollow. Citations and mentions pay out differently: a source link in the footer does nothing for you if your brand name never appears in the answer the user actually reads. Kevin Indig calls these ghost citations, and his analysis across four engines found 61.7 percent of citations are ghosts, credited as a source with no name recognition in the text. Semrush's version of the study put the gap in sharper relief: 6.2% vs 99.3% brand mentions in AI search.

So the same page can be cited and mentioned in ChatGPT, cited but invisible in Gemini, and never retrieved at all in Perplexity. Ann Smarty's framework names the failure modes well. There is the invisible citation, where the engine reads your page, pulls the information it needs to write the answer, but never links it, which is exactly what happens with all those uncited Reddit pulls. There is the reverse citation, where the model writes the answer from its training data first and only goes looking for supporting URLs after. In that second case the citation never shaped the answer at all. Your brand had to already live in the model's memory, a point that connects straight to AI search citation decisions.

What this means if you want to show up in all three

The strategy follows from the mechanics. Stop optimizing your content for a single engine's citations and start engineering for the parts all three LLMs share.

Be the extractable answer. Put the direct response in the first paragraph under a heading that matches the question, in plain text a retrieval system can lift from your content without guessing. That is the same passage-level discipline I covered in what it takes to rank in AI Overviews, and it travels across engines because every one of these LLMs retrieves chunks of content, not whole pages.

Build presence where each index is strong. That means high-authority owned content for Gemini and ChatGPT, and a real third-party footprint, Reddit, reviews, and earned coverage, for Perplexity. And get into the training data, because reverse citations only credit a brand the model already associates with the problem, and that association is built from your content and your name showing up across the web. None of this is a trick. It is the unglamorous work of being genuinely known, which is the through-line of the AI search survival manual and the work I do every day as an enterprise SEO consultant.

The mistake I see most is treating one engine's citation report as the scoreboard. It is one scoreboard out of at least three, and they are keeping different games. Win the mechanics they share, then check all three, because the one you skip is the one quietly recommending your competitor.

By Michael McDougald