How to Get Cited by AI Search and the Citation Graph Behind AI Citations

Last quarter I spent three weeks tracking one client's brand inside ChatGPT, Perplexity, and Google AI Overviews. Same questions their buyers actually type, run over and over, logged by hand. The brand barely showed up. What did show up was the same small set of outside sources feeding all three AI search engines. A review site here, an industry roundup there, one Reddit thread that would not die. The engines were not reading my client's site harder. They were reading the same handful of pages that already vouched for everyone else in the category.

Illustration concept for how to get cited by ai

That was the moment it clicked. Getting cited by AI is not a content problem you solve on your own pages. It is a graph problem. The sources these systems trust point at each other, and the ones inside that web of references get named while everyone outside it stays invisible. If you want to understand how to get cited by AI search, you have to look at the citation graph underneath it.

Getting cited by AI is not a content problem you solve on your own pages.

Michael McDougald

What the AI citation graph is and how to get your content cited by AI search

Getting cited by AI means becoming a source the citation graph already trusts. AI search engines like ChatGPT, Perplexity, and Google AI Overviews retrieve candidate passages, then decide which sources to name and which to cite by how many other trusted sources corroborate them. To get cited by AI, you earn citations and brand mentions from the sources these AI search engines already trust.

The citation graph is the network of sources that reference and corroborate each other across the open web. Wikipedia cites a news outlet, the news outlet cites a study, a review site cites both, and a forum thread argues about all three. AI search engines sit on top of that web and read it as a map of who is trustworthy on what. The mechanism that pulls those sources into an answer is retrieval-augmented generation, the 2020 method where a model grounds its answer in documents fetched from an external index instead of relying only on what it memorized in training. Retrieval decides what is eligible to be cited. The citation graph decides which of those eligible sources actually gets named.

The AI citation graph is Google's PageRank reborn

None of this is new. It is the oldest idea in search wearing a new coat. Google's original PageRank patent, filed in 1998, described the web as a linked database where the rank of a document is calculated from the ranks of the documents citing it. A page trusted by trusted pages inherited authority. Strip out the hyperlink and you have exactly what AI search runs on today. A source corroborated by corroborated sources inherits the right to earn citations. The same authority that once flowed through backlinks now flows through references, and the brands with the most authoritative links pointing into the graph inherit the most citations.

The difference is the unit. PageRank counted links between pages. The AI citation graph counts mentions and references between sources, linked or not. The structure is identical, and so is the dynamic it produces. Authority concentrates. top 20% of domains get 80% of AI citations. That is not a coincidence. It is what every citation graph does. Nodes with more references attract more references, and the gap between the cited and the uncited widens on its own. Mathematicians call it preferential attachment. SEOs have watched it happen to backlinks for twenty years.

When I pulled my client's citation data and mapped which sources fed each engine, the overlap was the whole story. The same review platforms and trade publications showed up across ChatGPT, Perplexity, and Google AI Overviews, plus two or three Reddit threads I came to recognize by their URLs. The engines looked independent. The citation graph feeding them was shared.

How AI search engines decide what content to cite

Once retrieval pulls a set of candidate passages, the engine has to choose which ones to ground its answer in. It is selective by design, not generous. Frase's analysis of AI citation behavior found that large language models cite only two to seven domains per response, far fewer than the ten blue links Google used to hand out. Getting into that short list is the entire game.

Three things decide it. The passage has to be extractable on its own, which means it answers the question in plain language without needing the rest of the page for context. The passage has to be dense with verifiable facts, because a model trained to avoid making things up reaches for claims it can check. And the source has to carry authority in the graph, meaning other trusted sources already point at it. Frase found that 44.2% of citations from the first 30% of text, so burying your answer in paragraph eight is the same as not writing it. On the Google side, Search Engine Land reported that 82.5% of citations go to interior pages. The engine wants the specific page that answers the specific question, not your front door.

AI Citation Trends

Citations from first 30% of page44.2 percent

AI Overview citations to deep pages82.5 percent

Source: searchengineland.com

Structure decides whether your content survives that step. Headings that mirror the exact queries people ask, short answer-first paragraphs, and structured data like FAQ and Article schema all make a passage easier to pull and quote. This is plain on-page work. The same content structure that earned a page its ranking in traditional search now points at retrieval instead of the ten blue links.

I have watched a single deep service page get cited for a query while the polished homepage that ranked above it in traditional search got skipped entirely. Ranking and being cited are now two different jobs.

Who gets named and who gets cited in AI search answers

There is a distinction most people miss, and it matters for how you measure this. Conductor draws the line cleanly: mentions vs. citations explained. You can be named without being cited, and cited without being prominently named. The citation graph governs both. A brand the graph corroborates gets named because the model has seen enough consistent description to say the name with confidence. A page the graph trusts gets cited because the model needs a source it can stand behind.

How many names you are competing against depends on the engine. Search Engine Land's study of nearly 8,000 citations across the major engines found ChatGPT and Google AI Overviews name only three to four brands per answer, while Perplexity will list thirteen. The fewer names an engine hands out, the more brutally the citation graph sorts who belongs. On ChatGPT, you are fighting for one of four slots, and those slots go to whoever the graph already treats as a default. For broad consumer queries those slots go to the obvious market-leader brands, but Perplexity will surface niche brands a ChatGPT answer never names, so the specific engine your buyers use changes which brands get cited.

How ChatGPT, Perplexity, and Google AI Overviews choose their citations

The engines do not read the same part of the graph, which is why the same brand can be named everywhere on one and nowhere on another. ChatGPT leans on the encyclopedic core. Search Engine Land's data put Wikipedia at 27 percent of ChatGPT's citations, and Profound's larger sample of Wikipedia as the most-cited domain across engines. Perplexity leans on freshness and community, pulling 46.5 percent of its citations from Reddit. Google AI Overviews lean on the same organic strength and community sources that already feed Google search.

On the surface that looks like three problems. Underneath it is one graph with three readers. Wikipedia, major news, large forums, and the same authoritative publishers feed all of them. An authoritative mention placed in a source the graph trusts does not lift you in one engine. It seeds the pool every engine drinks from. That is why I stopped optimizing for ChatGPT or Perplexity as separate channels and started asking a single question: is my client's brand a trusted node in the graph these systems share.

How to structure content and authority signals to get cited by AI search

Stop thinking about your page and start thinking about your position in the graph. The work that actually moves citations happens off your own domain. Frase found that 67% citation improvement across 5+ domains. That is corroboration doing exactly what PageRank predicted it would. Earn clean, consistent descriptions of your brand in the sources the graph already cites in your category, whether that is a review platform, a trade publication, or a well-moderated forum. This is the same off-site authority work behind the trust cascade for AI visibility, and it is why I treat digital PR as the engine of AI citation rather than a side project. I run this as enterprise SEO consulting because it lives in your off-site reputation, not your content management system.

This is also where traditional SEO still pays off. The authority signals that once earned backlinks and rankings, brand mentions, links from sites that matter, consistent entity data, are the same signals that tell AI search your brand is a node worth citing. The brands that already rank well tend to win the citation graph too, because the corroboration is already there. Original research is the strongest move of all, because every AI citation it earns is one more edge other sources can corroborate. Publish data nobody else has and you give other sources a reason to cite you, which is how a new node earns its first edges in the graph.

Then make your own pages worth pulling. The team behind the Generative Engine Optimization study tested this directly and found that adding citations, statistics, and direct quotations lifted visibility in AI answers by up to 40 percent. Put the answer in the first lines under a heading phrased the way people ask the question. Keep each section on one idea so retrieval can lift it cleanly, which is the same passage-level matching behind how embeddings predict which content gets surfaced. Use structured formatting and schema so the passage is machine-readable, give a direct answer to the specific queries your buyers ask, and cite authoritative sources inside your own content the way an authoritative page would. Add the statistic, name the source, make the claim checkable. Those structure signals are nothing new. They are the ranking signals search engines have always read, now deciding which citations an AI answer carries across the web. Keep your entity description consistent everywhere the graph can see it, because inconsistency tells the model it cannot say who you are with confidence. The grounding step that decides which trusted source actually gets used is the same one I broke down in how AI Overviews decide which sources to cite, and the full off-site playbook lives in our AI search survival manual.

Why the citation graph window is closing on new AI citations

A citation graph hardens with time, and that is the part that should make you move now. The longer an AI search engine treats a small set of sources as the default answer for a topic, the harder it gets for a newcomer to break in, because every new citation those incumbents earn raises the bar for everyone else. The brand that becomes a trusted node this year compounds an advantage the brand that waits will spend years clawing back. AI search does not reward the loudest publisher or the cleanest page. It rewards the most corroborated source. This is traditional SEO and digital PR doing the same job under a new name. The brand that invests in original research, authority, and a direct answer to every buyer question becomes the source AI search reaches for first. If you want to get cited by AI, stop polishing the page and its content in isolation and start earning your place in the graph that decides who gets named.

By Michael McDougald