The Google Algorithm Leak Changed Everything About How I Build SEO Strategy
    Back to Articles
    Algorithm Science and Technical SEO

    The Google Algorithm Leak Changed Everything About How I Build SEO Strategy

    Michael McDougald
    April 1, 2025

    I remember exactly where I was when Erfan Azimi's leak of Google's Content Warehouse API documentation hit the SEO world. I was disappointed. Not because the leak wasn't valuable, but because I'd already been reading Google's patents for the past decade, and this "shocking revelation" about how Google's algorithm works just confirmed what I'd been telling clients all along. The real shock wasn't what was in the leak. It was that 99% of the SEO industry was shocked at all.

    This is what infuriates me about most of the SEO agencies out there. They don't have a strategy. They don't read patents. They don't study how search engines actually work. They just blog because someone told them blogging works. Maybe they throw some AI writer at a stack of People Also Ask questions and call it a day. When the leak dropped, I watched other consultants scramble to understand things I'd been explaining to clients for years. Things that were hiding in plain sight in USPTO filings that anyone with a search bar could have found.

    The Google algorithm leak changed how I communicate with clients, but it didn't change my strategy. It just proved that the approach I'd been using was right when everyone else was guessing.

    What Actually Leaked and Why It Matters

    In March 2024, an SEO practitioner named Erfan Azimi discovered that Google's internal Content Warehouse API documentation had been accidentally pushed to a public GitHub repository. He shared it with Rand Fishkin at SparkToro, who published the initial findings, and then Mike King at iPullRank did a deeply technical breakdown of what the 2,500 pages of documentation actually contained. Google eventually confirmed that the documents were authentic, though they cautioned against "making inaccurate assumptions based on out-of-context information."

    Here's what matters: this wasn't a leak of algorithm weights or ranking formulas. It was the ingredient list. Google's engineers don't tweak individual weights on thousands of signals and push them live. They've built a system with thousands of measurable attributes, and then they apply machine learning to figure out which ones matter for which types of queries in which regions. The leaked documents showed us what those attributes are. Over 14,000 of them.

    The leak included references to distinct signals that Google uses to evaluate content, sites, links, and user behavior. Not all 14,000 matter equally. Some may be deprecated or experimental. But seeing that many signals in Google's system tells you something critical: Google's algorithm is far more sophisticated and site-specific than any one-size-fits-all ranking factor list would suggest. It's a personalized, context-aware system that considers your entire site's history, your topical expertise, your content quality, your link profile, and how real users actually interact with your content.

    Most agencies, when they read about this leak, probably thought "okay, so there are more ranking factors than I thought." Then they went back to writing blog posts based on whatever SEO blog they were reading that week. They didn't understand what it actually meant for how you build authority in your niche or why one site's content ranks over another's.

    NavBoost Runs the Algorithm and Google Lied About It for Years

    The single most important revelation in the leak wasn't about links or keywords or meta tags. It was NavBoost. NavBoost is Google's click-based ranking system, and it's been the engine running Google's core ranking algorithm far longer than Google ever publicly admitted.

    The leaked documents revealed that NavBoost tracks multiple types of clicks. There's goodClicks, which are clicks that lead to longer engagement. BadClicks, which are clicks that Google interprets as users finding what they clicked on irrelevant or unsatisfying. LastLongestClicks, which measure whether a user spent significant time on a page before returning to search results. This data comes directly from Chrome browser data. Google has a massive dataset of real user behavior, and that behavior is feeding directly into ranking signals.

    For years, Google representatives publicly stated that they don't use direct click data from browsers as ranking signals. Gary Illyes said "using clicks directly in rankings would be a mistake." John Mueller said "we don't use Chrome browsing data for ranking purposes." They were being extremely misleading about a distinction so technical that it barely matters in practice.

    The leak showed that NavBoost is one of the most referenced systems in Google's entire ranking infrastructure. During the DOJ antitrust trial, Google VP Pandu Nayak confirmed under oath that NavBoost has been using click data since approximately 2005 and that it's "one of Google's strongest ranking signals." A Google VP, under oath, said that click data is one of the strongest ranking signals. For years before that testimony, Google's public-facing team told SEOs that clicks didn't matter.

    NavBoost is also geo-fenced, meaning click behavior in Nashville might weight differently than click behavior in San Francisco. It's localized. It's personal. It's designed to show the most useful result to the specific person in the specific place searching right now.

    What this means for your strategy is straightforward: your site needs to earn real engagement. Not manipulated engagement. Not artificial clicks. Real users clicking on your result and staying on your page because you actually answered their question. This is why brand building matters. This is why being mentioned in local media matters. This is why word-of-mouth matters. This is why direct navigation matters. Every single one of those behaviors trains NavBoost to rank you higher.

    Most agencies still don't understand this. They optimize for the algorithm as if it's a machine that rewards keyword density or backlink quantity. It's not. It's a system that measures whether real humans actually prefer your content over the alternatives.

    Google Has Been Scoring Your Entire Domain This Whole Time

    The leak revealed something else that should have been obvious from reading Google patents, but apparently wasn't to most of the industry: Google has a siteAuthority metric. Google has been scoring your entire domain as a unit, not just as a collection of individual pages.

    The documents reference NSR, which appears to stand for Normalized Site Rank. They reference chromeInTotal, which appears to be a measure of how much traffic a site gets across the entire Chrome ecosystem. They reference site-level quality scores that average across all your pages to create a domain-wide quality measurement. There's even a mechanism where, if Google hasn't computed a score for a specific chunk of your site, it applies the average of your other chunks' scores to that unknown section.

    For years, Google denied having a "sandbox" or a site authority score. They said "we treat every page on its merits." They said "domain authority isn't a Google metric." The leak proved that's not how it actually works. Google absolutely has site-level metrics, and those metrics affect how every page on your site gets evaluated.

    This connects directly to a patent on site-level quality scoring that measures whether your entire domain demonstrates coherent expertise. The patent describes a system that evaluates not just individual pages but the overall quality signal of your entire website. The leaked siteAuthority metric appears to be the implementation of exactly this concept.

    This is why your site architecture is a confession. Every page you publish, every topic you cover, every pivot you make toward different industries or different audiences, gets averaged into your domain's overall topical focus and authority score. A poor page on your site doesn't just waste server space. It drags down your entire domain's ability to rank for everything else.

    This validates what I've been telling clients for years: you can't just publish content randomly and hope it ranks. Your site structure, your editorial focus, and your strategic commitment to specific topics actually matter more than most SEO tactics. Google is measuring whether your entire domain demonstrates coherent expertise, not just whether individual pages have the right keywords.

    Topical Authority Went From Theory to Documented Fact

    Before the leak, topical authority was mostly theoretical. We knew Google rewarded deep, authoritative coverage of topics. We could see it in search results. But we couldn't prove it from Google's documentation because Google wasn't sharing that information publicly. The idea of topical authority became a marketing gimmick in the hands of agencies that oversimplified it, turning it into "write about one topic and you'll rank" without understanding the actual mechanisms underneath.

    The leak revealed siteFocusScore, siteRadius, siteEmbeddings, pageEmbeddings, and site2vec. These aren't marketing terms. These are actual technical systems within Google's algorithm. siteFocusScore measures how tightly focused your site is on a particular topic or set of topics. siteRadius appears to measure how far your site strays from its core topics. And site2vec is a system that creates a mathematical vector representing your entire site's topical identity, similar to how word2vec creates vectors for individual words.

    Every page on your site gets measured against that site-wide vector. If a page aligns with your site's topical identity, it reinforces your authority. If it strays too far from what Google has learned your site is about, it weakens your overall signal. This isn't because Google is trying to punish you. It's because Google has learned that sites with strong topical focus tend to create better, more authoritative, more useful content about their area of expertise than sites that jump around to whatever trending topic they think will get traffic.

    This validates everything I've been building into client strategies for years: silo architecture actually matters. A site organized around coherent topic clusters ranks better than a site that's just a random collection of content. A site that stays focused on its area of expertise outranks a generalist competitor. A site that demonstrates that it knows its subject matter from multiple angles beats a site that just optimizes individual pages for keywords.

    What the Leak Reveals About Content That Actually Ranks

    The leak included signals around content quality that have direct implications for how you write. OriginalContentScore measures whether your content adds something new to the existing corpus of knowledge on the internet. This isn't about keyword density or readability metrics. It's about information gain, whether your content teaches readers something they couldn't have learned from the top 10 results already ranking for that query.

    Google also measures effortScore, which appears to be an estimate of how much effort and expertise went into creating content. This is where the algorithm moves beyond simple text analysis and into something closer to quality assessment. It's measuring signals that correlate with human expertise and research effort. A 500-word AI blog post that strings together information from five other blog posts scores low on effort. A deep dive where you synthesized original research, cited primary sources, and demonstrated firsthand experience scores high.

    The documents also reference freshness signals: bylineDate, syntacticDate, semanticDate. Google is tracking when content was published, how the text itself has changed over time, and whether the semantic meaning of the content has been materially updated. This means that updates to old content matter. This means that keeping your expertise current and visible matters. This means that an SEO content strategy that survives algorithm updates requires continuous attention to content quality and freshness, not just a one-time publishing burst.

    The relationship between OriginalContentScore and information gain connects to something deeper. Google has a patent describing how search engines measure whether content adds something new to the existing corpus of indexed pages for a given topic. The leaked signals show that this patent isn't theoretical. It's implemented. It's running right now on your site and every competitor's site, measuring whether you're just rehashing what everyone else is saying or whether you're actually advancing the conversation in your field.

    Most agencies have absolutely no framework for creating content that scores high on these metrics. They write blog posts because they read somewhere that "SEO companies should write blog posts." They use AI because it's cheap. They don't think about originality or effort or information gain because they don't understand what those concepts mean in the context of Google's ranking algorithm.

    Links Still Matter But Not How Most Agencies Think

    The leak included documentation of Google's link systems, and it shows that links are far more sophisticated than "more backlinks equals higher rankings." Google maintains what appears to be a three-tier link index system. Different types of links are indexed in different quality tiers, and they're weighted differently based on signals Google collects about those links, including whether the linking page actually receives real traffic.

    The weight of a link isn't just about PageRank flowing through it. It's about the anchor text. It's about when the link was created. It's about whether the linking page receives real user engagement. It's about whether the link is contextually relevant to both the source and destination. It's about the spam history of the linking domain. Google runs link quality assessments that are far more granular than "do more outreach and get more backlinks."

    Fresh links from newly published pages are weighted more heavily than old links sitting on static pages. This is why links from recent news coverage matter more than links you purchased from a ten-year-old directory. This is why earning coverage when you do something newsworthy actually works. You're not just getting a link. You're getting a link from a fresh, high-engagement page, which signals to Google that something real is happening with your brand.

    Link building strategies that survive algorithm updates work because they're focused on building real authority and real relationships, not on gaming link metrics. The leak confirmed that Google can tell the difference between a link you bought and a link you earned, between links that reflect genuine editorial endorsement and links that exist because someone paid for them.

    What I Actually Changed After Reading the Leak

    Here's what matters: the leak didn't change my core strategy. I was already doing everything the leak documented. I was already reading the patents. I was already building silo architecture because the math of topical vectors made sense to me years ago. I was already focusing on engagement because I understood that click signals matter. I was already auditing thin content because I knew that site-level quality metrics existed, even though Google wouldn't admit it.

    But the leak did give me clarity and confidence, and it changed how I talk to clients. I stopped saying "I think Google probably measures this." I started saying "Google definitely measures this, and here are the leaked documents that prove it."

    I doubled down on silo architecture. If anything, the leak made me more aggressive about organizing client sites into tight, focused topic clusters. siteFocusScore and site2vec are real, and they matter even more than I estimated. A site that strays from its core topics will always lose to a site that stays focused.

    I prioritized engagement metrics even more aggressively. NavBoost runs the algorithm. Direct navigation matters. Brand matters. Word-of-mouth matters. So I started pushing clients to think about brand building, not just search visibility. I started asking "how can we get more people to search for your brand name" instead of just "how can we rank for these commercial keywords."

    I started auditing thin content with ruthless honesty. If a page doesn't serve a strategic purpose in your site's topical architecture, and it doesn't demonstrate originality or genuine expertise, it's dragging your domain down. The March 2025 core update buried half of Nashville's local businesses because too many of them had thin content that looked like they didn't care about their own expertise. I stopped letting clients publish mediocre content just because they thought they needed a certain number of pages.

    Most importantly, I stopped caring what other agencies were doing. Because the leak made it clear that most agencies don't actually understand any of this. They're still blogging randomly. They're still buying backlinks from whoever emails them. They're still copying competitor content with different words. They're still treating the algorithm like a puzzle to game instead of a system that measures whether you're actually authoritative, whether you actually have expertise, and whether users actually prefer your content.

    The Google algorithm leak should have been a wake-up call for the entire industry. It should have forced every agency and every in-house SEO team to reconsider their strategies from the ground up. But it didn't. Most of the industry read about the leak, maybe wrote their own blog post about it, and went back to doing exactly what they were doing before. They have no framework for understanding what these signals mean. They have no strategy for improving site-level metrics. They have no process for building real topical authority.

    If you're working with an agency that doesn't understand silo architecture, site-level quality metrics, topical authority, and engagement signals, you're paying for a strategy that ignores what the algorithm actually measures. If you're a Nashville business watching competitors outrank you despite having worse content, worse expertise, and a worse reputation, the answer is probably that their site is structurally better than yours in ways that most agencies don't know how to fix.

    The Google algorithm leak didn't teach me anything fundamentally new. It just proved that the approach I'd been building from patent research and testing was correct all along. The question is whether the rest of the industry will ever catch up.

    Michael McDougald is the founder of Right Thing SEO, a Nashville-based SEO agency focused on algorithmic systems, site architecture, and topical authority. He's been reading Google patents since 2015 and has built silo architecture strategies for clients in healthcare, B2B SaaS, local services, and e-commerce.

    MM

    Michael McDougald

    Founder of Right Thing SEO, a math-driven SEO agency based in Nashville and Sarasota. Michael has spent 15+ years helping businesses achieve sustainable organic growth through data-driven strategies.

    Learn more about Michael →

    Ready to Stop the Fall?

    Get a free SEO assessment and discover what's holding your site back.