The Vector Math Behind Rankings and Why It Matters for Your Website
    Back to Articles
    Algorithm Science

    The Vector Math Behind Rankings and Why It Matters for Your Website

    Michael Thiessen
    June 15, 2025

    How Google Turned Words Into Numbers

    Most SEO advice treats Google like a keyword counter. But Google stopped counting keywords years ago. It started measuring meaning, and the math it uses to measure meaning is vector math. Cosine similarity, dot products, embedding spaces. These are not academic abstractions. They are the scoring functions that decide whether your page ranks or disappears.

    I've spent the last decade watching the SEO industry lag behind the actual technology Google uses to rank sites. We keep talking about keywords, metadata, and links as if those are the primary ranking signals. They're not anymore. Those are filters. What actually determines your ranking position is how closely your page's semantic vector aligns with the user's query vector in Google's embedding space.

    The shift started with RankBrain in 2015, when Google introduced its first neural ranking function. RankBrain was a shallow model that converted unfamiliar queries into vectors and matched them against known query patterns. It handled about 15% of queries Google had never seen before. But it was just the beginning.

    BERT arrived in 2019 and changed the entire game. Instead of processing words left to right, BERT reads in both directions simultaneously. It understands that "bank" in "river bank" and "bank account" are completely different concepts because it processes the full context window. MUM expanded that further into multimodal understanding, processing text, images, and video in the same embedding space.

    And now, with the confirmed use of RankEmbed as a dual-encoder system verified in Pandu Nayak's DOJ testimony, Google is ranking pages based on their position in a high-dimensional vector space. RankEmbed uses two separate neural networks. One encodes the query. The other encodes the document. Both produce vectors in the same space, and the similarity between those vectors becomes the relevance score.

    I wrote previously about how embeddings actually work under the hood, and the core insight still holds. These are not magical black boxes. They are mathematical transformations that convert language into geometry. Understanding that geometry gives you an unfair advantage over every competitor who still thinks SEO is about keyword density.

    Cosine Similarity Is the Score That Decides Relevance

    Here's the core insight that most SEOs miss: Google doesn't measure relevance by counting how many times a keyword appears on your page. It measures relevance by calculating the angle between two vectors in space.

    Think of it this way. Every piece of text, from a user's search query to your entire webpage, gets converted into a vector. A vector is just a list of numbers. Not two or three numbers. Typically 768 or 1536 numbers, depending on the embedding model. Each number represents some hidden semantic dimension that the AI learned during training.

    When you search for "how to optimize website speed," Google converts that query into a vector. Your page about page speed optimization becomes another vector. Google then measures the angle between these two vectors. The smaller the angle, the more aligned they are, and the higher your relevance score.

    This angle-based measurement is called cosine similarity. It's elegant because it measures direction, not magnitude. Your page could be long or short, could discuss speed in a hundred different ways, but if it points in the same semantic direction as the query, it scores high.

    I found confirmation of this in the Chromium source code. Dan Petrovic's analysis discovered the Embedding::ScoreWith method, which computes the dot product between two normalized vectors. Mathematically, a dot product on normalized unit vectors is identical to cosine similarity. This is not some hypothetical scoring mechanism. This is the actual code Google uses to score relevance.

    The vectors themselves are normalized to unit length, which means Google explicitly threw away magnitude information and kept only direction. This is the smoking gun that proves vectors and semantic alignment drive ranking decisions.

    What fascinates me about this discovery is that Google's open source patterns in Chromium mirror how they handle similar problems in production systems. When you see the same algorithmic approach show up in Gemini, in Gemma, in their publicly available embedding models, you're looking at the same engineering DNA that powers their ranking algorithm. The Chromium history_embeddings component implements a complete semantic search pipeline: extract text passages, convert them to vectors, store them in a vector database, and retrieve results by dot product similarity. That is the same architecture Google Search uses at scale.

    The Leaked API Confirmed Vectors Are Production Ranking Factors

    In May 2024, researchers uncovered the Content Warehouse API leak. Most people focused on the document metadata. I focused on one field: siteEmbeddings.

    Google stores vector embeddings for every site and every page in their index. But here's what caught my attention: Google also tracks siteRadius, which measures how far an individual page's embedding deviates from its site's overall embedding vector.

    This matters. Google has built a vector identity for your entire domain. They calculated the centroid of all your pages' embeddings, and now they measure whether individual pages stay aligned with your site's topical center or drift away from it.

    Pages that drift too far from your site's vector identity get penalized. Not because of some manual action or bad links. But because you violated the mathematical integrity of your topical consistency.

    This connects directly to how Google measures trust and expertise. An expert site has coherence. Its pages reinforce each other. They occupy a tightly clustered region of the embedding space. A weak, scattered site has pages all over the place, no topical nucleus, no gravitational center.

    Think about what this means practically. A law firm that publishes blog posts about personal injury, family law, immigration, and tax law is scattering its site embedding across multiple unrelated regions of semantic space. Its siteRadius is enormous. Contrast that with a firm that publishes exclusively about personal injury law, building deep content clusters around car accidents, slip and fall cases, medical malpractice, and wrongful death. That firm's site embedding is tight, focused, and authoritative in one region of the vector space.

    Now every time you add a new page, you're either reinforcing your site's vector identity or diluting it. This is why topical authority actually works. It works because it aligns your site's mathematical signature. The math doesn't care about your brand reputation. It measures semantic coherence.

    Keywords Still Matter Because Google Uses Both Systems

    Here's where most people get it wrong. They see vector math and think keywords are dead. They're not dead. They're just playing a supporting role.

    Google uses what's called hybrid retrieval. Two fundamentally different systems work together in sequence. First, BM25, a sparse keyword-matching algorithm built on term frequency and inverse document frequency, filters the entire index down to a candidate set. BM25 handles recall. It scans billions of documents quickly because it only looks at exact token matches, weighted by how rare each token is across the corpus.

    Then, dense embeddings take over. Vector similarity re-ranks those candidates by actual semantic relevance. This is where the cosine similarity scoring happens. This is the precision layer. A query about "improving page load times" matches against pages about "website performance optimization" even if those exact words never appear, because the vectors point in the same direction.

    DeepRank, Google's internal name for BERT when deployed for ranking, only runs on the top 20 or 30 results. The computational cost of running a full transformer model on every candidate is too high, so Google reserves it for the final positions. Keywords get you into the running. Vectors determine where you finish.

    Marc Najork's research at Google confirmed this hybrid fusion approach. Neither sparse retrieval alone nor dense retrieval alone produces optimal results. The combination leverages the speed and recall of keyword matching with the precision and understanding of neural embeddings. This is not a theoretical framework. This is how Google Search actually operates.

    This means you need both. You need keyword presence so the sparse retrieval system notices you exist. You need semantic alignment so the dense ranking system places you high. The Nashville SEO Playbook approach accounts for this by treating keyword optimization and semantic clustering as interdependent, not competing strategies.

    The implication is that purely keyword-stuffed content still gets indexed, still might rank for some things, but it gets crushed in the dense ranking phase. Meanwhile, content with great semantic alignment but zero keyword presence never gets into the candidate set to begin with. You need both.

    What This Means for Your Website Right Now

    Understanding vector math is not academic. It changes what you should actually do with your site.

    First, topical consistency matters more than ever. siteRadius is real. Every page you publish either reinforces your site's topical center or pushes it off. You can't be a general interest site and rank for specialized topics. Google's vectors won't let you.

    Second, content structure should create clear semantic clusters. Don't scatter related concepts across ten different pages. Create content that shares vocabulary, references, and semantic space. Internal linking matters because it tells Google which pages belong in the same topical cluster.

    Third, schema markup becomes more important, not less. It helps machines understand entity relationships and topical connections. When you mark up authors, dates, organizations, and related concepts with structured data, you're giving Google explicit signals about semantic relationships that might be ambiguous in plain text. Research shows that properly tagged schema receives significantly more citations in AI-generated search results, which makes structured data a vector optimization play, not just a rich snippet tactic.

    Fourth, word count matters less than semantic completeness. You don't need 3,000 words if 1,500 words cover all the semantic territory you need. But you do need to cover the full semantic landscape of your topic. Surface-level treatment gets scored as shallow by the embedding models.

    Fifth, Google's embedding models keep improving. Google's Gemini Embedding 2 leads the MTEB benchmark for text representation. Every quarter, these models get better at measuring meaning. This means gaming the system with keyword density or manipulative linking gets harder every quarter. The models are getting better at understanding what you're actually saying.

    And here's what should concern you if you're relying on black-hat tactics: as these models improve, they'll catch more nuance. They'll understand context better. They'll measure actual expertise versus content that just looks good. If you're building real topical authority, this trend helps you. If you're trying to fake it, this trend destroys you.

    If you're serious about ranking in a vector-driven algorithm, you need someone who understands the math and the SEO together. Technical SEO expertise at this level means understanding embedding spaces, not just crawlability.

    The Vector Algorithm Is Not Going Away

    This is not a phase. This is the future of search, and it's already here. Google is not going back to keyword matching. They're going deeper into semantic understanding.

    The math is not going away. It's getting more precise. Models are improving every month. The embedding spaces are getting higher dimensional and more accurate. The ability to measure meaning in ways that rival human judgment is accelerating.

    Understanding vector math and vector similarity is not optional anymore for anyone serious about ranking. You can ignore it and hope. Or you can understand it and build pages designed for semantic relevance from the ground up.

    I built Right Thing SEO around this understanding because I watched too many smart marketers get left behind by the algorithm. They were brilliant at traditional SEO. They understood links, they understood user intent, they understood keyword strategy. But they didn't understand embeddings, they didn't understand vector spaces, they didn't understand why topical authority worked at a mathematical level.

    The competitive advantage now belongs to teams that can read the math. That can look at an embedding space and understand why their page ranks third instead of first. That can diagnose topical drift using vector analysis instead of guessing. That can build content strategies designed for semantic alignment from day one.

    The pages that will rank in 2026 and beyond are the ones that understand this. Not as theory. As practice. As the actual mechanism driving rankings.

    Michael McDougald is the founder of Right Thing SEO. He specializes in technical SEO and vector-driven ranking systems. You can reach him at [email protected].

    MM

    Michael McDougald

    Founder of Right Thing SEO, a math-driven SEO agency based in Nashville and Sarasota. Michael has spent 15+ years helping businesses achieve sustainable organic growth through data-driven strategies.

    Learn more about Michael →

    Ready to Stop the Fall?

    Get a free SEO assessment and discover what's holding your site back.