RGM-203 · SEO Mastery · Module 1 of 8

Technical SEO Foundations

The floor under which no amount of content or links can save you. A page that can’t be crawled can’t be indexed; a page that can’t be indexed can’t rank — and in 2026 the same floor decides whether AI assistants can quote you at all. Five gates, sixteen years of history, the field data, the tools, and the diagnostics we run on day one.

What you will learn15 sections▾

01Why technical SEO is the floor, not the ceiling 02A short history: how the technical floor was poured 03Crawlability: robots.txt, sitemaps, internal linking 04Indexability: canonicals, noindex, hreflang, parameters 05Rendering: server-side, client-side, and the JS problem 06The AI crawler era: a second audience for your floor 07Core Web Vitals and page experience 08HTTPS, mixed content, and security headers 09Mobile-first indexing 10Structured data and the rich result economy 11Site architecture and URL design 12Advanced playbook 13The working toolbox: nine tools, nine jobs 14Common mistakes 15Operating checklist — score yourself

Why technical SEO is the floor, not the ceiling

Technical SEO doesn't rank pages by itself. Content, intent matching, authority, and user satisfaction do that. But broken technical SEO blocks every other lever: a page that can't be crawled can't be indexed, a page that can't be indexed can't rank, a page that loads in 9 seconds can't compete in 2026 SERPs against pages that load in 1.5. Technical work is the floor under which no amount of content or links can save you.

The good news is that most technical issues are binary — either fixed or not. Unlike content quality or link earning, which are gradient and competitive, technical SEO has a clear endpoint per issue. The bad news: there are many more technical issues than most teams recognize, and they compound. A site with five medium technical problems can lose 30–60% of its potential organic traffic.

By the numbers The state of the technical floor, 2025-2026

Half the web still fails the basics — which is the opportunity

48%

of mobile origins pass all three Core Web Vitals. Desktop: 56%. Half the web fails.

62%

of mobile pages clear the LCP threshold — the choke-point metric of the three.

9×

longer for Google to crawl JS-dependent content than plain HTML in Onely’s research.

58%

click rate on rich results vs 41% for plain listings across 4.5M queries.

Sources: 2025 Web Almanac, Performance (CrUX) · Onely rendering-queue research · Milestone SERP study.

Interactive What breaks at each gate — tap a stage

The five-gate pipeline, failure by failure

Gate 1 · Can Google find the URL?

Google discovers URLs through links and XML sitemaps. What breaks here: orphan pages with zero internal links, sections buried five-plus clicks deep, and sitemaps full of redirects. A URL that is never discovered never enters the pipeline at all.

Gate 2 · May Google fetch it?

robots.txt is the bouncer. What breaks here: blocked CSS and JS (Google renders a broken page), staging rules shipped to production, and parameter traps that burn crawl budget on infinite filter combinations instead of your money pages.

Gate 3 · Does the content survive rendering?

JS-dependent content waits for a second rendering pass. What breaks here: headlines and links that only exist after JavaScript runs, rendering errors Google never reports loudly, and content gated behind user interaction — which Googlebot never performs.

Gate 4 · Does Google keep it?

Indexing is a decision, not a default. What breaks here: canonicals pointing at the wrong URL, accidental noindex, duplicates folded into a version you did not choose, and thin pages Google crawls but quietly declines to index.

Gate 5 · Now the competition starts

Only a discovered, crawled, rendered, indexed page gets to compete. Here content quality, intent match, links, and page experience decide the order. Technical SEO bought the ticket; it does not win the race.

A short history: how the technical floor was poured

Every rule in this module was forged by a specific moment in search history. Knowing the timeline matters for a practical reason: when you inherit a site, its technical debt is usually frozen at whichever era its last SEO worked in. Tap through the eight moments that built the modern floor.

Interactive timeline Eight moments that made technical SEO — tap a year

1998’s crawler grew up in public

Caffeine · indexing goes continuous

Google rebuilt its index from batch updates to continuous incremental indexing — 50% fresher results. The era of waiting for the “Google dance” ended; crawl efficiency started mattering every single day.

August 2014 · HTTPS becomes a ranking signal

Google announced HTTPS as a lightweight ranking signal — under 1% of queries at launch — and the web listened anyway. Today 98.8% of mobile requests travel over HTTPS. A whisper from Google moved an entire industry.

April 2015 · Mobilegeddon

The mobile-friendly update boosted mobile-ready pages in mobile results. The name out-dramatized the impact, but the direction was set: the phone, not the desktop, would define how Google sees your site.

July 2018 · the Speed Update

Page speed became a mobile ranking factor for the slowest pages, and mobile-first indexing began rolling out the same year. Performance moved from a developer nicety to an acquisition-channel input.

May 2019 · evergreen Googlebot

Googlebot jumped from Chrome 41 to always-current Chromium. Modern JS features stopped breaking the crawler — but rendering still costs a second pass, a lesson teams keep relearning.

June 2021 · Page Experience + Core Web Vitals

LCP, FID, and CLS became ranking signals with public thresholds and public field data. For the first time, you could read your competitor’s real-user performance straight out of CrUX.

March 12, 2024 · INP replaces FID

First Input Delay measured only the first tap; Interaction to Next Paint grades every interaction on the page. Whole categories of “passing” sites woke up failing — JS-heavy commerce hardest of all.

2026 · the AI crawler era

ChatGPT-User now makes 3.6× more requests than Googlebot in observed datasets, and no major AI crawler renders JavaScript. The technical floor suddenly has a second audience — covered later in this module.

Sources: Google — HTTPS as a ranking signal · Google — the Speed Update · Search Engine Land — INP replaces FID · SEJ — ChatGPT crawl data.

Crawlability

Crawlability is whether search engine bots can find and access your pages. Three foundational artifacts:

robots.txt

The first file every crawler requests. Located at the domain root (example.com/robots.txt). It controls which crawlers can access which paths.

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /*?sort=
Allow: /

Sitemap: https://example.com/sitemap.xml

Important nuances: robots.txt blocks crawling, not indexing. If a page is linked externally and Google can't crawl it, Google may still index it without content. Use noindex meta tag (which requires crawl access) to prevent indexing, not robots.txt.

Common errors: blocking /wp-content/ (kills CSS and JS for Googlebot which then ranks the site poorly), blocking the root of a section accidentally, allowing test/staging environments to be crawled in production.

XML sitemaps

Sitemaps tell crawlers which URLs to prioritize. Best practices:

Only include canonical, indexable URLs — not redirects, not noindex, not 4xx.
Keep individual sitemaps under 50,000 URLs and 50MB uncompressed.
Use sitemap index files for large sites with multiple sub-sitemaps.
Include <lastmod> dates that are accurate — Google increasingly uses them.
Submit through Google Search Console and Bing Webmaster Tools.
Separate sitemaps by content type (products, articles, categories, images, video) for granular monitoring of indexation rates.

Internal linking and crawl paths

The most important crawlability factor on large sites isn't robots.txt or sitemap — it's internal linking depth. A page 5+ clicks from the homepage may get crawled infrequently or not at all on sites with limited crawl budget.

Build clear navigation paths to important pages.
Use breadcrumb links and contextual in-content links.
Audit orphan pages (no internal links pointing in) — they often go un-indexed.
Use hub-and-spoke architecture: pillar pages linking to subtopics linking back to pillars.
Avoid endless URL parameter combinations creating crawl traps.

Calculator Your full-recrawl cycle — from two numbers you already have

How long does Googlebot take to see your whole site once?

Indexable URLs

Googlebot HTML fetches / day

% of fetches wasted (params, redirects, 404s)

29 days per full recrawl

WATCH ZONE

Fetches/day comes from Search Console › Settings › Crawl stats, or your server logs. Thresholds are RGM analysis from client log audits. Google’s own guidance: most sites under a few thousand URLs never need to think about crawl budget (Gary Illyes, Google). Want this with freshness-coverage math and a shareable URL? Use the standalone Crawl Budget Calculator.

RGM EXPERT TRICK

Diff the crawler against the logs — the gap is your dead weight

We never trust a crawl simulation on its own. Screaming Frog tells you what a crawler could reach; thirty days of server logs tell you what Googlebot actually fetched. The two lists never match.

Pages in the crawl but absent from the logs are invisible in practice — usually buried five clicks deep or orphaned. Pages in the logs but not the crawl are worse: parameter traps quietly eating your crawl budget.

Our rule on the accounts we inherit: any template Googlebot has not touched in thirty days gets new internal links or gets consolidated. We do not wait for rankings to confirm what the logs already said.

WHY IT’S RARE · Log access means asking DevOps, so most SEO teams audit the simulation and never once see the real crawl.

Indexability

Once crawled, can the page be indexed? The signals:

Canonical tags

Canonical tags (<link rel="canonical" href="...">) tell Google which URL is the preferred version when content is similar across multiple URLs. Critical for:

HTTPS migration (canonical to HTTPS version).
Faceted navigation (product list pages with filter parameters).
Print and mobile-only versions.
Syndicated content (canonical points to original).
Pagination (debated; current best practice is self-referencing canonicals on each paginated page).

Common errors: canonicalizing every page to the homepage (deindexes everything but the homepage), canonicalizing across language versions (use hreflang instead), canonical pointing to a 404 or noindex page.

Meta robots and X-Robots-Tag

Page-level indexing signals. Use noindex for thin or duplicate content you don't want indexed. Use nofollow sparingly — Google now treats it as a hint, not a directive.

X-Robots-Tag is HTTP-header equivalent; use for non-HTML resources (PDFs, images) you want noindexed.

hreflang

For multilingual or multi-regional sites. Tells Google which language/region version to serve which user. Implementation options: HTML link tags, HTTP headers, or sitemap annotations. Most common implementation error: missing return links between language versions, or pointing to incorrect language code.

URL parameters and faceted navigation

The hardest indexability problem for ecommerce sites. Product listing pages with filter parameters (color, size, price range, sort order) generate near-infinite URL combinations. Approach:

Canonical filtered/sorted pages back to the unfiltered category URL.
Block parameter combinations from crawling via robots.txt where appropriate.
Use noindex on low-value filter combinations.
Use parameter handling in Google Search Console (deprecated) or signal preference via canonicalization and internal linking.

Decision engine Will this page get indexed?

Flip the three switches — the verdict updates

robots.txt blocks the URL from crawling?

Page carries a noindex (meta robots or X-Robots-Tag)?

Canonical points at a different URL?

Fully indexable

No blockers. From here, indexing is Google’s quality call: thin or duplicate pages still get crawled and quietly declined.

Rendering: the JS problem

Modern web increasingly relies on JavaScript to render content. Google CAN render JavaScript, but it does so in a second pass after initial crawl, sometimes days or weeks later. JS-rendered content that depends on user interaction or that errors in the rendering pipeline may never make it into the index.

Rendering options

Server-side rendering (SSR). Server returns fully-rendered HTML. Best for SEO. Frameworks: Next.js (with getServerSideProps), Nuxt, Remix, Rails, Django, Laravel.
Static site generation (SSG). Pre-rendered HTML at build time. Excellent for SEO; trivial to serve. Frameworks: Next.js SSG, Hugo, Gatsby, Astro.
Client-side rendering (CSR). Empty HTML shell; JS renders content in browser. Worst for SEO — relies on Google's second-pass rendering.
Hybrid (ISR, partial SSR). Mix of strategies per page type. Modern frameworks support this well.
Dynamic rendering. Serve SSR to bots, CSR to users. Google has deprecated this recommendation but it's still in use; treat as legacy.

Debugging rendering issues

Use Google's URL Inspection tool in Search Console — shows rendered HTML.
Use Mobile-Friendly Test — lighter version of the same renderer.
Compare raw HTML (view source) with rendered HTML (DevTools Elements panel) for big sites.
Check for JS errors in Google's rendering — Search Console URL Inspection shows errors.
Check that critical content (headlines, body copy, links) appears in raw HTML or at minimum in the rendered HTML without user interaction.

Interactive Pick a rendering strategy — see the SEO verdict

Five ways to put HTML in front of Googlebot

Static site generation · verdict: gold standard

HTML is pre-built at deploy time, so every crawler gets complete content instantly with no render queue and no server cost per request. The fit: content sites, docs, marketing pages — anything that does not change per user. (Hugo, Astro, Next.js SSG.)

Server-side rendering · verdict: excellent

The server renders full HTML per request. Crawlers and users see the same complete page; you pay in server compute and TTFB discipline. The fit: personalized or frequently-changing pages that still need to rank. (Next.js, Nuxt, Rails, Django.)

Hybrid / incremental · verdict: the pragmatic default

Static where you can, server-rendered where you must, regenerated on a schedule (ISR). Per-template decisions are exactly how we run audits — the blog has no business being client-rendered just because the app is.

Dynamic rendering · verdict: legacy — migrate off

Serving rendered HTML to bots and JS to users. Google has deprecated the recommendation; it doubles your infrastructure and rots silently when the bot pipeline breaks. Treat any dynamic-rendering setup you inherit as technical debt with a deadline.

Client-side rendering · verdict: worst case for SEO

An empty shell plus JavaScript. Your content waits in the render queue — and in Onely’s research, JS-dependent content took up to 9× longer to crawl than HTML. Acceptable for logged-in apps; a self-inflicted wound for anything that needs organic traffic.

A lot of people are still looking at view source. That is not what we use for indexing. We use the rendered HTML.

Martin Splitt, Google Search Relations — JavaScript SEO Q&A, Search Engine Journal

RGM EXPERT TRICK

Run a render canary before you trust Google with your JavaScript

Before we let a client ship a JS-rendered template, we plant two canaries on one live page: a unique nonsense token in the static HTML, and a second token injected only by JavaScript.

Then we search both tokens in quotes every day. The gap between the HTML token getting indexed and the JS token getting indexed is your site’s real render lag — not the median Google quotes on a podcast.

If the JS canary takes more than a few days, that template ships server-side rendered. No debate, no framework loyalty.

WHY IT’S RARE · It measures your own render queue instead of arguing from Google’s averages — and almost nobody thinks to instrument it.

The AI crawler era: a second audience for your floor

Everything you just learned about rendering now applies twice. When someone asks ChatGPT, Claude, or Perplexity a question, a retrieval crawler fetches your page in real time — and what it can read decides whether your brand appears in the answer. The volumes have already flipped: in a 24-million-request study across 69 sites in early 2026, OpenAI’s ChatGPT-User made 3.6× more requests than Googlebot.

Here is the part most teams miss: no major AI crawler renders JavaScript. Vercel’s analysis of over a billion requests found GPTBot and ClaudeBot fetch JS files but never execute them; only Google’s infrastructure (and so Gemini) renders. A client-side-rendered page is a blank page to the systems your buyers increasingly ask first. Server-rendered HTML, clean structured data, and a deliberate robots.txt policy for AI agents are now revenue decisions, not hygiene.

Field data Monthly crawler volume across Vercel’s network

The newcomers are already a fifth of Googlebot — and climbing

Googlebot

4.5B

GPTBot

569M

ClaudeBot

370M

3.6×

ChatGPT-User requests vs Googlebot across 24M requests, Jan–Mar 2026.

major AI crawlers that execute JavaScript (Google’s Gemini is the exception, via Googlebot infra).

57.7%

of ChatGPT’s fetches target HTML — it wants your markup, not your bundle.

OpenAI crawlers to manage: GPTBot (training) and ChatGPT-User (live retrieval). Different jobs, different robots.txt calls.

Sources: Vercel — the rise of the AI crawler · SEJ — ChatGPT now crawls 3.6× more than Googlebot.

Crawlability shapes everything that follows in AI search.

Aleyda Solís, founder of Orainti — Humans of Martech podcast, January 2026

The practical checklist for AI visibility: server-render anything you want quoted in answers; keep titles, prices, and key facts in raw HTML; decide per-bot robots.txt policy (training crawlers vs retrieval crawlers are separate decisions); and keep structured data complete — machine readers lean on it harder than Google does. RGM’s full treatment lives in the AI Search, AEO & GEO series.

Core Web Vitals and page experience

Crawled, rendered, indexed — now your page has to feel good to use. Google's Core Web Vitals are three metrics measuring real-user page experience:

Metric	What it measures	Good	Needs work
LCP (Largest Contentful Paint)	How fast the main content loads	< 2.5s	2.5–4s
INP (Interaction to Next Paint)	How fast the page responds to interaction (replaced FID in March 2024)	< 200ms	200–500ms
CLS (Cumulative Layout Shift)	How much the layout jumps as it loads	< 0.1	0.1–0.25

Page Experience signals also include HTTPS, no intrusive interstitials, and mobile-friendly. CWV is a real but modest ranking factor — it tiebreaks more often than it determines top-3 placement. But it affects user experience materially, which feeds back to rankings through engagement signals.

Improving LCP

Optimize images: WebP/AVIF formats, proper sizing, lazy-loading below the fold but eagerly loading the LCP element.
Preload the LCP image with <link rel="preload">.
Reduce render-blocking JS and CSS.
Use a CDN for static assets.
Server response time under 600ms; use caching, optimized DB queries, edge rendering.

Improving INP

Break up long-running JS tasks; use requestIdleCallback or scheduler.postTask.
Reduce JS bundle size; code-split.
Avoid synchronous third-party scripts.
Profile interactions with Chrome DevTools Performance panel; identify slow event handlers.

Improving CLS

Set explicit width/height on images and embeds.
Reserve space for ads, iframes, and dynamic content.
Avoid injecting content above existing content after page load.
Use font-display: swap with care to avoid flash of unstyled text shifts.

Field data Who actually passes — CrUX, July 2025

Pass rates by metric: LCP is the choke point

All 3 · desktop

56%

All 3 · mobile

48%

LCP · mobile

62%

INP · mobile

75%+

CLS · mobile

75%+

Source: 2025 Web Almanac, Performance chapter (CrUX field data). INP and CLS clear 75% globally; LCP is what most failing sites fail.

Benchmarks CWV pass rate by platform — your stack sets your starting line

Managed platforms win on defaults; WordPress wins on freedom (and pays for it)

Duda

84%

Shopify

75%

Wix

71%

WordPress

43%

Source: SEJ — 2025 Core Web Vitals challenge (CWV Technology Report). The takeaway is not “switch CMS” — it is that an unmanaged stack makes performance YOUR full-time job: theme, plugins, hosting, and images each need an owner.

Interactive Where does your LCP land?

Drag your 75th-percentile LCP — read the verdict

GOOD <2.5s

NEEDS WORK 2.5–4s

POOR >4s

2.4s

Case study · Vodafone Italy · web.dev

-31%LCP load time+8%sales+15%lead-to-visit rate+11%cart-to-visit rate

Vodafone Italy ran an A/B test where the only variable was performance: an optimized landing page against the original. Render-blocking JS was deferred, the hero image was preloaded and resized server-side, and critical CSS was inlined. The 31% LCP improvement produced 8% more sales — measured in revenue, not a lab score. The often-cited lesson: they tested performance like a feature, with a control group, so finance believed the number. (web.dev case study)

Two more receipts · web.dev case studies

0.25→0.09The Economic Times · CLS-43%bounce rate+7%redBus sales, via INP work

The Economic Times rebuilt ad slots and image containers with reserved space, cut CLS from 0.25 to 0.09, and watched bounce rate fall 43%. redBus attacked INP — long tasks, event handlers, input delay — and tied the responsiveness work to a 7% sales lift. The pattern across every published CWV case: the wins are measured in revenue and retention, not in scores. (web.dev — business impact of Core Web Vitals)

RGM EXPERT TRICK

Grade yourself the way Google grades you: p75, on a cheap phone

Core Web Vitals pass or fail at the 75th percentile of real Chrome users — not the average, and never your office wifi. A page that flies on a MacBook can fail its assessment on the mid-range Androids half your audience carries.

So we test on a throttled mid-tier device profile, and we watch CrUX field data per template, not per URL. A regression in one template hides inside a site-wide average for months.

And when LCP has to come down, we look for things to delete before things to optimize. The fastest hero video is the one you removed.

WHY IT’S RARE · Most teams chase a Lighthouse lab score on developer hardware — a number Google never once looks at.

HTTPS, mixed content, and security headers

HTTPS: Non-negotiable since 2018. Chrome marks non-HTTPS as "Not Secure." Use HSTS to enforce HTTPS at the browser level.
Mixed content: HTTPS page loading HTTP resources is blocked or flagged. Audit and fix references.
Security headers: CSP, X-Content-Type-Options, X-Frame-Options, Referrer-Policy. Not direct ranking factors but contribute to user trust signals.
HTTP/2 or HTTP/3: Material speed improvements. Most CDNs support automatically.

How settled is this? 98.8% of mobile requests now travel over HTTPS (2025 Web Almanac, Security). HTTPS stopped being a differentiator years ago — what remains differentiating is getting HSTS, mixed-content cleanup, and security headers right on day one of every migration.

Mobile-first indexing

Google began moving sites to mobile-first indexing in 2018 and declared the rollout complete in October 2023 (Google Search Central blog). There is no “desktop index” anymore: the mobile version of your site IS your site. Implications:

Mobile site must have all content the desktop site has — same headlines, same body copy, same structured data, same internal links.
Responsive design (one HTML, CSS adapts) is preferred over separate mobile site (m.example.com).
Mobile UX matters — tap targets, font sizes, viewport settings.
Mobile rendering should be tested specifically; tools sometimes diverge between mobile and desktop rendering.

Structured data and the rich result economy

Schema.org structured data lets you mark up your content semantically, enabling rich results in SERPs — star ratings, FAQs, product prices, recipe images, events, job postings, breadcrumbs.

JSON-LD is the preferred implementation (vs Microdata or RDFa) per Google.
Common useful schemas: Product, FAQPage, HowTo, Article, BreadcrumbList, LocalBusiness, Event, Recipe, Review, Organization.
Validate with Google's Rich Results Test and Schema.org validator.
Don't over-mark: Schema must match visible content. Schema spam can trigger manual action.
AI search prep: Structured data feeds LLM-powered search experiences (Google AI Overviews, Perplexity, Bing Copilot). Increasingly important for AEO/GEO.

By the numbers The rich-result economy, measured

Half the web is annotated — the half that gets the clicks

51%

of pages now carry structured data of some kind (2024 Web Almanac).

41%

of pages use JSON-LD — up from 34% in 2022; Google’s preferred format won.

87%

CTR on the best-performing rich result types in Milestone’s 4.5M-query study.

+35%

higher CTR for results with review stars vs plain links (SEJ, 2023 study).

Sources: 2024 Web Almanac — structured data · Milestone SERP study · Search Engine Journal.

Site architecture and URL design

URL structure

Short, descriptive, keyword-relevant, lowercase, hyphenated.
Avoid auto-generated parameters when possible; rewrite to clean URLs.
Avoid changing URL structures unless necessary; if changing, use 301 redirects mapped 1:1.
Consistent trailing slash policy (with or without — pick one).

Information architecture

Group content by topic, not by date or publication batch.
Limit directory depth to 3–4 levels for most content.
Build hub pages (pillar pages, category pages) that link to subtopic content.
Internal anchor text should be descriptive, not generic ("click here").

Architecture math Crawl attention decays with every click of depth

Relative Googlebot visit frequency by click depth — RGM analysis from client log audits

Homepage (depth 0)

daily

1–2 clicks deep

most weeks

3–4 clicks deep

monthly-ish

5+ clicks deep

rarely / never

Labeled RGM analysis: directional pattern we see consistently in client server logs, not a published universal constant. The lever: every internal link from a high-crawl page is a transfusion of crawl attention to a starved one.

Advanced playbook

Log file analysis. Server logs show exactly which pages Googlebot crawls and how often. Tools: Botify, OnCrawl, Screaming Frog Log File Analyser, JetOctopus. The single best diagnostic for crawl budget issues.
Crawl budget management. For sites with 100k+ URLs, crawl budget matters. Reduce crawl traps (parameter combinations), consolidate thin content, fix slow server response times.
JavaScript rendering audit. Compare initial HTML, rendered HTML, and what Googlebot indexes. Identify content gaps.
Critical rendering path optimization. Inline above-the-fold CSS, defer non-critical JS, preload key assets, use HTTP/2 server push or HTTP/3 priorities.
Edge SEO. Use edge workers (Cloudflare Workers, Vercel Edge Functions) to implement redirects, header rewrites, dynamic structured data without backend deploys.
Search Console API automation. Pull performance data, indexing status, Core Web Vitals, and structured data errors into your data warehouse for monitoring at scale.
Pagespeed Insights API monitoring. Track CWV trends per template, not just per URL. Identify template-level regressions before they affect rankings.
International SEO architecture. Subdirectory (example.com/de/) vs subdomain (de.example.com) vs ccTLD (example.de). Each has trade-offs; subdirectory wins for most.
Faceted navigation policy. Document which facets are indexable, which canonical to category, which are blocked. This is a multi-month project for large ecommerce sites.
Sitemap monitoring as KPI. Track indexation rate (URLs submitted in sitemap / URLs actually indexed) over time. Sub-80% indicates content quality, internal linking, or crawl budget issues.

Step by step The RGM 90-minute technical triage

What we run on day one of every engagement — before any tool subscription

Fetch /robots.txt by hand.Look for blocked CSS/JS paths, staging rules that shipped to production, and whether the sitemap line exists. Five minutes; finds something embarrassing on most sites we inherit.
Run a site: search on the domain.Compare the indexed count against the pages you actually have. Spot staging subdomains, parameter junk, and duplicate paths competing with money pages.
Open the XML sitemap and click ten URLs.Every one should return 200, canonical to itself, and be indexable. A sitemap full of redirects and noindex pages tells Google your quality signals are noise.
URL-Inspect your top five money pages.In Search Console: indexed? Google-selected canonical = your declared canonical? Does the rendered HTML contain the actual content?
Run PageSpeed Insights on one URL per template.Read the CrUX field section first and ignore the lab score. Field p75 is what Google grades; note which template fails which metric.
Diff raw HTML against rendered HTML on one JS template.If headlines, body copy, or links exist only after JavaScript runs, flag that template for server-side rendering and plant a render canary.
Read the indexation rate in the Pages report.Indexed divided by submitted. Below ~80% is not a technical bug — it is Google declining your content; route it to the content roadmap.

The working toolbox: nine tools, nine jobs

Every tool below earns its slot by answering a question no other tool answers as well. Learn the question each one exists for, and you will never be the auditor who runs everything and concludes nothing.

Curated The RGM technical stack — by the question it answers

Nine questions, nine answers

What CAN be crawled?

Screaming Frog

The desktop crawler standard. Simulates a crawl, surfaces broken links, redirect chains, canonical conflicts, and orphan candidates. First tool open in every audit.

What does Google SAY it did?

Google Search Console

The only source of truth for index status, Google-selected canonicals, CWV field assessments, and crawl stats. Free, primary, non-negotiable.

What do real users FEEL?

PageSpeed Insights + CrUX

Field data at p75 — the exact numbers Google grades. Read the field section first; treat the lab score as a debugging hint, never a KPI.

What did Googlebot ACTUALLY fetch?

Botify / OnCrawl / JetOctopus

Log analyzers turn raw server logs into crawl-budget truth at enterprise scale. The only way to see waste, frequency, and the pages Google ignores.

WHY is it slow?

Chrome DevTools + Lighthouse

Performance traces, long-task hunting, LCP element identification, INP interaction profiling. Where diagnosis happens after field data says “sick”.

Will it earn rich results?

Rich Results Test + Schema validator

Validates eligibility (Google’s test) and correctness (Schema.org validator). Run both — they catch different failures.

Is it visual + auditable?

Sitebulb

Crawl visualizations and prioritized hints that make architecture problems legible to non-SEO stakeholders. The audit deck writes itself.

Is it regressing over time?

DebugBear / WebPageTest

Synthetic monitoring per template with alerting. Catches the slow creep — the third-party tag that added 400ms — before CrUX makes it official.

What changed in the HTML?

URL Inspection + view-rendered-source

Compare raw HTML to rendered DOM to indexed version. The three-way diff that settles every “is JavaScript hiding our content?” argument.

Common mistakes

Blocking CSS/JS in robots.txt — Googlebot can't render the page properly and rankings suffer.
Self-canonical to homepage on every page — mass deindexation.
Treating noindex and nofollow as equivalent — they aren't.
Pure client-side rendering for content-heavy sites — relies on Google's second-pass renderer, which lags weeks.
HTTPS migration without comprehensive 301 mapping — massive ranking loss.
Sitemap including 404s, redirects, and noindex URLs — signals poor quality to Google.
Pagination with rel=prev/next (deprecated) and no other canonical strategy.
Generic anchor text everywhere ("click here," "learn more") — missed internal linking opportunity.
Hreflang implementation without return links — ineffective.
Structured data on every page mismatched to visible content — manual action risk.
Mobile site missing content the desktop site has — mobile-first indexing punishes you.
Ignoring Core Web Vitals because "they're only a small ranking factor" — they affect engagement which feeds rankings.

Operating checklist — score yourself

Sources and further reading:

Primary data and case studies:
Vercel — the rise of the AI crawler (1B+ request analysis)
Search Engine Journal — ChatGPT crawls 3.6× more than Googlebot (2026)
Search Engine Journal — 2025 CWV pass rates by CMS
2025 Web Almanac — Security (HTTPS adoption)
2024 Web Almanac — structured data adoption
Humans of Martech — Aleyda Solís on AI search crawlability (quote source)
Google — HTTPS as a ranking signal (2014)
Google — the Speed Update (2018)
Search Engine Land — INP replaces FID (March 2024)
2025 Web Almanac — Performance chapter (CrUX pass rates)
web.dev — Vodafone: a 31% LCP improvement increased sales 8%
web.dev — the business impact of Core Web Vitals
Onely — Google needs 9× more time to crawl JS than HTML
Search Engine Roundtable — Google: median render time ~5 seconds
Milestone Research — rich results CTR study (4.5M queries)
Search Engine Journal — Mueller & Splitt on JavaScript SEO (quote source)

Official documentation:
Google Search Central — Search Essentials, crawling, indexing, structured data
web.dev — Core Web Vitals definitions and thresholds
Bing Webmaster Guidelines
Schema.org documentation

Practitioner references:
Screaming Frog, Sitebulb, OnCrawl, Botify, JetOctopus — crawler and log-analyzer documentation
Bartosz Goralewicz (Onely) — JavaScript SEO research · Cindy Krum (Mobile Moxie) — mobile-first indexing · Aleyda Solis — international SEO

RGM glossary entries used in this module:
Technical SEO · Core Web Vitals · Crawl budget · Canonical tag · XML sitemap · Schema markup

Series: All modules in SEO Mastery.

CASE-method test

Prove it. Earn your passcode.

Ten questions, CASE method (Context · Analysis · Strategy · Execution). Pass at 90% to unlock this module’s completion passcode — retake as many times as you like.