SEO Mastery
RGM° · Training
Technical SEO Foundations
The floor under which no amount of content or links can save you. Crawl, index, render, Core Web Vitals, structured data, mobile-first — the technical SEO discipline in depth.
What you will learn
- Why technical SEO is the floor, not the ceiling
- Crawlability: robots.txt, sitemaps, internal linking
- Indexability: canonicals, noindex, hreflang, parameters
- Rendering: server-side, client-side, hybrid, and the JS problem
- Core Web Vitals and page experience
- HTTPS, mixed content, and security headers
- Mobile-first indexing and responsive vs adaptive
- Structured data and the rich result economy
- Site architecture and URL design
- Advanced playbook
- Common mistakes
- Operating checklist
Why technical SEO is the floor, not the ceiling
Technical SEO doesn't rank pages by itself. Content, intent matching, authority, and user satisfaction do that. But broken technical SEO blocks every other lever: a page that can't be crawled can't be indexed, a page that can't be indexed can't rank, a page that loads in 9 seconds can't compete in 2024 SERPs against pages that load in 1.5. Technical work is the floor under which no amount of content or links can save you.
The good news is that most technical issues are binary — either fixed or not. Unlike content quality or link earning, which are gradient and competitive, technical SEO has a clear endpoint per issue. The bad news: there are many more technical issues than most teams recognize, and they compound. A site with five medium technical problems can lose 30–60% of its potential organic traffic.
Crawlability
Crawlability is whether search engine bots can find and access your pages. Three foundational artifacts:
robots.txt
The first file every crawler requests. Located at the domain root (example.com/robots.txt). It controls which crawlers can access which paths.
User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /*?sort=
Allow: /
Sitemap: https://example.com/sitemap.xml
Important nuances: robots.txt blocks crawling, not indexing. If a page is linked externally and Google can't crawl it, Google may still index it without content. Use noindex meta tag (which requires crawl access) to prevent indexing, not robots.txt.
Common errors: blocking /wp-content/ (kills CSS and JS for Googlebot which then ranks the site poorly), blocking the root of a section accidentally, allowing test/staging environments to be crawled in production.
XML sitemaps
Sitemaps tell crawlers which URLs to prioritize. Best practices:
- Only include canonical, indexable URLs — not redirects, not noindex, not 4xx.
- Keep individual sitemaps under 50,000 URLs and 50MB uncompressed.
- Use sitemap index files for large sites with multiple sub-sitemaps.
- Include
<lastmod> dates that are accurate — Google increasingly uses them.
- Submit through Google Search Console and Bing Webmaster Tools.
- Separate sitemaps by content type (products, articles, categories, images, video) for granular monitoring of indexation rates.
Internal linking and crawl paths
The most important crawlability factor on large sites isn't robots.txt or sitemap — it's internal linking depth. A page 5+ clicks from the homepage may get crawled infrequently or not at all on sites with limited crawl budget.
- Build clear navigation paths to important pages.
- Use breadcrumb links and contextual in-content links.
- Audit orphan pages (no internal links pointing in) — they often go un-indexed.
- Use hub-and-spoke architecture: pillar pages linking to subtopics linking back to pillars.
- Avoid endless URL parameter combinations creating crawl traps.
Indexability
Once crawled, can the page be indexed? The signals:
Canonical tags
Canonical tags (<link rel="canonical" href="...">) tell Google which URL is the preferred version when content is similar across multiple URLs. Critical for:
- HTTPS migration (canonical to HTTPS version).
- Faceted navigation (product list pages with filter parameters).
- Print and mobile-only versions.
- Syndicated content (canonical points to original).
- Pagination (debated; current best practice is self-referencing canonicals on each paginated page).
Common errors: canonicalizing every page to the homepage (deindexes everything but the homepage), canonicalizing across language versions (use hreflang instead), canonical pointing to a 404 or noindex page.
Meta robots and X-Robots-Tag
Page-level indexing signals. Use noindex for thin or duplicate content you don't want indexed. Use nofollow sparingly — Google now treats it as a hint, not a directive.
X-Robots-Tag is HTTP-header equivalent; use for non-HTML resources (PDFs, images) you want noindexed.
hreflang
For multilingual or multi-regional sites. Tells Google which language/region version to serve which user. Implementation options: HTML link tags, HTTP headers, or sitemap annotations. Most common implementation error: missing return links between language versions, or pointing to incorrect language code.
URL parameters and faceted navigation
The hardest indexability problem for ecommerce sites. Product listing pages with filter parameters (color, size, price range, sort order) generate near-infinite URL combinations. Approach:
- Canonical filtered/sorted pages back to the unfiltered category URL.
- Block parameter combinations from crawling via robots.txt where appropriate.
- Use
noindex on low-value filter combinations.
- Use parameter handling in Google Search Console (deprecated) or signal preference via canonicalization and internal linking.
Rendering: the JS problem
Modern web increasingly relies on JavaScript to render content. Google CAN render JavaScript, but it does so in a second pass after initial crawl, sometimes days or weeks later. JS-rendered content that depends on user interaction or that errors in the rendering pipeline may never make it into the index.
Rendering options
- Server-side rendering (SSR). Server returns fully-rendered HTML. Best for SEO. Frameworks: Next.js (with getServerSideProps), Nuxt, Remix, Rails, Django, Laravel.
- Static site generation (SSG). Pre-rendered HTML at build time. Excellent for SEO; trivial to serve. Frameworks: Next.js SSG, Hugo, Gatsby, Astro.
- Client-side rendering (CSR). Empty HTML shell; JS renders content in browser. Worst for SEO — relies on Google's second-pass rendering.
- Hybrid (ISR, partial SSR). Mix of strategies per page type. Modern frameworks support this well.
- Dynamic rendering. Serve SSR to bots, CSR to users. Google has deprecated this recommendation but it's still in use; treat as legacy.
Debugging rendering issues
- Use Google's URL Inspection tool in Search Console — shows rendered HTML.
- Use Mobile-Friendly Test — lighter version of the same renderer.
- Compare raw HTML (view source) with rendered HTML (DevTools Elements panel) for big sites.
- Check for JS errors in Google's rendering — Search Console URL Inspection shows errors.
- Check that critical content (headlines, body copy, links) appears in raw HTML or at minimum in the rendered HTML without user interaction.
Core Web Vitals and page experience
Google's Core Web Vitals are three metrics measuring real-user page experience:
| Metric | What it measures | Good | Needs work |
| LCP (Largest Contentful Paint) | How fast the main content loads | < 2.5s | 2.5–4s |
| INP (Interaction to Next Paint) | How fast the page responds to interaction (replaced FID in March 2024) | < 200ms | 200–500ms |
| CLS (Cumulative Layout Shift) | How much the layout jumps as it loads | < 0.1 | 0.1–0.25 |
Page Experience signals also include HTTPS, no intrusive interstitials, and mobile-friendly. CWV is a real but modest ranking factor — it tiebreaks more often than it determines top-3 placement. But it affects user experience materially, which feeds back to rankings through engagement signals.
Improving LCP
- Optimize images: WebP/AVIF formats, proper sizing, lazy-loading below the fold but eagerly loading the LCP element.
- Preload the LCP image with
<link rel="preload">.
- Reduce render-blocking JS and CSS.
- Use a CDN for static assets.
- Server response time under 600ms; use caching, optimized DB queries, edge rendering.
Improving INP
- Break up long-running JS tasks; use
requestIdleCallback or scheduler.postTask.
- Reduce JS bundle size; code-split.
- Avoid synchronous third-party scripts.
- Profile interactions with Chrome DevTools Performance panel; identify slow event handlers.
Improving CLS
- Set explicit width/height on images and embeds.
- Reserve space for ads, iframes, and dynamic content.
- Avoid injecting content above existing content after page load.
- Use
font-display: swap with care to avoid flash of unstyled text shifts.
HTTPS, mixed content, and security headers
- HTTPS: Non-negotiable since 2018. Chrome marks non-HTTPS as "Not Secure." Use HSTS to enforce HTTPS at the browser level.
- Mixed content: HTTPS page loading HTTP resources is blocked or flagged. Audit and fix references.
- Security headers: CSP, X-Content-Type-Options, X-Frame-Options, Referrer-Policy. Not direct ranking factors but contribute to user trust signals.
- HTTP/2 or HTTP/3: Material speed improvements. Most CDNs support automatically.
Mobile-first indexing
Since 2021, Google indexes the mobile version of your site by default. Implications:
- Mobile site must have all content the desktop site has — same headlines, same body copy, same structured data, same internal links.
- Responsive design (one HTML, CSS adapts) is preferred over separate mobile site (m.example.com).
- Mobile UX matters — tap targets, font sizes, viewport settings.
- Mobile rendering should be tested specifically; tools sometimes diverge between mobile and desktop rendering.
Structured data and the rich result economy
Schema.org structured data lets you mark up your content semantically, enabling rich results in SERPs — star ratings, FAQs, product prices, recipe images, events, job postings, breadcrumbs.
- JSON-LD is the preferred implementation (vs Microdata or RDFa) per Google.
- Common useful schemas: Product, FAQPage, HowTo, Article, BreadcrumbList, LocalBusiness, Event, Recipe, Review, Organization.
- Validate with Google's Rich Results Test and Schema.org validator.
- Don't over-mark: Schema must match visible content. Schema spam can trigger manual action.
- AI search prep: Structured data feeds LLM-powered search experiences (Google AI Overviews, Perplexity, Bing Copilot). Increasingly important for AEO/GEO.
Site architecture and URL design
URL structure
- Short, descriptive, keyword-relevant, lowercase, hyphenated.
- Avoid auto-generated parameters when possible; rewrite to clean URLs.
- Avoid changing URL structures unless necessary; if changing, use 301 redirects mapped 1:1.
- Consistent trailing slash policy (with or without — pick one).
Information architecture
- Group content by topic, not by date or publication batch.
- Limit directory depth to 3–4 levels for most content.
- Build hub pages (pillar pages, category pages) that link to subtopic content.
- Internal anchor text should be descriptive, not generic ("click here").
Advanced playbook
- Log file analysis. Server logs show exactly which pages Googlebot crawls and how often. Tools: Botify, OnCrawl, Screaming Frog Log File Analyser, JetOctopus. The single best diagnostic for crawl budget issues.
- Crawl budget management. For sites with 100k+ URLs, crawl budget matters. Reduce crawl traps (parameter combinations), consolidate thin content, fix slow server response times.
- JavaScript rendering audit. Compare initial HTML, rendered HTML, and what Googlebot indexes. Identify content gaps.
- Critical rendering path optimization. Inline above-the-fold CSS, defer non-critical JS, preload key assets, use HTTP/2 server push or HTTP/3 priorities.
- Edge SEO. Use edge workers (Cloudflare Workers, Vercel Edge Functions) to implement redirects, header rewrites, dynamic structured data without backend deploys.
- Search Console API automation. Pull performance data, indexing status, Core Web Vitals, and structured data errors into your data warehouse for monitoring at scale.
- Pagespeed Insights API monitoring. Track CWV trends per template, not just per URL. Identify template-level regressions before they affect rankings.
- International SEO architecture. Subdirectory (example.com/de/) vs subdomain (de.example.com) vs ccTLD (example.de). Each has trade-offs; subdirectory wins for most.
- Faceted navigation policy. Document which facets are indexable, which canonical to category, which are blocked. This is a multi-month project for large ecommerce sites.
- Sitemap monitoring as KPI. Track indexation rate (URLs submitted in sitemap / URLs actually indexed) over time. Sub-80% indicates content quality, internal linking, or crawl budget issues.
Common mistakes
- Blocking CSS/JS in robots.txt — Googlebot can't render the page properly and rankings suffer.
- Self-canonical to homepage on every page — mass deindexation.
- Treating noindex and nofollow as equivalent — they aren't.
- Pure client-side rendering for content-heavy sites — relies on Google's second-pass renderer, which lags weeks.
- HTTPS migration without comprehensive 301 mapping — massive ranking loss.
- Sitemap including 404s, redirects, and noindex URLs — signals poor quality to Google.
- Pagination with rel=prev/next (deprecated) and no other canonical strategy.
- Generic anchor text everywhere ("click here," "learn more") — missed internal linking opportunity.
- Hreflang implementation without return links — ineffective.
- Structured data on every page mismatched to visible content — manual action risk.
- Mobile site missing content the desktop site has — mobile-first indexing punishes you.
- Ignoring Core Web Vitals because "they're only a small ranking factor" — they affect engagement which feeds rankings.
Operating checklist
- robots.txt audited and reviewed quarterly; does not block CSS/JS
- XML sitemap(s) submitted, with only canonical/indexable URLs, refreshed automatically
- Canonical strategy documented per template; verified by automated audit
- Indexation rate (sitemap submitted vs indexed) tracked monthly; sub-80% triggers investigation
- Server-side or static rendering for critical content; client-side rendering audited
- Core Web Vitals dashboards per template; regressions trigger alerts
- HTTPS sitewide, HSTS enforced, security headers configured
- Mobile-first rendering verified; mobile content matches desktop
- Structured data audited monthly via Search Console + Rich Results Test
- Log file analysis quarterly for crawl budget anomalies
- Internal linking depth audit: no important page deeper than 4 clicks from homepage
- URL parameter handling documented and consistent
- Hreflang implementation tested with return links validated
Sources and further reading
- Google Search Central documentation — Search Essentials, Crawling and Indexing, Mobile, Structured Data, Core Web Vitals
- Bing Webmaster Guidelines
- web.dev — Core Web Vitals, performance optimization, modern web architecture
- Ahrefs SEO blog — technical SEO deep dives
- Search Engine Journal Technical SEO category
- Search Engine Land — technical SEO columns (Detlef Johnson, Barry Schwartz)
- Moz — technical SEO learning hub
- Sitebulb, Screaming Frog, OnCrawl, Botify, JetOctopus — crawler and log analyzer documentation
- Cindy Krum (Mobile Moxie) — mobile-first indexing research
- Bartosz Goralewicz, Onely — JavaScript SEO research
- Aleyda Solis — international SEO and hreflang playbooks
- Schema.org documentation and Google's rich result gallery
Part of the SEO Mastery series. Continue to the next module or take the series exam.