Sitemap Inspector
What it does
The Sitemap Inspector fetches a sitemap URL, parses it, and follows <sitemapindex> children one level deep so it can give you the full URL inventory in a single pass. The output: every URL with its declared lastmod, changefreq, and priority, a summary of how many URLs are in each child sitemap, and a freshness distribution showing how many entries have been updated in the last 7, 30, and 90 days.
Common situations
You have just submitted a sitemap to Search Console and want to verify it parses correctly before waiting for the indexing report to update. The inspector fetches it instantly, validates the XML structure, and shows the URL set you submitted.
A site’s sitemap is reporting “couldn’t fetch” in Search Console. Test it directly — usually a server response or content-type issue, but sometimes the sitemap genuinely returns malformed XML. The inspector parses with a strict XML parser and surfaces the failure point.
You suspect a sitemap is including URLs that shouldn’t be there — noindex pages, redirected URLs, or pages canonicalised to other URLs. Pull the URL list from the inspector and cross-check against your noindex and canonical configuration.
A competitor’s content publishing rate is faster than yours and you want to understand their cadence. Their sitemap (typically at /sitemap.xml or /sitemap_index.xml) shows their full URL set with lastmod dates — the freshness distribution tells you how often they’re updating.
You are auditing a sitemap-index structure to verify the per-section sub-sitemaps are correctly organised. The inspector follows the index, fetches each child, and reports the URL count per child — useful for confirming your sitemap structure matches the site’s content organisation.
What you need to know
A sitemap is a URL inventory designed to help search engines discover content. The standard format is XML wrapped in <urlset> containing <url> entries; a sitemap index uses <sitemapindex> containing <sitemap> references to other sitemaps.
The inspector handles three shapes:
Single sitemap: a <urlset> with <url> entries. The inspector parses every URL, lastmod, changefreq, and priority. Most small sites use a single sitemap.
Sitemap index: a <sitemapindex> listing other sitemap URLs. The inspector fetches each child (capped at 10 to keep things light) and aggregates the URLs. Most large sites use this pattern.
Compressed sitemaps (.gz): the inspector does not handle gzipped sitemaps directly — most CMSes serve sitemaps uncompressed at the standard /sitemap.xml, with .gz versions provided as a transfer optimisation only.
What the inspector reports:
- URL count per sitemap and aggregated across the index.
- Lastmod distribution: how many URLs have lastmod within 7 days, 30 days, 90 days, older, or no lastmod at all. This is the freshness signal — sitemaps where most URLs have stale lastmod are not telling search engines anything useful.
- changefreq and priority per URL. Google has been clear it ignores these; Bing reads them lightly. The inspector shows them for completeness.
- Structural validation: malformed XML, missing required elements, or invalid URL formats are surfaced.
What the inspector does NOT do: verify each URL in the sitemap actually returns 200 (use the Broken Link Checker for that), check noindex status of each URL, or compare the sitemap against what’s actually in the site’s content management system.
The 50,000-URL hard limit per sitemap and the 50MB-uncompressed limit are protocol rules. Sites with more URLs need a sitemap index pointing at multiple per-section sub-sitemaps. The inspector handles indexes correctly up to 10 children per index.
Frequently asked questions
Where is a sitemap usually located?
By convention at /sitemap.xml at the site root. CMSes often place index files at /sitemap_index.xml or /sitemap-index.xml. Robots.txt should declare the canonical sitemap URL with a Sitemap: directive.
What’s a sitemap index?
A sitemap of sitemaps. When a site has more than 50,000 URLs (the per-sitemap limit), or when you want to organise URLs by section, you split into multiple sitemaps and reference them from a <sitemapindex>. The index is what gets submitted to Search Console.
What’s the maximum number of URLs per sitemap?
50,000. Above that, split into multiple sitemaps and reference from an index. The 50MB uncompressed limit is the other constraint — even under 50,000 URLs, very long URLs can push past 50MB.
Does Google use changefreq and priority?
Google has explicitly stated it does not use changefreq or priority. Bing reads them lightly. Most sites should leave changefreq at honest values for Bing’s benefit and not bother tuning priority — it doesn’t move rankings.
What does lastmod do?
Tells search engines when the URL’s content was last updated. Google uses lastmod to prioritise recrawl — pages with recent lastmod get crawled sooner. The catch: lastmod must be honest. Sites that auto-increment lastmod on every regeneration get the field ignored.
Can a sitemap include URLs from other domains?
Only with verified ownership in Search Console. By default, sitemaps are scoped to a single host. Cross-domain sitemaps require explicit Search Console configuration.
Should noindex pages be in the sitemap?
No. Sitemaps should contain URLs you want indexed. Listing noindex URLs sends conflicting signals (noindex says don’t, sitemap says do) and is a Search Console error.
How fast does Google process new sitemap submissions?
Discovery is fast (hours to days); full re-crawl of all URLs is slower (days to weeks for substantial sets). Direct sitemap submission in Search Console + the Sitemap: declaration in robots.txt are the two ways to ensure fast discovery.
Common problems
Problem: Sitemap loads in browser but Search Console can’t fetch it.
Most often a Content-Type issue. The server is returning text/html instead of application/xml, or the response is HTML wrapped around the XML (a CMS error page that happens to contain the XML). Set the response to Content-Type: application/xml explicitly.
Problem: Sitemap submitted but URLs are not getting indexed.
Sitemap submission is a discovery hint, not an indexing guarantee. Pages that don’t get indexed despite being in the sitemap usually have other issues — thin content, near-duplicate content, low-quality signals. Look at the page itself, not the sitemap.
Problem: Lastmod values show as “ignored” in Search Console.
Google has detected that lastmod is unreliable — typically because it auto-increments on every regeneration regardless of whether content changed. The fix is to set lastmod only when content genuinely changed, not on every sitemap rebuild.
Problem: Index file references children that 404.
Sitemap-index children must each return 200 with valid sitemap XML. If a child returns 404 or non-XML, Google reports the error and treats the whole index as partially broken. Test each child individually.
Problem: Inspector says “too many children” on a sitemap-index.
The inspector caps at 10 children to keep response time reasonable. For larger indexes, follow individual children manually or use a dedicated sitemap-management tool. The hard protocol limit is 50,000 sitemaps per index, but in practice anything over 100 is unusual.
Tips
- Submit both the sitemap-index and the children to Search Console where possible. Children’s indexing reports are useful per-section.
- Keep lastmod accurate. A sitemap that says everything was updated yesterday gets less attention than one with truthful per-page dates.
- Split by content type (sitemap-posts.xml, sitemap-products.xml) rather than arbitrary chunks. Per-section reporting in Search Console is much more useful.
- Verify the sitemap is referenced in robots.txt with the canonical URL. Discovery is fastest when both Search Console submission and robots.txt declaration are in place.
- Audit sitemap content periodically. Pages that are noindex, redirected, or have moved to new URLs sometimes leak into sitemaps from CMS bugs.
Related tools in this suite
The Sitemap.xml Generator is the build-side counterpart — when the inspector reveals issues with an existing sitemap, the generator helps construct a corrected one. The Page SEO Audit is useful for checking individual URLs from the sitemap (especially the ones that aren’t getting indexed).
What this looks like at scale
For a single site, manual sitemap inspection is fine. For organisations with multiple subdomains, multiple sites, or multilingual content sets, sitemap audits should be part of the deploy pipeline — generate, validate, submit. The WP Beacon Plugin tracks sitemap presence and freshness as part of the broader site monitoring.
Take it further
If a sitemap is consistently the source of indexing problems — Google ignoring lastmod, sub-sitemaps not being read, indexing lag on new content — the underlying issue is often architectural. Talk through the setup and we can scope what fixing it looks like.