Skip to main content

Sitemap.xml Generator

What it does

The Sitemap.xml Generator turns a list of URLs into a conforming sitemap.xml file. Paste your URLs (one per line), set the defaults you want applied to every entry — lastmod, changefreq, priority — and the generator emits valid XML that Search Console will accept. The output is wrapped in the standard <urlset> envelope with the correct namespace declaration; if you have more than 50,000 URLs, it warns you to split the file and reference both from a <sitemapindex>.

Common situations

You have a small static site or a hand-built page set and your CMS does not generate a sitemap automatically. Paste the URLs, take the defaults, drop the file at /sitemap.xml, and submit it in Search Console. That is the fastest path to having Google know every URL exists, which matters most on sites where internal linking is sparse.

You are migrating a site and need a sitemap of the new URLs to submit alongside the redirect rules. The new sitemap accelerates Google’s discovery of the new structure; the redirects do the rewriting on impressions of the old URLs. Generate the new sitemap from your migration spreadsheet and submit it the day the redirects go live.

A specific section of your site (a knowledge base, a glossary, a documentation tree) is poorly linked from the rest of the site and you want a dedicated sitemap to highlight it for Google. Generate a per-section sitemap, reference it from a <sitemapindex>, and submit the index. Sub-sitemaps let you isolate indexing problems to specific groups of pages.

You are auditing why a competitor’s pages are showing up faster than yours. Compare their sitemap structure (most are accessible at /sitemap.xml or /sitemap_index.xml) with yours — sitemap completeness is one of the easier asymmetries to spot.

You have a custom-built site that emits a sitemap programmatically but the format is wrong (missing namespace, wrong root element, malformed XML). Generate a known-good sitemap and use it as the spec to fix your generator’s output against.

What you need to know

A sitemap is a list of URLs you want search engines to know about, with optional metadata about each one. The protocol is defined at sitemaps.org and supported by Google, Bing, DuckDuckGo, and every other major search engine. The format is straightforward — <url><loc>...</loc></url> per entry inside a <urlset> envelope — but there are a few rules that catch people out.

URLs must be absolute and use the protocol the site is actually served on. https://example.com/page/ is correct; /page/ and //example.com/page/ are not. URLs must be properly XML-escaped — ampersands in query strings need to be &amp;, not &. The generator handles this for you.

The optional fields are: lastmod (ISO 8601 date or datetime, last time the content meaningfully changed), changefreq (always, hourly, daily, weekly, monthly, yearly, never — a hint about how often the page changes), and priority (a number from 0.0 to 1.0 indicating relative importance). Google has been clear for years that it ignores changefreq and priority — those fields are advisory at most, and gaming them is a waste of effort. lastmod is still used by Google when it is genuinely accurate; sites that auto-increment lastmod on every fetch get the field disregarded entirely. Bing still uses changefreq and priority lightly. The conservative pattern is: include lastmod with real dates, set changefreq for honesty (Bing reads it), and set priority only if you genuinely want to express relative importance.

The 50,000-URL and 50MB-uncompressed limits are hard. Above either, split into multiple sitemaps and reference them from a <sitemapindex> — a top-level file that just lists other sitemaps. Most large sites use this pattern: a sitemap index at the root, with per-section sub-sitemaps for posts, products, categories, etc. The index file structure is the same protocol with <sitemap> instead of <url> entries.

The generator does not gzip the output — Google accepts both, but if your site has hundreds of URLs and the file is over a megabyte, gzipping it (server-side or as sitemap.xml.gz) reduces transfer cost. The XML is what’s parsed, not the wrapper.

Frequently asked questions

Where should the sitemap live?

By convention, at /sitemap.xml at the site root. Search Console will let you submit it from any path, but /sitemap.xml is what bots check by default and what robots.txt typically references. For sub-sitemaps, the convention is /sitemap-posts.xml, /sitemap-products.xml, etc., all referenced from a top-level /sitemap_index.xml or /sitemap.xml.

Do I need to declare the sitemap in robots.txt?

Yes — adding Sitemap: https://example.com/sitemap.xml to robots.txt is the canonical discovery mechanism. Search Console submission is also useful, but robots.txt declaration covers bots that do not have a Search Console connection, including Bing.

How often should I regenerate the sitemap?

Every time the URL set or content changes. For a CMS, this is automatic — the sitemap regenerates on publish. For static sites or hand-built sitemaps, regenerate when you ship new pages, restructure URLs, or update significantly. The lastmod field should reflect the actual date of the last meaningful change, not the regeneration timestamp.

What’s the difference between a sitemap and a sitemap index?

A sitemap lists URLs; a sitemap index lists sitemaps. Use a sitemap index when you have more than one sitemap (because of the 50k cap, or because you want per-section organization). Search Console accepts either.

Should I include URLs that are noindex or canonicalised away?

No — sitemaps should contain URLs you want indexed. Listing a noindex URL in the sitemap sends conflicting signals (noindex says don’t, sitemap inclusion says do). Listing a canonicalised duplicate is the same problem. Audit your sitemap against your robots and canonical configuration.

Can I include URLs from other domains?

Only with verified ownership in Search Console. By default, sitemaps are scoped to a single host. Cross-domain sitemaps are a niche pattern used for migrations or multi-site organisations and require explicit Search Console configuration.

How do I submit it to Google?

Two ways. Either declare it in robots.txt (Sitemap: https://example.com/sitemap.xml) and let Google find it, or submit it directly in Search Console under Sitemaps. Direct submission gives you a per-sitemap report on indexed vs submitted URLs, which is the quickest way to spot indexing issues.

Does priority actually do anything?

Google has been clear it does not use priority at all. Bing uses it as a weak signal. The honest answer is: leave it at the default (0.5), or set 1.0 for the homepage and 0.8 for top-level sections — but do not spend time tuning it.

Common problems

Problem: Search Console reports “couldn’t fetch sitemap”.

Almost always a server response issue, not a sitemap content issue. Either the URL is wrong (404), the server is rejecting bot user-agents (403), the response is HTML instead of XML (Content-Type header), or the file is being served with an aggressive cache that has stale content. Hit the URL in a browser first to check the response.

Problem: Sitemap submitted but URLs are not getting indexed.

Sitemap submission is a discovery hint, not an indexing guarantee. Google indexes URLs based on quality and uniqueness, not on sitemap inclusion. If a URL is in the sitemap but not indexed, the question is “why does Google think this URL doesn’t deserve to be indexed?” — usually thin content, near-duplicate content, or a quality signal problem.

Problem: Lastmod dates show in Search Console as “ignored”.

Google has detected that the lastmod values are not reliable — typically because they auto-increment on every regeneration, or because they don’t match real content changes. The fix is to set lastmod only when the content actually changed, not when the file was regenerated. Auto-incrementing every fetch teaches Google to ignore the field.

Problem: Validator complains about XML syntax errors.

The most common cause is unescaped special characters in URLs — &, <, > need to be &amp;, &lt;, &gt;. The generator handles this, but if you are generating sitemaps elsewhere, this is the first thing to check.

Problem: A sitemapindex references sub-sitemaps but Search Console only reports the index, not the children.

Each sub-sitemap is its own report in Search Console — they do not roll up. To see indexing data per sub-sitemap, view each one separately under Sitemaps. The index is just a discovery shortcut for bots; it is not a reporting hierarchy.

Tips

  • Keep lastmod honest. A sitemap with truthful lastmod dates gets crawled efficiently; a sitemap that lies gets its lastmod ignored entirely.
  • Split large sitemaps by content type rather than by URL count alone. A sitemap-posts.xml plus sitemap-products.xml is easier to debug than two arbitrary halves of an index.
  • Submit both the index and the children if you can — Search Console treats them separately, and reporting on the children gives you per-section indexing visibility.
  • When migrating, generate the new sitemap before the redirects go live, but submit it the day they do — Google uses sitemap freshness to time recrawl, and a sitemap full of new URLs that 404 will be ignored.
  • For high-churn sections (a forum, a news site), a daily-regenerated sitemap with accurate lastmod is meaningful. For static or low-change sites, weekly or even monthly is fine.

Related tools in this suite

The natural follow-up is the Sitemap Inspector — paste the generated sitemap’s URL and verify the structure parses, the URL count matches, and any lastmod dates are valid. The Robots.txt Generator is the next stop for adding the Sitemap: declaration that lets bots discover the sitemap automatically.

What this looks like at scale

A static or hand-rolled sitemap is fine for a stable URL set. For a content site that publishes regularly, the sitemap should regenerate from the database on every publish — that is what most CMSes do automatically (WordPress with Yoast or Rank Math, for example). For a custom-built site without that, the sitemap regeneration logic belongs in the deploy pipeline. The WordPress development service is the right entry point if a sitemap regeneration problem has surfaced as part of a wider technical issue.

Take it further

If your sitemap is consistently the source of indexing problems — Google ignoring lastmod, sub-sitemaps not being read, indexing lag on new content — that often signals a deeper architectural issue with how the site is generating its URL set. Start a conversation if it has reached that point.