XML Sitemaps Feature Plugin Open for Testing and Feedback

Thierry Muller, a Developer Relations Program Manager at Google, and several contributors posted an update on the XML sitemaps feature that may land in WordPress this year. After seven months of development, the team has made the XML Sitemaps feature plugin available on GitHub. It is currently open for testing and feedback. The plugin should also be available in the WordPress plugin directory by next week.

Update (January 31, 2020): The Core Sitemaps feature plugin is now available in the WordPress plugin repository.

The project aims to ship a basic version of an XML sitemaps feature to all WordPress installations. It will also offer an API for plugin developers to manipulate. Therefore, sitemap plugins would not automatically disappear. Instead, plugins would offer users various options on how their sitemaps work.

A team created by Google, Yoast, and other contributors originally proposed XML sitemaps as a core WordPress feature in June 2019. Traditionally, WordPress has left this feature to plugins to implement, and many have filled this role over the years. However, several other major content management systems ship with sitemaps as part of their core codebase.

Many praised the initiative, such as WordPress project lead Matt Mullenweg. “This makes a lot of sense, looking forward to seeing the v1 of this in core and for it to evolve in future releases and cement WordPress’ well-deserved reputation of being the best CMS for SEO,” he said.

However, several people questioned whether WordPress should ship with XML sitemaps. Some were worried about performance and others felt like the feature should remain in plugins.

“At a high level, expanding the number of WordPress sites with Sitemaps ultimately speeds up content discoverability by search engines and re-crawl fresher content flagged by the lastmod date faster than a scheduled bot would,” Muller said of the primary reasons the feature belongs in core.

WordPress users may see this feature arrive in major update this year. “Ambitiously [version] 5.4,” said Muller of the release goal. “Realistically 5.5.”

The feature plugin currently indexes the following URLs for a site:

  • Homepage
  • Blog posts page (if not the homepage)
  • Posts and pages
  • Categories and tags
  • Custom post types
  • Custom taxonomies
  • Users/Authors

Custom post types and taxonomies are registered only if they are public. There is also a filter hook available to change which post types, taxonomies, and users are indexed. Ideally, WordPress would provide a registration flag for post types and taxonomies.

Solving the Performance Issues

One of the primary concerns with the initial proposal is how well a core sitemaps feature would perform and scale, particularly on larger sites. Without a full caching solution built into core, it presented some hurdles for the team.

“Solving the performance issue is not trivial, and we have looked into various solutions,” said Muller. “We believe that we landed on a solution that doesn’t need full caching and will still be scalable.”

For performance, there are two primary challenges:

  • The number of URLs per page.
  • The lastmod date in the index.xml file.

“Addressing the number of URLs per page is fairly trivial,” said Muller. “While sitemaps can have up to 50,000 URLs per sitemap, we found that capping it at 2,000 is acceptable from a performance perspective and totally acceptable from a search engine perspective.” The team decided to stick with a default of 2,000 URLs per sitemap and to provide a filter hook for plugins to alter if necessary.

Finding a solution for the lastmod date was not as easy. “We believe we found a good balance, which will be scalable and doesn’t open the can of worms that full caching exposes us to,” said Muller.

The solution the team implemented involved scheduling a cron task that runs twice daily (the frequency can be filtered by plugins). The cron job fetches the lastmod dates of each sitemap and stores them in the options table, which essentially works as a light caching solution.

“Relying on cron should be stable enough for small to medium websites,” said Muller. “Enterprise websites usually have server cron set up to more regularly ping WP Cron instead of relying on website visitors to trigger it. In fact, most managed hosting providers have that for all plans.”

If the team’s initial implementation is not well-rounded enough, they have been researching an alternative implementation that uses custom post types to store and update sitemap data. Two open GitHub tickets further explore performance that developers may want to check out: Issue #1 and Issue #39.

What Happens to Sites With Existing Sitemaps?

One question that remains unanswered is what happens when a user updates to WordPress 5.4/5.5 and already has a sitemap. There are likely millions of WordPress sites that are running a plugin or have some sort of sitemap solution in place.

“This is a question which we haven’t quite solved,” said Muller. “It is important to work with plugin authors, and in an ideal world, all plugins providing advanced sitemaps solutions would extend the core API. We would love to get feedback from the community on that one.”

WordPress must take care to avoid any major conflicts or indexing errors, or at least alleviate issues for the users who may be unaware of this upcoming feature.

26

26 responses to “XML Sitemaps Feature Plugin Open for Testing and Feedback”

  1. Having a XML sitemap in the core is the right thing to do in my opinion. This should have been a priority years ago, as well as full caching solution, and backup/migration tools.

    If we are serious about making WP the internet’s OS, these 3 things are necessary…

  2. Indexing users/authors by default contradicts all kind of privacy and security features of various plugins and tutorials, needs to be opt-in.

    Also every post type and taxonomy needs to be opt-in for sitemaps, no matter of existing “public” settings, especially categories and tags.

    Adding this to core sounds like another not so well tought through idea, very similar to Gutenberg push into core in an at least early beta stage.

    • The purpose of Sitemaps is to help/speed up content discovery, not to indicate what should or shouldn’t be indexed. Whether or not a website has a sitemaps, every public URLs may be indexed by search engines (unless excluded via the robots.txt or no-index tag).

  3. I’m excited to see this make its way into core but also am wary of some of the performance pitfalls that are somewhat inescapable without system level tools e.g. cron and CLI.

    At 10up, we’ve created a lightweight sitemap plugin that only works on cron over CLI. The solution is very simple and doesn’t support any sort of “on the fly” updates but is extremely performant and meets most SEO teams needs.

    https://github.com/10up/10up-sitemaps

    • Thanks for sharing Taylor.

      I took a quick look at the code and it seems that 10up sitemaps cache all urls in the options table correct? I am curious to hear how it performs with X million posts.

      In the current version of the MVP featured plugin, we only rely on cron to get the lastmod date for index sitemap urls which links to all object sitemaps (should remain a small list even for very large sites).

      I would love for you guys to contribute to the project and continue these discussions in the upcoming WordPress dedicated slack channel which should kick start in ~ 2 weeks,

      • @Thierry Muller When unsure about technical details about performance and caching, what about looking at real world solutions in other plugins like yoast or similar which have millions of active installs and seem to work fine.

        And trying to providing new core features ready for sites with “X million posts” seems to be a nice thing, but the rest of core, especially Gutenberg, won’t be able to handle this anyway, just try Gutenberg with a few hundred categories or tags which a X million posts site probably will have…

      • @Thierry – Yes, we cache the generated sitemap data in options. We store about 200 sitemap items in each option. On a front end request, the only database queries are retrieving the relevant options.

        We’d love to offer any help we can. Excited for the upcoming channel :) Thanks for moving this forward in core!

  4. This is a very great core feature and it will save us from installing a lots of plugins and i believe that it will keep getting better once it is been released.

    Also your last three paragraphs is my only fear, we will not want the sitemap features to conflict with our existing sitemap plugin and having a feature to on/off it will be the best.

    Thanks a lot for this.

  5. This is one of the dumbest ideas ever. There are 10s of very good solutions for this and if someone can’t do the research on this and install one of these plugin then they should not be using wordpress.

    The absolute tragedy is that there are 5-10 features/api that really need attention in core that could be worked on yet these people are spending many man months making wordpress worse.

    WordPress needs to be slimmed don’t not bloated up with things that belong in plugins. A massive fail and shame.

    • I would politely disagree,

      I’d rather have more features built in – with an option to turn them off

      Plugins are useful but NOT always created up to good coding standards, and often neglected or abandoned, and quite often premium versions give you most useful functionality.

      sitemaps /caching / lazy image loading / upload image optimisation / basic contact form – should be IMHO built in

      • sitemaps /caching / lazy image loading / upload image optimisation / basic contact form – should be IMHO built in

        Shhhhh, all those features built in would hurt marketshare of very important plugins like Jetpack https://jetpack.com/support/features/ and their upselling plans and also the wordpress.com account leads from Jetpack would drop like a rock. Also it would be more difficult for Jetpack and wordpress.com to charge extra for SEO etc.

  6. What about sites with a manually-created sitemap.xml? Not a plugin, but just a simple sitemap.xml file. Will it be overwritten? Appended?

    “This is a question which we haven’t quite solved,”

    I hope that when the feature plugin goes live in the WordPress plugin directory, that its description would mention what it does in this case (the case of manually-created existing sitemap.xml files without a plugin).

Newsletter

Subscribe Via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.