Page MenuHomeMiraheze

Bug: Sitemaps on custom domains are effectively nonfunctional
Closed, ResolvedPublic

Description

When a crawler attempts to get https://example.com/sitemap.xml, it'll receive a sitemap index. All of these indexes point to https://example.miraheze.org/sitemaps/example/sitemaps/sitemap-NS_0-0.xml.gz. This sitemap then points to various URLs on example.miraheze.org, *not* the custom domain. This means that sitemaps on custom domains are veritably nonexistent and no actual sitemap-assisted indexing of pages on the custom domain actually gets done.

Event Timeline

Hi,

Thank you for creating a task on Phorge, we will endeavor to resolve it as soon as possible.

NOTE: Please note that everyone at Miraheze is a volunteer and as such certain tasks may take a while until they are resolved. This is especially true for some bug reports which require additional investigation.

If you notice that your task has not received a response or follow up in a reasonable amount of time please comment on it.

Thanks,
Miraheze Technology Team

This wiki also uses the default Miraheze favicon, but only on the subdomain.

Edit: this is now fixed

Weird...

> curl --no-progress-meter https://chinafake.wiki/sitemap.xml | head
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
        <sitemap>
                <loc>https://chinafake.miraheze.org/sitemaps/chinafakewiki/sitemaps/sitemap-chinafakewiki-NS_0-0.xml.gz</loc>
                <lastmod>2024-08-24T00:31:25Z</lastmod>
        </sitemap>
        <sitemap>
                <loc>https://chinafake.miraheze.org/sitemaps/chinafakewiki/sitemaps/sitemap-chinafakewiki-NS_1-0.xml.gz</loc>
                <lastmod>2024-08-24T00:31:25Z</lastmod>
        </sitemap>
> curl --no-progress-meter -L https://chinafake.miraheze.org/sitemaps/chinafakewiki/sitemaps/sitemap-chinafakewiki-NS_0-0.xml.gz | gunzip | head
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
        <url>
                <loc>https://chinafake.miraheze.org/wiki/!</loc>
                <lastmod>2024-08-21T02:37:58Z</lastmod>
                <priority>1.0</priority>
        </url>
        <url>
                <loc>https://chinafake.miraheze.org/wiki/%22HAPPY%22_Animated_Jumprope_Characters</loc>
                <lastmod>2024-07-19T05:56:49Z</lastmod>

Wait no, I just remembered what the bug is-

It appears that sitemaps are automatically regenerated every Saturday. It's Saturday right now, but they don't seem to be updated yet--try waiting a few hours (or perhaps even a day)?

It appears that sitemaps are automatically regenerated every Saturday. It's Saturday right now, but they don't seem to be updated yet--try waiting a few hours (or perhaps even a day)?

It's now Sunday and the sitemap hasn't changed. Is it possible for someone to regenerate it manually?

Yep, someone with server access will need to run generateMirahezeSitemap.php (from extensions/MirahezeMagic).

I have regenerated the sitemap, but it has not updated on /sitemap.xml. Interesting.

It seems to be appearing there sometimes, my best guess is Cloudflare's caching.

Nope, it's gone again…
Could it be potentially prioritizing the subdomain over the chinafake.wiki domain when generating the sitemaps?

> curl https://chinafake.wiki/sitemap.xml?$RANDOM | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3860  100  3860    0     0   4455      0 --:--:-- --:--:-- --:--:--  4452
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
        <sitemap>
                <loc>https://chinafake.miraheze.org/sitemaps/chinafakewiki/sitemaps/sitemap-chinafakewiki-NS_0-0.xml.gz</loc>
                <lastmod>2024-08-24T00:31:25Z</lastmod>
        </sitemap>
        <sitemap>
                <loc>https://chinafake.miraheze.org/sitemaps/chinafakewiki/sitemaps/sitemap-chinafakewiki-NS_1-0.xml.gz</loc>
                <lastmod>2024-08-24T00:31:25Z</lastmod>
        </sitemap>
> curl https://chinafake.miraheze.org/sitemaps/chinafakewiki/sitemaps/sitemap-chinafakewiki-NS_0-0.xml.gz -L | gunzip | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162    0   162    0     0    523      0 --:--:-- --:--:-- --:--:--   524
100 24800  100 24800    0     0  43867      0 --:--:-- --:--:-- --:--:--     0
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
        <url>
                <loc>https://chinafake.wiki/wiki/!</loc>
                <lastmod>2024-08-30T04:19:40Z</lastmod>
                <priority>1.0</priority>
        </url>
        <url>
                <loc>https://chinafake.wiki/wiki/%22HAPPY%22_Animated_Jumprope_Characters</loc>
                <lastmod>2024-07-19T05:56:49Z</lastmod>

I swear...

The sitemap is generating properly its an issue with updating the old sitemap with the new updated version. But the script I confirmed to be generating the correct version of the sitemap but then doesn't update it.

Would it be possible to use the sitemap WikiSEO generates instead of the one MH generates?

Seems the sitemap has updated properly. https://chinafake.wiki/sitemap.xml now says <lastmod>2024-09-03T21:01:16Z</lastmod>.

Reception123 assigned this task to OrangeStar.
Reception123 subscribed.

Per above. If you continue experiencing issues please feel free to reopen.

Got it, thanks! Looks to be showing up all good on the global sitemap at https://static.miraheze.org/sitemap.xml as well so I'd say this is fixed!