Page MenuHomeMiraheze

DataDump (static) download timeout at exactly 5 minutes
Open, LowPublic

Description

Trying to download a dump of the huge allthetropeswiki, and I have to resume the download every 5 minutes exactly. (Thankfully Firefox can resume downloads.) This seems like the nginx(?) timeout for static.miraheze.org/*/dumps is too short, or perhaps for static as a whole.

As an tangentially related note, this is for backup purposes only. We do not have any plans to migrate to another service at the present time.

Related Objects

Event Timeline

I'm just going to give my two cents here, but increasing the timeout for NGINX is probably not the best idea. Miraheze's resources are already stretched thin and increasing the time that connections are left open would only add to the strain on the servers for probably something little wikis will actually need to utilise. It would probably be more worthwile getting a member of SRE to provide the dump rather than all of the potential side effects of raising the timeout limit.

Unknown Object (User) added a comment.Jun 14 2023, 04:28

The dump is already there so a Steward really can't do much. How big is the dump for allthetropeswiki?

In T10961#220938, @Agent_Isai wrote:

The dump is already there so a Steward really can't do much.

They can retrieve it from the server and pass it on.

In T10961#220938, @Agent_Isai wrote:

The dump is already there so a Steward really can't do much.

They can retrieve it from the server and pass it on.

That would be SRE and not Stewards.

I'm just going to give my two cents here, but increasing the timeout for NGINX is probably not the best idea. Miraheze's resources are already stretched thin and increasing the time that connections are left open would only add to the strain on the servers for probably something little wikis will actually need to utilise. It would probably be more worthwile getting a steward to provide the dump rather than all of the potential side effects of raising the timeout limit.

The static server just serves files, right? How does leaving connections open longer increase strain on the server? Please explain.

Unfortunately, I deleted that dump, because it is old, and all dumps I have tried to make since then have a status of "failed".

And then I looked inside the first dump I downloaded, and there was no wiki code or XML, just a sampling of image files. Because of the way it downloaded with the stopping and starting, something got corrupted (or was perhaps already hit by a cosmic ray on the Miraheze servers -- who knows now). The DataDump page should really include an MD5 of the output file. And, you know, actually provide working downloads.

I'm just going to give my two cents here, but increasing the timeout for NGINX is probably not the best idea. Miraheze's resources are already stretched thin and increasing the time that connections are left open would only add to the strain on the servers for probably something little wikis will actually need to utilise. It would probably be more worthwile getting a steward to provide the dump rather than all of the potential side effects of raising the timeout limit.

The static server just serves files, right? How does leaving connections open longer increase strain on the server? Please explain.

Whilst the impact of increasing the timeout for static files is less than it would be for dynamic files, you're still opening and reserving that connection, which is bound to impact performance—the extent to how much the performance is impacted, however, depends on the resources available, so it would be hard to tell whether or not this would have a huge impact, I was simply putting forth my two cents on the possibility of it being a bad idea.

I am currently investigating why that new dump failed.

I am currently investigating why that new dump failed.

My guess is that the dump is too big for swift. I'm currently manually generating the dump, to see how big it is.

Sure enough the dump is 8.6G in size which is too big for swift

labster lowered the priority of this task from Normal to Low.Jun 19 2023, 00:36

Thanks for doing that, @MacFan4000. In all of the things that happened this weekend, this is the first time I noticed.

I still think the original issue is valid, though not so important now. Unless someone else has an issue like mine I'm going to triage this as low priority.

I think DataDump has never really worked for extremely large wikis like ATT