Page MenuHomeMiraheze

Infrastructure (SRE)Group
ActivePublic

Members (4)

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

This is the project for the Infrastructure team based in the Site Reliability Engineering department.

This project is used to organise and manage all work which falls under the primacy of the Infrastructure team. Any queries with the progress or allocation of resources should be directed to the Engineering Manager for the Infrastructure team.

Engineering Manager: None at the moment

Recent Activity

Thu, Mar 28

Universal_Omega closed T11987: SSL script broke, not committing public keys as Resolved.

I have switched the SSL bot to the WikiTideSSLBot account. Please let me know if there are any issues with certs now.

Thu, Mar 28, 18:03 · Infrastructure (SRE), SSL
MacFan4000 added a comment to T11987: SSL script broke, not committing public keys.

It would be great if this could be looked into as SSL requests are starting to pile up.

Thu, Mar 28, 15:47 · Infrastructure (SRE), SSL

Tue, Mar 26

OrangeStar closed T11851: check_reverse_dns should contact authoritative nameservers for the TLD directly when checking if we're the authoritative nameservers of a domain as Declined.

Using RDAP (preferably) or WHOIS is a better solution for these kinds of issues.

Tue, Mar 26, 17:49 · SRE Automation, Monitoring, SSL, Infrastructure (SRE)

Mon, Mar 25

Universal_Omega added a comment to T11987: SSL script broke, not committing public keys.

At this point I think our only option may be to switch to a new account if we can't retrieve access to MirahezeSSLBot. But we'll see if we can find a recovery code or something first. I believe that is still controlled by John and we should retrieve access somehow.

Mon, Mar 25, 03:28 · Infrastructure (SRE), SSL
MacFan4000 triaged T11987: SSL script broke, not committing public keys as High priority.
Mon, Mar 25, 00:34 · Infrastructure (SRE), SSL

Sun, Mar 24

MacFan4000 created T11987: SSL script broke, not committing public keys.
Sun, Mar 24, 22:41 · Infrastructure (SRE), SSL
Universal_Omega lowered the priority of T8845: Allow Icinga to generate Phorge tasks for Critical alerts from Normal to Low.
Sun, Mar 24, 06:26 · Phorge, Monitoring, Infrastructure (SRE)
Universal_Omega lowered the priority of T8847: Icinga docs entries for all Infrastructure monitoring from Normal to Low.
Sun, Mar 24, 06:26 · Documentation, Monitoring, Infrastructure (SRE)
Universal_Omega changed the status of T8847: Icinga docs entries for all Infrastructure monitoring from Open to In progress.
Sun, Mar 24, 06:24 · Documentation, Monitoring, Infrastructure (SRE)
Universal_Omega lowered the priority of T11680: Create Miraheze/python-functions github repo & python package from Normal to Low.
Sun, Mar 24, 06:20 · Infrastructure (SRE), SRE Automation

Sat, Mar 23

Universal_Omega added a comment to T11275: API Requests to Wikibase Repositories are blocked.

This has been open for a while, I thought someone was going to come up with some idea to do this automatically but I guess not.

Let's keep it simple then. I propose we just have an array of wikis with wikibase client, and wikis those wikis have to contact, and send the appropiate CORS headers. Those that want to do the same thing as @Redmin here will have to open a phab task here for their wikis to be added to this array. This would be done in https://github.com/miraheze/puppet/blob/master/modules/varnish/templates/default.vcl#L231. Sound good to the SRE team?

Sat, Mar 23, 06:47 · Infrastructure (SRE)
Reception123 closed T11909: Mention Special:ManageWiki/extensions on the Feature Request Maniphest form as Resolved.

Will make the change now, though I think what is really needed is a wider reorganization of https://meta.miraheze.org/wiki/Request_features and the current forms we have on Phorge. Especially since eventually there will be very few tasks like imports (even images) that will still be done on Phorge. But it's probably worth waiting until more stuff is automated (at least images I'd say).

Sat, Mar 23, 06:19 · Infrastructure (SRE), Documentation, Phorge
Universal_Omega added a project to T11909: Mention Special:ManageWiki/extensions on the Feature Request Maniphest form: Infrastructure (SRE).
Sat, Mar 23, 06:17 · Infrastructure (SRE), Documentation, Phorge
Universal_Omega lowered the priority of T11744: Create db162 (or db172?) and migrate core databases there from Normal to Low.
Sat, Mar 23, 06:06 · Infrastructure (SRE), Database
Universal_Omega lowered the priority of T11730: Rebalance database servers from Normal to Low.
Sat, Mar 23, 06:06 · Infrastructure (SRE), Database
Universal_Omega claimed T8847: Icinga docs entries for all Infrastructure monitoring.
Sat, Mar 23, 06:03 · Documentation, Monitoring, Infrastructure (SRE)
Universal_Omega claimed T8845: Allow Icinga to generate Phorge tasks for Critical alerts.
Sat, Mar 23, 06:02 · Phorge, Monitoring, Infrastructure (SRE)
Universal_Omega renamed T8845: Allow Icinga to generate Phorge tasks for Critical alerts from Allow Icinga to generate Phabricator tasks for Critical alerts to Allow Icinga to generate Phorge tasks for Critical alerts.
Sat, Mar 23, 06:02 · Phorge, Monitoring, Infrastructure (SRE)

Mar 14 2024

Universal_Omega added a project to T11925: OrangeStar's LDAP account & Graylog access: Infrastructure (SRE).
Mar 14 2024, 19:32 · Infrastructure (SRE), Security

Mar 9 2024

Reception123 lowered the priority of T11934: Request for a TSPortal test server from Normal to Low.
Mar 9 2024, 08:00 · Infrastructure (SRE)
Universal_Omega added a comment to T11934: Request for a TSPortal test server.

I will consider this request after talking to a few others in both SRE and T&S and to find out what the future of TSPortal is anyway. As a T&S member I think it helps for DPA primarily the other aspects of it are pretty buggy sometimes and may not be really worth maintaining if we choose not to utilize it and in that case this request may be unnecessary...

Mar 9 2024, 06:52 · Infrastructure (SRE)
Collei triaged T11934: Request for a TSPortal test server as Normal priority.
Mar 9 2024, 06:49 · Infrastructure (SRE)
Collei triaged T11940: Emails for password reset/account creation/email confirmation not sending as High priority.
Mar 9 2024, 06:48 · MediaWiki (SRE), MediaWiki

Mar 8 2024

MacFan4000 updated subscribers of T11940: Emails for password reset/account creation/email confirmation not sending.
Mar 8 2024, 22:32 · MediaWiki (SRE), MediaWiki
MacFan4000 added a project to T11940: Emails for password reset/account creation/email confirmation not sending: Infrastructure (SRE).
Mar 8 2024, 22:31 · MediaWiki (SRE), MediaWiki

Mar 6 2024

OrangeStar created T11934: Request for a TSPortal test server.
Mar 6 2024, 12:54 · Infrastructure (SRE)

Feb 28 2024

Collei updated the task description for T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.
Feb 28 2024, 07:05 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei updated the task description for T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.
Feb 28 2024, 07:04 · Infrastructure (SRE), Varnish, MediaWiki, Production Error

Feb 26 2024

Xena added a comment to T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.

A user on Discord has reported it happening again, it's possible the issue wasn't fully resolved.

Feb 26 2024, 17:00 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei added a comment to T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.

Sounds good

Feb 26 2024, 05:13 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Dicto added a comment to T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.

Hmm, that's weird but now I don't get Error 500 neither by importing pages on gameshows nor by editing with code editor on chernowiki. Looks like the problem is actually resolved.

Feb 26 2024, 04:13 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei added a comment to T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.

Visual editor being broken is already tracked in T11903. As for the other issues, can you reproduce this on any wikis other than gameshowswiki?

Feb 26 2024, 01:15 · Infrastructure (SRE), Varnish, MediaWiki, Production Error

Feb 25 2024

Dicto added a comment to T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.

Still got Error 500 when try to import pages on gameshows.miraheze.org. Small xml files are going well when large (like 750 kb) are failing.

Feb 25 2024, 23:52 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Universal_Omega triaged T11902: Implement auto renewals for some wildcard domains in LetsEncrypt as Normal priority.
Feb 25 2024, 18:34 · SRE Automation, Infrastructure (SRE), SSL, Puppet, DNS
Agent_Isai closed T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions as Resolved.

Once again purged 13-16G of Varnish logs.

Feb 25 2024, 13:54 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei renamed T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions from 500 Internal Server Error - uploading images and editing pages to 500 Internal Server Error - uploading images, editing pages, and taking other actions.
Feb 25 2024, 01:54 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei merged T11900: XML ImportDump feature gives a "500 Internal Server" error into T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.
Feb 25 2024, 01:53 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei added a comment to T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions.

Several Discord users have reported this occurring recently and more frequently

Feb 25 2024, 01:51 · Infrastructure (SRE), Varnish, MediaWiki, Production Error

Feb 24 2024

RhinosF1 edited projects for T11891: 500 Internal Server Error - uploading images, editing pages, and taking other actions, added: Infrastructure (SRE); removed MediaWiki (SRE).
Feb 24 2024, 22:47 · Infrastructure (SRE), Varnish, MediaWiki, Production Error
Collei merged T11897: Configure CORS between wikis into T11275: API Requests to Wikibase Repositories are blocked.
Feb 24 2024, 21:28 · Infrastructure (SRE)

Feb 20 2024

MacFan4000 removed a member for Infrastructure (SRE): Paladox.
Feb 20 2024, 04:43
MacFan4000 removed a member for Infrastructure (SRE): Owen.
Feb 20 2024, 04:43

Feb 16 2024

OrangeStar placed T11857: Don't serve the MediaWiki-oriented CSP on Phorge up for grabs.
Feb 16 2024, 17:42 · Infrastructure (SRE)
OrangeStar claimed T11857: Don't serve the MediaWiki-oriented CSP on Phorge.
Feb 16 2024, 17:35 · Infrastructure (SRE)

Feb 15 2024

OrangeStar triaged T11857: Don't serve the MediaWiki-oriented CSP on Phorge as Low priority.
Feb 15 2024, 19:22 · Infrastructure (SRE)
Reception123 lowered the priority of T11033: Wiki deletion script run from Normal to Low.

Not a huge priority anymore

Feb 15 2024, 16:54 · MediaWiki, Infrastructure (SRE), Database
Reception123 added a comment to T11730: Rebalance database servers.

Oh, my bad. I assumed that this was done during the migration and this task just wasn't closed.

Feb 15 2024, 16:41 · Infrastructure (SRE), Database
Agent_Isai updated the task description for T11730: Rebalance database servers.
Feb 15 2024, 16:37 · Infrastructure (SRE), Database
Agent_Isai reopened T11730: Rebalance database servers, a subtask of T11729: Migrate databases to new cloud servers, as Open.
Feb 15 2024, 16:37 · Infrastructure (SRE), Database
Agent_Isai reopened T11730: Rebalance database servers as "Open".

We should still rebalance the successors.

Feb 15 2024, 16:37 · Infrastructure (SRE), Database