Page MenuHomeMiraheze
Feed Advanced Search

Tue, Mar 26

OrangeStar closed T11851: check_reverse_dns should contact authoritative nameservers for the TLD directly when checking if we're the authoritative nameservers of a domain as Declined.

Using RDAP (preferably) or WHOIS is a better solution for these kinds of issues.

Tue, Mar 26, 17:49 · SRE Automation, Monitoring, SSL, Infrastructure (SRE)

Sun, Mar 24

Universal_Omega lowered the priority of T8845: Allow Icinga to generate Phorge tasks for Critical alerts from Normal to Low.
Sun, Mar 24, 06:26 · Phorge, Monitoring, Infrastructure (SRE)
Universal_Omega lowered the priority of T8847: Icinga docs entries for all Infrastructure monitoring from Normal to Low.
Sun, Mar 24, 06:26 · Documentation, Monitoring, Infrastructure (SRE)
Universal_Omega changed the status of T8847: Icinga docs entries for all Infrastructure monitoring from Open to In progress.
Sun, Mar 24, 06:24 · Documentation, Monitoring, Infrastructure (SRE)

Sat, Mar 23

Universal_Omega claimed T8847: Icinga docs entries for all Infrastructure monitoring.
Sat, Mar 23, 06:03 · Documentation, Monitoring, Infrastructure (SRE)
Universal_Omega claimed T8845: Allow Icinga to generate Phorge tasks for Critical alerts.
Sat, Mar 23, 06:02 · Phorge, Monitoring, Infrastructure (SRE)
Universal_Omega renamed T8845: Allow Icinga to generate Phorge tasks for Critical alerts from Allow Icinga to generate Phabricator tasks for Critical alerts to Allow Icinga to generate Phorge tasks for Critical alerts.
Sat, Mar 23, 06:02 · Phorge, Monitoring, Infrastructure (SRE)

Mar 20 2024

Universal_Omega added a hashtag to Monitoring: #graphite.
Mar 20 2024, 06:04

Mar 13 2024

Universal_Omega changed hashtags for Monitoring, added #graylog, #analytics, #logging; removed #piwik, #ganglia.
Mar 13 2024, 23:37

Feb 13 2024

OrangeStar renamed T11851: check_reverse_dns should contact authoritative nameservers for the TLD directly when checking if we're the authoritative nameservers of a domain from check_reverse_dns should contact authoritative nameservers for the TLD directly on DNS checks to check_reverse_dns should contact authoritative nameservers for the TLD directly when checking if we're the authoritative nameservers of a domain.
Feb 13 2024, 20:37 · SRE Automation, Monitoring, SSL, Infrastructure (SRE)
RhinosF1 added projects to T11851: check_reverse_dns should contact authoritative nameservers for the TLD directly when checking if we're the authoritative nameservers of a domain: Monitoring, SRE Automation.
Feb 13 2024, 20:32 · SRE Automation, Monitoring, SSL, Infrastructure (SRE)
Universal_Omega lowered the priority of T11846: Alert on CirrusSearchElasticaWrite Job count from High to Normal.
Feb 13 2024, 06:16 · OpenSearch, Infrastructure (SRE), Monitoring

Feb 12 2024

RhinosF1 triaged T11846: Alert on CirrusSearchElasticaWrite Job count as High priority.
Feb 12 2024, 20:28 · OpenSearch, Infrastructure (SRE), Monitoring

Feb 3 2024

Universal_Omega added a comment to T10642: Self-host the CVT feed bot.

Just as a quick update even though this was already resolved, did https://github.com/miraheze/puppet/pull/3731 to fully puppetize this, including build, so it should be installed 100% automatically on any new servers now.

Feb 3 2024, 04:26 · Monitoring, Infrastructure (SRE)
Universal_Omega closed T10642: Self-host the CVT feed bot as Resolved.
Feb 3 2024, 02:03 · Monitoring, Infrastructure (SRE)
Universal_Omega added a comment to T10642: Self-host the CVT feed bot.

https://github.com/Universal-Omega/CVTBot/commit/a2b07eb14ef9ff34c4428b42d80c2b3a2c9db91e removed the mono dependency to make this work. Then https://github.com/miraheze/puppet/pull/3727 for making it work on Miraheze. That patch is currently running on mon181 which seems to work!

Feb 3 2024, 00:41 · Monitoring, Infrastructure (SRE)

Feb 1 2024

Universal_Omega closed T11754: Some servers missing from Grafana as Resolved.

Per above. Please do reopen if you notice others missing though.

Feb 1 2024, 12:09 · Infrastructure (SRE), Monitoring

Jan 31 2024

Universal_Omega changed the status of T10642: Self-host the CVT feed bot from Open to In progress.
Jan 31 2024, 01:03 · Monitoring, Infrastructure (SRE)
Universal_Omega moved T10642: Self-host the CVT feed bot from Incoming to Short Term on the Infrastructure (SRE) board.
Jan 31 2024, 01:03 · Monitoring, Infrastructure (SRE)
Universal_Omega edited projects for T10642: Self-host the CVT feed bot, added: Infrastructure (SRE), Monitoring; removed MediaWiki, MediaWiki (SRE).
Jan 31 2024, 01:03 · Monitoring, Infrastructure (SRE)

Jan 30 2024

Universal_Omega added a comment to T11754: Some servers missing from Grafana.

It looks like cp51 shows up in Grafana now.

Jan 30 2024, 17:03 · Infrastructure (SRE), Monitoring
RhinosF1 triaged T11754: Some servers missing from Grafana as High priority.
Jan 30 2024, 09:27 · Infrastructure (SRE), Monitoring

May 19 2023

MacFan4000 removed a member for Monitoring: MacFan4000.
May 19 2023, 19:56
MacFan4000 removed a watcher for Monitoring: John.
May 19 2023, 19:56
MacFan4000 removed a member for Monitoring: Southparkfan.
May 19 2023, 19:56
MacFan4000 removed a member for Monitoring: John.
May 19 2023, 19:56
MacFan4000 added a member for Monitoring: MacFan4000.
May 19 2023, 19:56

May 8 2023

Agent_Isai closed T10552: Add a cron to regularly optimise Matomo archive tables as Resolved.
May 8 2023, 15:31 · Monitoring, Infrastructure (SRE)

Apr 4 2023

Void claimed T10552: Add a cron to regularly optimise Matomo archive tables.

Drafted https://github.com/miraheze/puppet/pull/3178, should be good for review, though I'll merge in a few days if no objections.

Apr 4 2023, 03:26 · Monitoring, Infrastructure (SRE)

Mar 17 2023

MacFan4000 placed T8845: Allow Icinga to generate Phorge tasks for Critical alerts up for grabs.
Mar 17 2023, 21:59 · Phorge, Monitoring, Infrastructure (SRE)

Feb 28 2023

Unknown Object (User) added a comment to T10552: Add a cron to regularly optimise Matomo archive tables.

So should the cron include all of these?

Feb 28 2023, 00:39 · Monitoring, Infrastructure (SRE)

Feb 27 2023

Reception123 added a comment to T10552: Add a cron to regularly optimise Matomo archive tables.

So should the cron include all of these?

Feb 27 2023, 20:30 · Monitoring, Infrastructure (SRE)
Unknown Object (User) added a comment to T10552: Add a cron to regularly optimise Matomo archive tables.

Just to note, I have now also ran:

Feb 27 2023, 06:45 · Monitoring, Infrastructure (SRE)
Unknown Object (User) moved T10552: Add a cron to regularly optimise Matomo archive tables from Incoming to Short Term on the Infrastructure (SRE) board.
Feb 27 2023, 04:32 · Monitoring, Infrastructure (SRE)
Unknown Object (User) moved T10552: Add a cron to regularly optimise Matomo archive tables from Backlog to Matomo on the Monitoring board.
Feb 27 2023, 04:32 · Monitoring, Infrastructure (SRE)
Unknown Object (User) triaged T10552: Add a cron to regularly optimise Matomo archive tables as Normal priority.
Feb 27 2023, 04:29 · Monitoring, Infrastructure (SRE)

Feb 24 2023

John closed T10536: db112 is running out of disk space as Resolved.

https://github.com/miraheze/puppet/commit/bedbbf259236895187b13d9dde21e980787117bd temporary solution until we have more disk space to expand.

Feb 24 2023, 22:16 · Infrastructure (SRE), Monitoring
Reception123 added a comment to T10536: db112 is running out of disk space.

As mentioned above the ideal I'd say is to get rid of minor data that we realistically won't look back at but keep core ones like visits and maybe country

Feb 24 2023, 15:52 · Infrastructure (SRE), Monitoring
BrandonWM merged T10357: Backups remain stored locally on db112; causing disk space full into T10536: db112 is running out of disk space.
Feb 24 2023, 15:26 · Infrastructure (SRE), Monitoring
John added a comment to T10536: db112 is running out of disk space.

Will take a look over this later tonight

Feb 24 2023, 15:24 · Infrastructure (SRE), Monitoring

Feb 23 2023

Paladox assigned T10536: db112 is running out of disk space to John.

Assigning to John to decide what should happen.

Feb 23 2023, 22:45 · Infrastructure (SRE), Monitoring

Feb 22 2023

Reception123 added projects to T10536: db112 is running out of disk space: Monitoring, Infrastructure (SRE).
Feb 22 2023, 18:44 · Infrastructure (SRE), Monitoring

Jan 29 2023

Reception123 closed T10393: Puppet is failing on matomo131 as Resolved.

fixed by Paladox

Jan 29 2023, 16:27 · Monitoring, Infrastructure (SRE)

Jan 22 2023

John updated the task description for T8847: Icinga docs entries for all Infrastructure monitoring.
Jan 22 2023, 22:28 · Documentation, Monitoring, Infrastructure (SRE)

Dec 27 2022

Unknown Object (User) closed T9478: Add monitoring for high MariaDB connections as Resolved.
Dec 27 2022, 07:14 · Universal Omega, Monitoring, Infrastructure (SRE), Database
Unknown Object (User) moved T9478: Add monitoring for high MariaDB connections from Unsorted to Goals on the Universal Omega board.
Dec 27 2022, 07:13 · Universal Omega, Monitoring, Infrastructure (SRE), Database
Unknown Object (User) claimed T9478: Add monitoring for high MariaDB connections.

https://github.com/miraheze/puppet/pull/3089 has been tested and it works.

Dec 27 2022, 04:20 · Universal Omega, Monitoring, Infrastructure (SRE), Database

Dec 26 2022

Reception123 added a comment to T9478: Add monitoring for high MariaDB connections.

In that case I guess we could do 80% as warning and 90% as critical

Dec 26 2022, 19:55 · Universal Omega, Monitoring, Infrastructure (SRE), Database
John added a comment to T9478: Add monitoring for high MariaDB connections.

This task doesn't indicate what we'd want to consider as 'high connection' (for warning and critical).

Dec 26 2022, 19:19 · Universal Omega, Monitoring, Infrastructure (SRE), Database
Reception123 added a comment to T9478: Add monitoring for high MariaDB connections.

This task doesn't indicate what we'd want to consider as 'high connection' (for warning and critical).

Dec 26 2022, 18:29 · Universal Omega, Monitoring, Infrastructure (SRE), Database

Nov 30 2022

Paladox closed T9864: HTTPS check broken on swiftproxy111 and 131 as Resolved.

This is fixed.

Nov 30 2022, 22:39 · Monitoring, Infrastructure (SRE)

Oct 27 2022

John assigned T9864: HTTPS check broken on swiftproxy111 and 131 to Paladox.
Oct 27 2022, 19:03 · Monitoring, Infrastructure (SRE)

Oct 22 2022

John closed T9840: Add additional disk checks to monitoring as Declined.

Disks aren't exposed to the OS which makes monitoring them difficult within Icinga. I've tried a PVE monitoring check and it doesn't seem to work and I can't find any other replacements.

Oct 22 2022, 18:43 · Monitoring, Infrastructure (SRE)
John moved T9840: Add additional disk checks to monitoring from Incoming to Short Term on the Infrastructure (SRE) board.
Oct 22 2022, 17:33 · Monitoring, Infrastructure (SRE)

Oct 18 2022

Void added a project to T9840: Add additional disk checks to monitoring: Monitoring.
Oct 18 2022, 01:03 · Monitoring, Infrastructure (SRE)

Oct 5 2022

Unknown Object (User) closed T9777: HTTPS check broken on swiftproxy111 as Resolved.

This seems to be resolved now.

Oct 5 2022, 21:41 · Monitoring, Infrastructure (SRE)

Sep 28 2022

John triaged T9777: HTTPS check broken on swiftproxy111 as Normal priority.
Sep 28 2022, 20:10 · Monitoring, Infrastructure (SRE)

Sep 12 2022

Unknown Object (User) closed T8834: Monitor LoginNotify & failed logins as Declined.

I've had a conversation with Owen, and this is not something needed for T&S, and such metrics provides no benefits, therefore, closing task as declined.

Sep 12 2022, 05:43 · Universal Omega, Monitoring, MediaWiki (SRE), Trust & Safety

Jul 28 2022

John moved T9478: Add monitoring for high MariaDB connections from Incoming to Short Term on the Infrastructure (SRE) board.
Jul 28 2022, 17:48 · Universal Omega, Monitoring, Infrastructure (SRE), Database

Jul 2 2022

Unknown Object (User) lowered the priority of T9478: Add monitoring for high MariaDB connections from High to Normal.
Jul 2 2022, 01:28 · Universal Omega, Monitoring, Infrastructure (SRE), Database

Jul 1 2022

RhinosF1 triaged T9478: Add monitoring for high MariaDB connections as High priority.
Jul 1 2022, 07:28 · Universal Omega, Monitoring, Infrastructure (SRE), Database

Jun 25 2022

Paladox closed T5044: Setup centralised logging for services as Resolved.

Resolved

Jun 25 2022, 15:54 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox updated the task description for T5044: Setup centralised logging for services.
Jun 25 2022, 15:54 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John added a comment to T5044: Setup centralised logging for services.

@Paladox less than a week until end of goal period - do we have an update on this?

Jun 25 2022, 13:02 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Jun 20 2022

RhinosF1 closed T9421: MirahezeRC has disconnected as Resolved.

Thanks for the report. Reception123 restarted and I forgot to close but it's back now.

Jun 20 2022, 06:20 · Monitoring, Infrastructure (SRE)
Dmehus moved T9421: MirahezeRC has disconnected from Incoming to Short Term on the Infrastructure (SRE) board.
Jun 20 2022, 04:52 · Monitoring, Infrastructure (SRE)
Dmehus triaged T9421: MirahezeRC has disconnected as Normal priority.
Jun 20 2022, 04:52 · Monitoring, Infrastructure (SRE)

Jun 3 2022

Unknown Object (User) closed T9326: https/stunnel warnings for puppet111 as Resolved.

This should now be resolved.

Jun 3 2022, 17:57 · MediaWiki (SRE), Monitoring
Reception123 assigned T9326: https/stunnel warnings for puppet111 to Unknown Object (User).
Jun 3 2022, 06:05 · MediaWiki (SRE), Monitoring
Unknown Object (User) added a comment to T9326: https/stunnel warnings for puppet111.

I can handle this for MW-SRE (since it is our task, not infrastructure, I think (just realized this task has been retagged to MW-SRE anyway)) if you want. Just might be a couple days till I have time.

Jun 3 2022, 06:00 · MediaWiki (SRE), Monitoring
Unknown Object (User) added a comment to T9326: https/stunnel warnings for puppet111.

@Paladox would you have any idea what the cause is?

Jun 3 2022, 05:57 · MediaWiki (SRE), Monitoring
Reception123 updated subscribers of T9326: https/stunnel warnings for puppet111.

@Paladox would you have any idea what the cause is?

Jun 3 2022, 05:56 · MediaWiki (SRE), Monitoring

Jun 2 2022

John edited projects for T9326: https/stunnel warnings for puppet111, added: Monitoring, MediaWiki (SRE); removed Infrastructure (SRE), SSL.

Technically because this is the result of a change in how LE/SSL is set up, this is actually owned by MediaWiki-SRE and not Infra

Jun 2 2022, 23:14 · MediaWiki (SRE), Monitoring

Jun 1 2022

Unknown Object (User) moved T9302: Have a way to view job queue data from Backlog to Grafana on the Monitoring board.
Jun 1 2022, 22:47 · Monitoring, MediaWiki (SRE)
Unknown Object (User) moved T9302: Have a way to view job queue data from Backlog to Short Term on the MediaWiki (SRE) board.
Jun 1 2022, 22:47 · Monitoring, MediaWiki (SRE)
Unknown Object (User) edited projects for T9302: Have a way to view job queue data, added: Monitoring; removed MediaWiki.
Jun 1 2022, 22:47 · Monitoring, MediaWiki (SRE)

May 24 2022

Paladox closed T9264: Matomo is down as Resolved.

I read the changelog and it mentioned no schema updates https://matomo.org/changelog/matomo-4-10-0/ before I updated and I had a look at the changes done and there wasn't many.

May 24 2022, 11:38 · Monitoring, Infrastructure (SRE)
RhinosF1 added a comment to T9264: Matomo is down.

If we're updating something without even checking it loads, we have a serious issue.

May 24 2022, 06:33 · Monitoring, Infrastructure (SRE)
Unknown Object (User) added a comment to T9264: Matomo is down.

It's because paladox updated it I think. That's when it started anyway.

May 24 2022, 06:29 · Monitoring, Infrastructure (SRE)
RhinosF1 triaged T9264: Matomo is down as Unbreak Now! priority.
May 24 2022, 06:19 · Monitoring, Infrastructure (SRE)

May 9 2022

Unknown Object (User) moved T5044: Setup centralised logging for services from Backlog to Central Logging on the Monitoring board.
May 9 2022, 19:26 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) added a project to T5044: Setup centralised logging for services: Monitoring.
May 9 2022, 19:26 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) moved T8834: Monitor LoginNotify & failed logins from Backlog to External on the Trust & Safety board.
May 9 2022, 19:23 · Universal Omega, Monitoring, MediaWiki (SRE), Trust & Safety
Unknown Object (User) moved T8834: Monitor LoginNotify & failed logins from Backlog to Long Term on the MediaWiki (SRE) board.
May 9 2022, 19:16 · Universal Omega, Monitoring, MediaWiki (SRE), Trust & Safety
Unknown Object (User) closed T8848: Icinga docs entries for all MediaWiki monitoring as Resolved.
May 9 2022, 05:16 · Documentation, Monitoring, MediaWiki (SRE)

May 5 2022

Unknown Object (User) reopened T8848: Icinga docs entries for all MediaWiki monitoring as "Open".

On second thought I actually will reopen this until the PR is merged, as technically that is the main part of this task, the doc entries on icinga, which isn't done until that is merged.

May 5 2022, 00:46 · Documentation, Monitoring, MediaWiki (SRE)

May 4 2022

Unknown Object (User) added a comment to T8848: Icinga docs entries for all MediaWiki monitoring.

Sorry about that...

May 4 2022, 17:47 · Documentation, Monitoring, MediaWiki (SRE)
Unknown Object (User) closed T8848: Icinga docs entries for all MediaWiki monitoring as Resolved.
May 4 2022, 17:47 · Documentation, Monitoring, MediaWiki (SRE)
Unknown Object (User) reopened T8848: Icinga docs entries for all MediaWiki monitoring as "Open".

Just my PR is left here now.

May 4 2022, 17:47 · Documentation, Monitoring, MediaWiki (SRE)
Reception123 closed T8848: Icinga docs entries for all MediaWiki monitoring as Resolved.

https://meta.miraheze.org/wiki/Tech:Icinga/MediaWiki_Monitoring created and reviewed. Thanks to @Universal_Omega for helping out with some of the sections!

May 4 2022, 17:47 · Documentation, Monitoring, MediaWiki (SRE)
Unknown Object (User) moved T8848: Icinga docs entries for all MediaWiki monitoring from Backlog to Short Term on the MediaWiki (SRE) board.
May 4 2022, 17:43 · Documentation, Monitoring, MediaWiki (SRE)
Unknown Object (User) updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 4 2022, 17:43 · Documentation, Monitoring, MediaWiki (SRE)
Unknown Object (User) updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 4 2022, 17:36 · Documentation, Monitoring, MediaWiki (SRE)
Unknown Object (User) updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 4 2022, 17:09 · Documentation, Monitoring, MediaWiki (SRE)

May 3 2022

Reception123 updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 3 2022, 06:58 · Documentation, Monitoring, MediaWiki (SRE)
Reception123 updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 3 2022, 06:51 · Documentation, Monitoring, MediaWiki (SRE)
Reception123 updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 3 2022, 06:49 · Documentation, Monitoring, MediaWiki (SRE)
Reception123 updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 3 2022, 06:39 · Documentation, Monitoring, MediaWiki (SRE)
Reception123 updated the task description for T8848: Icinga docs entries for all MediaWiki monitoring.
May 3 2022, 06:37 · Documentation, Monitoring, MediaWiki (SRE)

Apr 17 2022

John added a comment to T8848: Icinga docs entries for all MediaWiki monitoring.

What needs to be documented for each check is:

  • Why the check exists/what does it monitor?
  • Is an alert a bad thing?
  • If its warning/critical, how do we fix it? Does it need fixing? Does it need further investigation?
Apr 17 2022, 19:23 · Documentation, Monitoring, MediaWiki (SRE)