Operations in 2018: The Story So Far!

The world doesn't stand still and so don't we. Just like for the past few years, Miraheze's Operations team has been working on major projects. In this blog post we want to explain you what we have been doing in 2018!

Short Summary

Work we have done in 2018:

  • Bought various new servers.
  • Upgrading all servers from Debian Jessie to Debian Stretch.
  • Upgrading Puppet from 3 to 4.
  • Migrating to PuppetDB.
  • Upgrading Varnish from 4 to 5.
  • Migrating from Ganglia to Grafana
  • Upgrading from Icinga 1 to Icinga 2.
  • Deploying Restbase.
  • Upgrading MediaWiki from v1.30 to v1.31.
  • Migrating from NFS to LizardFS

New Servers

Every day Miraheze users create new wikis, existing wikis grow and thus the demand for server resources grows significantly. In 2018 we have upgraded and bought various servers to meet the higher demand for capacity.

Most of these changes include:

  • Several new miscellaneous servers and upgrades;
  • A new file storage server to replace our old NFS infrastructure;
  • A new database server to be able to cope with our needs;
  • A upgraded puppet server to be able to cope with our new stack;
  • Upgraded cache proxy servers to be able to cope with our traffic needs

Debian Upgrade

In order to benefit from several wanted features such as PHP 7 and Varnish 5, we needed to upgrade from Debian Jessie to Debian Stretch.

Puppet and PuppetDB

As part of upgrading Debian versions, a new version of Puppet (used for server configuration management) was necessary. This new version of Puppet removed support for a feature we were heavily dependent on (ActiveRecords). ActiveRecords was used to export and centralize certain variables and information from across our infrastructure and similar behaviour was a necessity - in came PuppetDB. PuppetDB is a newer and more native version of the old. ActiveRecords system, however required a new deployment stack in order to support the software. In order to support the new stack we needed to install PostgreSQL. We also had to upgrade the hardware as the new stack required more resources then the old stack.

Migrating from Ganglia to Grafana

Since the very begin we had used Ganglia for monitoring the resource usage (disk, networking, RAM and CPU) of our servers. However, the Ganglia web interface is very outdated and Ganglia overall lacks functionality. We have replaced Ganglia with Grafana, which offers a nice, mobile friendly web interface as well. You can view our Grafana metrics here.

Upgrading Icinga from v1 to v2

Miraheze utilizes Icinga for monitoring all our servers and their services. A good example is checking if all our so-called 'MediaWiki servers' are working as expected. If a service or server fails to operate in a proper fashion, Icinga notifies Miraheze's Operations team via mail and our IRC channels. There is also a web interface offering detailed information regarding the health of Miraheze's servers and services.

Since the beginning Miraheze depended on Icinga v1. However, with Icinga v1 going End Of Life, we needed to migrate to Icinga v2. Icinga v2 is a complete rewrite of Icinga v1 and brought a totally difference config syntax. We had to migrate to other puppet modules to be able to deploy this new version.

Deploying RESTBase

In order to deploy Mathoid and Electron, we needed to deploy RESTBase. Electron allows you to generate a PDF of a page. Mathoid generates math images. We eventually deployed Mathoid locally on each server so we did not need Mathoid going through RESTBase.

Replacing NFS with LizardFS

With the increasing demand on our file storage system, we needed to migrate to a new infrastructure. This allowed us to increase our storage space and using our disk space more efficiently. Our first decision was to migrate to Swift, since the Wikimedia Foundation uses that as well (and thus has excellent internal MediaWiki support) Unfortunately, this migration didn't go well. File uploads went missing and various bugs and issues caused many downtimes to our wikis. We then researched another file system solution, which was LizardFS. We read nice reviews on it and so decided to try it out. LizardFS performs quite well and is of great assistance to our servers, especially due to its better stability.

Upgrading MediaWiki from 1.30 to 1.31

We guarantee to stay up to date with MediaWiki releases. This time was no exception, we began testing before 1.31 was released.

What are we planning next?

In partner with our goals Goal-2018-Jul-Dec we are planning on doing the following:

  • Migrating our scripts to Python 3 (T3647)
  • Switch from cp5 to cp3 (T2362 and T3405)
  • Migrate to Debian Buster when a stable release is released.
  • Migrate to Puppet 5 and PuppetDB 5 next year at the same time as buster update.
  • Upgrading to Varnish 6 next year.
  • and much more :)
Written by Paladox on Oct 1 2018, 21:24.

Event Timeline