Page MenuHomeMiraheze

db2 lacks disk space (and db3 also does)
Closed, ResolvedPublic

Description

The amount of disk space on db2 has been low for a few months now, causing two outages, and deleting binlogs will won't help anymore now. Since we only have 4.8GB (3%), and only 23GB on db3, we will need a solution very soon.

RamNode asks $28/mo (doubling db2 costs) for 80GB extra (which is only 66% more), so that is not an option. We could also buy a new 6GB SVZS (db4) for $21/mo, which gives us 100GB extra (minus overhead for OS), that is cheaper (and less databases per server, which is also good).

The other option would be trying TransIP VPSs: https://www.transip.eu/vps/ - they offer 100GB SSD(!) disk space add-ons for only €7.5/mo (slightly less than $9/mo), but it's a provider we've never used before, and we don't know the performance impact (SSD performance slower/faster than RamNode? What about the latency between MediaWiki and MariaDB?). This one needs some discussion.

Technology-Team please give your opinion regarding any of the above options.
@labster since you do the finances your input is important as well.

Event Timeline

I forgot to mention we have about 20GB left on db3, so we can use that server to free some space on db2 if we need to. However, ~25GB is still not enough.

I think we do need a db4 in the future, to avoid any other programs related to disk space.

I'm not sure about trying TransIP but it sounds interesting. Though the issue is if there are issues regarding performance or latency it will be difficult to switch server and transfer data.

OK, so in general, I feel like RamNode doesn't have any plans that meet our requirements for a database server. Just by our design -- make all the tables in advance, have lots of small wikis -- this implies we're going to have a lot of data at rest. Some of our services, like the wright*wikis, sit inactive most of the time by design. So we don't really need high-CPU, high-memory DB servers. It's probably worthwhile to look at other providers.

Comparing to TransIP, it looks like RamNode lives up the "RAM" part of its name. The disk space is cheap at TransIP, but what do we need in terms of everything else? I'm guessing we'd need at least two cores. And because we use DB as cache, the memory there is important too. Let's say we need 3GB. And because we're comparing to RamNode with 200GB, let's do a server with 350GB at TransIP. Now we're up to $60/month, compared to RamNode's $56. In exchange for 50GB storage, we lose 13GB ram and $4.

Now, if we're looking to consolidate servers, the math looks a little better. 4GB/350GB/2 is at $77.5/month, compared to (21+21+28 =) $70 for net 320GB, with relatively cheap ways of upgrading storage from there. As well as being able to have only one db server again, where dividing between servers wastes staff time.

Another alternative here is a CVZ from RamNode. I feel like we've tried spinning rust before, but I don't remember the outcome. Was that for static content? It's something to consider, anyway. Also consider creating Redis nodes at RamNode in front of a TransIP DB server, which would probably lessen the memory requirements. It'd be nice if mw servers had enough memory to host local instances of Redis, like we do at $dayjob, but for some reason Mediawiki tends to eat RAM.

Because I don't think I met any definitive conclusions here, I'm going to remind the ops team that it is always OK to upgrade a server on a temporary basis to ensure stability. If we need to spend $28 for one month while we're looking for a better solution, go ahead and do it.

OK, so in general, I feel like RamNode doesn't have any plans that meet our requirements for a database server. Just by our design -- make all the tables in advance, have lots of small wikis -- this implies we're going to have a lot of data at rest. Some of our services, like the wright*wikis, sit inactive most of the time by design. So we don't really need high-CPU, high-memory DB servers. It's probably worthwhile to look at other providers.

We're having more than enough CPU power, so that isn't the problem. Enough RAM as well (8GB/6GB RAM), we do want at least 4GB for a server though.

Comparing to TransIP, it looks like RamNode lives up the "RAM" part of its name. The disk space is cheap at TransIP, but what do we need in terms of everything else? I'm guessing we'd need at least two cores. And because we use DB as cache, the memory there is important too. Let's say we need 3GB. And because we're comparing to RamNode with 200GB, let's do a server with 350GB at TransIP. Now we're up to $60/month, compared to RamNode's $56. In exchange for 50GB storage, we lose 13GB ram and $4.

Were you looking at the VPS X1 plan? A VPS X4 with 4GB RAM, dual-core CPU and 250GB SSD costs only €27.50/month (about $31.8/month).

Now, if we're looking to consolidate servers, the math looks a little better. 4GB/350GB/2 is at $77.5/month, compared to (21+21+28 =) $70 for net 320GB, with relatively cheap ways of upgrading storage from there. As well as being able to have only one db server again, where dividing between servers wastes staff time.

Another alternative here is a CVZ from RamNode. I feel like we've tried spinning rust before, but I don't remember the outcome. Was that for static content? It's something to consider, anyway. Also consider creating Redis nodes at RamNode in front of a TransIP DB server, which would probably lessen the memory requirements. It'd be nice if mw servers had enough memory to host local instances of Redis, like we do at $dayjob, but for some reason Mediawiki tends to eat RAM.

The CVZ servers have really slow HDDs, and are overpriced in my opinion - RamNode removed SSD caching from those servers (making them even slower), but did not increase the amount of space for each plan.

Because I don't think I met any definitive conclusions here, I'm going to remind the ops team that it is always OK to upgrade a server on a temporary basis to ensure stability. If we need to spend $28 for one month while we're looking for a better solution, go ahead and do it.

Noted.

Two VPS X4 4GB/250GB/dual-core servers would cost about $64/month (depends on the exchange rate). Our current stack costs $28 (db2) + $18.90 (db3) = $46.90/month. For this $17.10/month increase, we gain 280GB SSD space and 2.5TB bandwidth, and we lose 6GB RAM (4GB for db2, 2GB for db3). All those servers are hosted in NL (with about 2ms latency between TransIP and RamNode servers, which is no problem), and we get €20 discount for the first month.

However, VPSs are servers that share resources (disk/cpu/ram) with other non-Miraheze VPSs. Dedicated servers don't, and if we are looking to spend $64/month on two VPSs, we may want to look at dedicated servers as well. I quickly spoke with LeaseWeb via live chat, and if we want we can mail sales for a quote (I was looking at this box).

I have yet to look into dedicated servers yet but I was talking to @Southparkfan about 3 20€ servers from transip that would give us way more storage (and more total ram but less per server) for 60€ per month. This brings up our costs a bit but seems more cost effective and if one server (as a test) works out (latency etc) then we could migrate wikis and decommission db2 and db3

3 servers is an issue without real rotational ability in CreateWiki/ManageWiki

@labster I would like to create a TransIP account and purchase one TransIP PureSSD X4 with 100GB extra disk space (total price €17.50) for testing purposes. You will have to make the actual payment since we don't have funds there.

Please approve/comment.

OK, let's go ahead and do it.

I have created the account with your gmail.

  1. TransIP requires phone verification which can only be done by you.
  2. They are charging 21% VAT (so €17.50 + €3.68 VAT). That's a pity, but we can always cancel the server if we think the expenses are too high.

Please buy the server (X4 with 100GB SSD add-on), change the account password to something I can store safely (just like the SolusVM password) and assign back to me.

I just created an account on TransIP, but I'm not sure what to do next.

I sent an email to @Southparkfan and staff@ regarding leaseweb.

NDKilla changed the status of subtask T2541: LeaseWeb trial from Stalled to Open.Jan 2 2018, 13:59
Reception123 raised the priority of this task from High to Unbreak Now!.Jan 28 2018, 19:04

Db3 has just also gone critical with 8 GB (db2 having 5.5 GB) This is now Unbreak now, and if something isn't done we will run out of space soon.

Reception123 renamed this task from db2 lacks disk space to db2 lacks disk space (and db3 also does).Jan 28 2018, 19:04

Hey guys tell me what to order and I'll do it.

Southparkfan lowered the priority of this task from Unbreak Now! to High.Feb 3 2018, 23:54

First wikis have been moved to a new server (T2541).

@Southparkfan The thing that is most likely what takes up the most space is https://github.com/miraheze/mw-config/blob/master/LocalSettings.php#L482.

Would it not be possible to somehow make something that automatically adds the SQL files for the extensions, but only when the extensions are actually enabled? Some kind of script?

For example, if "AJAXPoll" is enabled, run SQL queries on one wiki only.

This way, for starters, "empty/new" wikis would not be 25 MB anymore, and there are many of these wikis.

Would it not be possible to somehow make something that automatically adds the SQL files for the extensions, but only when the extensions are actually enabled? Some kind of script?

Yes, with ManageWiki potentially.

This would definitely reduce a very large of disk space, as I just ran an SQL query for some wikis (new extension) and it took about 2 GB.

So, this current db4 setup is pretty expensive, and it doesn't seem we can get a quote below LeaseWeb's site prices, thus:

  • 1x db4 (16GB/quad-core CPU with HT/4x 240GB SSD) €79.99
  • 3x quad-core VPSs for MediaWiki (4GB/80GB)[1] €53.91
  • 1x single-core/40GB/1GB Varnish €4.95

makes €138.85/mo. For comparison, we currently spend €64.13/mo[2] at RamNode, so we would be more than doubling our expenses.

Although, how did we determine SSDs are an absolute requirement? The only HDD-backed database server we have ever had was a CVZ VPS, that was on a node shared with lots of other customers. Any dedicated server with at least four disks in a RAID5/RAID10[3] should offer better performance than those CVZ servers.

Also, SSDs are very useful for write speeds (for example, in write-heavy workloads), but may be not worth the money in read-heavy workloads, since RAM can be used to cache data (in OS cache and the InnoDB buffer pool), and RAM is much cheaper. I've had a look at some performance metrics, and the read/write ratio on db2 and db3 is about 95/5, which indicates we have a read-heavy workload. So, given our dataset is quite big at the moment, lots of RAM should do the trick and increase our read/write hit efficiency, right?

[db2] [OK] InnoDB Read buffer efficiency: 99.48% (6871966818 hits/ 6908031077 total)
[db3] [OK] InnoDB Read buffer efficiency: 99.92% (160530512738 hits/ 160652851321 total)

Let me remind you our InnoDB buffer pool is only 768M, so >=99.5% read efficiency is great. If we get some a decent amount of RAM for a server (say, 16GB) and increase the buffer pool to 10GB, we should already be fine. Unlike traditional workloads, we have a lot of content that is not frequently used (very many closed wikis and wikis that are abandoned soon after, probably), so the rest of the RAM is more useful for increasing max. concurrent connections and having extra RAM for the OS cache.

Let's say we swap the SSDs in db4 for 4x 2TB HDDs in RAID10, giving us about 4TB usable. Now we're down to €58.99/mo instead of €79.99/mo, while having way more space (only compromising on not having the fast SSDs). Even if you throw in 32GB RAM, it's still €71.99/mo - somewhat less than db4. The 16GB RAM server and rest of the stack will cost us €117.85/mo then, still quite a lot, but at least less than the initial quote.

Shall we go with TransIP (which offers way more flexible servers without the hassle of stupid KVM/IP and outdated iLO), we could get the following:

  • 1x quad-core/8GB/500GB SSD MariaDB master €65/mo
  • 3x quad-core/1GB/50GB SSD MediaWiki €60/mo
  • 1x single-core/1GB/50GB SSD Varnish €10/mo

Total: €135/mo. Previous time I tried to buy that, there was no way to avoid the additional 21% VAT (we're US based yet it's a Dutch company so charge VAT nevertheless..?).

I talked with @NDKilla about this, and he thinks we should buy Scaleway's dedicated servers/VPSs. I was not in favour of that idea since the ping between RamNode and Scaleway is 25ms (they are both in NL!!), which means to me Scaleway has some pretty bad networking. However, they are ridiculously cheap compared to the competition:

  • 1x octa-core/32GB/250GB direct disk + 50GB SSD[4] MariaDB master €34/mo
  • 3x quad-core/8GB/50GB SSD[4] dedicated server MediaWiki €36/mo
  • 3x quad-core/4GB/100GB SSD VPS MediaWiki €18/mo
  • 1x quad-core/8GB/50GB SSD[4] dedicated server Varnish €12/mo
  • 1x dual-core/2GB/50GB SSD VPS Varnish €3/mo

Database on bare metal, MediaWiki + Varnish on VPS? €55/mo
Database and MediaWiki on bare metal, Varnish on VPS? €73/mo.
All on bare metal? €82/mo. That's cheap.

Personal opinion, I am skeptical of Scaleway due to their RTT and absurd low prices. LeaseWeb seems okay but is just very expensive if we want to go all-SSD, and we don't know 100% sure if HDDs will be fine for our databases. TransIP is just out of our price range. But, recently RamNode introduced virtual dedicated servers. For $80/mo (that is €64.47/mo) we get four dedicated CPUs, 16GB RAM. 400GB SSD (disk I/O is not dedicated to us, though) and it's in the same DC as our other servers (thus no need to move our current VPSs).

I would like anyone's input on this.

[1] I know that's 4x more RAM than we have now, but LeaseWeb cannot give us custom-sized virtual servers.
[2] cp4 is paid quarterly. For the monthly expenses I've calculated the yearly costs and converted them from USD to EUR, so the amount is subject to different exchange rates.
[3] As far as I know RAID5 is better for reads and gives us 50% more usable disk space (only one disk is needed for parity, instead of needing 50% of the disks like RAID10) on a server with four disks. However, RAID10 is way faster in writes (no need for parity calculations), more fault-tolerant (at least one drive failure, but up to 50% of the drives may fail as long as they are not in the same subset), better rebuild performance and less prone to disk failure during rebuilds.
[4] On bare metal, SSD volumes can be expanded: "Increase your storage capacity with flexible SSD volumes. Available per 50GB increments up to 150GB, with a maximum of 15 volumes per server. Volumes are billed €1/month per additional 50GB of SSD."
I am a bit confused there. Does it mean you could, technically, have no bare metal MediaWiki servers bigger than 200GB, because such software does not support disk space distributed among several partitions? (Varnish definitely does though, MariaDB I'm not sure but I think not.) That would be a pity, because otherwise €1/mo for 50GB SSD extra would rather be a great deal.

@Southparkfan In regards to the very final note of your comment, software raid a possibility?

Even though the cheaper option would be Scaleway, I personally think staying on the same DC is a better idea, at least for now. Moving servers/changing server configuration would require a lot of work, and I'm not sure either of us currently have time to take care of it (especially in the time available before disk space run out).

For TransIP and Leaseweb I find it a bit much that we have to double our current costs, for not too much more space.

Therefore, I think we would need the quickest option to end the disk problems (currently, if we end the temporary LeaseWeb "db4" subscription, we no longer have enough space to store the dbs which it contains

@Southparkfan In regards to the very final note of your comment, software raid a possibility?

As far as I can see, it is not possible to create one big volume. (copying my comment from IRC here for visibility)

For TransIP and Leaseweb I find it a bit much that we have to double our current costs, for not too much more space.

A server with HDDs will also (almost?) double our costs, but will also give us about 17x (not truly 4TB, rather something like 3.6TB) more db space, lots of RAM and 8 cores (latter is nice to have but overkill for our needs), all dedicated. Do you think that is a good idea?

Since I think that the 16GB VDS ($80/mo) is our best option at the moment (same DC & host, easy setup, good performance, reasonable price for 400GB SSD), I would like to purchase that server as soon as possible.

@NDKilla, @Reception123, @revi (also tagging you since this server is critical for MediaWiki) shall any of you have objections against this idea, please notify me before Monday, since the LeaseWeb expires after February 28, and my time is limited after Monday.

If I don't receive any objections from your side, I will ask labster to buy the server immediately.

As I said before, no objections from me.

@Southparkfan So, after the purchase, what would be the procedure for the transfer of dbs?

22:57:42 <+SPF|Cloud> 468 wikis deleted, >14GB database space reclaimed, 1.5GB nfs space reclaimed, over 86k database tables gone, ~2800 > 2400 wikis now, amount of closed wikis (854 ATM) / total wikis went from 46.2% to 35.5%

Lots of wikis that should be deleted regardless, and we still have an issue (empty tables taking up too much space), but we'll survive this week.

@labster if you approve, please increase the funds as necessary. Purchasing and such will be done by me.

(db4 has been decommissioned, shutdown and I have cancelled the service in the LeaseWeb interface)

Yeah, approved. 16GB VDS looks like a good deal for us without changing too much on the billing side either. There's enough money in the RamNode balance to make the purchase, so go ahead whenever you're ready.

VDS is out of stock at the moment. I've opened ticket #112243 regarding that.

db4.miraheze.org has been installed and is ready for use.