Friday, December 18, 2009

MySQL Replicant: a library for controlling replication deployments

Keeping a MySQL installation up and running can be quite tricky at times, especially when having many servers to manage and monitor. In the replication tutorials at the annual MySQL Users' Conference, we demonstrate how to set up replication appropriately and also how to handle various issues that can arise. Many of these procedures are routine: bring down the server, edit the configuration file, bring the server up again, start a mysql client and add a user, etc.

It has always annoyed me that these procedures are perfect candidates for automation, but that we do not have the necessary interfaces to manipulate an entire installation of MySQL servers.

If there were an interface with a relatively small set of primitives—re-directing servers, bringing servers down, add a line to the configuration file, etc.—it would be possible to create pre-canned procedures that can just be executed.

To that end, I started writing on a library that would provide an interface like this. Although more familiar with Perl, Python was picked for this project, since it seems to be widely used by many database administrators (it's just a feeling I have, I have no figures to support it) and just to have a cool name on the library, we call it MySQL Replicant and it is (of course) available at Launchpad.

So what do we want to achieve with having a library like this? Well... the goal is to to provide an generic interface to complete installations and thereby make administration of large installations easy.

By providing such an interface, it will allow description of procedures in an executable format, namely as Python scripts.

In addition to making it easy to implement common tasks for experienced database administrators, it also promotes sharing by providing a way to write complete scripts for solving common problems. Having a pool of such scripts makes it easier for newcomers to get up and running.

The basic idea is that you create a model of the installation on a computer and then manipulate the model. When doing these manipulations, the appropriate commands—either as SQL commands to a running server or shell commands to the host where the server is running—will then be sent to the servers in the installation to configure them correctly.

So, to take small example, how does the code for re-directing a bunch of servers to a master look?

import mysqlrep, my_servers
for slave in my_server.slaves:
   mysqlrep.change_master(slave, my_servers.master)
In this case, the installation is defined in a separate file and is imported as a Python module. Right now, the interface for specifying a topology is quite rough, but this is going to change.
from mysqlrep import Server, User, Linux

servers = [Server(server_id=1, host="",
                  sql_user=User("mysql_replicant", "xyzzy"),
           Server(server_id=2, host="",
                  sql_user=User("mysql_replicant", "xyzzy"),
           Server(server_id=3, host="",
                  sql_user=User("mysql_replicant", "xyzzy"),
           Server(server_id=4, host="",
                  sql_user=User("mysql_replicant", "xyzzy"),
master = servers[0]
slaves = servers[1:]
Here, the Server class represents a server and to be able to do it's job, it is necessary to have one MySQL account on the server and one shell account on the host machine. Right now, it is also necessary to specify the server ID, but the plan is to just require the host, port, socket, SQL account name, and SSH account information. The remaining information can then be fetched from the configuration file of the server. Each server have a small set of primitives on top of which everything else is built:
Server.sql(SQL command)
Execute the SQL command and return a result set.
Server.ssh(command list)
Execute the command given by the command list return an iterator to the result output.
Start the server
Stop the server.
There is a small set of commands defined on top of these primitives that can be used. Here is a list of just a few of them, but there are some more in the library at Launchpad.
change_master(slave, master, position=None)
Change the master of slave to be master and start replicating from position.
Fetch the master position of server, which is the position where the last executed statement ends in the binary log.
Fetch the slave position of server, which is the position where the last executed event ends.
Flush all tables on server and lock the database for read.
Unlock a previously locked database.
Using these primitives, it is easy to clone a master by executing the code below. For this example, I use the quite naive method of backing up a database by creating an archive of the database files and copying them to the new slave.
from mysqlrep import flush_and_lock_database, fetch_master_position
from subprocess import call

position = fetch_master_position(master)
master.ssh("tar Pzcf " + backup_name + " /usr/var/mysql")
call(["scp", + ":" + backup_name, + ":."])
slave.ssh("tar Pzxf " + backup_name + " /usr/var/mysql")
What do you think? Would this be a valuable project to pursue? Here are some links related to this post:


Domas Mituzas said...

Mats, looks similar:

(and yes, one needs such building blocks).

burtonator said...

Spinn3r already has an some code to do this which we can open source.

I think it's here:

the code is in production.

I'll ask one of our engineers if the latest is pushed.


Mats Kindahl said...

Hi Domas!

Right, I noticed the post you made a while ago, and it struck me that it was very similar to what I had started on. I forgot to include it in the post (sorry)): I updated the post to include, among others, a link to your post.