Wednesday, February 03, 2010

MySQL Replicant: Architecture

MySQL Replicant Library
Class Design
In the previous post I described the first steps of a Python library for controlling the replication of large installations. The intention of the library is to provide a uniform interface to such installations and that will allow procedures for handling various situations to be written in a uniform language.

For the library to be useful, it is necessary to support installations that use different operating systems for the machines, as well as different versions of the servers. Specifically, it is necessary to allow some aspects of the system to vary.

  • Depending on the operating system, or even just how the server is installed on the machine, the procedures for bringing the server down and up will differ.

  • Configurations are managed different ways depending on the deployment and there are various other tools to manage configurations of large systems.

    As part of the management of the topology, it is necessary to change the configuration files, but this should play well with other tools.

    In either case, any specific method for configuration handling should neither be required nor enforced.

  • In the example in the previous article, the technique for cloning a server was demonstrated. In this case the naive method of copying the database files was used. For the general case, however, some backup method will be used, but it depends on the requirements of the deployment. In other words, it is necessary to parameterize the backup method as well.
  • Each server in the system has a specific role to fulfill. Some server are final slaves whose only purpose is to answer queries, at least one server is a master, and some servers are relay servers.

To allow the system to be parameterized on these aspects, a set of abstract classes is introduced. In the figure you can see a UML diagram describing the high-level architecture of the Replicant library.

In the figure, there are four abstract classes:
Machine
The responsibility of this class it to handle all issues that are specific to the remote operating system, for example, to fetch files or issue commands to start and stop the server.
Config
The responsibility of this class is to maintain the configuration of a server. To do this, it may need to parse configuration files to be able to extract the specific section containing the definition.
BackupMethod
The responsibility of this class is to provide the primitives to create a backup and restore a backup. In both cases, the class supports taking a backup and potentially placing the backup image at a different machine, and restoring it.
Role
The responsibility of this class is to provide all the information necessary to configure a server in a role. Since the role does not only entails pure configuration information, but can also involve keeping certain tables and other database objects available, this is modeled as a separate class.
The central Server class relies on a Machine instance and a Config instance to implement the interface to the machine and to the configuration, respectively.

Configuration Management

The configuration of the server is made part of the Replicant library since manipulating the server configuration is usually necessary when changing roles of servers.

Depending on the deployment, other configuration managers such as cfengine or puppet are used to administer the configuration of all servers, while others hand-edit the configuration files (which has to be for small configurations, since it would be a pain to administer larger deployments in this way).

Long-term, there should be support for some safety measures when working with server configurations, so implementing an interface for handling server configurations in a safe transaction-like manner—or maybe this should be called a RCU-style manner—seems like a good idea. To support that, the following methods to fetch and replace configurations are introduced.

Server.fetch_config()
Returns a Config instance of the configuration for the server.
Server.replace_config(config)
Replace the configuration of the server with the modified configuration instance config.

This will allow an implementation to keep version numbers around to avoid conflicts, but is not required by the interface.

Each Config instance can then be manipulated by using the following methods:

Config.get(option)
Get the value of option as a string.
Config.set(option[, value])
Set the value of option to value. If no value is supplied, None is used, which denotes that the option is set but not given a specific string value.
Config.remove(option)
Remove the option from the configuration instance entirely.
So, for example, the log-bin option can be set in the following manner:
config = server.fetch_config()
config.set('log-bin', 'master-bin')
server.replace_config(config)

Machines

A MySQL server can run on many different machines and in many setups. A server can run on Linux, Solaris, or Windows, and even in those cases, there can be multiple servers on a single machine.

For a Linux machine with a single server, one usually uses the script /etc/init.d/mysql to start and stop the server—at least on my Ubuntu—but if multiple servers are used on a single machine, then mysqld_multi should be used instead.

For Windows and Solaris, the procedure for starting and stopping servers are entirely different. Windows starts and stops the servers using net start MySQL and net stop MySQL, while Solaris uses the svcadm(1M)

To parameterize the system over the various ways it can be installed, the concept of a Machine is introduced (I actually had problems figuring out a name for this, but this was suggested to me and seems to be good enough).

The responsibility of the Machine class is to provide an interface to access the installed server together with installation information such as the location of configuration files.

BackupMethod

One of the more important techniques when managing a set of server is the ability to clone a slave or a master to create new slaves. Cloning involves taking a backup of a server and then restoring the backup image on a the new slave. Since the techniques for taking backups vary a lot and different techniques will be used in different situations, parameterizing over the various backup methods is sensible.

BackupMethod.backup_to(server, url)
This method will take a backup of server and store it at the location indicated by url.
BackupMethod.restore_from(server, url)
This method will restore the backup image indicated by url into server.

Role

In a deployment, each server is configured to play a specific role. It can either be acting as a master, a slave, or even a relay. To represent a role, a separate Role class is introduced. Once a role is created, a server can be imbued with it.

  • Not every server have an assigned role.
  • Each server can just have a single role.
  • Each roles can be assigned to multiple servers.

Since a role may encompass much more than just setting some configuration parameters, this more flexible approach was chosen. When imbuing a server with a role, a piece of Python code is executed to configure the server correctly.

The use of roles in this case is actually just one of many choices, and when using this approach, there is actually two different ways that roles can be used. I am slightly undecided on the two and would like to hear comments on which one to use.

  1. Roles are just applied to the initial deployment and does not play any role after the system have been deployed. Roles are imbued into a server initially, and then the configuration of the server can be changed by procedures to manipulate the deployment.
  2. Roles exists in the entire deployment and when a server changes roles in the deployment, the Role instance will also change. Every server is assigned a role in the system, which is represented using a subclass of the Role class.

The first is by far the easiest to implement, which is why I chose this at this time. Since the roles are just containers for configuration options and other items that needs to be added, they are easy to write. Since this is what is used in the library currently, it is also what you see in the class design above.

The second approach seems better, but it has a number of consequences:

  • Every server has to have a role class associated with it, even the "initial" role is required.
  • If the role changes, another role class will be associated with it. This forces the role class to not only be able to imbue a server in a role, but to also unimbue the server from that role.
  • It cannot be possible to change the configuration of a server directly, it has to be in the form of defining a role and then changing the server to that role. Unimbuing the server from a role becomes very hard if the configuration of the server is changed outside the control of the role.