A common request is to have replication crash-safe in the sense that the replication progress information always is in sync with what has actually been applied to the database, even in the event of a crash. Although transactions are not lost if the server crashes, it could require some tweaking to bring the slaves up again.
In the latest MySQL 5.6 milestone development release, the replication team has implemented crash-safety for the slave by adding the ability of committing the replication information together with the transaction (see Figure 1). This means that replication information will always be consistent with has been applied to the database, even in the event of a server crash. Also, some fixes were done on the master to ensure that it recovers correctly.
If you're familiar with replication, you know that the replication information is stored in two files: master.info
and relay-log.info
.
The update of these files are arranged so that they are updated after the transaction had been applied. This means that if you have a crash between the transaction commit and the update of the files, the replication progress information would be wrong.
In other words, a transaction cannot be lost this way, but there is a risk that a transaction could be applied yet another time.
The usual way to avoid this is to have a primary key on all your tables. In that case, a repeated update of the table would cause the slave to stop, and you would have to use SQL_SLAVE_SKIP_COUNTER
to skip the transaction and get the slave up and running again. This is better than losing a transaction, but it is nevertheless a nuisance.
Removing the primary key to prevent the slave from stopping will only solve the problem partially: it means that the transaction would be applied twice, which would both place a burden on the application to handle dual entries and also require that the tables to be cleaned regularly.
Both of these approches require either manual intervention or scripting support to handle. This does not affect reliability, but it is so much easier to handle if the replication information is committed in the same transaction as the data being updated.
Crash-safe masters
Two problems related to crash-safe replication has been fixed in the master, both of which could cause some annoyance when the master recovered.
- If the master crashed when a binary log was rotated, it was possible that some orphan binlog files ended up in the binary log index file. This was fixed in 5.1 but is also a piece in the pussle of having crash-safe replication.
- Writing to the binary log is not an atomic operation, and if a crash occured while writing to the binary log, there were a possibility of a partial event at the end of the binary log.
Now, the master recovers from this by truncating the binary log to the last known good position, removing the partially written transaction and rolling back the outstanding transactions in the storage engines.
Crash-safe slaves
Several different solutions for implementing crash-safety—or
transactional replication, as it is sometimes known as—have been proposed, with Google's
TransactionalReplication patch being the most known. This solution stores the replication positions in the InnoDB transaction log, but the MySQL replication team decided to instead implement crash-safety by moving the replication progress information into system tables. This is a more flexible solution and has several advantages compared to storing the positions in the InnoDB transaction log:
- If the replication information and data is stored in the same storage engine, it will allow both the data and the replication position to be updated as a single transaction, which means that it is crash-safe.
- If the replication information and data is stored in different storage engines, but both support XA, they can still be committed as a single transaction.
- The replication information is flushed to disk together with the transaction data. Hence writing the replication information directly to the InnoDB redo log does not offer a speed advantage, but does not prevent the user from reading the replication progress information easily.
- The tables can be read from a normal session using SQL commands, which also means that it can be incorporated into such things as stored procedures and stored functions.
In addition to giving us crash-safe slaves the last of these advantages should not be taken lightly. Being able to handle replication from pure SQL put some of the key features in the hands of application developers.
As previously mentioned, the replication information is stored in two files:
master.info
- This file contain information about the connection to the master—such as hostname, user, and password—but also information about how much of the binary log that has been transferred to the slave.
relay-log.info
- This file contain information about the current state of replication, that is, how much of the relay log that has been applied.
Options to select replication information repository
In order to make the solution flexible, we introduced a general API for adding replication information repositories. This means that we can support multiple types of repositories for replication information, but currently, only the old system using files
master.info
and
relay-log.info
and the system using tables
slave_master_info
and
slave_relay_log_info
is supported. In order to select what type of repository to use, two new options were added. These options are also available as server variables.
- master_info_repository
- The type of repository to use for the master info data seen in Table 1.
- relay_log_info_repository
- The type of repository to use for the relay log info seen in Table 2.
Both of the variables can be set to either
FILE
or
TABLE
. If the variable is set to
TABLE
the new table-based system will be used and if it is set to
FILE
, the old file-based system will be used. The default is
FILE
, so make sure to set the value if you want to use the table-based system.
If you look in Table 1 and Table 2 you can see the column names used for the tables as well as the line number in the corresponding file and the column name in the output of
SHOW SLAVE STATUS
. Since we are using tables, the column names are used for storing the data in the table, but when using a file, the column names are only used to identify the correct row to update and the value is inserted at the line number given in the table.
The format of the tables have been extended with an additional field that is not present in the files but which is present in the table: the Master_id
field. The reason we added this is to make it possible to extend the server to track multiple masters. Note that we currently have no definite plans to add multi-source support, but as good engineers we do not want these tables to be a hindrance to adding multi-source.
Selecting replication repository engine
In contrast with most of the system tables in the server, the replication repositories can be configured to use any storage engine you prefer. The advantage of this is that you can select the same engine for the replication repositories as the data you're managing. If you do that, both the data and the replication information will be committed as a single transaction.
The new tables are created at installation using the mysql_install_db
script, as usual, and the default engine for these tables are are the same as for all system tables: MyISAM. As you know MyISAM is not very transactional, so it is necessary to set this to use InnoDB instead if you really want crash-safety. To change the engine for these tables you can just use a normal ALTER TABLE.
slave> ALTER TABLE mysql.slave_master_info ENGINE = InnoDB;
slave> ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;
Note that this works for these tables because they were designed to allow any storage engine to be used for them, but it does not mean that you can change the storage engine for other system tables and expect it to work.
Event processing
This implementation of crash-safe slaves work naturally with both statement-based and row-based replication and there is nothing special that needs to be done in the normal cases. However, these tables interleave with the normal processing in a little different ways.
To understand how transactions are processed by the SQL thread, let us consider the following example transaction:
START TRANSACTION;
INSERT INTO articles(user, title, body)
VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....');
UPDATE users SET articles = articles + 1 WHERE user_id = 4711;
COMMIT;
This transaction will be written to the binary log and then sent over to the slave and written to the relay log in the usual way. Once it is read from the relay log for execution, it will be executed as if an update statement where added to the end of the transaction, before the commit:
START TRANSACTION;
INSERT INTO articles(user, title, body)
VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....');
UPDATE users SET articles = articles + 1 WHERE user_id = 4711;
UPDATE mysql.slave_relay_log_info
SET Master_log_pos = @@Exec_Master_Log_Pos,
Master_log_name = @@Relay_Master_Log_File,
Relay_log_name = @@Relay_Log_File,
Relay_log_pos = @@Relay_Log_Pos
COMMIT;
In this example, there is a number of pseudo-server variables (that is, they don't exist for real) that have the same name as the corresponding field in the result set from
SHOW SLAVE STATUS
. As you can see, the update of the position information is now inside the transcation and will be committed with the transaction, so if both
articles
and
mysql.slave_relay_log_info
are in
the same transactional engine, they will be committed as a unit.
This works well for the SQL thread, but what about the I/O thread? There are no transactions executed here at all, so when is the information in this table committed?
Since a commit to the table is expensive—in the same way as syncing a file to disk is expensive when using files as replication information repository—the updates of the slave_master_info
table is not updated with each processed event. Depending on the value of sync_master_info there are a few alternatives.
- If sync_master_info = 0
- In this case, the
slave_master_info
table is just updated when the slave starts or stops (for any reason, including errors), if the relay log is rotated, or if you execute a CHANGE MASTER
command.
- If sync_master_info > 0
- Then the
slave_master_info
table will be updated every sync_master_info event.
This means that while the slave is running, you cannot really see how much data has been read to the slave without stopping it. If it is important to see how the slave progress in reading events from the master, then you have to set
sync_master_info to some non-zero value, but you should be aware that there is a cost associated with doing this.
This does not usually pose a problem since the times you need to read the master replication information on a running replication is far and few between. It is much more common to read it when the slave has stopped for some reason: to figure out where the error is or to perform a master fail-over.
Closing remarks
We would be very interested in hearing any comments you have on this feature and how it is implemented. If you want to try this out for yourselves then you can download the MySQL 5.6 Milestone Development Release where all this is implemented from the
MySQL Developer Zone (dev.mysql.com
).
If you want to find out more details, the section
Slave Status Logs in the MySQL 5.6 reference manual will provide you with all the information.
This is one of the features that presented by Lars Thalmann April 11, 2011 (yesterday) at 2:30pm, at the "MySQL Replication" talk at Collaborate 11 and April 12, 2011 (today) 10:50am "MySQL Replication Update" at the O'Reilly MySQL Conference & Expo.