A little more than a year ago, I was hired to implement row-based replication to the MySQL database server. Since the principles are easy enough, I thought this would be a straightforward task to be done in a few months, tops. As always, I quickly got punished for my hybris: getting a basic row-based replication up and running was relatively straightforward but as the saying goes,
the devil is in the details.
Row-based replication is now safely tucked away in MySQL 5.1 for anybody who wishes to use it, but the obvious question is then
what does it give me and why should I use it?
When using statement based replication, the replication is accomplished by replicating the actual SQL statements to the slave server directly. That works fine for most statements, but in some situations, this does not work as expected. This time, we will only look into one such situation. Consider the following SQL statements:
UPDATE account
SET balance = balance + 100
WHERE name = "Sakila";
INSERT INTO transactions
VALUES ('Sakila', 'deposit', 100, UUID(), NULL);
The purpose is to update the balance and add a line to the transaction log that to keep track of all the transactions done to the accounts, including the user-id clerk that performed the transaction. This works fine when executed on the master, but when the slave thread executes the statements, it will be the
UUID()
of the SQL thread on the slave, which is really not what we want. It is, of course, possible to handle this for the built-in SQL functions, but for user defined functions (UDFs), there is no chance at all that we can make it work.
Instead of logging the entire statement, we can log just the row change made to the table, and this is what row-based replication is about. That way, we will insert
exactly the same row at the slave as we did on the master.
To control the logging format, we introduced a new server variable
BINLOG_FORMAT
that can take the values
STATEMENT
,
MIXED
, and
ROW
. The formats
STATEMENT
and
ROW
do what you expect, use statement-based replication or row-based replication respectively, while the
MIXED
mode will temporarily switch to row-based replication for the statement if it uses a function that will give a different results when executed on the master and by the SQL thread.
So, assume that I've got the following two tables:
master> describe transactions;
+--------+------------------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------------------------+------+-----+-------------------+-------+
| name | char(20) | YES | | | |
| kind | enum('deposit','withdrawal') | YES | | | |
| amount | decimal(10,2) | YES | | | |
| clerk | int(11) | YES | | | |
| time | timestamp | YES | | CURRENT_TIMESTAMP | |
+--------+------------------------------+------+-----+-------------------+-------+
5 rows in set (0.00 sec)
master> describe account;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| name | char(20) | YES | | | |
| balance | decimal(10,2) | YES | | | |
+---------+---------------+------+-----+---------+-------+
2 rows in set (0.01 sec)
Here's the how you do to set replication to use
MIXED
binlog format and execute the statements above:
master> SET BINLOG_FORMAT=MIXED;
Query OK, 0 rows affected (0.00 sec)
master> UPDATE account
-> SET balance = balance + 100
-> WHERE name = 'Sakila';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
master> INSERT INTO transactions
-> VALUES ('Sakila', 'deposit', 100, UUID(), NULL);
Query OK, 1 row affected, 1 warning (0.00 sec)
master> SHOW BINLOG EVENTS FROM 975;
+-------------------+------+------------+-----------+-------------+------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------------+------+------------+-----------+-------------+------------------------------------------------------------------------------+
| master-bin.000001 | 975 | Query | 1 | 1102 | use `test`; UPDATE account SET balance = balance + 100 WHERE name = 'Sakila' |
| master-bin.000001 | 1102 | Table_map | 1 | 1155 | table_id: 17 (test.transactions) |
| master-bin.000001 | 1155 | Write_rows | 1 | 1206 | table_id: 17 flags: STMT_END_F |
+-------------------+------+------------+-----------+-------------+------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
The
UPDATE
statement is logged as a statement using a query event, while the
INSERT
statement is logged using a table map event and an event containing the rows inserted (one, in this case).
I'll explain more about the table map events and the various forms of row-containing events, but for now you have to trust me when I say that there are rows in the
Write_rows
event. :)