Wednesday, April 26, 2006

Row-based replication for the future

I just read Eric Bergen's blog on row-based replication and application development from the presentation of Row-based replication at the User's Conference. Eric is giving all kinds of new ways to use the replication, for example that for a statement you can now configure the replication to only replicate changes to one of the tables in a multi-table statement. He is, however, missing the most important aspect: everything that we can do now and will be able to do in the future that we couldn't do with just statement-based replication. Here are some things that you can do with row-based replication that was not possible with statement-based replication.

Cluster Replication Cluster replication is already in 5.1, but it's worth to mention since it could not be handled with statement-based replication. Inside the cluster, everything is rows: rows are passed back and forth between the nodes and rows are collected to form result of queries. If you are inserting data into the tables using the MySQL NDB Cluster handler ha_ndbcluster, it will be replicated as usual and both statement-based and row-based replication will work. However, if you are using the NDB API to insert rows into the cluster, the rows are "lost" since the server never see those rows. To solve this problem, we invented an "injector" whose sole purpose is to inject rows into the binary log. The injector is created inside ha_ndbcluster and the rows that are inserted into the cluster are injected into the binary log.

Replication of individual partitions to different servers This is not in any version of the server and there are (not yet) any plans to add it, but it's something that is feasible when having row-based replication. Eric is mentioning ways to separate the rows going to different tables, and only replicate one of the tables. This is not limited to replicating different tables to different servers, you could even replicate different parts of the same table to different servers. For example, assume that you want to partition your data depending on how frequently it is accessed. Now, imagine that you could set up partitions for your table of blogs like this:

CREATE TABLE blogs (id INT ..., freq INT, ...)
      PARTITION blog5 VALUES LESS THAN (1000000)
Further imagine that you could set up replication to replicate the different partitions to different servers. We could, for example assume that we wanted to place the blog5 partition on high-end servers dedicated for handling high loads, while the less frequently accessed blogs would be placed on low-end servers. Each access would then also update the statistics, causing the row to move to other servers as it becomes less frequently accessed. To prevent the row from moving back and forth between partitions when it is bordering one of the ranges, we could do the partitioning on the result of calling a UDF, which implements hysteresis by taking trends into account.

Friday, April 21, 2006


Just got Bug#16993 closed, which provides a good lesson into the complications of replication. The original title was RBR: ALTER TABLE ZEROFILL AUTO_INCREMENT is not replicated correctly, but the problem is not related to row-based replication (RBR) nor to ZEROFILL. The culprit is adding an AUTO_INCREMENT column to a table, and it is not actually a bug (to be frank, it depends on your point of view, but "fixing" this bug causes more headaces than it solves as you'll see in a moment). What the example code in the bug description does is creating a table on the slave and the master, but inserting (identical) rows into the tables in different order on the slave and the master; for example in this way:
master> CREATE TABLE ages(name CHAR(30), age INT);
master> INSERT INTO ages SET name='Mats', age=37;
master> INSERT INTO ages SET name='Lill', age=25;
master> INSERT INTO ages SET name='Jon',  age=4;
 slave> INSERT INTO ages SET name='Mats', age=37;
 slave> INSERT INTO ages SET name='Jon',  age=4;
 slave> INSERT INTO ages SET name='Lill', age=25;
Now, if you look at the tables, you will see that the same rows are present in both tables (I'm just showing the result on one of the servers since they are identical):
mysql> SELECT * FROM ages ORDER BY name, age;
| name | age  |
| Mats |   37 | 
| Jon  |    4 | 
| Lill |   20 | 
3 rows in set (0.00 sec)
Why the ORDER BY? Well, potentially the rows could be listed in different order on the master and slave because of one of the following reasons:
  • If this were a real database, we would have been running the master and the slave separately for a while before deciding that we should replicate them and, even though the tables contain the same rows now, the rows would potentially be listed in different order.
  • We are using NDB Cluster as storage engine, and there we have no guarantee on the order of the rows, regardless of the order they were inserted.
Now we (or management) decides that we need to assign a unique id to each person in the table. That is easy, just add a column with an AUTO_INCREMENT option. Since we have replication running, the same change will be made to the table on the slave.
master> ALTER TABLE ages
To our surprise, we do not get the same id:s assigned to the people on the master as on the slave. Look here:
master> SELECT * FROM ages ORDER BY name, age;
| name | age  | id |
| Jon  |    4 |  3 | 
| Lill |   20 |  2 | 
| Mats |   37 |  1 | 

slave> SELECT * FROM ages ORDER BY name, age;
| name | age  | id |
| Jon  |    4 |  2 | 
| Lill |   20 |  3 | 
| Mats |   37 |  1 | 
So, what is happening under the hood? When executing an ALTER TABLE:
  1. a new table is created with the extra column
  2. the rows are copied over one by one from the old table to the new table in a storage engine-specific order
  3. the old table is dropped
  4. the new table is renamed to the name of the old table
Observe that the ALTER TABLE statement is replicated by statement, so it gets executed on both the slave and the master. So, if we're using MyISAM as storage engine, the rows are copied in the order they were inserted. For other storage engines, you get other row orders. So much for the problem, what about the solution and why is this not considered a bug? I will start with the solution, since this explains why we decided not to produce a bug fix for it, but rather adding a caveat to the documentation. The solution is simple: sort the rows before when adding them to the table. To do this, you have to repeat the steps above yourself, but add an ORDER BY clause when inserting the rows. So, the steps to add a column are:
  1. Create a new table with an extra column. Since you might want to add the new column at an arbitrary place, I give a generic solution.
    CREATE TABLE new_ages LIKE ages;
    ALTER TABLE new_ages
  2. Copy the rows into the new table.
  3. INSERT INTO new_ages(name,age) 
      SELECT name,age FROM ages ORDER BY name,age;
  4. Drop the old table.
    DROP TABLE ages;
  5. Rename the old table to the new table.
    ALTER TABLE new_ages RENAME ages;
Now, why do we not write a patch to fix this "bug"? Simply put, the ORDER BY involves a complete sort of the table so implementing a patch to do it this way would force sorting the table for every execution of an ALTER TABLE. In addition, it might not be necessary to sort the rows on every column in the table. Forcing this solution on every user that needs to do an ALTER TABLE would be a bigger evil than leaving the implementation as it is. And of course, if don't care where the new column is, you can replace the first two steps with:
  SELECT name,age FROM ages ORDER BY name,age;