- Adding a timestamp column on the slave to see when the row was last updated.
- Eliminating some columns on the slave because you don't need them and they take up space that you can use for better purposes.
- Temporarily handling an on-line upgrade of a dual-master or circular replication setup.
Master | Slave |
---|---|
CREATE TABLE employee ( id SMALLINT AUTO_INCREMENT, name VARCHAR(64), email VARCHAR(64), PRIMARY KEY (id)) |
CREATE TABLE employee ( id SMALLINT AUTO_INCREMENT, name VARCHAR(64), email VARCHAR(64), ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP PRIMARY KEY (id)) |
master> INSERT INTO employee(name, email) VALUES ('Mats', 'mats@example.com'); master> DELETE FROM employee WHERE email = 'mats@example.com'; master> UPDATE employee SET name = 'Matz' WHERE email = 'mats@example.com';In all these cases, the statements execute perfectly well with both table definition since the "missing" column has a default value and each statement gives exactly the names of the columns to update. The
DELETE
and UPDATE
statements naturally
refer only to the column on the master, but for INSERT
it
is necessary to add the column names even if the tuple matches the
definition on the master since it could be different on the slave.Having to give the column names all the time is fragile and if the user—or the application—makes a mistake and types the following statement, replication on the slave will stop with an error:
master> INSERT INTO employee VALUES (DEFAULT, 'Mats', 'mats@example.com');In contrast to statement-based replication, row-based replication will do the right thing and throw away extra columns sent by the master or add default values to extra columns on the slave—if the column has a default value—provided that the columns are added or removed last in the table.
This works fine for the example above since the extra timestamp column is last in the table. The effect is to keep track of when the row was last updated on the slave, which could be used to see if the row is current.
Master | Slave |
---|---|
CREATE TABLE employee ( id SMALLINT AUTO_INCREMENT, name CHAR(64), email CHAR(64), PRIMARY KEY (id)) |
CREATE TABLE employee ( id SMALLINT AUTO_INCREMENT, name VARCHAR(64), email VARCHAR(64), PRIMARY KEY (id)) |
VARCHAR
field instead of a
CHAR
field—recall that VARCHAR
fields
are variable length strings while CHAR
fields occupy a
fixed space in the row. (We don't care too much about the reasons for
using CHAR
on the master, we just use this example to
illustrate the problem.)
When using statement-based replication, this works well since the actual
statement is replicated. However, when using row-based replication we
have the additional requirement (in 5.1) that the column types
have to have identical base types. Unfortunately,
CHAR
and VARCHAR
does not have the same base
type, so replication will stop with an error when you try to execute
the INSERT
, which is not very helpful.
Fortunately, the replication team have extended row-based replication with a new feature in MySQL 5.5: that of converting between types when replicating from a master and to a slave with a different table definition. With this feature, a stricter type checking is also implemented and better error messages.
The conversion checks the declared types on the master and slave and decides before executing the transaction if the conversion is allowed. This means that it does not investigate the actual values replicated: only the types of the column on the master and the slave. In addition to better performance when not checking each value this check is done so that you can be sure that any value replicated between the tables will work, not just the values that you happened to have in your test suite.
When dealing with conversions, we are only considering conversions within the groups below.
- Integer types
TINYINT
,SMALLINT
,MEDIUMINT
,INT
,BIGINT
- Decimal types
DECIMAL
,FLOAT
,DOUBLE
,NUMERIC
- String types
CHAR(N)
,VARCHAR(N)
,TEXT
even for different values of N on master and slave.- Binary types
BINARY(N)
,VARBINARY(N)
,BLOB
even for different values for N on master and slave.- Bit types
- Conversion between
BIT(N)
for different values of N on master and slave.
Within each group, we also have two types of conversions:
non-lossy conversions and lossy conversions. With a
non-lossy conversion you are guaranteed that no information is lost,
but with lossy conversions it is possible that you lose some
information. A typical example of a non-lossy conversion is converting
from a CHAR(32)
field to a CHAR(64)
field—since the target field is wider than the source field,
there is no risk that any part of the string is lost. Converting in
the other direction, however, is a lossy conversion since a string
with more than 32 characters cannot fit into a CHAR(32)
field. A more odd example is conversion between FLOAT
and
DECIMAL(N,M)
, which are always considered lossy,
regardless of the direction the conversion is done. Since it
cannot be guaranteed that all floating-point numbers can be converted
to decimal numbers without losing precision, and vice versa.
Controlling what conversions are allowed is controlled with a new
server variable SLAVE_TYPE_CONVERSIONS
, which is of the
type SET('ALL_LOSSY','ALL_NON_LOSSY')
, that is, it is a
set of allowed conversions. The default for this variable is
the empty set, meaning that no conversions are allowed at all.
If the ALL_NON_LOSSY
constant is in the set, all
conversions (within each group) that do not lose any information are
allowed. For example, replicating from CHAR(32)
to
TINYTEXT
is allowed since the conversion goes to a wider
field (even if it is a different type).
If the ALL_LOSSY
constant is in the set, all conversions
(again, within the same group) that could potentially lose information
is allowed. For example, conversion to a narrower field on the slave,
such as CHAR(32)
to CHAR(16)
is
allowed. Note that non-lossy conversions are not automatically
allowed when ALL_LOSSY
is set.
ALL
is used since we were considering the possibility of allowing conversions within certain groups only, for example, to add the feature of only allowing lossy conversions for strings and non-lossy conversions for integers, we could set SLAVE_TYPE_CONVERSIONS
to 'STRING_LOSSY,INTEGER_NON_LOSSY'
. This is, however, pure speculations at this time.
3 comments:
Hi Mats! Nice write-up. I didn't know about the pending work on charsets for replication--that's a good feature.
On another topic we have run into problems with lossy replication with float and double values in row replication. In the case of double we run into particular trouble because MySQL double has more precision than Java, which uses IEEE formats. Is double replication loss-free for you across all platforms?
Hi Robert!
The internal representation of double in MySQL is indeed a C double. The size of this depends to a large extent on the compiler used, but it is common to use IEEE double precision floats for this (which is covers the required range and precision imposed by the C standard).
For row-based replication, the value is replicated in binary form and is therefore non-lossy, as long as the value is not converted to another type.
For statement-based replication, the value is converted to base-10 (a string) and converted back to a floating-point again, which can give interesting results.
So it seems then that there must be conversion weirdness on some float or double values as you send RR updates across platforms? It therefore sounds somewhat like the RR charset issue, namely that things work as long as master and slave match exactly. At least with float/double you don't have the problem of the representation being changed administratively as you do with charsets.
Post a Comment