As
Gary already pointed out, the replication behavior when mixing
non-transactional and transactional tables have changed between 5.1.30
and 5.1.31. Why did it change? Well, for starters, it actually never
worked and the existing behavior was fooling people that it actually
worked.
There are several bugs reported on mixing transactional and
non-transactional tables in statements and in transactions. For the two latest
examples, see BUG#28976 and BUG#40116.
To explain the situation, I will first start with some background...
The binary log
For this discussion, we call a statement that manipulates
transactional tables only a
transactional statement and call
it a
non-transactional statement otherwise.
The binary log is a serial history of the transactions
executed on the master, that is, each transaction is written to the
binary log at commit time. To handle this, the binary log has a
thread-specific transaction cache as well as the actual binary
log. Whenever a transactional statement arrives to the binary log, it
is cached in the transaction cache, and once the transaction commits,
the transaction cache is written to the binary log. Non-transactional
statement, on the other hand, are written directly to the binary log
since they take effect immediately. Note that this is an idealized
picture of how it works, and it lacks a lot of critical details, which
we will cover below.
Non-transactional statements in transactions
When using a non-transactional table in a transaction, the effects are
actually written directly to the table and not managed by the
transaction at all. Thanks to the
locking
scheduler, this can guarantee a serialization order of the
statements, but there is no guarantees of atomicity. Note, however,
that the locks are released at the end of the
statement, not
the transaction. This is a result of the MyISAM legacy, which does not
have a notion of transactions. In other words, this is what you get in
a sample execution:
T1> create table myisam_tbl (a int) engine=myisam;
Query OK, 0 rows affected (0.04 sec)
T1> create table innodb_tbl (a int) engine=innodb;
Query OK, 0 rows affected (0.00 sec)
T1> begin;
Query OK, 0 rows affected (0.00 sec)
T1> insert into myisam_tbl values (1),(2);
Query OK, 2 rows affected (0.04 sec)
Records: 2 Duplicates: 0 Warnings: 0
T1> insert into innodb_tbl values (1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
T2> select * from myisam_tbl;
+------+
| a |
+------+
| 1 |
| 2 |
+------+
2 rows in set (0.00 sec)
T2> select * from innodb_tbl;
Empty set (0.00 sec)
Replicating non-transactional statements in transactions is easy
So, how should one handle this case in replication? Note that we right
now only consider statement-based replication (but we will consider
row-based replication later).
So, let us say that we have the transaction above, that is:
BEGIN;
INSERT INTO myisam_tbl VALUES (1),(2);
INSERT INTO innodb_tbl VALUES (1),(2);
COMMIT;
As far as possible, we want to mimic the actual behavior, meaning
that the non-transactional change should be replicated to the slave as
soon as possible so that even if the transaction does not commit for a
long time, the MyISAM change should be visible on the slave
immediately.
OK, this is simple, if a non-transactional statement is inside a
transaction, we just write it to the binary log. The effects have
already taken place, so of course we are write it directly to the
binary log (right?).
Fine, so what about this case then:
BEGIN;
INSERT INTO innodb_tbl VALUES (1),(2);
INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;
COMMIT;
Ew, um, well... no, that does not work. If the
myisam_tbl
insert is written before the
innodb_tbl
insert, it will
not have the 1 and 2.
Bummer...
Hum... OK, so let's say that we only write the non-transactional
statement that is first in the transaction directly to the binary log.
Then we will have it replicated directly to the slave, but for the other case, when the non-transactional statement is not first in the transaction, we will have the
non-transactional statement stored in the transaction cache so that it
can read from the transactional table.
This is how it worked in MySQL at least since 4.1 (I didn't look
earlier), and it serves well...
... unless you consider all the caveats
That was a very rosy picture, but the reality is not that easy to
handle.
Rolling back a transaction.
If the transaction is rolled back, and a non-transactional statement
was in the transaction cache, it is still necessary to write the
transaction to the binary log (with a
ROLLBACK
last).
This is wasting resources (disk space) since if the transaction
contained only transactional statements, it could just be
discarded.
Also, it will be incorrect in the case that the
non-transactional engine is replaced with a transactional engine on
the slave, or if there is a crash after the non-transactional
statement. It only works if the statement is first in the
transaction, because then it will be committed as a separate
transaction ahead of the transaction that is rolled back.
For this reason...
Non-transactional statement need to be at the beginning of a
transaction.
Since only the statements that are at the beginning is "written
ahead", it means that the user have to remember to put the
non-transactional statements first in the transaction. This lulls an
inexperienced user, or one that just didn't consider replication when
writing the transactions, into the idea that replication can handle
the transaction even when the statement is not first in the
transaction.
Incidentally, since we're talking about non-transactional
statements...
What statements are non-transactional?
So, this is the scenario mentioned in
BUG#40116. Let us
introduce two tables, one MyISAM log table and two InnoDB tables
holding the business data. We then add a trigger to log any changes
to one of the InnoDB tables like this:
CREATE TABLE tbl (f INT) ENGINE=INNODB;
CREATE TABLE extra (f INT) ENGINE=INNODB;
CREATE TABLE log (r INT) ENGINE=MYISAM;
CREATE TRIGGER tbl_tr AFTER INSERT ON tbl FOR EACH ROW
INSERT INTO log VALUES ( NEW.f );
Now, let's have a look at this transaction (deliberately without a
COMMIT
or
ROLLBACK
):
INSERT INTO tbl VALUES (1);
INSERT INTO extra VALUES (2);
What about the first statement in the transaction? Is it
non-transactional or transactional? What do we do once we have seen
only that statement?
If we treat the statement as non-transactional statement and write it
ahead of the transaction, we have to make a decision there and then on
whether to commit it or to roll it back, which is just not possible
(it will be wrong whatever we decide).
Another alternative is to look at the "top level" table and treat it
as transactional (because tbl
is transactional). This
appears what 5.0 is doing. This means that the statement would be put
in the transaction cache, but if it is later decided that the
statement should roll back, we have to write the transaction to the
binary log with a ROLLBACK
last. This would work in an
acceptable manner, but what happens if the engines are switched so
that the non-transactional table is the "top level" table? Does this
mean that the statement suddenly becomes non-transactional and is
written ahead of the transaction? If we do that, we will have all the
problems described the previous paragraph about being forced to make a
good decision there and then.
Ooooookey... so we have to cache the statement even if it contains
non-transactional changes. Fine. The only case that we can actually
write ahead is when we have non-transactional statement not containing
any transactional changes and that statement is first in a
transaction... and there is a lot of logic to check that case.
So, we decided to remove that logic and always write statements inside
a transaction to the transaction cache. The only remaining piece of
the logic is that a transaction containing a non-transactional change
is written to the binary log with a ROLLBACK
last. If
there are only transactional changes, the transaction cache is just
tossed and nothing written to the binary log.
The only thing remaining is to print a warning when a statement
containing non-transactional changes is put in the transaction
cache. This is not the case right now: the server prints a warning
when a transaction holding a non-transactional change is rolled
back, which in my view is a tad to late, since the problem
actually occurs when the statement is written to the binary
log.
What about row-based replication?
Until now, we have discussed statement-based replication only, but for
row-based replication we can actually do better. Any changes that are
non-transactional can be written ahead of the transaction since there
are no dependencies on statements inside the transaction. There are
only two problems:
- For performance reasons, rows are cached and written to the
binary log when the statement commits. This means that if there is a
crash before finishing the statement, the rows written for the
statement as far is it got is lost. For most transactional engines,
this is not a problem, since the changes will be rolled back, but
for MyISAM, the changes can stay, leading to inconsistencies between
the tables on the master and the slave. Since it is not reasonable
to handle this case, we assume that each statement for a
non-transactional engines is atomic.
- There is only one transaction cache, and for the
non-transactional changes to "overtake" the transactional changes,
we need an additional cache that is flushed for each statement. This
will be implemented as part of WL#2687.