Tuesday, May 16, 2006

Replication of DELETE FROM versus TRUNCATE TABLE

A bug titled DELETE FROM inconsistency for NDB (Bug#19066) dropped into my lap, and while fixing it, we had to make some hard decisions on what should be considered the "correct" way to solve this.

The bug is related to the difference between TRUNCATE TABLE and DELETE FROM with no WHERE clause. On the surface, they seem to be equivalent, but when digging deeper, we will see that there is big difference between the statement when replication comes into play.

Before delving into the problem and the solution, I'll start by recapitulate some selected parts of the manual.

  • The TRUNCATE TABLE and DELETE FROM with no condition are "logically equivalent": TRUNCATE TABLE empties a table completely. Logically, this is equivalent to a DELETE statement that deletes all rows, but there are practical differences under some circumstances. (MySQL 5.1 Reference Manual, Section 13.2.9)
  • The TRUNCATE TABLE is implemented as a DROP + CREATE: Truncate operations drop and re-create the table, which is much faster than deleting rows one by one. (MySQL 5.1 Reference Manual, Section 13.2.9)
  • The DELETE FROM without a condition will delete each row in the table: ...a DELETE statement with no WHERE clause deletes all rows.(MySQL 5.1 Reference Manual, Section 13.2.1)
In other words, both statements are supposed to empty the table. When dealing with replication, however, the concept of logical equivalence is pushed to the limit. Normally, the operational behaviour and the post-condition is what we replicants use to guide us into what a certain statement should do. So what are the operational behaviour and the post-condition of the two statements?

The operational behaviour of TRUNCATE TABLE (let's just call it TRUNCATE henceforth) is to drop the table and re-create it and the post-condition is an empty table. In contrast, the operational behaviour of a DELETE FROM with no WHERE clause (let's call it DELETE-ALL henceforth) is to remove each row from the table, and the post-condition is that every row that were in the table is removed.

Wait a minute... there is no difference between these two. In both cases, the end result is an empty table! Isn't it?

Yes, that is correct... if you are running an isolated statement and executing the operations on the original table. Isolation on statement level typically achived using locks (either table locks or row locks), so there will usually not be any other process interfering with the operation.

Now, let us see what it is that makes these two statements behave differently even though they superficially appear to be equivalent.

NDB Cluster swings, but misses. The original bug report mention NDB, which is special for this particular bug in that the storage engine does not have (global) table locks [sic]. When NDB deletes the rows of a table using DELETE-ALL, it does it row-by-row. If another client starts to insert rows into that table, it might be that the rows remains in the table after the DELETE-ALL has completed. So the table will not be empty.

Interestingly enough, the TRUNCATE for NDB is implemented as a DELETE-ALL, so neither of the statements will empty the table if another client starts inserting rows.

Circular replication grapples and throws. Suppose that you have a setup like the multi-master setup described by Giuseppe Maxia.

In this case, the replication can progress in a circle. Suppose that you're adding rows to one of the servers (A) in the circle, while at the same time trying to empty the same table on another of the servers (B). Does it make sense to remove the inserted row together with the all the rows that were present in B? Not really... that would make B delete things that it has no idea about. It could well be that the administrator checked the table visually, decided that there's nothing there that needed, and issued a DELETE-ALL. It doesn't help if the operator lock the table to make sure that nobody changes the table. In this sense, DELETE-ALL should delete what the server knows about, and nothing else.

Multi-source replication is standing by to finish the job. We don't have multi-source replication yet, but we will have it in the near future. When we have and use it, there is a huge difference between emptying the table and deleting all rows in the table. Reading data from several different masters and inserting it into one table is a convenient way to aggregate data from, e.g., several different branches of a company (where the table contain a branch id as well, to avoid conflict). If one branch decides to drop all the data it has, only the rows related to that branch should be deleted, not data about all branches.

The moral of the story

  • If you want an empty table, you should use TRUNCATE TABLE, which is guaranteed to empty the table regardless of how the replication is set up.
  • If you want to delete the rows that are in the table (on the master), use DELETE FROM.