Monday, June 18, 2007

BLOB locators + BLOB streaming + Replication = Yeah!

On the MySQL Conference & Expo 2007, I had the chance of meeting up with Paul (the author of PBXT) and Mikael. We briefly touched the topic of the BLOB Streaming Protocol that Paul is working on, which I find really neat. On the way back home, I traveled with Anders Karlsson (one of MySQL:s Sales Engineers), who is responsible for the BLOB Locator worklog and he described the concepts from his viewpoint.

Since I work with replication, these things got me thinking on what the impact is for replication and how it affects usability, efficiency, and scale-out. Being a RESTful guy, I started thinking about URIs both when Paul described the BLOB Streaming Protocol and when Anders starting describing the BLOB Locators. Apparently, I wasn't the only one.

Combining BLOB Locators with the BLOB Streaming Protocol has a significant impact on the scalability and performance of replication, and I'm going to show how by giving a typical use of replication: scaling out reads from a installation by replicating from a single master to several slaves.

Now, when a client connects to get a blob from the database, the server delivers a result set containing one or more blob locators. Since we are using URI:s and the HTTP protocol, the blobs can be served by a normal web server, and the client can fetch the data in the Blobs using HTTP and build the real result set. The existence of blob locators is completely transparent to the client, who sees no difference from the previous implementation.

Now, what does this give us that make this setup so scalable?

  • Instead of storing the actual blob data, we store a reference to the data (in the form of an URI). When working with the blob and copying it to another table, we will actually just copy the reference, which is a very quick operation compared to the size of most blobs. The use of the BLOB locator is entirely transparent to any operations on the blob: reading is not affected, and changing the blob can be accomplished using a copy-on-write semantics (which of course makes the operation slower).

  • Since we have a unique reference to a blob it is possible to implement caching mechanisms to cache results of, e.g., fulltext searches in the blobs.

  • The use of an URI makes the blob locator server-agnostic, which means that we can reliably replicate the URI instead of the blob and still expect any client that connects to the slave server to be able to fetch the blob using HTTP. There is no translation necessary when doing the replication, and the URI can be treated as just a string. This means that a scale-out strategy is trivial to implement. This is just a generalization of the recommended practice to store the blobs as files on a server, and save the file name in the tables instead: we just make it transparent to the user and simplify the deployment.

  • By using an URI as reference, we can put the blob data on a separate server, which can be dedicated to delivering blob data to requesters. Since everything is going via this server, it is very likely that "hot" data is available immediately, and since we are using an URI, delivery over the Internet can rely on Web Caches to avoid re-sending data that is already cached somewhere.

    We do not lose the ability to count the number of deliveries of the data, since we can always count the number of blob locators that we have been delivered instead of the number of BLOBs that have been delivered.

  • The HTTP protocol has support to both PUT and GET to read and write data to the server.

  • We unload a significant amount of "dumb" job from the server, that of assembling result sets consisting of blob data and other data, and therefore allow the server to perform more of the "intelligent" job of doing database searches.

  • The design is incredibly flexible since it is possible to, for example, allowing the blob server(s) to be placed anywhere, even in different towns, and can still keep the main operating site in one location.


Paul McCullagh said...

Hi Mats,

You've got it! This is basically a reference model of the scalable part of what we aim to build at

As you say, the key is using a URI as a "BLOB Locator". This gives us incredible flexibility.

I'll be posting the first version of the BLOB Streaming engine shortly. The first version allows you to download BLOBs using HTTP, as I explained in my blog.

I would certainly appreciate your help when it comes to replicating the BLOBs as we get further down the road :)

Best regards,


Mats Kindahl said...

Hi Paul!

No problems, I'll be happy to try it out. As I said, I think this is a really neat idea.

I think it is important to make sure that the URI is not tied to the table name or column name in any manner, since there might be several version of the BLOB "in the air" at the same time, and in addition, a change of table name or column name should not affect the URI.

I also think it is important to stick to the standard HTTP 1.1 protocol, but it seems that you're on that path already.