Monday, September 26, 2011

Python Interface to MySQL

There has been a lot of discussions lately about various non-SQL languages that provide access to databases without having to resort to using SQL. I wondered how difficult it would be to implement such an interface, so as an experiment, I implemented a simple interface in Python that similar to the document-oriented interfaces available elsewhere. The interface generate SQL queries to query the database, but does not require any knowlegdge of SQL to use. The syntax is inspired by JQuery, but since JQuery works with documents, the semantics is slightly different.

A simple example would look like this:

from native_db import *
server = Server(host='127.0.0.1')
server.test.t1.insert({'more': 3, 'magic': 'just a test', 'count': 0})
server.test.t1.insert({'more': 3, 'magic': 'just another test', 'count': 0})
server.test.t1.insert({'more': 4, 'magic': 'quadrant', 'count': 0})
server.test.t1.insert({'more': 5, 'magic': 'even more magic', 'count': 0})
for row in server.test.t1.find({'more': 3}):
  print "The magic is:", row['magic']
server.test.t1.update({'more': 3}, {'count': 'count+1'})
for row in server.test.t1.find({'more': 3}, ['magic', 'count']):
  print "The magic is:", row['magic'], "and the count is", row['count']
server.test.t1.delete({'more': 5})
The first line define a server to communicate with, which is simply done by creating a Server object with the necessary parameters. The constructor accepts the normal parameters for Connector/Python (which is what I'm using internally), but the user defaults to whatever getpass.getuser() returns, and the host default to 127.0.0.1, even though I've provided it here.

After that, the necessary methods are overridden so that server.database.table will refer to the table with name table in database with name database on the given server. One possibility would be to just skip the database and go directly on the table (using some default database name), but since this is just an experiment, I did this instead. After that, there are various methods defined to support searching, inserting, deleting, and updating.

Since this is intended to be a simple interface, autocommit is on. Each of the functions generate a single SQL statement, so they will be executed atomically if you're using InnoDB.

table.insert(row)
This function will insert the contents of the dictionary into the table. using the keys of the dictionary as column names. If the table does not exist, it will be created with a "best effort" guess of what types to use for the columns.
table.delete(condition)
This function will remove all rows in the table that matches the supplied dictionary. Currently, only equality mapping is supported, but see below for how it could be extended.
table.find(condition, fields="*")
This will search the table and return an iterable to the rows that match condition. If fields is supplied (as a list of field names), only those fields are returned.
table.update(condition, update)
This will search for rows matching condition and update each matching row according to the update dictionary. The values of the dictionary is used on the right side of the assignments of the UPDATE statement, so expressions can be given here as strings.

That's all folks!

The code is available at http://mats.kindahl.net/python/native_db.py if you're interested in trying it out. The code is very basic, and there's potential for a lot of extensions. If there's interest, I could probably create a repository somewhere.

Note that this is not a replacement for an ORM library. The intention is not to allow storing arbitrary objects in the database: the intention is to be able to query the database using a Python interface without resorting to using SQL.

I'm just playing around and testing some things out, and I'm not really sure if there is any interest in anything like this, so what do you think? Personally, I have no problems with using SQL, but since I'm working with MySQL on a daily basis, I'm strongly biased on the subject. For simple jobs, this is probably easier to work with than a "real" SQL interface, but it cannot handle as complex queries as SQL can (at least not without extensions).

There is a number of open issues for the implementation (this is just a small list of obvious ones):

Only equality searching supported
Searching can only be done with equality matches, but it is trivial to extend to support more complex comparisons. To allow more complex conditions, the condition supplied to find, delete, and update can actually be a string, in which case it is used "raw".

Conditions could be extended to support something like {'more': '>3'}, or a more object-oriented approach would be to support something similar to {'more': operator.gt(3)}.

No support for indexes
There's no support for indexes yet, but that can easily be added. The complication is what kind of indexes should be generated.

For example, right now rows are identified by their content, but if we want unique rows to be handled as a set? Imagine the following (not supported) query where we insert :

server.test.t1.insert(content with some more=3).find({'more': eq(3)})
In this case, we have to fetch the row identifiers for the inserted rows to be able to manipulate exactly those rows and none other. Not sure how to do this right now, but auto-inventing a row-identifier would mean that tables lacking it cannot be handled naturally.

Creating and dropping tables
The support for creation of tables is to create tables automatically if they do not exist. A simple heuristic is used to figure out the table definition, but this has obvious flaws if later inserts have more fields than the first one.

To support extending the table, one would have to generate an ALTER TABLE statement to "fix" the table.

There is no support for dropping tables... or databases.