High-Availability Groups
One of the central concepts used to construct a farm is the high-availability group (or just group when there is no risk of confusion) and is introduced by the high-availability extension. As mentioned in the previous post, the group concept does not really represent anything new but is rather a formalization of how we think and work with the structure of the farm. The key to supporting high-availability is to have redundancy in the system: if one component fail, another one should be ready to pick up the job of the failing component. Hardening the systems (by using hardware less prone to fail or hardware with built-in redundancy) can help reduce the chance of a component failing, but not completely eliminate it. Even a hardened system is susceptible to failure in a power outage or an earthquake. With this in mind, we introduced the group concept for managing pieces of data in our farm:each group consists of several machines that are responsible for managing the same piece of data.The concept of a group is an abstraction to model the basic concept that we're after, but does not really say anything about how it is implemented. This is intentional: it should be concrete enough to support all the operations we need, but abstract enough to not restrict how it is implemented. This is important because connectors (or any other "outside" observer) that work with groups should not have to be updated whenever new implementations are added. For example, it should not make a difference to a connector if the group is implemented using a traditional Master-Slave setup, a MySQL Cluster, or using replicated storage such as DRBD.
Server properties in groups
There are a few key properties that we assume for groups:- A server belong to (at most) one group.
- At any time, each server in the group have a designated a status.
- At any time, each server has a mode indicating if it accepts reads, writes, both, or neither.
- Each server also has a weight, which is the relative power of the server and is used to balance the load.
In Figure 1 you can see an example of how servers could change status, but note that at this time, we do not track all states. For example, we are considering how to handle the provisioning of new servers in flexible and extensible way, but more about that in a separate post.
Server Mode. The mode of the server gives information on whether it can be read or written and provide information for the connector on how it should send queries. For now, we only have three modes: Offline, Read-only, and Read-Write. Offline servers cannot be read from or written to, and usually does not accept connections. Read-only servers can only be read from and write transactions should not be sent to these. Read-Write servers are usually primaries of the group. They can accept writes and will propagate them correctly to other servers in the group.
Server Weight. The weight of a server is used to balance the load between servers. The weight represent the relative power of the server. When balancing the load between servers, the connector will figure out what servers are eligible for accepting a transaction and then pick one of the servers in such a way that the distribution over time will be proportional to the weight of the server.
Transaction properties
As mentioned before, one of the goals is to support sharding in the presence of transactions and to make that work correctly, it is necessary to declare up-front what the transaction will contain. Not everything, but the key elements of the transaction: what tables it will access, what sharding key is used, and if it is a read-only or read-write transaction. The first two properties are only necessary if you are working with a sharded system, so we skip those for now; the last one, however, is important for handling load-balancing in the connector.When executing transactions using a Fabric-aware connector, you provide the information about the transaction using transaction properties. There are several properties available, but we will focus on the ones related to group handling:group
and type
. The group
property is used to provide the name of the group you want to connect to (you can have several), and the type
property is used to tell if this is a read-only or read-write transaction. In the future, we might add more properties such as priority to indicate that this is an urgent transaction and a prompt reply is needed. For example, the following code is using a Fabric-aware connector to promote an employee.
from mysql.connector.fabric import ( TYPE_READWRITE, ) def promote_employee(conn, emp_no): stmts = [ ("SELECT salary INTO @salary FROM salaries" " WHERE emp_no = %s AND to_date = DATE('9999-01-01')"), ("UPDATE titles SET to_date = CURRENT_DATE()" " WHERE emp_no = %s and to_date = DATE('9999-01-01')"), ("UPDATE salaries SET to_date = CURRENT_DATE()" " WHERE emp_no = %s and to_date = DATE('9999-01-01')"), ("INSERT INTO titles VALUES" " (%s, 'Master of the Universe', CURRENT_DATE(), DATE('9999-01-01'))"), ("INSERT INTO salaries VALUES" " (%s, 10 * @salary, CURRENT_DATE(), DATE('9999-01-01'))"), ] # Use the group for the ACME company conn.set_property('group', 'ACME') conn.set_property('type', TYPE_READWRITE) conn.start_transaction() cur = conn.cursor() for stmt in stmts: print "Executing:", stmt % (emp_no,) cur.execute(stmt, (emp_no,)) conn.commit()On line 20 and 21 you see how the properties of the transaction is set. In this case, we declare the group that we will access (for example, a fictional company "ACME") and also the type of the transaction. After that, a transaction is started as normal and executed. The Fabric-aware connector will pick the right server to send the transaction to and you will get the result back in the normal fashion.
type
is not yet implemented in Connector/Python, some work remains to make it support load-balancing fully.Picking a server
But are these server and transaction properties sufficient for a connector to make a decision on what to do with a transaction? Let's take a look and see how the server can be selected.A server can be chosen by first selecting a set of candidates and then picking one of the candidates based on the weight of the server. Picking the candidates are done by matching the transaction properties and the server properties to find all server that are eligible for accepting the transaction. When a list of candidates are available you can, for example, pick one at random based on the weight of the servers. You can see an example Python code below that illustrates how this could be done. The first function find_candidates computes the set of candidates from the set of all servers SERVERS, while the second function pick_server pick one of the servers at random based on the weight of the server.
def find_candidates(props): candidates = [] for srv in SERVERS: if props.group == srv.group and (props.mode & srv.mode): candidates.append(srv) return candidates def pick_server(servers): random_weight = random() * sum(srv.weight for srv in servers) sum_weight = 0.0 for idx, srv in enumerate(servers): sum_weight += srv.weight if sum_weight > random_weight: return servers[idx] return servers[-1] # Last server in list # Example code for picking a server based on transaction properties pick_server(find_candidates(trans.props))
Implementation of groups
The reason to why we introduced the group concept in this manner is to be able to vary the implementation of a group, so the question is then, does it work? To see if it works, it is good to consider some sample implementations of high-availability groups and see if they can be described in this manner, so let's do that. Note that the only version that is currently implemented is the primary-secondary approach: the other ones are just food for thought (at this point).
In this approach, the primary would be in read-write mode, and the secondaries could either be offline or in read mode. Secondaries cannot accept writes since that might cause a split-brain situation, but they can either be in read-only mode or offline. Not loading the servers with read-only transactions can make it easier for the secondaries to be up to date with the primary, but this depends on the general load on the system. Scale-out slaves added would then, of course, be pure read-only servers, and they cannot be promoted to be masters because they do not have the binary log enabled. However, if the primary master fails, they still need to fail-over to the new primary.
If (when?) the primary master fails, MySQL Fabric will detect the failure and start executing a procedure to promote one of the secondary master to be primary instead of the one that failed. MySQL Fabric have to do this because the servers do not know how to handle the fail-over themselves, and in addition it is necessary to inform the connectors about the new topology. In this procedure, the scale-out servers have to be moved over to the new primary as well.
In this setup, the primary would then be a read-write server, while the standby server would be in offline mode. Scale-out servers would be in read-only mode, in this case attached to the primary.
The two solutions above employ different fail-over procedures that are executed by the Fabric node when it notices the failure. In contrast with the solutions above, MySQL Cluster is self-governing and does not require any fail-over handling implemented in the Fabric node.
Summary and considerations for the future
For the examples above, the properties we have outlined is definitely sufficient, but there might be other cases where more information is needed.One property that is missing in the current implementation is a way to select a server based on the proximity to the connector. For example, it could be possible to put the primaries and secondaries in a group in different data centers to ensure that it can handle a catastrophic failure. This, however, opens two issues:
- There will be a set of read servers in each data center that should be connected to the primary or secondary in the same data center.
- When the connector picks one of the candidates, it should prefer to use those in the same data center.
Related Links
- MySQL Forum Fabric, Sharding, HA, Utilities
- Presentation from MySQL Connect: MySQL Sharding: Tools and Best Practices for Horizontal Scaling
- Tips to Build a Fault-Tolerant Database Application
- Writing a Fault-tolerant Application using MySQL Fabric
- Fabric Sharding &emdash; Horizontal Scaling of MySQL
- MySQL Fabric Sharding - Migrating From an Unsharded to a Sharded Setup