Some of you may be familiar with the bestseller in business literature The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, by Harvard professor Clayton Christensen. In his book, Prof. Christensen compared disk drives to fruit flies. The comparison relates to the rapid changes that disrupted the disk drive industry for decades. That disruption is compared to the rapid changes that take place to fruit flies that live only for few hours, and for this reason researchers can study and analyse their behaviour.
In the software business, you can replace disk drives with databases. Obviously, databases live longer than fruit flies, but it is an industry that sometimes shows schizophrenic changes. Whilst it is true that relational databases have being dominated the scene for decades, readers with grey hair and reading glasses have certainly experienced tens, if not hundreds of innovative changes in databases since the 70′s. Hierarchical, relational, object oriented, object relational, framework dependent, multi-dimensional and many other types of databases have been introduced to satisfy the need for faster, wider, more sophisticated data management. The 21st century, with the advent of web and mobile applications, has been so far the playground for even more innovative products, most of them under the umbrella of two overused terms, Big Data and NoSQL. Make no mistake, there is even more to come: the Internet of Things will sparkle a new wave of innovation in the industry.
How does this long premise apply to MySQL and MariaDB? The truth is, MySQL has been stagnating for almost 6 years. In 6 years, we have been witnesses to the explosion of a huge amount of data that has to be stored, retrieved and managed. We have also seen a significant change in the type of data: accounting, retail and financial information were predominant in the 80’s and 90’s whilst today it is all about multimedia content – videos, photos, audio clips, but also documents, books, executable programs, text messages, logs and tracing information – today’s “typical” data.
When I say that MySQL has been stagnating, I do not mean that MySQL has not been improved. Versions 5.5 and 5.6 are substantial milestones in MySQL’s history, but Oracle engineers have been focusing on fixing old-time issues, mostly performance related. In a way, some of the companies in the MySQL ecosystem have been led to follow the same path. These companies have worked on improving and stabilising their products, mainly due to the limits in the investments in disruptive technologies that they can afford. A good example is Tokutek: Tim Callaghan and Co. have done a tremendous job and their storage engine has reached a good and strong maturity, but the prize for the most innovative product in the Tokutek offices today goes to TokuMX, which is related to MongoDB, not MySQL.
5.5 and 5.6 are what professor Christensen calls sustaining technologies. Sustaining technologies, generally speaking, improve the performance of a product – “performance” not only in database terms, but in a more generic term, applicable to any product. What we are seeking here is disruptive technologies, and they are difficult to find. Disruptive technologies look at a problem with a new perspective, with a new way to fix issues, they introduce new and more innovative features. At the beginning, they may provide worse performances, but in the long term they simply provide “a better solution.”
The proof to this point – i.e. that 5.5 and 5.6 are sustaining technologies – is the recent announcement from Facebook, Google, Twitter and LinkedIn about WebScaleSQL. WebScaleSQL is today an even better MySQL, with more tests, better code, and better optimisation. In WebScaleSQL there are some interesting features that are the seed for more disruptive technology, such as the read-ahead mechanism and more.
Disruptive technologies in the MySQL ecosystem
Are there disruptive technologies in the MySQL ecosystem? Indeed there are. MySQL 5.6 was announced as a GA on 5th of February 2013. That is a sustaining technology. The following day, the MariaDB Foundation announced the second alpha version of MariaDB 10. Version 10.0.1 was a very incomplete and disruptive technology. It took almost 14 months to create something innovative and to incorporate components that other companies have been trying to develop, sometimes for many years.
Why do I call MariaDB 10 a disruptive technology? Here are the most important points, in case you have missed some:
- For the first time, we have removed (or at least significantly reduced) the slave replication lag. Parallel replication is one of the most important and long awaited features which opens the doors to a more extensive use of master/slave replication for read scalability. See Kristian Nielsen’s blog on the subject.
- We have introduced multi-source replication and more sophisticated replication topologies. We can now consolidate data coming from multiple masters into a slave node. It means, for example, that we can manage to connect multiple departments and data centers together. This is a significant contribution provided by Lixun Peng and Taobao.
- We are sensibly reducing the downtime for online databases. The combination of MariaDB’s Global Transaction ID (which is fundamentally different from MySQL 5.6 GTID) and Galera Replication from Codership takes the availability of MySQL databases into a brand new territory, at a lower cost and with a significant reduction in the complexity of the infrastructure. Please note that MariaDB 10 Galera Cluster is not GA yet.
- We have a significantly improved built-in sharding solution for MariaDB. 6 years ago, a young man in a Spiderman suit presented for the first time a new storage engine, called Spider. A few days ago, that same man, Kentoku Shiba, released Spider 3.2. The engine is now included in MariaDB 10 and is an extremely promising technology that has been substantially improved, and now includes also high availability, in-shard scalability and distributed transactions. Spider 3.2 is not GA yet, but previous versions are already used in production in medium size sites around the globe and we expect a significant boost in its usage in the future.
- We have reorganised the way we integrate MariaDB/MySQL with other data sources. On one side, we have improved the way storage engines can interact with the core of MariaDB: for example, tables can be created or altered using engine-specific attributes that simplify the use of new features in the engine. One example of new engines developed by MariaDB is Cassandra, which allows the use of MariaDB and the MySQL clients as an interface to read and write data from/to a Cassandra cluster. We have also introduced a brand new storage engine called Connect. The Connect engine, as the name says, connects external sources, such as external DB tables (also from other DBMS), files, folders, logs, etc., in reading and writing. Users can now use applications built on MySQL and MariaDB to directly access heterogenous environments, without moving data back and forth among databases. They can also create joins and view results that are a combination of data coming from multiple sources. This is the very first version of Connect, so you should use it carefully.
From the innovator’s to the DBA’s dilemma – innovation vs. compatibility?
- MySQL 5.6 has Unicode support in the mysql command line utility
- At the moment, the memcached protocol is not available in MariaDB 10
- In MariaDB 10 the default storage engine for temporary tables cannot be specified
- In MySQL 5.6 it is possible to exchange partitions between tables
- Binlog variables and settings are different
- The block encryption mode for block-mode AES algorithms are handled by a parameter in Mysql 5.6
- MySQL 5.6 can specify where to write the core file if the server crashes
- MariaDB 10 provides a full set of parameters for the Aria engine
- MariaDB 10 provides more parameters to handle the connection pool
- MariaDB 10 provides more parameters to change the behaviour of the optimizer
- MariaDB 10 offers XtraDB parameters that are not available in MySQL 5.6
- MariaDB 10 identifies the transactional state of a session
- MariaDB 10 provides dynamic and virtual columns functions
Time for conclusions