## Friday, December 21, 2012

### Galera features beyond just HA

Galera from codership has been getting a lot of attention recently. Galera provides a nice High Availability solution for MySQL where Galera provides synchronous replication with conflict detection using the classic InnoDB Storage Engine. No more playing about with special storage engines of DRBD failover, just continue to use InnoDB and add Galera as the secret sauce for High Availability.

Some of the neat features of Galera are, but are not limited to, multi-master replication, a lightweight implementation of replication and zero failover times due to the multi.master ability. This is not a complete HA solution though, just a component of it, we still need to add some monitoring and failover mechanisms, but as Galera is multi-master this is greatly simplified and can in many cases be handled by the driver or the application with little overhead.

Now, the replication in Galera is synchronous, so that should slow things down a bit, right? Well, yes, but on the other hand Galera can use multiple threads to apply data on the slave, so that should compensate for that somewhat. And how does it compare to MySQL Semi-synchronous replication, which on paper should be that much different?

So I was curious about the multi-threaded apply on the slave that Galera supports? Could this be the multi-thread apply that MySQL has been waiting for all this time? (No, the schema parallel implementation in MySQL 5.6 doesn't count in my mind). So I set out to try this, and this is my thinking:
• The parallel nature of this should be best exposed when you have many small transaction, so each INSERT is a single row, autocommit transaction.
• For the sake of the test, remove as much InnoDB overhead as possible and run on Ramdisk (tmpfs)
• The schema should be simple
• Simple INSERTs are to be tested, nothing else
• Multi-master operation, but no conflicts.
This is admittedly a simple testcase, but it should tell us something. The schema looks like this:
CREATE TABLE tab1 (
c1 int(11) NOT NULL,
c2 char(100) DEFAULT NULL,
PRIMARY KEY (c1)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And the data to be inserted is
• column c1 - Unique sequential integer.
• column c2 - A random string of characters of 5 to 100 characters length.
• 1.000.000 rows are inserted using 400 threads (200 on each MySQL server).
The hardware I am using for this test is my usual homebrew 8 core AMD box with 16G RAM, nothing exciting but useful.
InnoDB was standard configured here, nothing special, and Galera was using 16 apply threads on the slave, which is probably excessive for this use case. Both MySQL and Galera was using two MySQL servers on the same box.
• MySQL with semi-synchronous replication ached some 4.830 INSERTs per second.
• Galera achieved some 12.987 INSERTs per second, nearly 3 times the performance!
This test wasn't something scientific, but then most applications aren't terribly scientific either. To me, it seems like Galera is the Replication system MySQL should have had from the beginning! This rocks! And this also proves to me what I was thinking from the start, that Galera has more to give than a plain HA solution!

/Karlsson

## Monday, November 26, 2012

### In my mind: Why the ORDBMS idea failed

Some 15 years ago, the idea of an ORDBMS (Object-Relational Database Management System) was red hot, and I was very close to the flaming hot center of that. I worked for Informix at the time, and Informix bought Illustra which was the hottest and coolest of the databases if it's time, hey it was an ORDBMS.

This was not a bad idea per se, and I got entangled with it and was really enthusiastic about the idea and I spent a lot of time evangelizing this technology. For Informix, this was as much market positioning and a technical change, Informix went from being the cheap redneck cousin to become the Gordon Gekko of databases. Before this, Oracle was the Big Market Leader, Sybase was the technology leader and Informix was the price leader (no, I'm not talking technical realities, there was a lot of good technical stuff to all of these, this is about how the world at large perceived these guys). But Illustra and another Informix project, XPS (aimed at the data warehouse market) was going to take Informix to places it had never been before. Oh, the Billboard wars, the day when Informix went past Sybase, those were fun days.

From a financial POV, Informix lost it, we already know that (read "The Real Story of Informix Software and Phil White" by "Steve W. Martin" ISBN: 978-0-09721822-2-5), but that's not, in my mind, the whole story, and I think that even though I think there are many good prspects for an ORDBMS system, it's not really as generic as I figured it back then (OK, I was wrong, I admit it, it does happen).

From a technical standpoint what went wrong was (this is my take on it, by the way) that the cool ORDBMS features shoehorned into an aging Informix RDBMS design ended up being largely the worst of both worlds. That has been fixed, to an extent, in more recent Informix releases, but not it's too late :-(

And from a conceptual view, this is also what I think is wrong with the whole ORDBMS thinking. I know and love the traditional RDBMS model, with a fixed number of columns and a variable number of rows, even if this is a simple model, it works real well for data. It makes plain data easy to visualize and understand, and this is also a well researched and understood model for data. As for OO, then this has been thoroughly researched, but the implementations and functionalities differ a lot. Also, OO has a developer focues way of looking at data, for an application, and Objects is a natural way of looking at things and makes things easy, from an application POV. But representing data as an Object is a different thing. Not a bad or good way, but different. The Relational model also lends itself to building control structures for data as it, assuming that the RDBMS is used in some kind of normalized form, is representing data at a very low level, lower than what most applications or end-users view data. And Objects are a way of combining all this data into something that is more application centric.

So the ORDBMS systems turned non-OO enough to not attract the OO people, and at the same time the OO features were non-Relational enough to make the SQL-experts ignore them. (Like: "Why would I want a result set with a variable number of columns?"),

And before I close this: Yes, I know there are many ORDBMS applications out there, that works well and where the application utilize all the cool ORDBMS features. Also, in Oracle and in Particular Postgres and others, there are ORDBMS features that are developed. And inside Postgres, the ORDBMS features is a building block for more than one generic RDBMS feature. But for database people in general, ORDBMS is something we don't see much of.

/Karlsson

## Thursday, November 22, 2012

### Character sets, Collations, UTF-8 and all that

Yesterday at the first Swedish MySQL User Group real meeting here in Stockholm, I presented a talk on character sets, collations and stuff like that. If you read this blog, you know that I have written about this before, but the presentation I did yesterday was a fair bit more detailed. You can view the full presentation on slideshare:
One thing I talked a lot on was collations and how they affect matters, and this has more of an impact than you think, in particular when using UTF-8. You would think that using UTF-8 most character set problems are solved (at least when using 4-byte UTF-8), but no. Collations are still added to this, and there are many of them and the effect of choosing the wrong one can be real bad.

Let me take an example. You would think think that using a UNIQUE or PRIMARY KEY on a text-based column (using something like a VARCHAR or CHAR type) in a table would ensure that any two strings are unique, but that two strings values that are different may coexist in two different rows. Think again.

A collation defines how characters in a character set are sorted and compared. And most localized collations have some weird attributes to them. There are things that linguistics think are reasonable for a particular language, and that are hence present in the UNICODE standard, but it might not be widely accepted by the community at large. So back to my original example. Let's say we are in Sweden, then 4 (yes, four) different collations may be applicable:
• utf8 binary - This is a plain binary collation, comparisons are done on the binary value of the characters.
• utf8_unicode - This is a pretty reasonable collations based on some generic compromise in UNICODE on how things are sorted, and are not sorted across the globe. Sort of.
• utf8_general - This is a simplified, faster general variation compared to utf8_Unicode
• utf8_swedish - This is a collations that is specific to Sweden with some interesting Swedish specifics.
So lets's see how this work in practice. Lets try a table that looks like this:
CREATE TABLE utf8_table (
swedishname char(10) CHARACTER SET utf8

COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (swedishname)
) ENGINE=InnoDB;

What happens with this data:
INSERT INTO utf8_table VALUES('A');
INSERT INTO utf8_table VALUES('Ä');
In Sweden, these two are unique (the second A has an umlaut). In the rest of the world, these two are the same, so the above will not work, a PRIMARY KEY error will happen on the second row, despite the characters being different! So we try this instead:
CREATE TABLE utf8_table (
swedishname char(10) CHARACTER SET utf8

COLLATE utf8_swedish_ci NOT NULL,
PRIMARY KEY (swedishname)
) ENGINE=InnoDB;

And with the same data:
INSERT INTO utf8_table VALUES('A');
INSERT INTO utf8_table VALUES('Ä');
And this works as it should work, both rows are inserted!

I will write another blog post on this soon, with some more examples eventually, but for now:
Cheers
/Karlsson
PS. I apologize if you have problems reading the above, it probably comes from the fact that there are embedded Swedish characters in the text :-(

## Tuesday, November 13, 2012

### This sucks! Well, maybe it does and maybe it doesn't...

Imagine that Microsoft and Apple got into a big fight for the market some 15 years ago and that Apple lost. Big time. Apple went down completely and there was nothing left. And as an IT expert, you were called in to look at what remained, what could be salvaged and what was just a waste of everyones time and money.

If you had seen the iPhone back then what would you have said? (I'm not so sure myself, chances are I would have been terribly negative). Note that there would have been no AppStore, no HTML5 sites, none of that neat stuff.

Or to make a different analogy: Was VHS better than BetaMax? Well, that depends on who you ask: The end consumer wanting to rent a movie or the techie looking at the specifications of the technology in question.

Just after the second world war, in a Germany in shambles, the allies went in and had that look at Germany, and with them they brought some smart dudes, to look at what was useful, what was not and what was rubbish. Reginald Rootes, who together with his brother Billy ran the Rootes Group, one of the big 5 producers of cars i Britain at the time, came along to, among other places, Wolfsburg to have a look at the VW plant. Despite being adviced that the VW was a viable product and seeing it himself, Reggie wasn't interested. Now, some 60+ years later, all the remaining Rootes brands and factories are long gone (the last one, producing Peugeots, closed in 2007). And VW is fighting with Toyota for the title of the world largest car maker.

All in all, stuff that might have serious issues, might be just because of development issues, and you need to look further down the road to see the potential. And don't make the mistake of thinking that the good or bad implementation of an idea says much about the real potential of that idea.

Take Virtualization. Running a database in a virtualized environment was a big no-no just a few years ago. Now things have developed, performance is much better and many of us can use a virtualized environment for many, if not most, of our database needs. Be it Oracle, MySQL or Postgres or whatever.

Go back 20 years and ask yourself how you would have reacted if someone told you that in 20 years, many large enterprises would have large parts of their infrastructure run on a operating system developed by a Finnish student in his spare time in an outsourced environment run by an Internet bookstore company? Nah, don't think so.

So what is the next big thing then? I try to spend some time on it, and when I get to test or try something, I really try to separate the implementation of the idea, be it a new operating system, a new type of access method or whatever, from the actual implementation. The latter says less about the former than you think.

Also, technology isn't everything. Far from it. The best technology doesn't always win. And as for the new technologies you look at, the usefulness and applicability of those isn't always what you think. Did the web turn out to do what we were expecting? What did you expect to be able to do with a cellphone some 15 years ago, besides making phone calls and sending text messages? I believe there is a synergy between the potential of a technology and the applications for it that is the drive forward. And don't be so fast to click that "This sucks" button.

/Karlsson
Sorry for this post not being that MySQL focused, but I think it is still applicable. Even if the implementation sucks.

## Tuesday, October 23, 2012

### MyQuery 3.5.0 available

I just made MyQuery 3.5.0 available for download from sourceforge. If you do not know about this program, this is my Windows based MySQL Query editor and database monitor program. And yes, it is Windows only.

It was a long time since I updated MyQuery, but I have been been at it for a long time, but slowly as I haven't had the bandwidth, and I have collected a few things I wanted in this release, but of all those things, many has to wait for now, as decided to add a major feature which did require a fair bit of coding and testing: autocomplete. When editing SQL statements, autocomplete for names of tables, columns etc is really useful, but it is not as easy as it sounds. What makes this a bit complicated are things like the default database (which I need to figure out for anything that is typed), table aliases and separate name spaces (which is a good thing, but makes like in this instance difficult).

Table aliases are particularly nasty as MyQuery really doesn't parse the SQL but just uses the current names of all objects. So when I type table1. you might expect a lit of all columns in the table table1 to be listed, but it isn't as easy as that as table1 might be an alias for another table. The way I manage this right now is to list all columns in all tables when I see something which looks like a table name (table1 in this case), but there is no such table (and no such database for that matter), I list all columns in all tables in the current database. This might not be what you want, but it's the best I can do right now.

To be honest, I really think autocomplete is a terribly useful feature, in particular when dealing with databases with many tables or tables with long names (such as when working with the information_schema database).

Another thing I have added in this release is a more flexible way of dealing with database status variables, which can now be organized into groups, which are easy to manage and which are persisted.

What I want to add is stuff like MySQL 5.6 information_schema additions, MariaDB specific functions and engines, JSON load and unload and some other MySQL 5.6 stuff. In addition, I'd like to do some integration with my sqlstats plugin. But now I have at least finished MyQuery 3.5.0, and I'm quite happy with it. If you have used earlier versions of MyQuery, this version also fixes a bunch of minor memory leaks and at least one crashing bug (which happened in some instances in the connect dialog).

Happy SQLing!
/Karlsson

## Tuesday, October 9, 2012

### Too many or too big open MyISAM tables??

MySQL tries to be smart and to cram as much performance out of available hardware, and one thing that most MySQLers knows is that opening a table (ie. opening the frm file, and depending on the storage engine, any additional files related to the table) is expensive from a performance point of view. When we see the opened_tables status variable increasing, we tend to increase the table_open_cache setting and keep as many tables as possible open and avoid reopening them.

When it comes to MyISAM though, this has a nasty side-effect. When the server crashes and there are MyISAM tables open, these might well need to be REPAIRed. And if you have big MyISAM tables, this is an issue. Let's say that your application use several MyISAM tables, with the same content, and that you create new tables after a certain time, to keep the size of each individual table down? There are other reasons why you have this effect also, say a large MyISAM table that this rarely updated or read from, but suddenly it is, and then it is kept alone again? When a crash occurs, you might be stuck with several MyISAM tables that are open, but have not been accessed in a long time so might not need to be open, but still there is a risk that these will require a long and boring REPAIR.

There is a Feature request for this; 67142, but I was thinking that instead of having low level server code do this, this would be a good thing to implement using EVENTs. So let's give it a shot.

To being with, I need to figure out when a table was last touch. MySQL doesn't record when a table was last read from, but if we are OK with flushing tables that haven't been written to in a specific time, then the update_time column in the information_schema table does the trick. To figure out what tables to flush, I need to select table and database names from this table for all tables that haven't been modified within a certain time. Also, I must filter so I only get MyISAM tables, and make sure that I don't hit the MyISAM tables in the mysql database. An appropriate SELECT may look like this:
SELECT table_schema, table_name
FROM information_schema.tables
WHERE engine = 'MyISAM' AND table_schema != 'mysql'
AND UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(update_time) > <seconds>

That's not enough of course, all this gets me are the table and database names, we need to do something more than this. But let's start from the beginning. An EVENT will run an SQL statement, and in my world, there is mostly one statement it should run, which is a CALL to a stored procedure. And in this case, this procedure should use the SQL above to figure out what tables to flush, and then do the actual flushing. As the FLUSH command, when using inside a MySQL Routine, will not take any parameters, we have to run this as a PREPARED statement. All in all, when we end up with is something like this:
DELIMITER //
DROP PROCEDURE IF EXISTS myisamflush;
CREATE PROCEDURE myisamflush(p_sec INTEGER)
BEGIN
DECLARE v_no_data BOOL DEFAULT FALSE;
DECLARE v_database VARCHAR(64);
DECLARE v_table VARCHAR(64);
DECLARE v_dummy INTEGER;
DECLARE cur1 CURSOR FOR SELECT table_schema, table_name
FROM information_schema.tables
WHERE engine = 'MyISAM' AND table_schema != 'mysql'
AND UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(update_time) >

p_sec;

OPEN cur1;

-- Loop for all MyISAM-tables that haven't been updated

-- for p_sec seconds.
FETCH cur1 INTO v_database, v_table;
WHILE NOT v_no_data DO
-- Create the FLUSH statement.
SET @tmpstmt = CONCAT('FLUSH TABLES ', v_database, '.', v_table);

-- Prepare and execute the statement.
PREPARE stmt1 FROM @tmpstmt;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;

-- Get the next table / database.
FETCH cur1 INTO v_database, v_table;
END WHILE;
CLOSE cur1;

-- This is to overcome a bug that causes a warning,

-- even though the warning was caught.
SELECT COUNT(*) INTO v_dummy FROM mysql.user;
END;
//
DELIMITER ;

Note that noop SELECT at the end? In some versions of MySQL, when a handler is run, like in this case the CONTINUE HANDLER FOR NOT FOUND warnings will still be around here, which means the execution of the procedure will finish with a warning. To get rid of that, I issue an SQL statement that always runs.I'm not sure why this is happeniing, and for it might be that I have a mistake somewhere, but I do not think so.

So far so good, now all we need is an event, in this case I will run every 15 minutes and flush tables that hasn't been used in the last 15 minutes, but you can set this to anything:
DROP EVENT IF EXISTS myisamflush;
CREATE EVENT myisamflush
ON SCHEDULE EVERY 15 MINUTE
DO CALL myisamflush(900);

Was this enough? Maybe, but to get this working check if the event scheduler is running:
mysql> SHOW VARIABLES LIKE 'event%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| event_scheduler | OFF   |
+-----------------+-------+

In this case, it wasn't running so we should start it (and we should also modify the config file so that the event scheduler is running when the server is restarted, but that is a different story):
mysql> set global event_scheduler=1;

Before we finish up, note that everything above assumes that you are running with root privileges or similar.

Cheers
/Karlsson

## Monday, October 1, 2012

### Moving on and flying high...

As of today, Oct 1, yours truly is again working closer to the MySQL community. I have had 2 great years at Recorded Future, no doubt, and I have had a chance to work with some cool technologies and learn quite a few things I didn't know before. But not it's tim to high the sky, and I am from today working as Sales Engineer at SkySQL.

This is going to be fun, I will keep up my blogging, I hope to keep up my MongoDB investigations and above all I'll be involved with the SkySQL Cloud offerings!

That's all for now, but if you know me, that's not the end of it, far from it!

Cheers
/Karlsson

## Tuesday, September 11, 2012

### MySQL Configuration Wizard - Don't always trust it...

So, the MySQL Configuration Wizard for making MySQL Easier to use on Windows will just plainly aid in setting up a safe and sound MySQL Configuration. Right? Just run it and you are safe. OK? But thing do break, specifically harddrives and stuff, so to keep your database in shape you have a second set of disks in your Windows machine, for all that data that you do not want to loose in RAID configuration. Or you might even connect, say, your D:-drive to an expensive but safe SAN! Then you run the MySQL Configuration Wizard and tell it to place your InnoDB data files in that D: drive. Phew! Now you are OK! That RAID-set was expensive, but better safe than sorry! Now you can place all that important data into MySQL, and what's left on the C:-drive is just some programs, which can be replaced if necessary, but the data is safe!

NO! That's NOT how it works, not at all. There are two things wrong here:
• Just because you move the InnoDB files, which might well be the storage engine you use, doesn't mean that the other MySQL datafiles are there. Fact is, you CANNOT change where MySQL places the other datafiles, they get into C:\ProgramData\MySQL Server 5.5/data whatever you do! Why this is so I do not know, but if you want to change this location, you have 2 options:
• Edit the my.ini file generated by the Config Wizard. And change the datadir setting and move the files and directories from the C:\ProgramData\MySQL Server 5.5/data directory to the directory where you want the files on your D:drive.
•  Instruct Windows beforehand where you want your datafiles, which is real easy, just change the setting in the registry that determines where ProgramData is: HKLM\Software\Microsoft\Windows NT\CurrentVersion\ProfileList\ProgramData. Real easy and safe, right :-)
• If you are OK with the MySQL other datafiles, and just want the InnoDB files to be safe, you still have to fix one thing: The innodb_log_group_home_dir setting. This follows the datadir setting, which means that if you move the InnoDB datafiles to another drive using the Config Wizard, the transaction log files will not move with it! They remain in C:\ProgramData\MySQL Server 5.5/data! Ouch! See this bug:66769
/Karlsson

## Thursday, September 6, 2012

### MySQL Server installer on Windows

I don't what makes this so hard for MySQL Developers at Oracle. Look, I know you guys are trying hard, that's not it, and I also know that chasing SQL Server is a priority, and that is fine. But for that to work, the MySQL MSI Installer has to get better! It just has to. Like some basic issues like these:

• You can not use the MSI installer to install two MySQL servers on the same box. If the installer seems an old installation, it insists on upgrading it if you are installing a newer version, and if you are installing an older version, well, that will be refused.
• You cannot, using the MSI installer, install both 32 and 64-bit versions on the same box. If you try that, even if the versions are the same, the second install insists that what you have already installed is newer than the one you are installing, although the have the same version number, but one is 32 bit and one is 64 bit.
• In the Instance configurator when it comes to the data directory location, if you select "Installation path" you might think that the data directory will be placed where you are installing MySQL right now. Think again, this is not so. What this means is that if you install MySQL on the box, and MySQL hasn't been installed there before, then Installation path means just that, So let's assume that you choose to install MySQL in C:\foo, then the data directory will be C:\foo\data. Which is fine. If you then uninstall this version of MySQL and then install some other version of it, this time in, say, C:\bar, then you would image that the data directory would be C:\bar\data, but it will still be C:\foo\data. I'm not sure where this is stored, but it is somewhere and it's really annoying.
So, if a developer wants to run with the 32-bit version of the MySQL library (say because he / she is using a 32-bit driver for MySQL), then you cannot use the MSI. If someone would have given this an extra round of thinking, then the MySQL 64-bit server installation, would have included both the 32-bit and the 64-bit versions of the client. Right? I'll see if I have the bandwidth and energy to report these issues as bugs.

Will the MySQL Installer help then? That is a 201 Mb package that contains, according to the download page "All MySQL Products. For all Windows Platforms. In one package." Right, that should do the trick then? Nope, once you have made the choice to install MySQL Server, be it 32 or 64 bit, you can only install one of them.

Look, I'm trying hard not to be a grumpy old man here, but it is difficult (partly because I'm getting old I guess). I am aware that I can always download and unpack the ZIP archive (which is what I will do now), but please folks, this is on Windows and I'd rather stay away from that on that platform. And Oracle / MySQL took the effort to create not just one, but 2 installers, but none of them allow you to install 2 MySQL servers on the same machine, which is too bad.

And by the way, no just because I am installing MySQL and then uninstalling it, and then installing it again, doesn't mean I want to use the same data. It just doesn't.

And I am afraid that if you think MariaDB is any better, think again, that is NOT the case. Too bad.

/Karlsson

## Wednesday, August 29, 2012

### Revisiting libmysqld, the client / server overhead and all that. And an apology

I wrote about the performance gains with libmysqld a few days ago but I had just too many things in my head to do a proper comparison with the MySQL Cluster / Server protocol. Yes, libmysqld is faster, but not as much faster as I thought, and blogged about. What happened was that I had another thing to try which I had forgotten about, which was to test using the Client / Server protocol without the dreaded CLIENT_COMPRESS flag (see more on this here).

Without CLIENT_COMPRESS, I could see NDB performance improve by some 25 - 30 %. But with InnoDB, which I just tested, I achieved some 98 k row reads per second! Yikes, I should have tested that one before comparing with libmysqld (in which case I got 115 k rows read per second, which is still faster).

The good thing with all this a multitude of things:
• We know for sure that you should NOT use the CLIENT_COMPRESS flag. Just don't. At least not when you have many small operations going and the database is largely in RAM. I'll test this in some more detail later, to see if I can find some good cases where CLIENT_COMPRESS is a good fit, but in this case, it's not.
• When data is in memory, and you aren't using sharding, MongoDB really isn't that much faster, maybe some 10% compared to MySQL using InnoDB. But then you get transactions, joins and all sorts of goodies with MySQL.
• The MySQL Client / Server protocol is FAR from as sluggish as I suspected!
• The MySQL Parser and Optimizer is not that much of an overhead as I was lead to believe.
• Using MySQL with InnoDB in a simple table might be such a bad Key Value Store as you, but as always it depends on the milage.
Frankly, the result is pretty much a surprise to me, MongoDB isn't that much faster than MySQL AT ALL, at least not in the case when data is in RAM. And if you ask how the MEMORY engine performance, well, about the same as InnoDB, slightly faster, but no as much as to say anything conclusively.

What remains to test then? Well, I have Tarantool and HANDLER SOCKET to test. And possibly a few more things. Also, I want to test what happens when there are some biggers sized documents in store that will not fit in memory in either MongoDB or MySQL? BLOBs anyone?

/Karlsson
Apologizing for comparing apples to oranges. Heck I already KNEW that I had used the CLIENT_COMPRESS flag, so why did I reference those test before that was removed? I just forgot it I guess.

Justin Swanhart wrote a blog on Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels and I thought I'd give my view on this issue.

To begin with, from a technical standpoint, Justin is correct in his statements, that is not why I am writing this (and this is the reason I'm writing a separate blogpost instead of just commenting his), but there is more to it than that.

What Justin really writes about is how these isolation levels are implemented in a particular database, in this case InnoDB. For example the sentence "There are a few differences between READ-COMMITTED and REPEATABLE-READ, and they are all related to locking." makes this very clear, these isolation levels, as defined by ANSI SQL as well as how they are implemented by other databases, has nothing to do with locking. What I am trying to say that when you issue the statement SET TRANSACTION ISOLATION LEVEL READ COMMITTED what you are saying is not that "I want these kinds of locks places here and there when I issue any upcoming SQL statements", what you are saying is that "I want to use the READ COMMITTED isolation level, which means that the data returned has certain attributes as related to consistency". As a side-effect of those attributes with regard to the returned dataset and when you use the InnoDB Storage Engine, you get just those locks, but that is another thing. (Like when you turn the steering wheel in your car, what you are saying is NOT that some gears and bolts should twist and turn in this or the other way, what you say is that "please dear car turn right or I am going to run into that upcoming brisk wall", which of course in turns means that those gears and bolts and stuff gets into action, but that is not what you asked for, you asked for the car to turn, and which way it can. And again, I'm not saying that those gears and stuff ain't interesting, they are!). And none of this to imply that Justins post was incorrect or not useful, I just think, as I wrote above, that there is a not more to it.

So, having determined that, what ARE you saying to the database when you issue those commands. And the answer is this (I'm excluding READ UNCOMMITTED here):
• READ COMMITTED - Please give me only rows back that are committed!
• REPEATABLE READ - Please give me rows back that are committed, and if I read one of those again, in the same transaction, make sure those rows are still there and haven't been modified.
• SERIALIZABLE - Please give me rows back that are committed, and make sure that if the same SQL statement is executed again, I get the exact same rows, none being modified, deleted or added.
Now, read the above once again, and for READ COMMITTED, we only make sure that the rows are committed, we WHEN they should have been committed, only that they are when the row is read. So what does this mean then? Well, it means that there is a level of flexibility in the implementation here. Some implementer might say that the point in time to look at is at the start of the statement, another implementer might consider it the point when the row is read. And this is a BIG difference:
• In the first implementation, it means that when using READ COMMITTED, the read is a "consistent read", which means that the whole set of records looks like it once existed. Or in other words, if I for example sum the value of some column, that sum will be or will have been the sum of that column in the database at some point in time.
• In the second interpretation of READ COMMITTED, one may, for example, have rows added to the result set "after" the point when where the current row is read. Which means that this is an inconsistent read.
What is annoying is that there are actually different implementations out there. Let's see what ANSI SQL-99 has to say about what is allowed to happen when using READ COMMITTED:

P2 (‘‘Non-repeatable read’’): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies
or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may
receive the modified value or discover that the row has been deleted.

What makes things a bit easier is that most implementations, although not all by far, use the second interpretation (I have to admit I have only looked at SQL-99 here, and not in that much detail, so things might have changed before or after SQL-99).

There is one thing though that should be pointed out in Justins blog, which is what he writes is not MySQL specific, it's InnoDB specific! So different storage engines may behave differently and may place different locks.

Committed by yours truly
/Karlsson

## Monday, August 27, 2012

### Fast and furious!

A few days I wrote a bit on my first results of comparing MySQL with MongoDB as a Key-Value Store, something that has been going on for way to long, but I am not finished yet. Last time I used MySQL Embedded Library to bypass the MySQL Client Server protocol to see what the overhead was, and the result was that it is big (and again, note that the same networking was used with MongoDB and I was also using Unix Domain Sockets, as well as plain TCP/IP, so don't ask me to fix any network issues I might have). Using Embedded Server with InnoDB was actually faster than using MongoDB, some 3 times faster compared to using the client / server protocol.

That one out of the way, I now wanted to see what I could get if I used the storage engine that was fastest in Client / Server mode, MEMORY. That took a while to fix, as to have an Embedded Server application, like my test application here, use the MEMORY engine, I have to load the data into the MEMORY table somehow each time I run the application. No big deal but a slight update to my benchmarking application was needed, as well as some debugging as embedded server is pretty picky with you doing things the right way and in exactly the right order, and is much less forgiving than the MySQL Client library. Anyway, I now have it fixed, and the result. Fast. Real fast and furious: 172 k rows read per second! Compared to 110k rows read per second with MongoDB (but that is MongoDB in Client Server mode of course). Using the MySQL Client, the MEMORY engine achieved 43 k row reads per second, which means that libmysqld is 400% faster! How is that for a performance improvement.

Which is not to say that we all should start building libmysqld applications right now. But what I want to say is that if you want to improve the performance of MySQL, looking into the Client / Server protocol would be a good starting point, there is a lot of performance to get there. The results noted could be interpreted as at least 75% of the time that MySQL processes a query, excluding disk I/O (this is the MEMORY engine after all), is spent in the Client / Server protocol. And looking at it differently: A key value store such as MongoDB might not be as fast as we think, but MongoDB sure does have a more efficient C/S protocol!

/Karlsson

## Sunday, August 26, 2012

### So, what is the client/server protocol overhead then? Or: How MySQL can finally kick some MongoDB ****

If you have followed, or have tried to follow, my different attempts at getting Key-Value Store, in this case represented by MongoDB, from MySQL on a single machine with all data in RAM. I have not been very successful so far, but many smart people, way smarter than yours truly, has been giving me suggestions for things to try to get MySQL closer to the performance of MongoDB, where MongoDB did some 110 k row reads per second, whereas MySQL was at best reading some 43 k rows per second (using the HEAP / MEMORY storage engine) and 46 k row reads per second (using NDB and without CLIENT_COMPRESS). Note that not all combinations has been tested, so it would be reasonably safe to assume that using the HEAP / MEMORY storage engine and excluding the memory storage engine, would be even faster than the 43 k rows when using CLIENT_COMPRESS.

As I could see that the CPU load on mysqld was very high, and as everything is in memory and hence there is no disk I/O, then we were probably CPU bound here. So bypassing mysqld when using MySQL Cluster, directly accessing NDBAPI did get me much better performance, some 90 k row reads, but that is still behind MongoDB.

Among the suggestions of what the bottleneck in MySQL was, was Parsing and optimization. Also, the question of how big the client / server overhead was. I have already figured out that using the compress client protocol actually limited performance quite a bit here, which might not be that much of a surprise, but that it affected performance by as much as 25 - 30 % as I found was surprising.

Anyway, the easiest way to see how much the client / server overhead was, was to run with libmysqld, the MySQL Embedded Library. Using libmysqld required some changes in my benchmark program, as I wanted to make sure that data was in memory before starting the test. Having done that I then commenced to test with using InnoDB in libmysqld. And hey, now we see some very different results! 115 k row reads per second! Yes, faster than MongoDB!

The documentation for libmysqld claims that the embedded InnoDB engine is not reentrant, but I run multiple connections here (in multiple threads), and it seems to work just fine. But this might be a mistake.

If this is truly so, I'm not 100% sure about these results yet, I'd really like to see someone else testing something similar with "normal" mysqld vs. libmysqld, then we know a few things:
• The overhead of MySQL parsing and optimization is not even close to as high as much as one might think (yes, I was also thinking that all this CPU overhead might be due to SQL parsing etc).
• There is something in the MySQL Client / Server protocol that one should look at (note that I was testing with Unix Domain sockets, and also with the normal TCP/IP network, and sockets was faster but not much. So this is not due to some network issues on my machine, and also the same network was using for testing with MongoDB).
What I suspect is that the client / server protocol sends a bit too many messages and that this in turn causes a lot of context switches (I have not looked at this yet, but you bet I will, i.e. how many context switches do I get with MongoDB compared to MySQL).

Another conclusion is that there would be a lot to potentially gain from improving the client / server protocol here. And some of you out there might say that my testcase is a bit extreme, having all data in RAM and all that, but I am not so sure that is the case, if you are running in a cloud, getting a lot of RAM is not that hard, but getting real fast disk-I/O is, at least right now and at reasonable price, i.e. adding RAM is less expensive than adding more powerful I/O.

The next cool thing to try then: Using libmysqld with NDB, an option has has been around for a while now, but I have not seen it documented much, which is strange as libmysqld really should server NDB much better than many other storage engines (as the mysqld / libmysqld component, when using MySQL Cluster, is largely stateless).

/Karlsson
Once again playing with libmysqld, and still thinking this is a "forgotten" technology with some interesting aspects to it.

## Thursday, August 23, 2012

### MySQL Cluster performance up again, or CLIENT_COMPRESS considered harmful

I'm back again (previous post in ths series is here)., with some interesting finds related to some more testing of MySQL Cluster testing (yes, I have promissed to test more things than this, but I got so entangled with NDB that I just had to give it one more shot). Looking at my benchmarking code, I realized I used the CLIENT_COMPRESS flag when I connected to MySQL. This flag was there in the code where I connected to MySQL (using mysql_real-connect(), this is a C program after all) and it was probably just pasted in from some other code somewhere. Not that this flag isnät known to me or anything, but I had not tested the compressed or non-compress MySQL client protocols much. I guess I at one time had assumed that CLIENT_COMPRESS at best helps when sending large packets between the client and the MySQL server, and above all, that for many small packets, it wasn't terribly helpful, but also not terribly harmful. Turns out I was wrong (yepp, that DOES happen).

Googling for CLIENT_COMPRESS, I didn't find much more than this either, to be honest, if you have many small packets, it's not going to be very helpful, but not very harmful either.

In this case though, it was the MySQL daemon maxing out the CPU that was the issue, so maybe I should to to run without CLIENT_COMPRESS. As stated above, Googling for this did not, at least not initially, provide much help, but as the CPU was maxed out, and compression consumes CPU power a lot, maybe we should avoid compression.

The result? 25 - 30 % more performance, just like that! MySQL Cluster with NDB is now managing some 46 k requests per second, as compared to the previous 35 k! Not a bad improvement. All in all, using MySQL Cluster using the MySQL API, as opposed to NDB, you probably want to avoid using CLIENT_COMPRESS and you are likely to make many small SQL statements with limited sizes of the result sets, and all data in memory (well, not all if you use STORAGE DISK, but that has issues of it's own), chances are that your performance bottleneck of the database side of things, will be the CPU.

But don't get too excited, as I am now going to revisit this with InnoDB also! (Yes, that is mean)

/Karlsson

### MySQL Cluster performance revisited - tcmalloc kicks some ***

My now long-running series of posts on getting max performance from a very simple MySQL Cluster setup (see details here) is continuing here. As a short intro to what I am trying out here, is to see if I can validate the claim that MySQL Cluster / NDB would be a good replacement for a Key Value Store (KVS) such as MongoDB. I test this in a rather simple single-server environment, but this is for a reason, just not a coincidence: The reason is that RAM is getting inexpensive and servers that can take a lot of RAM are also getting less expensive, which in turns means that the saying that many small servers are more cost-effective that few big ones, might not be as valid as it used to be. Also, I wanted to test what MySQL Cluster can do for me, from a KVS centric view. In short, I run on one server (16 Gb RAM, 8 cores) with all data in memory, what do I get.

I can tell you I have been trying real hard to get MySQL Cluster to work better here. Yes, SQL parsing in MySQL is to blame for a lot of why MySQL Cluster is so much slower than MongoDB. But why is MySQL Cluster slower than a fully cached (hot) InnoDB then? Admittedly, I was running the InnoDB tests with MySQL 5.6, so there you have one more thing for me to try (i.e. running InnoDB tests with the mysqld packaged with MySQL Cluster).

A lot of different MySQL Configurations was tried, as was a few other things. Lookiing at what was going on in the box revealed that most of the CPU was consumed by mysqld, so I decided to try to speed that up a bit, as the CPUs were maxed out by mysqld using up some 85 % or so. Using tcmalloc seemed like one obvious fix here, and I got some 10% better performance using that, MySQL Cluster using the SQL interface now performance about 35 k requests per second, which still is below InnoDB (with MySQL 5.6) and not even close to MongoDB, but still an improvement.

But as I said, there is still more stuff to test and I am not giving up yet. But I have given up on distributing some mysqld load to some other server. I got that working, but as the load on the test is distributed evenly across the mysqld servers, the slowest one will hold things up (that is how thing work right now, but I can and will revisit that one). Tarantool, some more NDBAPI tests and then some remains to be tested!

Take care
/Karlsson

### Free but not Gratis: A call for Open Source for everyone!

The term Open Source is not as old as you may think, and the concept actually predates the name. Initially the keyword was Free not Open, but Free is here in the sense of Freedom not in the sense "without cost", and this conflict in the English term "Free" was one of the big reasons that Free really wasn't a good word here. Which all in all doesn't mean that Free isn't still used to describe the Open Source movement, like in FSF (Free Software Foundation).

And Free as in Freedom, not Free as in "without cost", is an important distinction. What the deal was, in my interpretation at least but there are many different views here, was that the software should be available for use by anyone and for any purpose as long as they followed the rules. And the rules was there for a number of purposes, two important ones being:
• To ensure that the software in question remained free and open.
• To encourage the development of more free and open software.
Now, these are points I fully agree with, for a number of reasons. But I also think that that, and this is where I am not so politically correct anymore (if that is something I ever was), that there are markets for:
• Making a living on Open Source software. And no one said this shouldn't be possible, but here I am thinking beyond master programmers like Linus, RMS and Ken Thompson, I'm thinking about your everyday coder, building applications for a customer, providing support or packaging software. Yes, there are many good developers out there without the ability to be master programmers like Linus or Alan Cox, but which are still needed and has competences beyond those guys, like Domain Knowledge.
• Proprietary software. Yes, I am aware that this is something you probably didn't want to hear, but I do think that proprietary software will still be around and is still needed in many places. Let me tell you why I think so,and the main reason is that there are areas in Software that just isn't viable as Open Source. Like small specialized areas where it's not possible for someone to create good Open Source Software. This is not to say that specialized software cannot be Open Source in meaning that the source is open, but not Open Source in the meaning that there is a community development effort, with people around the globe finding and fixing issues adding features.
Yes, I do believe that a developer who has significant domain knowledge of something, say car sales, can bring significant contributions to an Open Source project, even though that developer might not be more than an average programmer. And if the project in question is an Open Source system for, say, car rental, then his input might well be more important than any input from, say, Linus Torvalds.

Another thing to realize is that the higher up the ladder of software you go, from operating systems and device drivers, up through databases and middleware, past generic office applications and to domain specific applications adopted for a very specific purpose, the more non-IT specific knowledge is needed. To create an Operating System, like Linux, you need to be really good at, well this is a surprise I guess, but I do believe that being an Ace on Operating Systems is a good thing. And Operating Systems is something you can learn at University and the theories and operations of an OS are generic IT stuff. To a large extent, the same goes for databases. But for a system for, say, car rental, that is something you do not learn at University. And here are some issues with the Open Source movement, if you ask me:
• The domain of Open Source is IT, i.e. computers, software etc, and the users of it are also presumed to be IT folks (which doesn't mean that Open Source isn't used by everywhere, it's just that Open Source it isn't a conscious choice by your average user at Joe Bloggs Car and Trunck Rental).
• Open Source doesn't in and of itself bring a lot of advantages to those guys at Joe Bloggs Car and Truck Rental. Which is not to say that Open Source doesn't push prices down or isn't the most cost effective solution, but the Open nature of, say, Linux, means little beyond "lower price" to them,
That Linux is not as strong on the Desktop as on the Server side I think is largely based on this. It's not that the end users don't appreciate Linux and Openness, it's just that it's not easy to see that advantages in the case of an Operating System or even in the case of, say, Open Office, beyond that fact that the cost is less. But really, did want Freedom just to be cheap? There has to be more to it?

I think there is a market for much more domain specific Open Source software. Really, I do! But I think we (us IT folks) must start to be much more open towards things like end user needs and application specifics. In many cases, the way something domain specific works has been determined by some ancient IT-system built ages ago, so that the effective business logic is defined by how this old IT system works, not the other way around (flight booking anyone). I also firmly beleive that there is a big market for something like that out there, this is an area where the non-as-smart-as-RMS developers (like myself) can contribute, where we have some domain specific knowledge that can be turned into a useful and powerful application. But the Open Source movement needs to find a way to accommodate that, to understand that just because someone is not a good developer for writing SCSI device drivers, might not mean that he or she doesn't posses valuable knowledge, and to support those domain specific effort. Yes, to write an OS or a Database System, you can surely do that based on knowledge you picked up at University, but that is not the case for many, if not most, end user applications. Hey, maybe we can learn something interesting domain specific while contributing something IT specific to a project.

/Karlsson
Feeling Free and Open

## Tuesday, August 14, 2012

### MySQL Cluster / NDB as a key value store - Better results this time

Following up on my series of "how much performance do I get when I access RAM only", the most recent post being this, and as you can see there, MySQL didn't really perform that well. And MySQL Cluster in particular wasn't very fast, fact is, it was slower than InnoDB and almost any other storage engine, with the exception of MyISAM then.

Well, I finally had some bandwidth to finish testing my benchmark with NDBAPI. This took some time, as i also had some other MySQL Cluster tests to do (multiple mysqld, cramming everything into one ndbmtd etc), but finally I had it done. And this time, things worked better. Using MySQL NDBAPI I managed to get about 90.000 single row reads per second using 100 threads in a simple 105.000.00 table with 3 columns and a BIGINT PRIMARY KEY, compared to about 32k single row reads when using MySQL Cluster using the SQL interface. MySQL with InnoDB got some 39k rows per second.

Still, MySQL Cluster using NDBAPI still doesn't beat MongoDB (which got about 110k single rows reads per second), but still close enough to be useful. I still have some tuning to do here, so I might get better results in a few days. From this, one might jump to the conclusion that it's mysqld that is the bottleneck here, and that is probably correct, but then you would expect two mysqlds to help, but no. On the other hand, the explanation to this might be that I am maxing out the CPU (all my cores are at 100%). To test this, I'll put a mysqld daemon on another box. Before I go on, the reason I do this is to see if I can find the bottleneck, using a second machine would be unfair to, in particular, MongoDB.

My first idea was to run MySQL Cluster on my desktop Windows box, as this is one of the more powerful boxes I have, and I imagined that the MySQL Cluster for Windows had improved since I last tried it and complained, but that turned out not to be the case. The msi installer for MySQL Cluster doesn't work at all it seems (Bug #66386 and Bug #66387). To be honest, it seems like this installer really has never worked, if I was Oracle (which I am not), I'd just take it away or fix it, as it stands, it is just an annoyance and a waste of time. (And don't tell me "No one use MySQL Cluster on Windows". Yes, that might well be true, but that doesn't mean it shouldn't be fixed if Windows is not supposed to be a supported platform. If you drive that reasoning further, then you might also say that there is no market for MySQL Cluster 7.2.8 as no one uses it).

This means that I have two options:
- Persist in getting MySQL Cluster 7.2.7 to Work on Windows.
- Start up my old Linux desktop box.

I'd like to get Cluster running on Windows again, and write a blog post on how to get it to work (and possibly even create an installer myself). On the other hand, although the Linux box isn't that how, it should be warm enough (it's a 4-Core ancient AMD box, loosely based on scrapped Mobo from a box at the old MySQL Uppsala office). So which one to do? We'll see, but possibly both, my 4-core Linux box should be running anyway (I was shut down when I moved my office a few weeks back) and MySQL Cluster really should work on my Windows box, if for no other reason so to say that "I run MySQL Cluster on Windows" so you cannot use that as a reason not to fix the obvious MySQL Cluster Installer issues.

And then I have Handlersocket, Tarantool and some other things to try. If you haven't realized it by now, I am enjoying this. And I haven't really seem many tests like this: "I have one box with lots of memory so I can fit my whole database in RAM. Now, which one is fastest?".

/Karlsson

## Thursday, August 9, 2012

### MySQL as a Key Value Store (KVS) - Why, some background and some more tests

You might have seen my attempts at testing MySQL as a Key Value Store (KVS) (here and here), i.e. used like MongoDB, and I have had some comments to those. Above all, many wanted to know the details of my test environment. To be honest, this started off as a very simple thing, but it turned out a bigger than I thought, so I guess its reasonable to give some background. I apologize for not giving you this background before, but as I said, my initial intention was to just run bog standard MongoDB versus MySQL and see what happened.

## Background

Here at Recorded Future, when I joined about 2 years ago, we used a lot of MySQL, and it was causing us serious problems. We had a pretty standard school-book style relational schema, and performance was really hurting us bad. Because of the amount and type of data we manage, because we run on Amazon and because we are still developing and changing, this schema didn't fit our needs at all.
Read - scalability was not a good option either, as we are massaging so much data that we had big issues with just having the Slaves keep up with the master. Schema changes, as we developed our application, was also difficult.

What we ended up doing was a redesign of a lot of things, ending up with something that could use a KVS, as these are easier to scale, using sharding. This had some negative implication too, like it was no longer possible to go into MySQL and issue a query for data just like that. And no, using the mongo shell for this, as not only were we now using MongoDB, we had data spread over several MongoDB stores. On the good side though, things was not much more performant and easier to scale, write scalability with MongoDB is good and easy to use and configure. Which is not to say that MongoDB solves all problems in the world.

So, having implemented MongoDB and having gotten used to it and even starting to like it a fair bit, I also started seeing some other things. Like, Why can't MySQL be used for this? I mean MySQL is a bit more mature than MongoDB and should be better at handling, say disk-IO. On the other hand, MongoDB is newer. One thing I already knew was that MongoDB really needs to be in-memory to a large extent, when it hits the disk, performance goes downhill fast, so here MySQL is better. But when we DO have enough RAM (and as we use several MongoDB datastores for different purposes, and we are on Amazon, where you can get respectable amounts of RAM, this is usually the case) what happens? If I let MySQL use RAM it can get so I don't get any disk-IO for reads at all, will it be as fast as MongoDB? Or maybe even faster.
And among the MySQL Storage Engines, the one mostly targeted as a Key Value Store is NDB? How's that going to work.

## Test environment

So, to test all this, but to make sure I wasn't affected by Amazons ups-and-downs, I pulled one of our MongoDB datastores, some 105.000.000 rows of real world data. Initially, my plan was to use the full MongoDB datastore, but I had to exclude some fields as these were VARCHAR and putting them on Disk with NDB was causing just too much disk-I/O, as NDB stores disk data as fixed length (so a UTF-8 VARCHAR(256) field occupies some 768 bytes). I ended up with a table schema like this:
CREATE TABLE keyvalue (
id bigint(20) NOT NULL,
value1 int(11) DEFAULT NULL,
value2 double DEFAULT NULL,
PRIMARY KEY (id)
)
In the case of MongoDB, the id column was used as _id, which is MongoDBs way of saying "PRIMARY KEY".

Having done all that, I need some tools for my test bench:
• Homebrew server with an AMD FX-8120 8 core CPU and 16 Gb RAM on an ASUS M5A88-V Mobo. This is a neat server MoBo really, supporting SATA 6, USB 3.0 and other niceties but I use a Lite-On LNE100TX network card instead of the built-in Realtek crap that many MoBos, inparticular AMD ones, seems to be equipped with.
• A bunch of disks, the disks where database resides is a LVM Stripe of two Seagate Barracuda 7200.11 500 Gb disks using xfs. But this is pretty insignificant as there should be no disk-I/O at all, or at least extremely limited, when testing.
• Ubuntu 10.10 (I have some reasons for this a bit ancient version. And yes, I have patched that 16 Gb memory issue in Linux with some AMD MoBos).
• MySQL 5.6.5 64-bit (for all MySQL tests except NDB)
• MySQL Cluster 7.2.6 64-bit (for MySQL NDB tests)
• MongoDB 2.0.5 64-bit
Having all that ready I now had a slightly smaller table taking up some 4.2 Gb when exported as JSON. So this data has been imported into MongoDB and MySQL with a number of different storage engines, and I was ready to do some testing. I use a test program that kicks of SQL queries with over a specific T of threads, each SQL (with Mongo this is not SQL of course, but the test program is the same) containing a variable part, which is the ID for the row to look up, The program starts by reading a file with a number of random IDs, the IDs are spread over the threads and then each thread looks up the rows N times.

In this case I picked some numbers for my test, so the number of threads I use is 100, not an unreasonable number, but big enough to make any concurrency issues visible and the number of rounds is 10. So to be clear, the 1.000.000 IDs are distributed over the threads, so each thread processes 10.000 IDs each, running then 10 times, which means I do some 10.000.000 random reads in parallel. Not the most exciting benchmark on the planet, I know, but it was all the same interesting and also to some extent a real-world test.

Before each test run, I did a few initial runs to ensure that the cached were warmed up. Also, I did look at vmstat and stuff to ensure no disk-I/O was taking place.

## Current state of tests

I have so far tested MongoDB, MySQL Cluster and MySQL InnoDB and since last I have also tested the MySQL MEMORY engine and MySQL with MyISAM. So far the results are like this:
• MongoDB - 110.000 rows read per second
• MySQL Cluster - 32.000 rows read per second
• MySQL with InnoDB - 39.000 rows read per second
• MySQL with MEMORY / HEAP - 43.000 rows read per second
• MySQL with MyISAM - 28.000 rows read per second
The last two ones were very disappointing to me, and trust me, I want them to be good. I really wanted MySQL to beat MongoDB, and I really wanted to see how the much lower overhead of MyISAM would make it run circles around InnoDB, but no.

As MyISAM only caches the keys itself, not the actual data, this was an issue, but still, the MyISAM data file should be well cached by Linux for me, and rightly so, there was no disk-I/O. Could context switching when going to Linux for data be an issue? I decided to try something fun. I created a new unique index on the table, covering all columns, which meant that if I used this covering index, all data would be fulfilled by the index (but the index in itself would be bigger of course). This didn't help at all, quite the opposite. Playing with named keycaches and stuff didn't help at all, nor did using mmap. Which is not to say that these guys do not help to improve performance, all this means is that when all data is already im memory, they don't seem to help. I did figure out one thing that did work though, which was using tcmalloc. Enabling tcmalloc when starting the mysql server, which is easy these days as MySQL comes with the tcmalloc library included and mysqld_safe has an command-line option for it. Real easy, this really should be the default.

With this in place, I managed to get:
• MySQL with MyISAM - 37.000 rows read per second
I'll try using tcmalloc with the other engines also, and see what happens.

## In conclusion so far

Don't take me wrong here,  I'm not out to prove that MySQL or MongoDB sucks. Nor am I out to do some exact benchmarking. What I am out to do is to see if the claim that an RDBMS can perform as well as a standard Key Value Store in an environment optimized for a Key Value Store (i.e. all data in memory). And such an environment isn't that specialized actually. databases around a few 10's of gigabytes are really common and can easily fit on memory. And although I do prefer the more strict datamodel of an RDBMS to the pretty schema-free design of a NoSQL alternative such as MongoDB (MongoDB really has some reliance on schema, but in essence, it's schema-free), I have to admit that sometimes you need more performance, and distributing the load over multiple servers is a lot easier when you don't have to worry about global transactions or global consistency, and frankly, in many cases you can live without that, to an extent,

What remains to be tried is MySQL with NDB using multiple mysqld servers, InnoDB using tcmalloc and tarantool and the MySQL HANDLER interface (actually, I have already tested the latter, but I am not sure I got the tests right).

/Karlsson
Who apologizes for the omission of the test setup before publishing the first results

## Tuesday, August 7, 2012

### Sales Wolfs vs Techies? Is that really an issue?

Are you to be aware of the Sales Wolf really? Is this something to be scared of? Sean Hull writes a bit about this in a recent post but I think he got a few things wrong, although I also agree with a lot of stuff in there. Above all, I think Sean oversimplifies what Sales are up to.

Before I go on though, let me tell you that I have worked at Oracle, not only as part of the MySQL -> Sun -> Oracle transition but before that, actually my first job outside Telco and my first job for a US company was for Oracle. And I learnt so much stuff there, it was an incredible time. And not only was Oracle good at taking care of me and letting me go to classes and conferences and stuff, they actually helped me find those things and pointed me there. I went from a C programmer with some interest in databases to a decent support engineer and then to performance consultant and eventually a Pre-Sales Engineer, in a few hectic years. But I was younger then.

So, back to the real issue, is Sales just about getting the customer locked in and getting as much money as fast as possible from the unsuspecting management person, in the process bypassing the far smarter DBAs? Nope, that's not how I looked at my career as a Sales Engineer, I do not think I ever ended up selling something, together my my sales guys then, that I cannot stand up for today. Sales is so much more, just think about it, as there is competition out there.

Does your Oracle Sales Team really want to get unfriendly with the techies at the customer site? Yes, you often sell to someone in management, but just as often to a techie or to someone in management who is a techie. I there are very few Sales Guys out there who are both successful and is selling stuff the customer doesn't want or need.

Is the techie Ops and DBA guys really the experts? They know what you have alright, that is mostly correct, but they do not always know what the options are and what the advantages and disadvantages are. And again, this is natural: If you are a seasoned Oracle DBA, having taken all the classes in Oracle and knowing your TNSNAMES.ORA and INIT.ORA and other bits of Oracle inside and out, why would you want MySQL? MySQL is something you don't know? Why would you want something like that? I'm not saying that Oracle DBSs are bigots, quite the contrary, all I am saying is that we ALL have reasons beyond pure facts to react to things. It's not like a DBA or a sysadmin is necessarily more unbiased than someone in management, just because they know the existing technology better.

For me as a Sales Engineer, I think selling is about trust and about positioning. Can you trust me when I say Oracle might be a better choice for your database needs? Or that the answer might be MySQL? Or PostgreSQL? I'm really a curious person, and I really love to know what people are doing with technology, I am mostly not that interested in technology for the sake of technology itself. I'm not an Open Source taliban either, despite having worked at MYSQL for 6+ years (before that for close to 20 years at commercial closed source database companies, mostly). I like Open Source stuff mostly because it's a brilliant means of developing software. And yes, I have said no numerous times, and I have my sales guys with me (mostly)!

Look at a car salesguy. Does he want to sell you the most expensive car he has, then he is probably not a good salesguy in the long run, as you, as a customer, will eventually figure out that this car really didn't fit you that well, and next time around, you want to buy a car from someone else.

Having said all this, I agree with many other points in the post in question. Yes, make a list of things, if you can, but remember that the company trying to sell to you might have something you can benefit from, but which requires some effort or change on your side to change. Do not just just look at what you know and have experience in, but on what you can learn. If the technology isn't used in your industry, is that a good thing or a bad thing? If it's not used by your competitors, are there advantages to the technology that your can use to your advantage versus your competitors?

/Karlsson

### Some corrections and additions to my simple KVS tests.

This is the first follow-up to my post on a simple test of KVS alternatives. To recap, I tested a simple single table schema in MySQL using the NDB and InnoDB storage engines. To have a Key-Value store to compare with, I did the same test in MongoDB. All tests were done of the same system, an 8-core AMD Linux box with 16 Gb RAM. The tests consisted of reading 1.000.000 rows, out of the total 105.000.000 in the table, distributed over 100 threads 10 times, a total of 10.000.000 rows read then. The test program I use makes sure that the same random ID's of the table are reused each time and the same are used for all servers.

Now, firstly, after some checking I realized that I had not fully cached the InnoDB engine, so it was doing a certain, small, amount of disk I/O still. I fixed this and the number now looks like this:
• MongoDB - 110.000 rows read per second.
• InnoDB - 39.000 rows read per second.
• NDB - 32.000 rows read per second.
So now InnoDB is faster than NDB. But NDB has some tricks up it's sleeve, like running with multiple mysqld servers, and I have today finished my test-program to support just this. Also, I have had some useful hints of NDB engine tuning, so I'll try that one too, but testing NDB takes more time as restarting NDB is much, much slower than MongoDB or MySQL with InnoDB.

But I have another test result today! I realized that although I no big fan of the query cache, I should be able to use that here too. And I don't need that a big a cache, as I am only reading some 1.000.000 rows. So off I went, turned on and tuned in, without dropping out, the query cache and ran my test again. I soon realized one thing: Warming up the query cache was going to be real slow. But warming up MongoDB is just as slow, MongoDB really is targeted for as much as possible in RAM, the disk I/O the do is hardly optimized (they use mmap, so what can you expect). Once the query cache was nice and warm, I ran my benchmark (this was using the mysqld with InnoDB, which matters less as all reads are now done in the query cache). And what I got was about 34.000 rows read per second. This is not a 100% fair comparison of course, as the query cache doesn't need to cache that much (only 1.000.000 queries), but really, it should have been faster than caching in InnoDB, I was a bit disappointed with this and I'll see if I can find the bottleneck somewhere in the code.

But I'm not finished yet. The MEMORY engine and NDB with a few more mysqld servers remains to be tested, as well as Tarantool, the MySQL HANDLER interface and NDBAPI eventually. Not necessarily in that order.

And before closing, if you are wondering, the test program is written in plain C, no C++ or Java or anything or fancy stuff like that. Also, the test program uses proper multi-threading, I do not have multiple processes running around here,

/Karlsson

## Monday, August 6, 2012

### Finally, the simple comparison of KVSs and some rants about performance

Faster and faster, that is what we want from our databases. And the biggest roadblock for the MySQL Dragster is the speed of the hard disk, right? No, I'm not going to debate that, that is just the case. And how do you fix that then? Well, if what is limiting your dragster is a roadblock, then either you drive round the roadblock of you make it disappear faster, or in computer terms:
• Avoid using the disk by instead putting as much data as you can in RAM
• Use faster disks (like SSDs)
Now, to be honest, this analogy isn't that good, as the performance limiting factor of the disk so huge, and contrary to popular belief, it's not getting any better! But we have SSDs you say? Yes, that makes the hard drive faster, but the CPU and RAM is getting faster even faster! But let's assume that we have enough memory so we do not need the disk? Will just about everything go at the speed of light? Nope, what happens here is that stuff that wasn't even noticeable in terms of limiting performance when the disk was there, as disk-I/O is such a huge bottleneck, suddenly show it's dirty face!

Like this:  As the CPU cores are getting faster, but not that much faster, anymore, due to physical limitations, we have more and more of these CPU cores instead. And suddenly, any limitation in getting those CPUs to work well together suddenly turns into a major headache! Like a mutex shared by all threads. Like the Query Cache mutex in MySQL for example!

With this in mind I am now ready to commence with the benchmarks that I wrote about in may. Yes, it took some time to get the data loaded into MySQL and in the process I managed to create a new Open Source project for exporting and importing JSON data from and to MySQL With that in place I now have something that is sort-of real world data. I had to remove a few columns (or Attributes of you are a JSON taliban) to make MySQL Cluster work with this data, this because MySQL Cluster stores VARCHAR data as fixed length data on disk which means a few things:
• There is a lot more stuff to write to disk.
• UTF-8 means that there is 3 times more data to write!
All this means that MySQL Cluster may well work in well as an alternative to some key-valuestore setups, but not all, and it depends what "value" means here. If "value" means "document" or "object", then we need to use a VARCHAR or something like that for the value, which will be real limiting in the case of MySQL Cluster. And I'll try to be real nice to MySQL Cluster here, so I end up with a really simple schema:
CREATE TABLE keyvalue (
id bigint(20) NOT NULL,
value1 int(11) DEFAULT NULL,
value2 double DEFAULT NULL,
PRIMARY KEY (id)
)
And in this table I load some 105.000.000 rows. Should be simple with MySQL Cluster, right? Except that MySQL Cluster will only accommodate 512 Mb of hash data per partition (this is a really, really silly limitation! Is 512 Mb the MySQL Cluster version of "640 K should be enough for anyone?"). But this fixable, and with 4 partitions, this works just fine.

As a side note, without disk data, MySQL Cluster feels MUCH more stable. The occasional data loss and other weirdness that I experienced what I tried to load the table with VARCHAR data are now completely gone. So not only does disk data limit you on in terms of data types (VARCHAR), it also seems to need some more development. And no, I wasn't in the mood of reproducing the errors I got.

Anyway, on my server here at home, having an 8-Core AMD CPU and 16 Gb RAM, waiting to run this benchmark. I'm testing MySQL with InnoDB, MySQL Cluster and MongoDB. The test program is the same in all cases, I read 1.000.000 rows 10 times distributed over 100 threads. To be fair to all, I made sure that what data I had would fit in memory and that it WAS in memory, so I did a couple of warmup runs first. In the case of NDB I used the MySQL API, not NDBAPI (I will try this eventually). The results I got was this:
• MongoDB - 110.000 reads per second
• MySQL with InnoDB - 30.000 reads per second
• MySQL with NDB - 32.000 reads per second
In the case of NDB, I had these setting, beyond the standard stuff:
ndb_use_exact_count=0
ndb_force_send=0
And that makes one helluva difference, I can tell you that! Why this isn't by default I don not really know, I assume there is a good reason for it, but someone has to tell me what it is. Now, I did load data also, and the results there were similar, but as I was loading JSON and that is pretty native to MongoDB, that was expected, MongoDB was about 2.5 times faster than NDB / InnoDB, which were on par with each other. I will not give any exact numbers here as loading data depends on so much more in terms of tuning.

This is not the end of this story though, if we assume, as MySQL was way behind MongoDB in terms of performance, but InnoDB and NDB were on par with eachother, one might at least want to try the theory that it is the MySQL part that is slowing things down, and this can be tested by running MySQL / NDB with more than one mysqld, and that is the next thing to try. Then we have the HANDLER interface and proper NDBAPI also, the latter should be a lot faster, really. And yes, I really should see what MyISAM can do for me. And MariaDB.

And before I end, please MySQL Cluster developers, remove that silly 512 Mb HASH index size per partition limitation (see the manual here). As RAM get's less and less expensive, and if we, for example, wanted to avoid Network and Disk I/O as much as possible, and instead use RAM and CPU (like in an Amazon environment), then this turns into a rather sreious limitation. My tests above was on hard iron though.

/Karlsson
Who will do some more benchmarking tomorrow.

## Thursday, July 26, 2012

### MySQL JSON import tool version 1.4 is out

This is just a quick notice that there is a new version of the JSON import tool available now. The main thing in it is that it fixes an iddues with threadinig that caused a threaded load (and this is the default) to hang of there was an error or the import was interrupted. Also, there are some minor fixes and also a runtime status printing, available by sending a SIGUSR1 signal to the process, feature is available.

Cheers
/Karlsson

## Tuesday, July 24, 2012

### What is a MySQL GA? RC? Beta?

I still do not know how this works for MySQL, but whatever the thiinking is, I do not like it. There is a flow of new features coming into the GA releases which is my mind usually isn't a good thing (i.e. a new feature means a new source of bugs). This is particularily bad in the case of MySQL due to the risk of regression bugs and lack of modularization.

And I know, what you are going to say, if we didn't get those new features then that old GA version would not be useful, if for no other reason so due to Moore's Law, the software has to keep up with what is going on in the world of Hardware. But MySQL has never been good at that, for example, can someone explain why the sample MySQL config file for big MySQl server, my-huge.cnf, starts with this comment:
# Example MySQL config file for very large systems.
#
# This is for a large system with memory of 1G-2G where the system runs mainly
# MySQL.
When was a "very large" system one with "1G-2G" of memory? Was it before or after the civil war? What usefulness does something like this have? (And this is from MySQL 5.6, which isnät even GA yet).

MySQL has never been very good at keeping up with itself either, the same my-huge.cnf contains a long comment section on how to set up replication by mocking with the config file, something that has been adviced against for ages and which isn't even supported in MySQL 5.6 (or MySQL 5.5 for that matter).

What MySQL needs, in my mind, is more version, less emphasis on adding necessary features for new hardware or for fixing bugs, and more GA versions. 5.6 has been real slow in appearing, which is annoying. There is some good stuff in there, but it's very late.

If you ask me, GA means "This versioon will only get bugfixes from now on". And if this means that important features that should be in production has a long time getting there, then the issue is not to add new features to GA releases released during the Carter administration, but to get some more GA releases out there.

And besides that: Can we modernixe MySQL a bit? Remove old stuff from the distribution that doesn't work or that noone really should use? Add stuff that is up-to-date, like sample config files for Amazon EC2? Right? And possibly tools integrated with EC2? Or iOS? Or Ubuntu?

/Karlsson
And yes, I am aware that I myself is a firm beliver in awk, vi and autotools, which are awfully old technologies. So maybe my views shouldn't be taken too seriously, and I just want to stir up a debate?

### New MySQL JSON import and export tools

Yo!

My idea for a JSON based SQL schema tool that I wrote about some time ago has been on the backburner as I wanted to finish two other projects:
• A better JSON import tool to substitute the old mysqljsonload.
• Fix up mysqljsonexport to align it with mysqljsonimport.

These two projects are now done and the program are ready for download from Sourceforge, mysqljsonimport from here and mysqljsonexport from here. There is some serious changes to these program here, for example, the multi-threading in mysqljsonimport is now "dynamic", so that as many threads as needed (up to a user specified limit) are created to support faster loads and in case of mysqlexport, mainly this now has it's options aligned with mysqlimport. This means that now you can export and import a whole database in JSON with these tools with a single command.

As usual with my tools, there are A LOT of options and a lot of flexibility, you can specify fixed values for certain MySQL columns to import or export, and both tools also support an automatic internal increment feature. Also, MySQL columns imported may now contain MySQL functions or constants, basically anything that is allowed as a value in a MySQL INSERT can be used.

Hope you enjoy these tools, there is still a few things missing, like better documentation and more tests and stuff, and also better parallel export for mysqljsonexport, but that is for another version of these tools.

As you might remember, these tools started off as a tool to do some InnoDB, NDB and MongoDB benchmarking. I never got as far with that as I wanted to, but now I have the tools to complete those benchmarks. Also I will soon start working on a tool for that database-schema-in-JSON tool, so stay tuned to this channel folks, and don't touch that dial!

Cheers
/Karlsson

## Tuesday, July 17, 2012

### Database SCHEMA as JSON - A second OSS opportunity!

I have already written about how documentation is an opportunity for Open Source Software to show off, and area where we have the potential to be better than the Open Source folks. Another such area is interoperability, and for myself, interoperability between databases is what springs to mind.

### Database schema interoperability

We all know what a MySQL schema looks like: I bunch of CREATE DATABASE, CREATE TABLE CREATE this-and-that, all together in a file. Usually, when someone asks for a schema, this is what they get, and this is independent on if the user is using PostgreSQL, MySQL, Oracle or whatever database.

In some way, most SQL based databases follow some variation of the SQL standard, but as they all, without exception, also have extensions (which they are allowed to, even if the follow the SQL standard) and they all also interpret things differently. At least in the case of MySQL, there is also the schema "translation" effect to take into account. A column defined as BOOL doesn't actually appear as BOOL when you look at the create schema, neither in the actual type used, nor in the actual semantics (more values that TRUE (1) and FALSE (0) may be represented), and this is not the only example.

All in all, although databases use SQL to define the schema, the schema isn't portable, at all. At least in the case of MySQL, we have to add the different features of the different versions, meaning that an exported schema is full of those /*!40110 WEIRDO_SYNTAX */ comments.

In conclusion, the schema isn't really portable at all, and to an extent not even between different versions of the same database server software. The big issue is that the features supported are part of the actual syntax of the statements that create the schema. But there should be a better way of doing it, and us Open Source folks could well be the ones to do it, i.e. create a way to represent a schema that makes the schema portable across versions and across different databases and in the case of MySQL also across storage engines, where application. And what better format to do this than JSON?

### The details

Thinking a bit about how to represent a schema in JSON, I came of with a few ideas:
• A bunch of top level objects, like GRANTS and DATABASES and also a top level object that defines the source of the schema.
• Some other common objects, like TABLEs, VIEWs and  ROUTINEs.
• A means to provide versioned schema information.
• A means to provide both proprietary (like STORAGE ENGINE) and standardized (NOT NULL) attributes, and also some kind of suggested mapping (like in the case of MySQL, a column type might be both the proprietary MySQL type, like MEDIUMINT, and the best standardized fit for this type, INTEGER).
The beauty of this is that JSON is "tagged", so that any attribute that is meaningless when importing the schema, like STORAGE ENGINE when importing a schema from MySQL into PostgreSQL, can be silently ignored, or the import utility could be smart enough to know how to map this proprietary attribute to PostgreSQL.

For this to work, we need to decide on some basics, like what to contain in the schema and how to name it. Basically I think that the best common denominator would be SQL-92, so that anything that is SQL-92 and is not ambiguous, should look the some in the exported JSON schema, independent of what database we are exporting from.

What is more needed is some tool to export and import schemas to these different database system, but in most cases, I don't think that would be terribly complex. The neat way of this way of doing stuff is that we have a common vehicle, or JSON schema, that allows to to easily migrate a schema, once we have the schema export / import tool ready. On top of this, we should be able to migrate data, but again I think JSON would be brilliant for this (yes, I am a big fan of JSON, but you probably know that by now).

Some coding also is necessary for this, but that is what we Open Source dudes and dudettes are good at, right? If someone else than myself thinks this is a good idea, let me know, and we could do some work. I would really like some PostgreSQL, Oracle etc. folks involved, and not only MySQLers! I truly believe that interoperability is the next big thing, the thing that will make Cloud computing real valuable!

/Karlsson