On October 9th, 2007 Dennis Watson (not verified) says:
In defense of MySQL (I work with 20+ MySQL instances everyday) you might be over simplifying a bit here in your comparison.
* Storage Engines. MyISAM stinks for concurrent operations. I have had observable increases in throughput simply by converting everything from MyISAM to InnoDB. Sorry, no real numbers here, but server load levels dropped significantly since they were not waiting around for the DB.
That being said, MyISAM is great for WORM applications like data warehousing. If concurrency is not an issue then MyIASM can give you a big win in terms of manageability (MyISAM tables are just files on disk) and raw query speed. Even better is the ARCHIVE storage engine which stores data in a compressed, read only format which is great for historical data which is not going to change.
* Tuning, tuning, tuning. PostgreSQL used to ship with a really poorly configured configuration file. MySQL still ships with a really small configuration (eg: 16M caches) for that is only for getting running and personal development boxes. Take a look at the my-huge.cnf included in all MySQL distros for better tuning parameters.
However database tuning is a black art. You really have to tune the database to the application. This is complicated in MySQL case by the many choices in storage engines each which has its own tuning parameters.
Tuning really begins with the application design. Your choice of normalization or denormalization should reflect the applications position in the OLTP to DW spectrum. Query tuning also plays a major role in total application performance. It pains me to see Ruby on Rails ActiveRecord and Migrations pay little attention on foreign keys and creating appropriate indexes.
* Replication. MySQL is designed to scale out with replication. It has the best replication of any database I have used (Oracle, PostgreSQL, Sybase, Informix...). Perhaps this is why MySQL engineers have not paid too much attention making a single MySQL instance scale up beyond a certain point. You are simply not meant to do things that way with MySQL.
It would be nice if you could go farther with a single MySQL instance though. Slaving works MOST of the time, but it stinks when your slaves stop because some transaction timed out when you were trying to take a backup.
PostreSQL has Sloney and PGCluster, but I haven't heard too much about them in production. There is the commercial offering from MammothDB too, but from what I have heard MyQSLs replication is still better.
* Vacuuming. Every database needs some sort of maintenance. Most databases "tombstone" records when they delete. It is a basic space verses time trade off. Problem is you end up with fragmentation. You hear DBAs talking about avoiding table and index fragmentation a lot. So after a while you have to clean this up and defragment (much like a harddrive) your storage.
PostreSQL has vacuum and MySQL has "optimize table". They essentially do the same thing (although the multitude of storage engines complicates things again for MySQL) which is rewrite the tables and indexes in a space efficient manner. You should do this at least weekly if your DB does any sort of OLTP work with updates and deletes.
Also you need to "analyze" your tables. Analyze will collect statistics about the data in your tables and help the query planner pick the correct indexes, etc when it is deciding how to fetch your data for you. You have to do this with Oracle, MySQL and PostreSQL as far as I know. Run analyze on all your tables once and you might be surprised how much quicker things run.
* MySQL warts. I did tell you I work with many MySQL instances everyday right? I run into A LOT of bugs in MySQL. Many of them are known gotchas like these http://sql-info.de/mysql/gotchas.html Others are the result of storage engines doing things differently. And still others are just plain old ugly bugs. I really wish I didn't have to battle with these issues everyday, but often I do. I have become proficient at teasing out the gotchas from the storage engine quirks, but I run into bugs all too often and MySQL corp seems more interested in adding goo gaw features to MySQL-Cluster than fixing problems with my disks filling up because "purge log until 'date'" does not seem to work properly.
* MySQL Ubiquity. MySQL is everywhere now. You can find books, forums, support, and MySQL help all over these days.
PostgreSQL is not so ubiquitous, but it is gaining momentum. More people are hearing about "the other open source database" and giving it a try. More companies like Mammoth, Greenplum, and Sun are shipping products based on PostgreSQL.
* Licensing. If you are developing a product where you might want to ship an embedded database, then do not use MySQL. Hands down. MySQL Corp will molest you over licensing costs and you will be too locked in at that time to switch to a DB with better licensing (and your ORM will not help you but that's another rant). Another DB to look into in the embedded space is FireBird/InterBase.
So although I agree that a couple tests have shown that PostreSQL does more transactions per second than MySQL, I have reservations. I wonder if MySQL was meant to be used this way, what the storage engine and settings of the MySQL instance were, etc. I do believe that PostgreSQL wins hands down in a lot of areas including scaling on a single instance, but in the end it might come down to horses for courses. I might preffer to use PostgreSQL for a large OLTP instance and MySQL with archive or MyISAM engine for a data warehouse.
Comparing apples and avocados
In defense of MySQL (I work with 20+ MySQL instances everyday) you might be over simplifying a bit here in your comparison.
* Storage Engines. MyISAM stinks for concurrent operations. I have had observable increases in throughput simply by converting everything from MyISAM to InnoDB. Sorry, no real numbers here, but server load levels dropped significantly since they were not waiting around for the DB.
That being said, MyISAM is great for WORM applications like data warehousing. If concurrency is not an issue then MyIASM can give you a big win in terms of manageability (MyISAM tables are just files on disk) and raw query speed. Even better is the ARCHIVE storage engine which stores data in a compressed, read only format which is great for historical data which is not going to change.
* Tuning, tuning, tuning. PostgreSQL used to ship with a really poorly configured configuration file. MySQL still ships with a really small configuration (eg: 16M caches) for that is only for getting running and personal development boxes. Take a look at the my-huge.cnf included in all MySQL distros for better tuning parameters.
However database tuning is a black art. You really have to tune the database to the application. This is complicated in MySQL case by the many choices in storage engines each which has its own tuning parameters.
Tuning really begins with the application design. Your choice of normalization or denormalization should reflect the applications position in the OLTP to DW spectrum. Query tuning also plays a major role in total application performance. It pains me to see Ruby on Rails ActiveRecord and Migrations pay little attention on foreign keys and creating appropriate indexes.
* Replication. MySQL is designed to scale out with replication. It has the best replication of any database I have used (Oracle, PostgreSQL, Sybase, Informix...). Perhaps this is why MySQL engineers have not paid too much attention making a single MySQL instance scale up beyond a certain point. You are simply not meant to do things that way with MySQL.
It would be nice if you could go farther with a single MySQL instance though. Slaving works MOST of the time, but it stinks when your slaves stop because some transaction timed out when you were trying to take a backup.
PostreSQL has Sloney and PGCluster, but I haven't heard too much about them in production. There is the commercial offering from MammothDB too, but from what I have heard MyQSLs replication is still better.
* Vacuuming. Every database needs some sort of maintenance. Most databases "tombstone" records when they delete. It is a basic space verses time trade off. Problem is you end up with fragmentation. You hear DBAs talking about avoiding table and index fragmentation a lot. So after a while you have to clean this up and defragment (much like a harddrive) your storage.
PostreSQL has vacuum and MySQL has "optimize table". They essentially do the same thing (although the multitude of storage engines complicates things again for MySQL) which is rewrite the tables and indexes in a space efficient manner. You should do this at least weekly if your DB does any sort of OLTP work with updates and deletes.
Also you need to "analyze" your tables. Analyze will collect statistics about the data in your tables and help the query planner pick the correct indexes, etc when it is deciding how to fetch your data for you. You have to do this with Oracle, MySQL and PostreSQL as far as I know. Run analyze on all your tables once and you might be surprised how much quicker things run.
* MySQL warts. I did tell you I work with many MySQL instances everyday right? I run into A LOT of bugs in MySQL. Many of them are known gotchas like these http://sql-info.de/mysql/gotchas.html Others are the result of storage engines doing things differently. And still others are just plain old ugly bugs. I really wish I didn't have to battle with these issues everyday, but often I do. I have become proficient at teasing out the gotchas from the storage engine quirks, but I run into bugs all too often and MySQL corp seems more interested in adding goo gaw features to MySQL-Cluster than fixing problems with my disks filling up because "purge log until 'date'" does not seem to work properly.
* MySQL Ubiquity. MySQL is everywhere now. You can find books, forums, support, and MySQL help all over these days.
PostgreSQL is not so ubiquitous, but it is gaining momentum. More people are hearing about "the other open source database" and giving it a try. More companies like Mammoth, Greenplum, and Sun are shipping products based on PostgreSQL.
* Licensing. If you are developing a product where you might want to ship an embedded database, then do not use MySQL. Hands down. MySQL Corp will molest you over licensing costs and you will be too locked in at that time to switch to a DB with better licensing (and your ORM will not help you but that's another rant). Another DB to look into in the embedded space is FireBird/InterBase.
So although I agree that a couple tests have shown that PostreSQL does more transactions per second than MySQL, I have reservations. I wonder if MySQL was meant to be used this way, what the storage engine and settings of the MySQL instance were, etc. I do believe that PostgreSQL wins hands down in a lot of areas including scaling on a single instance, but in the end it might come down to horses for courses. I might preffer to use PostgreSQL for a large OLTP instance and MySQL with archive or MyISAM engine for a data warehouse.