Converted the site from MySQL to PostgreSQL

I had a little time over the weekend and got around to converting the site's backend database from MySQL to PostgreSQL, something I've been meaning to do for a while.

Now the odd thing I found was that using PostgreSQL made the site consistently faster. Odd, considering MySQL is generally held to be the faster of the two.

Take a look:

Now despite the obvious (the server itself needs optimizing...), despite running ab multiple times for each concurrency level, PostgreSQL is faster across the board.

Thinking maybe it was just a matter of caching for short bursts of activity, I tried running ab for 30 seconds to see if time made a difference:

Again PostgreSQL is faster (with my limited concurrency anyways). With the timed benchmark I also noticed that the longest time taken for a page with MySQL was 8266ms, compared to PostgreSQL's 564ms.

I don't claim to have carried out these tests in any rigorously controlled manner, having done them for my own benefit. Indeed, the server is a live server hosting more than just this site, running antivirus scans and spamassassin with postfix for a number of users, etc. I'd like to hammer away on the benchmarks side, but that wouldn't be fair to the other users on the system--reboots, apache and mysql/postgresql restarts etc. to clear caches and the like.

I did keep an eye on top while performing each test to make sure nothing else was interfering with cpu time and memory. I also used 2 subdomains to access each test instance/copy of the site, one for postgresql and one for mysql, so that shouldn't be an issue. I also ran the benchmarks remotely, from a server in Dallas, ~15 hops away from this server in Toronto.

But even with the aformentioned lack of controls in mind, PostgreSQL is clearly faster for this particular site, using any ab test I've tried. I'll post a short list of the steps I took to convert the site in a day or two, when I have some more time.

Edit: tests were done using Mysql 5.0.38 and Postgresql 8.2.5 on a 64bit Athlon 3800+ with 1gb ram.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Maybe a myth?

I've been informed by a member of my local LUG that it is a myth (now) that PostgreSQL is slower than MySQL. I'll have to do some more research to verify that and post an update later.

sounds like good news i

sounds like good news

i guess there is only one advantage mysql has against postgresql (i mean for newcomers) - tutorials and site talking about it!

Vacuum

Pg is faster until you do some deletes ... then the vacuum job kicks in and perf goes to hell. If its a busy server with a lot of inserts and deletes the vacuum job might not even be able to finish. If that happens you have to take the DB offline for maint.

Pg is great but overall isn't faster than mysql ... yet!

so turn off autovacuum

don't see that as a big deal.

It's very easy to control

It's very easy to control the impact of vacuum via the vacuum_cost settings.

And "never being able to finish" is nothing but FUD. Other databases have a similar amount of work to do, they just do it during the transaction itself. At least with PostgreSQL you have the ability to put that work off until later.

Finally, in almost any *real* benchmark, PostgreSQL is more scalable than MySQL. So unless you're doing something that's essentially single-user, PostgreSQL will be faster.

PostgreSQL has vastly increased in that area

Hi,

PostgreSQL used to suffer from that up until the 7.X aera, vacuuming was just frightening.

But today, that's not true any more, the newer releases have the possibility to configure Vacuum in a much more polite way, so it doesn't interfere with normal operation any more.

And the new 8.3 release even avoids explicit vacuuming in some cases (a neat side-effect of the HOT patch), and allows parallel vacuum on different tables etc.

PG deletes faster than MySQL very much a myth

The first thing I look for when tuning MySQL is delete queries that can be avoided. MySQL hates deletes. PG will let you do them, but you'll need to vacuum periodically to recover the space. And that vacuuming can easily be done in off hours and is VERY MUCH less strenuous to the DB than the actual delete in MySQL.

Much faster in my tests

On 8 core systems postgresql blows mysql out of the water (on FreeBSD). Compare

http://people.freebsd.org/~kris/scaling/pgsql-ncpu.png

http://people.freebsd.org/~kris/scaling/netbsd.png (ignore the netbsd comparisons if you like)

Two things to notice:

1) pgsql has 45% higher transaction rates

2) mysql has an immediate drop-off above 8 clients. This is actually due to pthread mutex contention within mysql and not a FreeBSD issue, i.e. a mysql scalability bottleneck.

2a) FreeBSD is still outperforming Linux on this workload but I don't have updated graphs yet.

Good stuff Kris

Those charts are impressive. I'm looking forward to the release of FreeBSD 7.x

Other graphs for PostgreSQL

Something to try

Hmm, having easy access to a newly built dual quad core xeon box with 8gb of ram I'd really like to have a go at comparing the Opterons against the Xeons. But alas, it has Debian Etch installed, can't run BSD on it. Likely no comparison anyways (in favour of the Opeteron's I/O advantage), even with fixes to the context switching issue in 8.1 and later versions.

Having only recently become enamoured of Postgresql, BSD vs. Linux can wait for a while until I really have a handle on this fabulous DBMS.

Vacuum

Man, isn't that total crap? The VACUUM process does not start when deleting. This has to be done manually (or scheduled).

I would say that PostgreSQL is not slower on writes than MySQL - probably the other way round. Some benchmarks concerning write performance would be nice.

RE: Vacuum

Isn't *what* total crap? PostgreSQL has had auto-vacuum for a loooong time now. So, for a typical read-heavy database load, you don't even have to think about vacuuming. Nothing has to be done manually and there is nothing to schedule. The default settings "just work".

If you have a write-heavy database load, then you'll want to learn more about auto-vacuum, analyze, etc. and tweak settings as needed. Of course, that's also where the asynch vacuum approach really pays off. You have the freedom to tune vacuuming and related maintenance tasks to minimize their impact on real time performance.

Maybe you should take at least a cursory glance at the PostgreSQL docs before complaining about non-existent problems.

tabletype

Did you migrate from MyISAM ?

Yes

That's the default engine used with Drupal, so yes. What with Oracle buying up engines left and right, soon that'll be all that's left to use with mysql :(

Also note that the site runs with an unoptimized default APC module and (had) a query cache size of 16mb when using mysql.

Misconfiguration?

Hi,

I've found this article via the PostgreSQL's site. I am more familiar with MySQL than PostgreSQL and I'm not trying to say that either one is faster or more performant than the other. However, I'm very surprised by the results of your benchmark and I suspect that your MySQL wasn't optimally configured.

I read that you used AB, that means you retrieved the same page repeatly, therefore if the query cache was working, only the first page would have actually executed the queries and all subsequent pages' queries' results would have been pulled from the cache directly. That's an ideal situation for MySQL's query cache and should have tremendously biased the benchmark towards MySQL, that's why I'm very surprised to see PostgreSQL executing twice as many queries (pages) as MySQL in your second benchmark. If you decide to run a new batch of tests, try comparing the stats (SHOW STATUS) before and after running ab, and see if the cache was used.

Regards,

HR

Something to try

Hmm, I'll have a go at that sometime, though both MySQL and PostgreSQL were (as far as I can tell) both running with default configurations. I guess that's something to consider though, that many people (like me) with their personal sites don't really go past the application to optimise anything more than that. So while default setups for any dbms are most definitely not optimised for a particular application, for many users and in many use cases, they never are.

I don't know what I think either way, but I've got to learn PostgreSQL for course work next semester so for now I'm running with that.

MySQL defaults, not a recommended initial config file?

Did you at least use one of the default configuration files from the list at dev.mysql.com ? MySQL's defaults with no configuration file have historically been for systems with minimal RAM, so no configuration file at all cripples it if you're after anything more than serving a few pages on a desktop PC.

key_buffer_size is the cache size for the MyISAM index pages, MyISAM data rows come from OS cache. 20% of the RAM available for the database is a fair initial setting.

The Windows installer does do some basic tuning but *nix people are expected to look over the instructions and do some basic setup themselves.

PostgreSQL people would probably be unhappy that you were using MyISAM instead of InnoDB but the bare defaults for MyISAM are so inappropriate for high use that it's not really important.

That's the same for PostgreSQL

That's exactly the same for PostgreSQL, the default configuration is tweaked to minimize impact on small systems for people who "accidentally" install it.

To get reasonable performance, config file tweaking is necessary.

TOAST benefits, maybe?

When messaging boards shift over to PostgreSQL, the "TOAST" capability seems likely to be a component that would have a highly material effect.

That is, columns that are large, such as text messages that may contain several kilobytes of data, will tend to get compressed and stored in a side table. There is initially some added cost to the compression, though on modern hardware, Dueling Cores, and such, there may be so much CPU power lying around that the cost won't be noticeable.

The benefits will be more noticeable - the large bits of data will get compressed, reducing the size of the database, and any queries that just hit the "metadata" will find that the metadata packs into pages way more compactly. (Rather than 2-3 tuples per page, it might increase to 20-30 tuples/page.)

Queries that hit the data

Queries that hit the data can be faster too. Compressed data = smaller footprint on disk which can make for less disk IO to service a request. A good thing because most DB servers end up IO bound before they are CPU bound.

Yeah, now I just wish we

Yeah, now I just wish we could control when stuff got rolled over to toast tables... I bet there's a heck of a lot of things that are way below 2K, but still large enough that toasting would help...

All TOAST, all the time

Decibel, take a look here:
http://www.postgresql.org/docs/8.3/interactive/storage-toast.html
You can force compression always, even when the data are small enough to not require being stored out of line.

Comparing apples and avocados

In defense of MySQL (I work with 20+ MySQL instances everyday) you might be over simplifying a bit here in your comparison.

* Storage Engines. MyISAM stinks for concurrent operations. I have had observable increases in throughput simply by converting everything from MyISAM to InnoDB. Sorry, no real numbers here, but server load levels dropped significantly since they were not waiting around for the DB.

That being said, MyISAM is great for WORM applications like data warehousing. If concurrency is not an issue then MyIASM can give you a big win in terms of manageability (MyISAM tables are just files on disk) and raw query speed. Even better is the ARCHIVE storage engine which stores data in a compressed, read only format which is great for historical data which is not going to change.

* Tuning, tuning, tuning. PostgreSQL used to ship with a really poorly configured configuration file. MySQL still ships with a really small configuration (eg: 16M caches) for that is only for getting running and personal development boxes. Take a look at the my-huge.cnf included in all MySQL distros for better tuning parameters.

However database tuning is a black art. You really have to tune the database to the application. This is complicated in MySQL case by the many choices in storage engines each which has its own tuning parameters.

Tuning really begins with the application design. Your choice of normalization or denormalization should reflect the applications position in the OLTP to DW spectrum. Query tuning also plays a major role in total application performance. It pains me to see Ruby on Rails ActiveRecord and Migrations pay little attention on foreign keys and creating appropriate indexes.

* Replication. MySQL is designed to scale out with replication. It has the best replication of any database I have used (Oracle, PostgreSQL, Sybase, Informix...). Perhaps this is why MySQL engineers have not paid too much attention making a single MySQL instance scale up beyond a certain point. You are simply not meant to do things that way with MySQL.

It would be nice if you could go farther with a single MySQL instance though. Slaving works MOST of the time, but it stinks when your slaves stop because some transaction timed out when you were trying to take a backup.

PostreSQL has Sloney and PGCluster, but I haven't heard too much about them in production. There is the commercial offering from MammothDB too, but from what I have heard MyQSLs replication is still better.

* Vacuuming. Every database needs some sort of maintenance. Most databases "tombstone" records when they delete. It is a basic space verses time trade off. Problem is you end up with fragmentation. You hear DBAs talking about avoiding table and index fragmentation a lot. So after a while you have to clean this up and defragment (much like a harddrive) your storage.

PostreSQL has vacuum and MySQL has "optimize table". They essentially do the same thing (although the multitude of storage engines complicates things again for MySQL) which is rewrite the tables and indexes in a space efficient manner. You should do this at least weekly if your DB does any sort of OLTP work with updates and deletes.

Also you need to "analyze" your tables. Analyze will collect statistics about the data in your tables and help the query planner pick the correct indexes, etc when it is deciding how to fetch your data for you. You have to do this with Oracle, MySQL and PostreSQL as far as I know. Run analyze on all your tables once and you might be surprised how much quicker things run.

* MySQL warts. I did tell you I work with many MySQL instances everyday right? I run into A LOT of bugs in MySQL. Many of them are known gotchas like these http://sql-info.de/mysql/gotchas.html Others are the result of storage engines doing things differently. And still others are just plain old ugly bugs. I really wish I didn't have to battle with these issues everyday, but often I do. I have become proficient at teasing out the gotchas from the storage engine quirks, but I run into bugs all too often and MySQL corp seems more interested in adding goo gaw features to MySQL-Cluster than fixing problems with my disks filling up because "purge log until 'date'" does not seem to work properly.

* MySQL Ubiquity. MySQL is everywhere now. You can find books, forums, support, and MySQL help all over these days.

PostgreSQL is not so ubiquitous, but it is gaining momentum. More people are hearing about "the other open source database" and giving it a try. More companies like Mammoth, Greenplum, and Sun are shipping products based on PostgreSQL.

* Licensing. If you are developing a product where you might want to ship an embedded database, then do not use MySQL. Hands down. MySQL Corp will molest you over licensing costs and you will be too locked in at that time to switch to a DB with better licensing (and your ORM will not help you but that's another rant). Another DB to look into in the embedded space is FireBird/InterBase.

So although I agree that a couple tests have shown that PostreSQL does more transactions per second than MySQL, I have reservations. I wonder if MySQL was meant to be used this way, what the storage engine and settings of the MySQL instance were, etc. I do believe that PostgreSQL wins hands down in a lot of areas including scaling on a single instance, but in the end it might come down to horses for courses. I might preffer to use PostgreSQL for a large OLTP instance and MySQL with archive or MyISAM engine for a data warehouse.

A few comments

Overall an informative post. I know more about PostgreSQL than MySQL, and I have a few comments:

* Slony was built for production deployment from the beginning. It serves the .org and .info TLDs, which is about as "production" as it gets. It's stable, performs well (at least for it's primary use cases), is more granular than most replication systems, and updates can cascade from one slave to another. It is controlled on a per-table level, which is great for the extra control you want in an intensive application, but makes administration more difficult.

* I think there are some misconceptions about VACUUM. "optimize table" is very similar to "VACUUM FULL" in postgresql, but few people use VACUUM FULL at all. VACUUM by itself is what people usually run (or combined with ANALYZE), which doesn't interfere with reads, and it will mark the dead space in the table and indexes for reuse. With each release, VACUUM is becoming less of a burden in postgresql.

* I agree 100% that database performance evaluation is a huge challenge. The application developers' style is likely to favor one database over another, which is a big part of the problem in taking objective measurements.

* Before implementing a replicated system, it's a good idea to do a lot of thinking first. If it's simple async master/slave, you can probably do that without problems, but anything more complicated than that requires a lot of consideration.

* I don't think you can take for granted that MySQL is faster in the general case (I'm not saying you are personally). Back in the days of 7.0 postgresql, that was probably true, but postgresql has come a long way since that time (as has MySQL), and a lot of the improvements have been to performance. Take a look into the 8.3 beta of postgresql, and see how many of the new features are performance-related. I think PostgreSQL is a first-rate competitor against MySQL for performance, and is improving fast.

RE: Comparing apples and avocados

I also work on large mysql instances (100+ million row tables and such) and just have to say innodb is not always the answer one would hope. While I do like it in favor of myisam, you can still run into pretty serious limitations... like deadlocks in insert only scenarios (actually they are pretty bad in general, but that perticular one is just magically bad) At my $dayjob, we actually had a client recently convert back to myisam because of performance issues with innodb. I'd like to recommend innodb in all cases, but due to the quirks of how mysql interacts with its table engines, you just can't.

You mus be joking

1. "MySQL is designed to scale out with replication. It has the best replication of any database I have used"

We did believe in it until we tried to create a project for a large network of music shops in Poland. A project where DB/2 or Oracle should have been used. MySQL replication is a toy not ready for any production environment. There are far too many serious bugs in it. The setup doesn't survive master or client's server's crash and requires bringing the whole system down to repair. This is ridiculous. Actually it causes more problems than it solves. The system that works "most of the time" is not what we mean "PRODUCTION READY". The same applies to anu other open source database system.

2. "That being said, MyISAM is great for WORM applications like data warehousing". You must be joking? Great? Maybe slightly better than InnoDB, but still far from what DB/2 or Oracle can do. Data warehousing WITHOUT materialized views and range indices? ROTFL. PostgreSQL and MySQL are no good for DW.

Really?

Read the comment 2 comments above yours. I'd say Postgresql, with Slony replication must be good for something if "It serves the .org and .info TLDs". Sure maybe it isn't designed to run data-warehousing applications, but then neither is Oracle designed to run personal blogs or MySQL designed with transactions in mind. But hey if MySQL is good enough for Yahoo!, it must be good for something in some particular role no?

If that isn't enough proof that Free Software is capable of performing in a production environment then I don't know what is. I guess the world really would be a better place if we all just ran SQL Server or Oracle or DB/2...

You must be incapable of speeling

Or understanding that commercial databases are the only way to go.

1. PostgreSQL can do materialized views, even though they don't call it that. Google postgresql materialized views

2. MySQL can run a fantastic DW. I know because I ran a 5T MySQL DW for 7 years, even before it was catching on(and had lots of warts). I had 30k users for the BI I was providing and it ran like a top. It all comes down to the skill(and dedication) of your DBA. The raw speed and small format of MyISAM makes it ideal for this role, but usage has to be properly managed. MyISAM table locking makes this a fun challenge, but quite doable.

3. Query speed is roughly equivalent between PG 8.2 and MySQL 5.0. Some things are better in each due to the optimizers, but it essentially ends up a wash. However, with concurrent updates and the negative impact this has on the MySQL query cache on write heavy DBs PG is often better. PG 8.3 will be far superior to MySQL in _overall_ write/read performance in an OLTP env. PG 8.2 in the past was not as good here due to the transaction overhead of being an _actual_ 100% ACID MVCC+ database. On small to moderate hardware MySQL and PostgreSQL will smoke any commercial DB except maybe SQL Server since it doesn't do MVCC.

It's all a trade off. To say one is better than another means you're not factoring in the application and needs of the user. It pains me to say so, but sometimes I have to tell the customer . . . "Your best bet is SQL Server". :) However, if I'm staying more than a few months I'd much rather see them use PostgreSQL in most situations, and even MySQL in a few.

Comparison article

Comparing the two databases fairly is a pretty hard job. I recently wrote an outline of the major issues involved you can find at http://www.postgresql.org/docs/techdocs.83 that may shed some light on some of the things people have brought up here.

Information overload

From the comments so far, I have now got a week's worth of material to read through. Thanks!

Confs, table types, db size, db version... it all matters

I have experience of running big (millions of lines, tens GBs of data) and small databases with both MySQL and PostgreSQL.

PostgreSQL performs better with out-of-the-box confs than MySQL, but when you tune the configurations for the server it's not so straight forward (depends on how many concurrent users, how much RAM/CPUs available and so on). Also MySQL older versions perform much worse than newer > 5 versions. MyISAM as table type sucks badly as well, unless you have small db and don't want to use transactions. Potgre seems to perform better on Linux than Windows also.

There are about 20-30 confs that I usually tweak to make MySQL work ok. If you do benchmarks without stating the confs you use or table types or types of queries, they don't give that much.

Once by just optimizing the confs I made one of my MySQL dbs perform roughly twice as fast as it was before. If that was done to the benchmark above, who knows, it might be a tie.

Wow, this is the first time

Wow, this is the first time I've ever heard of PostgreSQL being faster than MySQL out of the box! Did you use the appropriate mysql.conf for your hardware? What version of Postgres was this?

RE: Wow, this is the first time

In this context, speed is relative to multi-user concurrency. I would wager that most other comparisons between MySQL and PostgreSQL were based on comparing the number of single tuple inserts/updates/deletes that can be made by a single user in a given period of time.

hmm

it sounds interesting but i think the main issue is in preconfiguration of MySQL, i am pretty sure postgre should be a bit slower

What is PRODUCTION?

What is PRODUCTION? (something tells me it's a dumb question:D)

PostgreSQL is in case of

PostgreSQL is in case of more complex queries or views much faster than MySql.

We were having performance problems with project using MySql (fine tuned, but using views. project has been using it over 2 years).

After test rewriting of some problematic queries to PostgreSQL and query testing, I was quite shocked.

query time with mysql was 1.7 second.... and in postgresql 65ms!!!
after that, we rewrote entire project to postgres and it is indeed significiantly faster at all

Comment - such situation can occur in specific conditions, which were, though, quite common in our project. There were:
- need to use views (usually complex queries)
- tables with many records (not so many, actually thousands), while selecting only top x

I tried to find out how is this possible and the result was clear, based on both DBMS docs - very poor implementation of views in mysql. Most of complex queries fall into 'temptable' type, which means that temporary table is created every time query is executed.

Holey Jeebus!

PavelJ:

"I tried to find out how is this possible and the result was clear, based on both DBMS docs - very poor implementation of views in mysql. Most of complex queries fall into 'temptable' type, which means that temporary table is created every time query is executed."

How, in $DEITY's name, can anyone work with such an sad excuse of a DBMS?

Benchmarking PostgreSQL vs MySQL performance on Drupal

I did a more controlled benchmark, and using default settings, PostgreSQL is slower than MySQL.

You can see all the details in benchmarking PostgreSQL vs MySQL performance on Drupal.

Did you use database migration software?

My initial Drupal instance was PostgreSQL because I do most of my development on PG. I moved to MySQL because of the shortage of PG-supporting Drupal modules, as well as some plain badly written queries by developers who apparently had little experience with PG.

After about a year of MySQL I'm ready to move back to PG. I don't need stats to tell me that my Drupal instance was snappier with PG. It was.

I initially hand-migrated everything from PG -> MySQL. I don't want to go through that bloody experience again. Did you use a database migration tool and, if so, what was it?

A small guide

I used pearcec's guide from groups.drupal.org as a reference after reading up a little about my2pg (which he used as well). There's really no way to do it easily, apart from just going a table at a time and being careful to check for appropriate single and double quotes (or lack thereof). So mostly manual I'm afraid to say.

I did the conversion on my desktop machine, tested it to make sure everything worked, and then uploaded everything and threw the switch on the server. Zero downtime :)

Not A Myth

While in FreeBSD 6.x, and PostGreSQL 8.x, the benchmarks weren't near as impressive. Both FreeBSD and PostGreSQL have made a huge stride in making the two work flawless. I've completely moved over to PostGreSQL on FreeBSD 7, and everything is just faster. I'll post some benchmarks later against the two.

I have found this site

I have found this site provide tools to convert mysql to postgresql .
http://postgresql-to-mysql.qarchive.org/

I researched tools, but there

I researched tools, but there isn't a FOSS tool that I found, and I certainly wouldn't submit a single row or field for conversion to an online service.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.