A few words (and gripes) on Wordpress Indexes
Let me just start off with this note. Wordpress is a fantastically documented framework. The Codex is one of the most beautifully styled and (mostly) cleanly written details I've seen. I love it. Also, Wordpress is great for what it's designed to do. Blogging (I personally prefer using Harp but that's just me).
And before I really get into this, let me just re-iterate that the thoughts on this blogpost and any of my others are mine and not the company I work for, etc. etc.
A default install will, with a good caching plugin like w3tc or super cache, scale out fairly well for a blog. Because the blogs content, though generated by PHP and the database underneath, can be cached since it doesn't really change much. For logged in users you get some changes neccesary, but hey, it's do-able.
Unfortunately, a lot of people use wordpress for things it's not made for, like e-commerce. I've had the unfortunate pleasure of dealing with a plugin that causes your site to change into a Group-On clone. A nice thought, except for the fact that this particular plugin decided to do things "the wordpress way". What does this mean?
It mean's that everything. Products, Customers, Purchases, coupons, you
name it, is stuffed into the wp_posts
table and the extra information
is placed into wp_postmeta
. This fits into Wordpress's EAV model. Long
and skinny and easily extendable. Sounds great for something where people
want to put whatever information they want about their products onto a
page with minimal effort.
Except for this simple fact.
The wp_postmeta
table is designed to hold anything, so the meta_value
column is a LONGTEXT
field. What's the problem with that you might ask?
When someone does a query like this:
SELECT * from wp_posts wp INNER JOIN wp_postmeta wpm ON wp.ID = wpm.post_id WHERE meta_value = 'X' GROUP BY wp.ID ORDER BY wp.post_date_gmt
This seems like a perfectly harmless query. Doing an explain reveals this:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: wp type: index possible_keys: PRIMARY,post_name,type_status_date,post_parent,post_author key: PRIMARY key_len: 8 ref: NULL rows: 31 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: wpm type: ref possible_keys: post_id key: post_id key_len: 8 ref: shortcodes.wp.ID rows: 2 Extra: Using where
Also doesn't seem that bad right? Yes, it's using a temporary and a filesort but
the JOIN
itself is fast right? Yes the join is fast, PRIMARY
key is great. The
number of rows in this small test database result in a query that takes 0.09s on
my machine. But if I switch to a much larger database?
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: wp type: index possible_keys: PRIMARY,post_name,post_parent,post_author key: PRIMARY key_len: 8 ref: NULL rows: 168203 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: wpm type: ref possible_keys: post_id key: post_id key_len: 8 ref: testdb.wp.ID rows: 2 Extra: Using where 2 rows in set (0.00 sec) mysql> SELECT * from wp_posts wp INNER JOIN wp_postmeta wpm ON wp.ID = wpm.post_id WHERE meta_value = 'X' GROUP BY wp.ID ORDER BY wp. post_date_gmt\G Empty set (34.33 sec)
That's right kids, 34.33 seconds. On a database with 168,203 rows in it's posts table. That seems like a lot, but when you have plugins that stuff everything into that table, it's really not and is more common than you think.
So why is it slow? The problem lies in the LONGTEXT
, because in order to do our
query we're making a temporary table of our join, and we're sorting that. And it's
the ORDER BY
and GROUP BY
that are taking a toll on us. If I run the same query
without those two, it takes 2.78s to execute. Still an awful time for a database
to spend on a single query, but much much better than 34.33 seconds.
The temporary table is what really kills us, if you were to do an explain you'd
see that the filesort
and temporary
extra's are gone. Because we have to pull
down all the LONGTEXT
fields associated with ALL of posts that have meta information
we get clogged up on our disk I/O. Since the LONGTEXT
can store up to 4GB (last time
I checked the mySQL documentation anyway), we're potentially pulling down a lot of
information per row, and that's not going to fit into RAM, especially not with that
many rows being acted on by our query.
Really, the solution here is to rewrite your query in some clever way and also to
apply partial indexes to your meta_value
column so that we can get some speed.
Also, if your queries are doing other JOIN
s and WHERE
s that you'll want to
try to make an index that fully covers what you're querying against. I found a
great slideshow detailing some methods on making indexes work correctly and I
suggest it to anyone trying to really understand what they can do to make their
database a little more optimized.
Finally, my other biggest gripe is something that flabbergasted me yesterday when I realized it.
Wordpress does not have an index on the autoload
column of it's wp_options
table.
wp_options
is the table designed to hold your options (well named right?), so
plugins store setup information, wordpress stores how you want your dates to be
shown, and there's tons more. Unfortunately, plugin developers often put a lot
of things into the options table because it's so easy to use. The autoload
column decides whether or not the option should be pulled down from the database
when Wordpress boots up. So I see a query like this on start up a lot:
SELECT option_name, option_value FROM wp_options WHERE autoload='yes'
On a clients webpage, this query took 40+ seconds. While this was due primarily to being clogged up on queries like the one mentioned above, it showed up in the slow query log and I investigated it. After fooling around, I ended up adding an index to the autoload column to help it. Why? Because without the index the full wp_options table has to be scanned, for this client that's 15681 rows (see why I don't like Wordpress being used as a platform to do things it shouldn't?), when I add an index, the number of rows scanned is reduced to 229. That's a huge boost.
Really, I was confused why autoload
isn't indexed by default. A quick perusal of
wp-includes/options.php
in a wordpress install will show that the the function
wp_load_alloptions
calls into the database with this query, and searching for
the use of that function will turn up that it's used on every page load since, as
I said previously, it's called on every page request. This seems like an oversight
to me, granted that I doubt the core wordpress developers felt that people would
abuse the options table like it has been in this particular case, but it's still
something I would expect from people as smart as them. I wouldn't consider indexing
that column to be a pre-optimization either (an argument I imagine they might make).
Wordpress is a great blogging platform. It would be great if, when people tried to use it for other areas, that they don't blindly follow the "wordpress way", and instead focus on good normalized schema design. Just my 2 cents.