Integrity constraint violations on the Drupal 7 block table

One of our Drupal 7 sites was giving a lot of white pages of death (500 errors). Looking through the Drupal logs, we saw a lot of these entries that were the cause of the white pages:

SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'spock-system-user-menu' for key 'tmd': INSERT INTO {block}

It’s obviously a problem with the block table, so let’s have a look at it:

[user@server01 www]$ drush sql-query 'describe block'
bid        int(11)      NO PRI NULL auto_increment
module     varchar(64)  NO
delta      varchar(32)  NO 0
theme      varchar(64)  NO MUL
status     tinyint(4)   NO 0
weight     int(11)      NO 0
region     varchar(64)  NO
custom     tinyint(4)   NO 0
visibility tinyint(4)   NO 0
pages      text         NO NULL
title      varchar(255) NO
cache      tinyint(4)   NO 1
css_class  varchar(255) NO

There is no field called ‘tmd’, so it must be an index on the table:

[user@server01 www]$ drush sql-query 'show index from block'
block 0 PRIMARY 1 bid     A 36 NULL NULL BTREE
block 0 tmd     1 theme   A 12 NULL NULL BTREE
block 0 tmd     2 module  A 36 NULL NULL BTREE
block 0 tmd     3 delta   A 36 NULL NULL BTREE
block 1 list    1 theme   A 12 NULL NULL BTREE
block 1 list    2 status  A 18 NULL NULL BTREE
block 1 list    3 region  A 36 NULL NULL BTREE
block 1 list    4 weight  A 36 NULL NULL BTREE
block 1 list    5 module  A 36 NULL NULL BTREE

Yep, there is an index called ‘tmd’ and it’s a unique one (the 0 in the second column means it’s not non-unique). We can see it consists of the 3 colums theme, module and delta (hence the name tmd).

The problem

The most occurring problem here, and this is well known in the Drupal community, is that large delta values make the block query fail. Take the 2 block names below, which will be the value for the delta field in the query:

They are both longer than 32 characters. The value for the ‘tmd’ index will trim the delta field down to 32 chars, making them both the same string:

With the module and theme being the same value for both, this will now give a duplicate key error and explain our watchdog error.

The fix?

The easiest fix for that problem is to simply increase the length of the title, delta or theme column, depending which one is giving problems. In our case that would be the delta column:

ALTER TABLE block
CHANGE delta delta VARCHAR(128)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '0'
COMMENT 'Unique ID for block within a module.';

Here we raised the length of the column to 128 chars, which should be a lot better than the default 32. We can’t raise if much beyond that size though, as the maximum length of the index key on InnoDB tables is limited by MySQL (see the CREATE INDEX documentation for the full explanation).

Sadly this did not fix our problem, as we kept seeing this error in the logs after raising the column size. You can also see that in the error I posted at the start of this post the size of the key was not exceeding the length of any column, so that was already an indication that there was something else causing the problem.

The real problem gets revealed

If we dive into the Drupal code (simply searching for pieces of the query string from the error), we find the query that’s giving the exception is being called inside the function _block_rehash.

At the top of the Drupal 7 API page for this function we can see a list of all functions that call this function, and one of them is block_flush_caches:

function block_flush_caches() {
  // Rehash blocks for active themes. We don't use list_themes() here,
  // because if MAINTENANCE_MODE is defined it skips reading the database,
  // and we can't tell which themes are active.
  $themes = db_query("SELECT name FROM {system} WHERE type = 'theme' AND status = 1");
  foreach ($themes as $theme) {
    _block_rehash($theme->name);
  }

  return array('cache_block');
}

This function is an implementation of hook_flush_caches, a function that gets called at every full cache flush.

So in short: this _block_rehash function is called at every cache flush. And that’s exactly where our problems lies with this site.

The site we are reviewing is a very busy Drupal 7 site with a lot of concurrent visitors. Drupal cron is scheduled to run every minute as this is needed for scheduled content publishing and sending out push notifications to mobile devices (but that’s not really important for this issue).

A cron run does a partial cache flush (called in system_cron) and after that the first visitor of the site triggers a cache rebuild. But because this site is a very busy site, the _block_rehash function seems to get called more than once at the same time, by different visitors.

As the_block_rehash function does not start a transaction, or have any kind of table locking on the block table, we can get a situation where the same query wants to get inserted more than once. Luckily our database fails on that because of the unique index, and thus saving us from corrupt data. But the user will get a 500 error returned and thus see a white page instead of the content he expected.

The real fix

For now we simply added a try/catch block around the query in the _block_rehash function, so that when it fails it won’t give a white screen of death anymore:

try {
  drupal_write_record('block', $block, $primary_keys);
}
catch (Exception $e) {
  watchdog('block_rehash', 'Trying to insert an existing block with message: @message', array('@message' => $e->getMessage()), WATCHDOG_WARNING);
}

The proper solution would be that Drupal implements a transaction system that prevents these kind of problems happening again. A shoutout to Jasper Knops for helping me figure out this issue (and proofreading this post :)

About Drupal cron

I mentioned above that this site runs a Drupal cron every minute, which is not a good idea on any busy site. If you have jobs that need to run every minute you should invoke individual cron commands instead of running a full Drupal cron.

comments powered by Disqus