Nginx + CDN + GoogleBot or how to avoid many useless Googlebot hits

If you're like me and you've developed a CDN distribution for your website's content (while waiting for SPDY to be widely adopted and available in mainstream distributions), you might have noted that the Googlebot is frequently scanning your CDNs, and this might have made your website a bit overloaded. After all, the goal of the CDNs are (several but in my case only) to elegantly distribute contents across subdomains so your browser will load the page resources faster (otherwise it gets blocked by the HTTP limit or any higher limit set by your browser of simultaneous content download). Hell,

Drupal 7 + HTTPS + Nginx + Varnish + Apache + Boost + APC + Securepages + Drupal

If you happen to develop large sites in Drupal, you might fall upon a case like this one, where different servers (namely at least one reverse proxy and one web server) interact, causing a series of chain reactions every time you change something. It might be frustrating, at times, to try and boost a coordinated system like this, and end up getting your users frustrated because part of it doen't work, when the rest (the part that *does* work) is super-fast.

Spider a website with wget

This command might be useful if you want to auto-generate the Boost module cache files on a Drupal site

wget -r -l4 --spider -D thesite.com http://www.thesite.com

Let's analyse the options...

-r indicates it's recursive (so "follow the links" and look for more than one page)

-l indicates the number of levels we want to recurse. If you are on the first page and you follow a link, you are at level 1. If you follow a link on that last page, you are at level 2, etc

--spider indicates not to download anything (we just want to go through the pages, that's all)

The Drupal 6 bootstrap easy debug

Just as a self reminder, and because I don't fancy too much looking into the Drupal core for debugging, here is a short explanation of how the Drupal 7 bootstrap mechanism works. First of all, a bootstrap mechanism is a mechanism by which you work progressively your way through the full loading of a system, step by step, starting with the loading of simple elements that will allow you to load more complex elements. The Linux system also has a bootstrap mechanism (as do most OSes).

Drupal 7 problem: Content edit tab doesn't appear for user editor

In Drupal 7, there is a *very tricky* problem you might fall upon at some point. We did and because it's not *really* a Drupal bug (although it could be considered as a usability bug), it's worth writing down. The problem is simple: one admin (Joe) and two editors (Sam and Max) edit a website. They all create their own content of the type "artist". They all have permissions to "Create artist", "Edit own artist", "Edit all artist" and "Delete artist". Also, the "artist" content type has a text body which can use the "Pure text" format, the "Filtered HTML" format and the "Full HTML".