
In the long arc of web design, sites change. Over the years, they are redesigned, reimplemented, ripped apart, and reassembled. In doing this, content, typically, can be lost to the ether. We all know that missing content is simply “not a good thing” and should be avoided.
Inbound links to a site generally come from two main places:
The search engines crawl regularly and keep themselves up to date. If they can find your content they will. If you tell them it’s moved permanently, they’ll update their records. Articles and other sites, however, usually won’t. I haven’t checked old outbound links on this site and I doubt many other people do either. In order to keep the end user searching through google and clicking on links from external sites happy, we need to employ some URL redirection to keep the old links pointing to the content with which they are associated.
We want to use 301 redirects.
A 301 redirect refers to the HTTP protocol status code delivered by the web server. A 301 is similar to the most famous status code: 404. A 404 is “page not found”. A 301 says “this resource has been permanently moved and here is the new address. Simple.
Redirects are achieved by using the mod_rewrite Apache extention. The semantics of writing rewrite directives can be quite mind boggling to the novice user as evident by the documentation for mod_rewrite
When you redeploy a project with a new framework, a lot of old links that have google-fu disappear. People clicking on said links will get a 404 error and end up confused believing the content to have been destroyed. Sad web surfer. So, as a good web designer, you want to ensure that all of your old URLs point to their new counterparts.
“But what about meta header redirects?” you say.
Sure, that would work on a small scale, but they have been so abused by spammers in recent years that they will decimate your search engine ranking. Also, when you have over 1900 redirects (as I did when porting this site to Mephisto) you want to make sure that you can set something up that is
And that’s what we’re going to setup here.
In setting up capistrano based on Coda Hale’s instructions you are actually telling Apache to send everything to the mongrel_cluster to handle requests before you reach .htaccess. For some reason, this allows certain requests to be handled (e.g. ones that were related to /feed worked, but /feed has a route associated with it), but it ignores others.
# Redirect all non-static requests to cluster
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://mongrel_cluster%{REQUEST_URI} [P,QSA,L]
That basically says “when the condition equals the site with any trailing URL, send that stringto the mongrel_cluster load balancer and let it deal with it.”
First off, we need to turn on mod_rewrite for this site. It should be on in your .conf file since Rails uses rewrite to achieve basic routing, so look for the line that says
RewriteEngine On
If it is not present, your site probably doesn’t work at all and you have far larger problems than I can address here.
Second, we know we have a lot of these redirects, so we don’t want to muddy up our very nice apache config file with them inline. So let’s use the Include directive to tell Apache to pull the file wholesale and parse it at that point. Place this line immediately after the RewriteEngine On directive.
Include etc/apache22/Includes/boboroshi_rewrite.conf
I placed mine in the same directory as my main domain conf file. On Apache versions before 2, you will not see an Includes directory in /usr/local/etc/apache* so you would need to do this differently.
In this file, I placed a series of redirects that I typed up in an external text editor and uploaded to the server. Did you think I would type 1900 lines in nano or vi? Crazy! I digress…
The file looks something like this:
RewriteRule ^/log/index\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L]
RewriteRule ^/index\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L]
RewriteRule ^/index\.rdf$ http://feeds.feedburner.com/boboroshi [R=301,L]
RewriteRule ^/atom\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L]
RewriteRule ^/rsd\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L]
RewriteRule ^/bloggertrue\.mov$ http://versionsix.boboroshi.com/media/video/bloggertrue.mov [R=301,L]
RewriteRule ^/travel/vh1awards/$ http://versionsix.boboroshi.com/viewpoint/travelogue/vh1awards/ [R=301,L]
RewriteRule ^/travel/calinevada62001/$ http://versionsix.boboroshi.com/viewpoint/travelogue/calinevada62001/ [R=301,L]
RewriteRule ^/backlog/2006/11/just_like_starting_over\.php$ http://www.boboroshi.com/2006/11/28/just-like-starting-over [R=301,L]
RewriteRule ^/backlog/2006/10/myspace_data_modeling\.php$ http://www.boboroshi.com/2006/10/31/myspace-data-modeling [R=301,L]
RewriteRule ^/backlog/2006/10/the_killers_sams_town_springsteen_queen_and_deadwood\.php$ http://www.boboroshi.com/2006/10/8/the-killers-sams-town-springsteen-queen-and-deadwood [R=301,L]
RewriteRule ^/backlog/2006/10/photos_soft_complex_monopoli_cedars_at_the_black_cat\.php$ http://www.boboroshi.com/2006/10/7/photos-soft-complex-monopoli-cedars-at-the-black-cat [R=301,L]
RewriteRule ^/backlog/2006/10/the_periodic_spiral\.php$ http://www.boboroshi.com/2006/10/4/the-periodic-spiral [R=301,L]
[......]
What does that mean? Well looking at the mod_rewrite documentation we’re looking at some basic regular expressions. the ^ starts the string and the $ ends the string. Since a period is used to represent any character, you want to slash escape it by placing a backslash character (\) before any period that should be in the URL.
[R=301,L] the R says “force redirect” and apply a 301 status code. The “L” says “stop running through the rules now and load the page. This is good when we have 1900 entries. Once it finds the entry, it stops and gets the content to the client.
So, once you’ve got it written up, toss that file in the directory, restart apache, run cap restart from your deployment workstation, and you should be in business. And your old users will be very very happy that they don’t have to look at a cache page in Google.
December 27th, 2006 at 05:11 PM
If you’re using Apache already, mod_alias is probably the way to go for redirects.
December 28th, 2006 at 05:16 PM
Josh -
I initially had those in my .htaccess but they were also choking. I’ll give it a whirl updating the main file. Thanks for the tip!
January 10th, 2007 at 06:52 AM
And you’ll probably want to use the ‘RedirectPermanent’ alias instead of just ‘Redirect’ (which does a 302 by default).