8
Mar

Generating Sitemaps With Rails

posted by gchatz No comments rails

Sitemaps are a cool way for describing your site’s structure to search engines.
They can be more than useful when your site’s link aren’t easily discoverable (like searches for example).

There are some examples for generating sitemaps on the fly using an .rxml template, but if you site contains a large number of links you’ll need more than that.

Identifying links to include

First, you’ll have to identify what kind of links of you’re application you want to include in the sitemap. In most cases. all links that have a search engine value should be included in the sitemap. This is more of a SEO issue and it has to do with the general layout and content of your application.

Understanding Sitemaps

Sitemaps are described here

Very shortly there are two types of sitemap formats:

  • The Sitemap files that contain a list of urls in your site
  • The Sitemap Index files that contains a list of sitemap files.

Sitemap file

The xml format of the sitemap file is like this:

<?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   ...
   ...
</urlset>

Where

  • loc is the actual url
  • lastmod is the last modified date
  • changefreq defines how often this url is updated
  • priority of this url compared to other urls in your site

Changefreq will not affect how Google and other search engines update the specific url.

Sitemap Index file

The index file contains a list of the sitemap files you want to include.
The xml format of that file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
   <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
   .....
   </sitemapindex>

Where loc defines the location of the sitemap file, and lastmod the last change date.

Helper Classes

We’ll need some helper classes to easily generate sitemaps and sitemap indexes.
It’s just a set of simple REXML::Document and REXML::Element derived classes.

class SitemapUrl < REXML::Element

  def initialize(loc, lastmod = nil, changefreq=nil, priority=nil)
    @loc = loc
    @lastmode = lastmod
    @changefreq = changefreq
    @priority = priority

    super("url")
    create_elements
  end

  def create_elements
    #add location
    el = self.add_element("loc")
    el.text = @loc

    if @lastmod
      el = self.add_element("lastmod")
      el.text = @lastmod
    end

    if @changefreq
      el = self.add_element("changefreq")
      el.text = @changefreq
    end

    if @priority
      el = self.add_element("priority")
      el.text = @priority
    end
  end

end

class Sitemap < REXML::Document

  attr_accessor :loc,:lastmod, :urls

  def initialize(loc=nil, lastmod=nil)
    super
    @loc = loc
    @lastmode = lastmod

    self << REXML::XMLDecl.new("1.0", "UTF-8")

    urlset = add_element("urlset")
    urlset.add_attribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")

    @urls = self.root
  end

  def to_xml
    to_s
  end

  def add_url(loc, lastmod = nil, changefreq=nil, priority=nil)
    @urls << SitemapUrl.new(loc, lastmod, changefreq,priority)
  end

end

class SitemapIndex < REXML::Document
  attr_accessor :sitemaps

  def initialize
    super

    self << REXML::XMLDecl.new("1.0", "UTF-8")

    sitemapindex = add_element("sitemapindex")
    sitemapindex.add_attribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")

  end

  def add_sitemap(sitemap)
    el = self.root.add_element("sitemap")
    loc = el.add_element("loc")
    loc.text = sitemap.loc
  end

  def to_xml
    to_s
  end
end

(you can download the file here )

Using Rake to generate your sitemap

We want our sitemaps to be generated periodically. We can use a rake task and a cron job that will create the xml files and compress them.
Rake tasks are extremely helpfull for that case and we can access all of our models just be setting the environment as a requirement.

namespace :myapp do 
    namespace :sitemap do

      desc "Create Products Sitemap"
       task(:products => :environment) do
          sitemap = Sitemap.new
          sitemap.add_url("http://www.myapp.gr/products/1")
          sitemap.add_url("http://www.myapp.gr/products/2")
          #add files depending on your application logic

          #delete the previous
          FileUtils.rm(File.join(RAILS_ROOT, "public/product_sitemap.xml.gz"), :force => true)

          #create the new file
          f =File.new(File.join(RAILS_ROOT, "public/product_sitemap.xml"), 'w')

          #output contents
          f.write sitemap.to_xml
          f.close

          #compress
          system("gzip #{File.join(RAILS_ROOT, 'public/product_sitemap.xml')}")

       end

       desc "Create Reviews Sitemap"
       task(:reviews => :environment) do
          sitemap = Sitemap.new
          sitemap.add_url("http://www.myapp.com/reviews/1")
          sitemap.add_url("http://www.myapp.com/reviews/2")
          #add files depending on your application logic

          #delete the previous
          FileUtils.rm(File.join(RAILS_ROOT, "public/reviews_sitemap.xml.gz"), :force => true)

          #create the new file
          f =File.new(File.join(RAILS_ROOT, "public/reviews_sitemap.xml"), 'w')

          #output contents
          f.write sitemap.to_xml
          f.close

          #compress
          system("gzip #{File.join(RAILS_ROOT, 'public/reviews_sitemap.xml')}")

       end

       desc "Create Index"
       task(:index => :environment) do
          #add each sitemap file
          products = Sitemap.new("http://www.myapp.com/products_sitemap.xml.gz")
          reviews = Sitemap.new("http://www.myapp.com/reviews_sitemap.xml.gz")
                
          index.add_sitemap(products)
          index.add_sitemap(reviews)

          #remove the previous file
          FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_index.xml.gz"), :force => true)

          #create the index file
          f =File.new(File.join(RAILS_ROOT, "public/sitemap_index.xml"), 'w')

          #write contents
          f.write index.to_xml
          f.close

          #compress
          system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_index.xml')}")

       end

       desc "Create all sitemaps"
       task(:generate=> :environment) do
          #create all sitemap files
          Rake::Task["myapp:sitemap:products"].invoke
          Rake::Task["myapp:sitemap:reviews"].invoke

          #create the sitemap index file 
          Rake::Task["myapp:sitemap:index"].invoke
       end

   end
end

Create a file sitemap.rake and place it in your lib/tasks folder. (you can download it here)

Scheduling sitemap generation

Now that we have our rake task ready, we can setup a cron job that will execute this rake task periodically.
For a linux machine that would require a line in the crontab file:

#generate sitemaps daily
0 0 * * * root cd /path/to/your/application && /usr/bin/rake myapp:sitemap:generate

Don’t forget to submit your sitemap urls to search engines. There are some services that will help you submit your sitemap to more than one search engine at a time.