Generating Sitemaps With Rails
posted by gchatz No commentsIn this article:
Sitemaps are a cool way for describing your site’s structure to search engines.
They can be more than useful when your site’s link aren’t easily discoverable (like searches for example).
There are some examples for generating sitemaps on the fly using an .rxml template, but if you site contains a large number of links you’ll need more than that.
Identifying links to include
First, you’ll have to identify what kind of links of you’re application you want to include in the sitemap. In most cases. all links that have a search engine value should be included in the sitemap. This is more of a SEO issue and it has to do with the general layout and content of your application.
Understanding Sitemaps
Very shortly there are two types of sitemap formats:
- The Sitemap files that contain a list of urls in your site
- The Sitemap Index files that contains a list of sitemap files.
Sitemap file
The xml format of the sitemap file is like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
...
...
</urlset>
Where
- loc is the actual url
- lastmod is the last modified date
- changefreq defines how often this url is updated
- priority of this url compared to other urls in your site
Changefreq will not affect how Google and other search engines update the specific url.
Sitemap Index file
The index file contains a list of the sitemap files you want to include.
The xml format of that file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
.....
</sitemapindex>
Where loc defines the location of the sitemap file, and lastmod the last change date.
Helper Classes
We’ll need some helper classes to easily generate sitemaps and sitemap indexes.
It’s just a set of simple REXML::Document and REXML::Element derived classes.
class SitemapUrl < REXML::Element
def initialize(loc, lastmod = nil, changefreq=nil, priority=nil)
@loc = loc
@lastmode = lastmod
@changefreq = changefreq
@priority = priority
super("url")
create_elements
end
def create_elements
#add location
el = self.add_element("loc")
el.text = @loc
if @lastmod
el = self.add_element("lastmod")
el.text = @lastmod
end
if @changefreq
el = self.add_element("changefreq")
el.text = @changefreq
end
if @priority
el = self.add_element("priority")
el.text = @priority
end
end
end
class Sitemap < REXML::Document
attr_accessor :loc,:lastmod, :urls
def initialize(loc=nil, lastmod=nil)
super
@loc = loc
@lastmode = lastmod
self << REXML::XMLDecl.new("1.0", "UTF-8")
urlset = add_element("urlset")
urlset.add_attribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
@urls = self.root
end
def to_xml
to_s
end
def add_url(loc, lastmod = nil, changefreq=nil, priority=nil)
@urls << SitemapUrl.new(loc, lastmod, changefreq,priority)
end
end
class SitemapIndex < REXML::Document
attr_accessor :sitemaps
def initialize
super
self << REXML::XMLDecl.new("1.0", "UTF-8")
sitemapindex = add_element("sitemapindex")
sitemapindex.add_attribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
end
def add_sitemap(sitemap)
el = self.root.add_element("sitemap")
loc = el.add_element("loc")
loc.text = sitemap.loc
end
def to_xml
to_s
end
end
(you can download the file here )
Using Rake to generate your sitemap
We want our sitemaps to be generated periodically. We can use a rake task and a cron job that will create the xml files and compress them.
Rake tasks are extremely helpfull for that case and we can access all of our models just be setting the environment as a requirement.
namespace :myapp do
namespace :sitemap do
desc "Create Products Sitemap"
task(:products => :environment) do
sitemap = Sitemap.new
sitemap.add_url("http://www.myapp.gr/products/1")
sitemap.add_url("http://www.myapp.gr/products/2")
#add files depending on your application logic
#delete the previous
FileUtils.rm(File.join(RAILS_ROOT, "public/product_sitemap.xml.gz"), :force => true)
#create the new file
f =File.new(File.join(RAILS_ROOT, "public/product_sitemap.xml"), 'w')
#output contents
f.write sitemap.to_xml
f.close
#compress
system("gzip #{File.join(RAILS_ROOT, 'public/product_sitemap.xml')}")
end
desc "Create Reviews Sitemap"
task(:reviews => :environment) do
sitemap = Sitemap.new
sitemap.add_url("http://www.myapp.com/reviews/1")
sitemap.add_url("http://www.myapp.com/reviews/2")
#add files depending on your application logic
#delete the previous
FileUtils.rm(File.join(RAILS_ROOT, "public/reviews_sitemap.xml.gz"), :force => true)
#create the new file
f =File.new(File.join(RAILS_ROOT, "public/reviews_sitemap.xml"), 'w')
#output contents
f.write sitemap.to_xml
f.close
#compress
system("gzip #{File.join(RAILS_ROOT, 'public/reviews_sitemap.xml')}")
end
desc "Create Index"
task(:index => :environment) do
#add each sitemap file
products = Sitemap.new("http://www.myapp.com/products_sitemap.xml.gz")
reviews = Sitemap.new("http://www.myapp.com/reviews_sitemap.xml.gz")
index.add_sitemap(products)
index.add_sitemap(reviews)
#remove the previous file
FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_index.xml.gz"), :force => true)
#create the index file
f =File.new(File.join(RAILS_ROOT, "public/sitemap_index.xml"), 'w')
#write contents
f.write index.to_xml
f.close
#compress
system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_index.xml')}")
end
desc "Create all sitemaps"
task(:generate=> :environment) do
#create all sitemap files
Rake::Task["myapp:sitemap:products"].invoke
Rake::Task["myapp:sitemap:reviews"].invoke
#create the sitemap index file
Rake::Task["myapp:sitemap:index"].invoke
end
end
end
Create a file sitemap.rake and place it in your lib/tasks folder. (you can download it here)
Scheduling sitemap generation
Now that we have our rake task ready, we can setup a cron job that will execute this rake task periodically.
For a linux machine that would require a line in the crontab file:
#generate sitemaps daily 0 0 * * * root cd /path/to/your/application && /usr/bin/rake myapp:sitemap:generate
Don’t forget to submit your sitemap urls to search engines. There are some services that will help you submit your sitemap to more than one search engine at a time.
