Merb routing in Rails - Today!

We’ve been hearing great things about Merb routing so a few weeks ago, we wanted to see if we could get it working in a Rails application as a proof of concept.  The goal of this project was to use the Merb routing engine along with router.rb and without touching any of our existing Rails application (i.e. anything in app/*).   Pre-existing URL route recognition needed to continue to work and all named routes and url_for logic needed to generate the same URLs as before.

With the announcement that Merb and Rails are merging, we figured that it’d be a great time to share a little bit of what we learned.

Background

  • We’re running Rails 2.2.2.
  • When we started messing with routes, we had 2500 generated routes (i.e. “rake routes | wc -l”).  After Aaron’s formatted routes patch, we had 1250.  We currently have over 1500.
  • Routing accounted for around 4 seconds out of 6 for ./script/console and mongrel_rails to start up.
  • Each mongrel_rails process was running at a minimum of 250mb in production.

Step 1: compiling

Compiling is the first part of routing and involves loading an application’s routes.rb or router.rb and storing it in some way that allows for generation and recognition.

The merb_routing plugin contains a subset of merb-core… just the classes related to routes.  It shouldn’t be a surprise that these classes assume they are running in a Merb environment and not a Rails environment.  To successfully load these merb routing classes without drastically modifying the source, we wrote two small compatibility layers (mini-merb and mini-extlib) that provide some basic functionality that Merb and ExtLib provide (i.e. Merb.logger, Merb.root, etc).  This enabled us to successfully load the Merb::Router classes in our environment without any errors.

We then rewrote our routes.rb by hand using router.rb syntax.  We had to modify Merb routing slightly because we were getting “memory exhausted” compile errors.  It turned out that Merb routing uses a single if/elsif structure to recognize routes and due to our large number of routes, we hit a ruby quirk where you can’t have more than 2498 branches of logic in a single if-elsif statement.

irb(main):001:0> eval("if 1; #{"elsif 1;" * 2498} end")
=> nil
irb(main):002:0> eval("if 1; #{"elsif 1;" * 2499} end")
SyntaxError: (eval):1:in `irb_binding': compile error
(eval):1: memory exhausted

The quick fix was using many single if statement with a return inside rather than one giant if-elsif.   After a few hooks into ActionController::Routing we had our environment using Merb/router.rb instead of Rails/routes.rb.

We had to make a few other minor changes to Merb routing to be compatible with rails:

  • added support for { :method => :any } in routing conditions
  • added support for BLAH_index as a named route for singular resources (Rails creates a named route called “blog_index” instead of just “blog” for singular resources)

Lastly, we wrote a rake task that overrides the default “rake routes” to pull from Merb routes instead of Rails routes.  At this point, we had the routes loaded and could see the objects in script/console.

Step 2: recognizing

Recognition is the second part of routing and happens at the beginning of every request.  It translates a URL into various parameters and figures out which controller/action to invoke for that request.

Once we had router.rb working correctly, route recognition and parameter loading was surprisingly easy to wire in.

Step 3: generating

Generation is the third part of routing and translates structured options into a URL string.  This happens every time you use url_for, a named route helper or redirect_to.

Generation was the trickiest part of this project and had the most edge cases.  Philosophically speaking, Merb and Rails route generation are very different.  Rails gives you named routes as helpers (person_path, person_url, etc) that you can use as well as a url_for() method which will search through all of the routes and find the best route given the options you provide.  Merb on the other hand provides a single method, url(), which all generation goes through.  If you don’t explicitly provide a named route, url() will use the default route (i.e. :controller/:action/:id) rather than looking for the “best route”.

Getting the named routes to work was pretty easy and only required a single method_missing catch-all for ActionController::Base and ActionView::Base.  Using a simple regexp, if the requested method ends in _url or _path, it uses Merb::Router#url and passes in the named route.

There was a bit of trickery involved in getting the path vs. url as well as the  :only_path stuff working correctly, but overall not too hard.

The last tricky piece was returning the best route instead of the default route when no named route was provided.  This is accomplished by looping through all of the available routes and determining which of the routes satisfies the most options passed in.  With 1500 routes, this turned out to be a bit slow, so some optimization ideas were borrowed from Rails and a cache is maintained of available routes given a controller/action.

Conclusion

We’ve been using this plugin successfully in production for the last month.  Our environment startup time as well as our memory overhead were both reduced drastically as soon as we put it in to production. We started this development when Rails 2.1 was the latest stable release and benchmarks against Rails 2.1 put Merb routing way ahead in just about every metric we tested.

The routing in Rails 2.2 was sped up substantially and is now comparable to Merb (Rails wins in some benchmarks, Merb in others).  BUT…. Merb still blows away rails in startup time, by 2-3x. We thought we could take out our merb-routing hacks and reduce code complexity, but after watching production restarts, we decided to put it back.

This plugin can be found on github: merb_routing

UPDATE: after talking to Carl Lerche, it sounds like the new router refactoring he is working on will support both syntaxes on the same codebase. That’ll be very cool.

Sorry Mephisto.. Moving to WordPress

Dear Mephisto, you’ve been great. We’ve been dating for a while, and I didnt have any complaints. but I’m sorta sick of writing blog posts in your web UI. I heard there are plugins for 0.8 for metaweblog api support, but after putzing around with rails 2.0 vs 2.2, etc… I decided to break up with you. It’s really because I wanted to play with blogo. No hard feelings. Lets be friends.

Hi WordPress. How are you?

We just ported our blog to WordPress. To import all our old content, I found m2wp.pl, which after adding some mysql patches, does a great job of generating a wordpress export file that imports perfectly from the WP admin UI. It was missing a few features, like correct author attribution, so I make some tweaks and put it on github if anyone is interested.

BTW, whats blogo?

Sor far I like.

One feature request for us techies, an easy TextMate “insert code here” feature.

UPDATE: don’t forget to cp your assets.


Modular routing in rails and merb (acts_as_routing)

Here’s a proof of concept plugin that will monkeypatch Rails or Merb routing to allow you to define “acts as blocks” anywhere throughout your application (i.e. a plugin) and then use them in your routes file.
Imagine the plugin acts_as_commentable defines the following in its init.rb:

ActionController::Routing.routes_for_acts_as(:commentable) do |map|
  map.resources :comments
  map.best_comment '/best-comment', :controller => 'comments', :action => 'best'
end

If you added these :acts_as to your config/routes.rb:

ActionController::Routing::Routes.draw do |map|
  map.resources :people, :acts_as => [:commentable]
  map.resources :posts, :acts_as => [:commentable]
end

You could then use these routes throughout your application:

  <%=person_comments_path(Person.first)%>
  <%=post_comments_path(Post.first)%>
  <%=person_best_comment_path(Person.first)%>

This is equivalent to doing:

ActionController::Routing::Routes.draw do |map|
  map.resources :people do |people_map|
    people_map.resources :comments
    people_map.best_comment '/best-comment', :controller => 'comments', :action => 'best'
  end
  map.resources :posts do |posts_map|
    posts_map.resources :comments
    posts_map.best_comment '/best-comment', :controller => 'comments', :action => 'best'
  end
end

The plugin is available on github: http://github.com/hungrymachine/acts_as_routing/tree/master

P.S. At some point, I’ll submit patches to Rails and Merb so this functionality is native rather than provided via a monkeypatching plugin.

JRuby and why it might be nice to be back on the JVM - Part 2

In the previous post, I did some JRuby testing and noticed perf improvements over time.

Mark Imbriaco, of 37Signals, asked how it compared to MRI. I was curious too.

I can’t promise a “clean” comparison, your mileage may vary, but… MRI was:

Completed in 552ms (View: 104, DB: 14) | 200 OK [http://someurl.com/people/3]
Completed in 345ms (View: 60, DB: 8) | 200 OK [http://someurl.com/people/3]
Completed in 346ms (View: 60, DB: 4) | 200 OK [http://someurl.com/people/3]
Completed in 347ms (View: 56, DB: 4) | 200 OK [http://someurl.com/people/3]
Completed in 349ms (View: 58, DB: 5) | 200 OK [http://someurl.com/people/3]

Some stats on my runtime

$ java -version
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)
$ ruby -v
ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin9.5.0]

Both ran on a MacBook 2.53GHz. Likely the perf is better on Java6?

Installed Java6 and re-ran. Similar but better results.

Completed in 8815ms (View: 1632, DB: 429) | 200 OK [http://someurl.com/people/3]
Completed in 1017ms (View: 656, DB: 25) | 200 OK [http://someurl.com/people/3]
Completed in 998ms (View: 618, DB: 18) | 200 OK [http://someurl.com/people/3]
Completed in 1300ms (View: 437, DB: 35) | 200 OK [http://someurl.com/people/3]
Completed in 315ms (View: 119, DB: 21) | 200 OK [http://someurl.com/people/3]
Completed in 361ms (View: 140, DB: 20) | 200 OK [http://someurl.com/people/3]

Wow! Thats damn close to MRI…eventually… JRuby warmup time sucks though. Maybe warm it up during deploy before putting it back into the load balancer? Not exactly sure how to warm it correctly though.

If you just hit a URL on your app, you’ll only warm up THAT execution path… and you cant hit all actions, so…. thoughts?

JRuby and why it might be nice to be back on the JVM

We ported our Rails application to JRuby in order to do some GC comparisons yesterday. Not a ton of changes required. Mostly replacing gems with native dependencies. YAML is a bit more strict than its MRI counterpart…. but all told, thats pretty amazing! Nice job JRuby crew!

That said…. one interesting, entirely un-scientific, test case. After requesting the same URL 6 times, the VM got a lot better. 3.4s to 480ms

Completed in 3380ms (View: 2739, DB: 65) | 200 OK [http://someurl.com/people/3]
Completed in 1299ms (View: 1211, DB: 18) | 200 OK [http://someurl.com/people/3]
Completed in 1020ms (View: 949, DB: 20) | 200 OK [http://someurl.com/people/3]
Completed in 852ms (View: 732, DB: 18) | 200 OK [http://someurl.com/people/3]
Completed in 743ms (View: 628, DB: 31) | 200 OK [http://someurl.com/people/3]
Completed in 485ms (View: 374, DB: 14) | 200 OK [http://someurl.com/people/3]

It doesnt appear to be the DB caching the data. That time stayed consistent.

I didnt use any crazy runtime flags.. Just jetty, a war file, and -server….

If it was something we were memcaching, then after the 2nd request, it would have been consistent.

The amount of time spent in the view improved dramatically.

I didnt think JRuby leveraged JIT’ing yet. Anyone care to guess/explain why it was significantly better per request?

Note: It did trail off in the 300ms range with further testing…

Are your mongrels growing by 20MB/request on Rails 2.2? Blame AssetTag!

After porting our production application to Rails 2.2, we noticed a major memory leak.

Beforehand, monit would restart instances a handful of times a day. After Rails 2.2, monit restarted instances THOUSANDS of times a day.

This is a graph of one of our haproxy instances a couple days ago.

We looked at everything, including time spent rewriting Routes, thinking that was the culprit.

This morning, we all sat around and fought the issue old school style. binary debugging… and found it: AssetTagHelper. See the patch here.

The new thread-safe asset tag code keeps a static AssetTag::Cache = {} of all asset_tags created (css,jss, and all images).

Internally, each AssetTag object keeps a reference to the controller and template objects, and in turn all instance variables you created in your request.

What does that mean? Say you have a people controller, that loads a person and their stuff, and you show images of their stuff via image_tag().

 class PeopleController < ApplicationController
   def show
     @person = Person.find(params[:id])
     @stuff = @people.stuff.find(:all, :limit => 30)
   end
 end

When image_tag() is called, it does rails magic to append file extensions, asset_ids, and the like. To be smart, it caches those objects so it doesnt hit the file system
to figure all that out on every request. The problem is it puts it in a static cache, AssetTag::Cache.

So each PeopleController instance has a reference to 1 person and 30 Stuffs. After 1000 people look at their pages, or better yet google crawls your site, you have 1k @controllers with a total of 1000 People Objects, and 1000*30 Stuff objects. This would normally be fine. The objects leave scope and get GC’ed. But, if you generate an image tag to an unique asset, AssetTag puts that into a cache, AssetTag::Cache, with a reference to the @controller of the request. So All People and their Stuff are kept around forever, unable to be GC’ed…. every time a unique image is rendered via AssetTag. Eventually monit has to kill the process.

The patch we just submitted does 2 things.

1) It now keeps a cache of just the modified path strings, caching the file access stuff. If you have tons of local images, reference them by fully qualified host. Thats better for lots of reasons. Cookie-less asset hosts with multiple subdomains FTW!

2) It stops caching absolute URL paths. You cant do anything on the filesystem to verify them, and keeping a cache of those would also grow unbounded. We have millions of items in our system, each with a reference to an image.

Here is a graph of that haproxy today… Sleeping………

In order to do some testing of your own, here’s a simplistic after_filter you can add to application.rb (or is it now application_controller.rb?). Make sure you run this in production mode or with cache_classes = true. As you click around your site, you should see that the cache retains references to controller instances, just to name a few. After you apply the patch, you’ll see the cache is just strings.

 def assettag_cache
    puts "-"*80
    puts "[AssetTag::Cache] Now #{ActionView::Helpers::AssetTagHelper::AssetTag::Cache.size} items"
    ActionView::Helpers::AssetTagHelper::AssetTag::Cache.values.each do |asset_tag|
      if asset_tag === ActionView::Helpers::AssetTagHelper::AssetTag
        puts "   [Asset] #{asset_tag.instance_variable_get("@source")}  #{asset_tag.instance_variable_get("@controller").class.to_s}"
      else
        puts "   [Asset] #{asset_tag}"
      end
    end
  end

Ohh… and we havent given up on routes… Warren is working on some very interesting enhancements to rails routing. Looking forward to blogging about that soon.