How to save 100m of RAM per mongrel (Part 2)

Posted by aaron
on Monday, November 10

UPDATE I've added a patch to Rails Edge for this fix, which is much different than the patch below. See here

In a previous article, I called out the massive memory usage of the default rails resource behavior, and it seems others have as well. In an attempt to decrease the number of routes, I commented out the "formatted_*" routes, and manually entered them back by hand.

But after some internal discussion/testing with Warren, we realized that was sloppy and error prone. Instead, I hacked Routing segments to allow for an optional format segment, so that formatted routes and normal routes are shared. The one downside, from what I can tell so far, is you lose "formatted_*" named routes + url helpers for those routes, but passing format to a url_for still works

   person_path(:id => 1, :format => "json") =>  /people/1.json

In the below gist, you'll see a OptionalFormatSegment, which sneakily gets around the '.' regex separator, removes the formatted_* named routes added by default, and should be the same solution as the previous post, but without the need to manually put all the routes back in.

I'm still testing this approach, but am interested in what other folks think.

Note: This monkey-patch only works on Rails 2.2

Hiring Experienced Web Developers at LivingSocial!

Posted by aaron
on Friday, October 24

We are excited to hire up to 2 experienced web developers in the coming month.

Who are you?

  • You've built and launched web applications that you're proud of and people love.
  • You have at least 5 years of serious web development experience.
  • You're pragmatic. You simply get things done.
  • Qualified candidates will have Ruby experience, but not required.

Who are we?

  • We are LivingSocial (http://livingsocial.com), a social cataloging and discovery platform.
  • We rapidly build software for over 7 million of people, serving tens of millions of page views/month.
  • Our tech team is small. We believe you can do more with less of the right people.
  • Our book vertical receives more reviews per day than Amazon.
  • Our users have cataloged/rated/reviewed almost 100M entities.
  • We are located in Washington, DC. (Chinatown)

What we use:

  • Ruby, Rails, Merb, MySQL, CSS, Javascript
  • ObjectiveC, iPhone Apps
  • Facebook/OpenSocial APIs
  • Whatever technology is best for the job

Perks:

  • Competitive salary, awesome Health/Dental Insurance, matching 401k, stock options.
  • MacBook Pro, 30" monitor, or your choice.
  • Open Souce software rules. We contribute, so should you.
  • We all attend and speak at conferences.
  • Flexible work hours.
  • Relocation available for the right candidate.

Relevant Google Queries:

Twitter Chatter:

Our New Office:

Interested?

jobs at livingsocial dot com





MRI Ruby + MySQL + Threads == Stop the world... JRuby doesn't

Posted by aaron
on Wednesday, August 27

As we have been internally discussing how to scale our databases from 10's of millions of rows to 100's of millions, database sharding came up.

Depending on your data model and your application, sharding data into tables by some natural key is great if any given request uses only one shard. FiveRun's DataFabric seems to help with that. Its obviously best to shard the data in the most used way, but occasionally you'll need to write broadcast queries across shards. It'd be even better if those broadcast queries were executed concurrently. Well, apparently, its not that simple in MRI Ruby.

The mysql (2.7) gem stop's the world while executing a query. A sloppy but reproducible test is here. This is not a mutex lock on the re-use of a single connection object. Eddie also reached out to the Sequel maintainer, who agrees it is likely due "to the fact that the C drivers don't release the interpreter lock while they wait for a response from the server." JRuby, or more accurately JDBC, acts as expected. We even tested DataMapper's DataObjects::MySQL, as it appears they've re-implemented the mysql gem. Unfortunately it suffers from the same stop-the-world issue.

$ ruby mysql_locking_test.rb 
Loading MRI MySQL
Serial: 4.00969815254211s
Multi-threaded: 4.00785183906555s

$ data_mapper=true ruby mysql_locking_test.rb 
Loading DataMapper
Serial: 4.01330804824829s
Multi-threaded: 4.01132893562317s

$ jruby mysql_locking_test.rb 
Loading JDBC-MySQL
Serial: 4.2802369594573975s
Multi-threaded: 1.0499329566955566s

Only in JRuby was the multi-threaded != sequential time, as expected.

Potentially unrelated, no one has touched the mysql gem in 3 years?!?!

$ gem spec -v 2.7 mysql | grep date
date: 2005-10-09 00:00:00 +00:00

So even Ruby ORM frameworks (Sequel, DataMapper) that say they're thread-safe, are not concurrent on MRI... at least for mysql. For folks not using Rails, which already has a mutex lock higher in the stack, this must be a performance issue. For example, Merb + DataMapper + MySQL. If there is a 2s SQL query, all threads in that process stop for 2s.

Can others verify? "select sleep(2) from dual;" is a great way to test for this.

UPDATE: Multiple have asked, so to clarify. The sample code here creates a new connection PER thread. The mysql docs states: "Two threads can't send a query to the MySQL server at the same time on the same connection", but the test is creating a new connection object per thread, so that should not apply.

UPDATE 2: A couple folks have mentioned asynch drivers. One for mysql and one for postgres, but I believe they're based on NeverBlock, which is Ruby 1.9 only. That sounds like awesome progress. What's the realistic ETA for folks running 1.9 in a production environment? At least until Christmas for a 1.9 official release?

UPDATE 3: Looks like lots is happening here. There now is a mysql driver that supports async/threaded operations on 1.8! See the NeverBlock MySQL project. Looking forward to testing this in a production environment.

Cleaning up old releases

Posted by val
on Wednesday, June 04

Instead of relaying on running cleanup of old releases via capistrano, we have a cron job to only keep releases for last two days (but at least three latest).

#!/usr/bin/env ruby

require 'fileutils'

KEEP_RELEASES = 3
KEEP_DAYS = 2
EXCLUDE_APPS = %W(uploadr)

cut_time = (Time.now.utc - KEEP_DAYS*24*60*60).strftime("%Y%m%d%H%M%S").to_i

Dir['/u/apps/*'].each do |app|
  next if EXCLUDE_APPS.include?(File.basename(app))
  dirs = Dir["#{ app }/releases/*"]
  fresh = dirs.select { |dir| (dir.split('/').last).to_i > cut_time }
  latest = dirs.sort.last(KEEP_RELEASES )

  (dirs - fresh - latest).each do |dir|
    FileUtils.rm_rf dir
  end
end

The End of Slideshows: Animoto

Posted by aaron
on Tuesday, May 13

UPDATE: Animoto just raised a round of investment from Amazon! Congrats Guys!

Animoto is a great idea. They take your photos and create a production quality video to the music of your choice. Its the end of those boring slide shows, for good.


(From a recent Techcrunch article here)

We had the pleasure to work with the Animoto guys to launch their Facebook application, "Animoto Videos", which leveraged all of the existing photos on Facebook. The growth was amazing.


(From a recent AllFacebook article here)

Scaling an application from a few hundred users to over a million in just a few days isnt easy, but we had a great team. Their backend rendering farm lived in Amazon's Cloud, and the growth was so impressive, Jeff Bezos even spoke about them at Y Combinator’s Startup School just a few weeks ago. From 50 EC2 instances to over 4k in only a few days. See the video below.

<div><a href='http://www.omnisio.com'>Share and annotate your videos</a> with Omnisio!</div>

It was a pleasure working with the entire team from Animoto, RightScale, and Amazon. See their blog posts about the application here, here, and here.

I'm sure I'll cross paths with many of you at RailsConf. First round of beers is on me.

Curb your Net::HTTP

Posted by aaron
on Tuesday, April 08
Curb is a ruby binding for libcurl. We've had sporadic issues with Net::HTTP, which this might aleviate via native dns, native timeouts, performance improvements, etc. It wouldnt be hard to re-implement ActiveResource, rfacebook, myspace-ruby, etc to use it instead. Anyone using this already?
sudo gem install curb
require 'rubygems'
require 'curb'
require "net/http"
require 'benchmark'

iterations = 40
Benchmark.bm do |x|
  x.report("curb") do
    iterations.times do
      c = Curl::Easy.perform("http://www.google.com")
      #puts c.body_str
    end
  end
  x.report("net/http")  do
    iterations.times do
      http = Net::HTTP.start("www.google.com")
      req = Net::HTTP::Get.new("/")
      res = http.request(req)
      #puts res.body
    end
  end
end
             user     system      total        real
curb      0.010000   0.030000   0.040000 (  4.019197)
net/http  0.140000   0.110000   0.250000 (  4.155106)

Agressive Timeouts On External API Calls

Posted by val
on Sunday, March 30

One of the challenges with writing a Facebook or Bebo application is staying within a limit it gives you to respond with data before it shows the Application Did Not Respond page to a user. Having a content reach application calling external APIs, like Amazon or YouTube, with response times beyond your control, forces you to keep such calls short to allow extra time for processing. We usually wrap them in aggressive timeouts with a retry. As an example is this code from the Ruby Amazon E-Commerce REST Service API gem rewritten to limit a single call attempt to two seconds with one more retry.

Original Code
module Amazon  
  class Ecs

    def self.send_request(opts)
      request_url = prepare_url(opts)

      res = Net::HTTP.get_response(URI::parse(request_url))
      unless res.kind_of? Net::HTTPSuccess
        raise Amazon::RequestError, "HTTP Response: #{res.code} #{res.message}"
      end
      Response.new(res.body)
    end

  end
end
Modified Code
module Amazon  
  class Ecs

    class EmptyResponse
      def items; []; end
      def total_pages; 0; end
    end

    def self.send_request(opts)

      res = timed_try(request_url, 2) do |url|

        uri = URI::parse(url)
        req = Net::HTTP.new(uri.host, uri.port)

        # Agressive timeouts
        req.open_timeout = 1
        req.read_timeout = 2

        req.start { |http| http.request_get(url) }

      end

      res.kind_of?(Net::HTTPSuccess) ? Response.new(res.body) : EmptyResponse.new

    end

private

     def timed_try(url, attempts, &block)

       attempt = 1
       begin
         block.call(url)
       rescue Timeout::Error
         if attempt >= attempts
           RAILS_DEFAULT_LOGGER.warn "[amazon_api] gave up after attempt ##{ attempt } to get data from #{ url }"
           nil
         else
           RAILS_DEFAULT_LOGGER.warn "[amazon_api] attempt ##{ attempt } timed out on getting data from #{ url }"
           attempt += 1
           retry
         end
       end

     end

  end
end

Reviewing Application Health with HAProxy Stats

Posted by val
on Thursday, March 27
One of the methods we use for checking the health of our applications is stats collected from HAProxy. We utilize it to see how many requests are scheduled for execution on mongrel instances. The graph is one indication of how our applications perform. When we launched the new version of the site three weeks ago, the graph for a single vertical (ReadingSocial) on a typical Tuesday looked like this:
So, between porting all verticals to Myspace, Orkut, Bebo, and enhancing the functionality, we spent some time on optimization. In addition to analyzing slow-query logs with mysqlsla, Aaron wrapped all external API calls (and we do a lot of them - to Amazon, Facebook, Myspace, etc) in slow monitoring so we could see where the latest external bottleneck was so we could fix it one by one. Three weeks later the graph became much more peaceful:

Reconfiguring the whole rails stack via a central YAML file

Posted by val
on Sunday, August 19

The challenge with hosting of multiple Rails-based Facebook applications is that the amount of users grow quickly. To address this problem we are using EC2 nodes that we can expand/shrink as the demand grows. The price/performance ratio isn’t quite what we first expected, so we are moving toward having a few dedicated boxes instead. Another problem that we add at least a couple of applications a week. On each box that hosts them, we need to reconfigure monit, haproxy, nginx, logrotate and nagios.

To mitigate both issues on dedicated boxes, we resolved to have a central configuration definition in svn with individual box configurations keyed on localhost name. A ruby script regenerates all those aforementioned configuration files from ERB-processed templates when it is run on a box and bounces the services. A sample config looks like:
dedicated-1:

    description: "The dedicated box #1"
    ip: 64.233.167.99
    failover: dedicated-2

    apps:

        bookshelf:
            port: 5000
            instances: 20
            response: Book

        ljconnect:
            port: 6000
            instances: 7
            virtual: ljconnect.hungrymachine.com
            response: Journal  
                      

That definition would generate a monit config with 20 instances of the bookshelf application and 7 instances of the ljconnect application plus all other configurations (including nagios health checks expecting the response value) . It is all possible because we adopt a fixed application deployment file structure and port numbering conventions (via offsets) for all servers.

Using mocks at the early stage of FB app development

Posted by val
on Wednesday, August 15

Developing applications for Facebook is a pain. The tunnel approach helps a lot to ease that pain but even then I prefer to start a FB app as a regular application, polish the logic, and then convert it to the Facebook one by adding FBML and such. At the early stages of the development I have the mocked parameter in config/facebook.yml set to true and keep this code in config/initializers/facebook.rb:

PERSON_PROFILE_URL = "http://www.facebook.com/profile.php"

FACEBOOK_CONFIG = YAML.load_file("#{RAILS_ROOT}/config/facebook.yml")[RAILS_ENV] || {}

if FACEBOOK_CONFIG['mocked']

  class Facebook::FBMLController

    require 'ostruct'
    FB_SESSION = OpenStruct.new(:session_user_id => 1, :session_key => "12345", :is_valid? => true)

    def fbsession; FB_SESSION; end

    def require_facebook_install; true; end

    def redirect_to(url); super; end

    def url_for(*params); super; end

  end

  module Facebook::Acts::FbUser
    module InstanceMethods
      def friends
        (self.class.find(:all) - [ self ]).collect(&:uid)
      end
    end
  end 

end

It mocks out just enough of Facebook on Rails functionality to use FBMLController and acts_as_fb_user from the beginning without Facebook backend.

Routing to the initial action after facebook application install

Posted by val
on Tuesday, August 14

Some facebook applications might have multiple entries. For example, a user might be adding an application (action – new) or replying to an invitation (action – reply, param – id). Since the UI for Facebook application configuration allows to provide only static Post-Add URL it might seem like there is no way to route users back to the original action if they tried to reach when the application has not been installed for them. Luckily, we have full control on the destination via the next paramater of the post install URL. All we need is to build a URL using the incoming call parameters with the exclusion of Facebook-specific ones.

This is an example for Facebook on Rails based code that might go to the application controller:

class ApplicationController < Facebook::FBMLController

protected

  before_filter :require_facebook_install 

  def require_facebook_install    
    if in_canvas? && !fbsession.is_valid?
      redirect_to fbsession.get_install_url(:next => url_for(post_install_params))
      false
    end
  end

  def post_install_params
    params.merge(:init => true).delete_if { |k, v| k.starts_with?('fb_sig') }
  end

end

Notice that the code sets the init parameter so it can be used to identify a post install call

Indentifying users who just installed your facebook apps

Posted by val
on Tuesday, August 14

Sometimes it is useful to do some action on a Facebook user right after your application has been installed by the user. For example, you might want to push some default FBML to user’s profile in case he does not complete the action you expect him to do after installation. Facebook application configuration allows to provide Post-Add URL to route users to the destination url after the application install. It could be a dedicated post_add action or, in case of a default action where you have some code in the controller and since Facebook limits amount of redirects you can use, it could be a parameter to the url, like &init=true, used to identify that it was a post-install action and execute on it.