MRI Ruby + MySQL + Threads == Stop the world... JRuby doesn't

Posted by aaron
on Wednesday, August 27

As we have been internally discussing how to scale our databases from 10's of millions of rows to 100's of millions, database sharding came up.

Depending on your data model and your application, sharding data into tables by some natural key is great if any given request uses only one shard. FiveRun's DataFabric seems to help with that. Its obviously best to shard the data in the most used way, but occasionally you'll need to write broadcast queries across shards. It'd be even better if those broadcast queries were executed concurrently. Well, apparently, its not that simple in MRI Ruby.

The mysql (2.7) gem stop's the world while executing a query. A sloppy but reproducible test is here. This is not a mutex lock on the re-use of a single connection object. Eddie also reached out to the Sequel maintainer, who agrees it is likely due "to the fact that the C drivers don't release the interpreter lock while they wait for a response from the server." JRuby, or more accurately JDBC, acts as expected. We even tested DataMapper's DataObjects::MySQL, as it appears they've re-implemented the mysql gem. Unfortunately it suffers from the same stop-the-world issue.

$ ruby mysql_locking_test.rb 
Loading MRI MySQL
Serial: 4.00969815254211s
Multi-threaded: 4.00785183906555s

$ data_mapper=true ruby mysql_locking_test.rb 
Loading DataMapper
Serial: 4.01330804824829s
Multi-threaded: 4.01132893562317s

$ jruby mysql_locking_test.rb 
Loading JDBC-MySQL
Serial: 4.2802369594573975s
Multi-threaded: 1.0499329566955566s

Only in JRuby was the multi-threaded != sequential time, as expected.

Potentially unrelated, no one has touched the mysql gem in 3 years?!?!

$ gem spec -v 2.7 mysql | grep date
date: 2005-10-09 00:00:00 +00:00

So even Ruby ORM frameworks (Sequel, DataMapper) that say they're thread-safe, are not concurrent on MRI... at least for mysql. For folks not using Rails, which already has a mutex lock higher in the stack, this must be a performance issue. For example, Merb + DataMapper + MySQL. If there is a 2s SQL query, all threads in that process stop for 2s.

Can others verify? "select sleep(2) from dual;" is a great way to test for this.

UPDATE: Multiple have asked, so to clarify. The sample code here creates a new connection PER thread. The mysql docs states: "Two threads can't send a query to the MySQL server at the same time on the same connection", but the test is creating a new connection object per thread, so that should not apply.

UPDATE 2: A couple folks have mentioned asynch drivers. One for mysql and one for postgres, but I believe they're based on NeverBlock, which is Ruby 1.9 only. That sounds like awesome progress. What's the realistic ETA for folks running 1.9 in a production environment? At least until Christmas for a 1.9 official release?

UPDATE 3: Looks like lots is happening here. There now is a mysql driver that supports async/threaded operations on 1.8! See the NeverBlock MySQL project. Looking forward to testing this in a production environment.

Using helpers in a controller: with_helpers

Posted by warren
on Thursday, August 07

Although there is supposed to be a clear separation between views and controllers, often when it comes to helper functions, there is a small bit of overlap and there are situations where it'd be nice to simply use a few helpers from inside an action.

ActionController::Base.class_eval do
  def with_helpers(&block)
    template = ActionView::Base.new([],{},self)
    template.extend self.class.master_helper_module
    add_variables_to_assigns
    template.assigns = @assigns
    template.send(:assign_variables_from_controller)
    forget_variables_added_to_assigns
    template.instance_eval(&block)
  end
end

Here is what it looks like in a controller. Although this is a non-sensical example, it shows off how you can use a helper (i.e. link_to) in an action. It also shows how the instance variables (i.e. @person) set in the action are available in the with_helpers block... the same way that instance variables are available in views.

class MyController < ApplicationController
  def my_action
    @person = Person.find(params[:id])
    render :text => with_helpers { link_to(@person.full_name, person_path(@person)) }
  end
end

Breaking the Rails static asset timestamp cache in development mode

Posted by warren
on Saturday, June 21

Rails automatically adds the File.mtime to static assets when using stylesheet_link_tag and javascript_include_tag. The file's mtime is cached to prevent excessive file system access... even in development mode. This is problematic in a Facebook canvas during development mode because often you won't immediately see the changes you make to your stylesheets and javascripts.

Here is a monkey patch you can throw into a config/initializers/break_asset_cache_in_dev_mode.rb to fix this:

if RAILS_ENV == 'development'
  require 'action_controller/dispatcher'
  ActionController::Dispatcher.before_dispatch do
    ActionView::Base.computed_public_paths.clear
  end
end

Cleaning up old releases

Posted by val
on Wednesday, June 04

Instead of relaying on running cleanup of old releases via capistrano, we have a cron job to only keep releases for last two days (but at least three latest).

#!/usr/bin/env ruby

require 'fileutils'

KEEP_RELEASES = 3
KEEP_DAYS = 2
EXCLUDE_APPS = %W(uploadr)

cut_time = (Time.now.utc - KEEP_DAYS*24*60*60).strftime("%Y%m%d%H%M%S").to_i

Dir['/u/apps/*'].each do |app|
  next if EXCLUDE_APPS.include?(File.basename(app))
  dirs = Dir["#{ app }/releases/*"]
  fresh = dirs.select { |dir| (dir.split('/').last).to_i > cut_time }
  latest = dirs.sort.last(KEEP_RELEASES )

  (dirs - fresh - latest).each do |dir|
    FileUtils.rm_rf dir
  end
end

Rake task for syntax checking a Ruby on Rails project

Posted by warren
on Wednesday, June 04

Here's a quick rake file that will crawl through your Rails project and syntax check ruby, erb, and yml files. You should always run this before doing a "cap deploy" and it even doesn't hurt to run it before a "svn commit".

require 'erb'
require 'open3'
require 'yaml'

task :check_syntax => [:check_ruby, :check_erb, :check_yaml]

task :check_erb do
  (Dir["**/*.erb"] + Dir["**/*.rhtml"]).each do |file|
    next if file.match("vendor/rails")
    Open3.popen3('ruby -c') do |stdin, stdout, stderr|
      stdin.puts(ERB.new(File.read(file), nil, '-').src)
      stdin.close
      if error = ((stderr.readline rescue false))
        puts file + error[1..-1]
      end
      stdout.close rescue false
      stderr.close rescue false
    end
  end
end

task :check_ruby do
  Dir['**/*.rb'].each do |file|
    next if file.match("vendor/rails")
    next if file.match("vendor/plugins/.*/generators/.*/templates")
    Open3.popen3("ruby -c #{file}") do |stdin, stdout, stderr|
      if error = ((stderr.readline rescue false))
        puts error
      end
      stdin.close rescue false
      stdout.close rescue false
      stderr.close rescue false
    end
  end
end

task :check_yaml do
  Dir['**/*.yml'].each do |file|
    next if file.match("vendor/rails")
    begin
      YAML.load_file(file)
    rescue => e
      puts "#{file}:#{(e.message.match(/on line (\d+)/)[1] + ':') rescue nil} #{e.message}"
    end
  end
end

And here's what the output looks like:

warren:tmp_project$ rake check_syntax
(in /Users/warren/tmp_project)
app/controllers/application.rb:37: syntax error, unexpected '='
app/views/people/home.html.erb:60: syntax error, unexpected '<', expecting $end
vendor/plugins/will_paginate/test/fixtures/users.yml:13: syntax error on line 13, col 0: `dev_<%= digit %>:'

This could probably be expanded to support lots of other types of files as well (css, javascripts, etc).

Updates:

  • Exclude all files in vendor/rails (thanks szeryf)

ESI & Mongrel-ESI.. Request for Feedback

Posted by aaron
on Wednesday, June 04


The Railsconf08 talk on ESI & Rails has sparked some interest in the community, and Todd, the core mongrel-esi maintainer, is asking for feedback on the mongrel-esi mailing list.

The latest rumor is he is working on a nginx port of mongrel-esi, which I have to admit sounds very interesting...

How are people planning to use ESI? Can anyone provide Todd with some feedback?

  I'm wondering what is the main set of features preventing folks
today from using mongrel-esi in production?  Is it :

* performance
* documentation
* stability

If it's performance, can someone do some load testing maybe provide
some numbers and even some target numbers?  I am working on an
improved concurrency model, that should help improve page performance
for pages with lots of esi:include tags

if it's documentation, what's the missing bits, and can any of you
help out by filling in the gaps?

if it's stability, can you provide some samples that fail?

-Todd

RailsConf 2008 Presentation - Assembling Pages Last

Posted by aaron
on Saturday, May 31


I presented this morning at RailsConf about Edge Caching and ESI. You can download my presentation below.

Assembling Pages Last: Edge Caching, ESI, and Rails

People asked some awesome questions, caught the edge cases I didnt want to cover which were complicated, like authorization/security, multi-level ESI includes, per page include limits, etc. The people at RailsConf are awesome. Thats totally why I enjoy these events.

The End of Slideshows: Animoto

Posted by aaron
on Tuesday, May 13

UPDATE: Animoto just raised a round of investment from Amazon! Congrats Guys!

Animoto is a great idea. They take your photos and create a production quality video to the music of your choice. Its the end of those boring slide shows, for good.


(From a recent Techcrunch article here)

We had the pleasure to work with the Animoto guys to launch their Facebook application, "Animoto Videos", which leveraged all of the existing photos on Facebook. The growth was amazing.


(From a recent AllFacebook article here)

Scaling an application from a few hundred users to over a million in just a few days isnt easy, but we had a great team. Their backend rendering farm lived in Amazon's Cloud, and the growth was so impressive, Jeff Bezos even spoke about them at Y Combinator’s Startup School just a few weeks ago. From 50 EC2 instances to over 4k in only a few days. See the video below.

<div><a href='http://www.omnisio.com'>Share and annotate your videos</a> with Omnisio!</div>

It was a pleasure working with the entire team from Animoto, RightScale, and Amazon. See their blog posts about the application here, here, and here.

I'm sure I'll cross paths with many of you at RailsConf. First round of beers is on me.

Advanced Rails Recipes

Posted by aaron
on Tuesday, May 13
Advanced Rails Recipes: 84 New Ways to Build Stunning Rails Apps

Congratulations to Mike Clark and the Pragmatic Programmers team for shipping Advanced Rails Recipes. I highly recommend this book of 70+ recipes on topics ranging from deployment to UI to security and performance, just to list a few.

Val, Warren and I are honored to have our recipes included.

Recipe 34: Play Nice with Facebook

Recipe 62: Profile in the Browser

Recipe 67: Encrypt Sensitive Data

It was a pleasure to work with Mike Clark on this project. See his screencast below.

Advanced Rails Recipes Screencast (33Mb, Quicktime)

ActsAsInsertOrUpdate

Posted by aaron
on Tuesday, April 22

Problem

With high volume Rails applications, entities with unique constraints are expensive and error prone to create/update. ActsAsInsertOrUpdate helps solve that problem (if you're using MySQL), by leveraging the "INSERT ... ON DUPLICATE KEY UPDATE" functionality.

Scenario

Lets say you have a Person, and Entity, and a Rating. Each user can rate each entity only once, and if they re-rate the entity, it should update the value.

class Entity < ActiveRecord::Base
  has_many :ratings
end

class Person < ActiveRecord::Base
 has_many :ratings
end
  
class Rating < ActiveRecord::Base
 belongs_to :Person
 belongs_to :Entity
end  

Here is the table that back's Rating. Notice the Unique Key constraint on (entity_id, person_id).

CREATE TABLE `ratings` (
  `id` int(11) NOT NULL auto_increment,
  `rating` tinyint(4) default '0',
  `person_id` int(11) default NULL,
  `entity_id` int(11) default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `index_ratings_on_entity_id_and_person_id` (`entity_id`,`person_id`),
)

Previously, the logic would be something like:

  • 1) Check if a rating exists for the User + Entity
  • 2) If so, update
  • 3) If not, insert
  • 4) rescue the insert in case there is a unqiue constraint error
  • 5) retrieve the record (and/or update with the new rating)
  • If the table is MyISAM, Steps 1-5 aren't transactionally safe. If you're using InnoDB, and experience heavy volumes of traffic, you're prone to Deadlock's. This is even more of a concern is the unique entity is shared across multiple users, as seen with a recent client of ours.

    Solution:

    class Rating < ActiveRecord::Base
     belongs_to :Person
     belongs_to :Entity
     acts_as_insert_or_update :field_to_update => "rating"
    end  

    Now Steps 1-5 above become, just one. Rating.create(..)

    In the background, ActsAsInsertOrUpdate overwrites the implementation of ActionRecord:Base#create, to leverage an often unsed feature of MySQL called INSERT ... ON DUPLICATE KEY UPDATE. As configured above, if a duplicate record is found for the unique constraint, the rating field will be updated with the new value.

    Caution

    This is a brute force hack on ActiveRecord::Base#create. Use at your own risk.

    Code

    Waiting for a rubyforge account. Will post more info soon.

MySQL Stored Function: parsing a JSON encoded string

Posted by warren
on Monday, April 14
For analytics purposes, we ended up storing JSON-encoded data as a column in a mysql table. Although we don't often need to query it directly, from time to time, it makes things a bit easier/faster. Below is a MySQL stored function that takes two parameters (a JSON encoded string, and the name of a key) and returns the value associated with that key.
CREATE FUNCTION JSON(`json` TEXT, `search_key` VARCHAR(255)) RETURNS TEXT DETERMINISTIC BEGIN

  DECLARE i INT DEFAULT 1;
  DECLARE json_length INT DEFAULT LENGTH(json);
  DECLARE state ENUM('reading_key','done_reading_key','reading_string', 'reading_array');
  DECLARE tmp_key TEXT;
  DECLARE tmp_value TEXT;
  DECLARE current_char VARCHAR(1);

  WHILE i <= json_length DO
    SET current_char = SUBSTRING(json,i,1);

    IF state = 'reading_key' THEN
      IF current_char = '"' THEN
        SET state = 'done_reading_key';
      ELSE
        SET tmp_key = CONCAT(tmp_key, current_char);
      END IF;
    ELSEIF state = 'done_reading_key' THEN
      IF current_char = '"' THEN
        SET state = 'reading_string';
      ELSEIF current_char = '[' THEN
        SET state = 'reading_array';
      END IF;
    ELSEIF state = 'reading_string' OR state = 'reading_array' THEN
      IF current_char = '\\' THEN
        SET i = i + 1;
        SET tmp_value = CONCAT(tmp_value, SUBSTRING(json,i,1));
      ELSEIF (state = 'reading_string' AND current_char = '"') OR (state = 'reading_array' AND current_char = ']') THEN
        IF search_key = tmp_key THEN
          RETURN tmp_value;
        ELSE
          SET state = NULL;
        END IF;
      ELSE
        SET tmp_value = CONCAT(tmp_value, current_char);
      END IF;
    ELSE 
      IF current_char='"' THEN
        SET state = 'reading_key';
        SET tmp_key = '';
        SET tmp_value = '';
      END IF;
    END IF;

    SET i = i + 1;
  END WHILE;

  RETURN NULL;
END

Examples

Here's a few examples of how it can be used:

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key1');
# returns 'val"ue1'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key2');
# returns 'value2'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key3');
# returns '"array1","array2"'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key4');
# returns 'value4'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key5');
# returns NULL

Notes:

If you're trying to run this in the MySQL console, you'll need to set the DELIMITER to be something other than a semi-colon. Before executing the above, run "DELIMITER $$" and after executing it, run "$$" and then "DELIMITER ;" to set your delimiter back to semi-colon.

If you're trying to run this in a Rails migration, don't forget to escape the back-slashes (i.e. '\\' should become '\\\\')

Although this handles several simple use cases of extracting JSON-encoded data, it is by no means comprehensive. There are many JSON-encoded structures that this will not work on. This will not work correctly with nested arrays, or with named hashes.

The performance of this is pretty slow. A better approach would be to create a UDF that plugs into MySQL. Here's a UDF to encode JSON data, but not decode: http://www.mysqludf.org/lib_mysqludf_json/index.php

Faster Implementation:

Here's a faster version, but it's not quite as robust:
CREATE FUNCTION JSON_FAST(`json` TEXT, `search_key` VARCHAR(255)) RETURNS TEXT DETERMINISTIC BEGIN
  IF INSTR(json, CONCAT('"', search_key, '":"')) THEN
    RETURN SUBSTRING_INDEX(SUBSTRING(json, INSTR(json, CONCAT('"', search_key, '":"')) +
           LENGTH(search_key) + 4), '"', 1);
  ELSEIF INSTR(json, CONCAT('"', search_key, '": "')) THEN
    RETURN SUBSTRING_INDEX(SUBSTRING(json, INSTR(json, CONCAT('"', search_key, '": "')) +
           LENGTH(search_key) + 5), '"', 1);
  ELSE
    RETURN NULL;
  END IF;
END
Here's some key differences:
SELECT JSON('{"key":"value \"plus quotes\""}', 'key');
# returns 'value "plus quotes"'
SELECT JSON_FAST('{"key":"value \"plus quotes\""}', 'key');
# returns 'value \'

SELECT JSON('{"key":["value1","value2"]}', 'key');
# returns '"value1","value2"'
SELECT JSON_FAST('{"key":["value1","value2"]}', 'key');
# returns NULL

Curb your Net::HTTP

Posted by aaron
on Tuesday, April 08
Curb is a ruby binding for libcurl. We've had sporadic issues with Net::HTTP, which this might aleviate via native dns, native timeouts, performance improvements, etc. It wouldnt be hard to re-implement ActiveResource, rfacebook, myspace-ruby, etc to use it instead. Anyone using this already?
sudo gem install curb
require 'rubygems'
require 'curb'
require "net/http"
require 'benchmark'

iterations = 40
Benchmark.bm do |x|
  x.report("curb") do
    iterations.times do
      c = Curl::Easy.perform("http://www.google.com")
      #puts c.body_str
    end
  end
  x.report("net/http")  do
    iterations.times do
      http = Net::HTTP.start("www.google.com")
      req = Net::HTTP::Get.new("/")
      res = http.request(req)
      #puts res.body
    end
  end
end
             user     system      total        real
curb      0.010000   0.030000   0.040000 (  4.019197)
net/http  0.140000   0.110000   0.250000 (  4.155106)

Agressive Timeouts On External API Calls

Posted by val
on Sunday, March 30

One of the challenges with writing a Facebook or Bebo application is staying within a limit it gives you to respond with data before it shows the Application Did Not Respond page to a user. Having a content reach application calling external APIs, like Amazon or YouTube, with response times beyond your control, forces you to keep such calls short to allow extra time for processing. We usually wrap them in aggressive timeouts with a retry. As an example is this code from the Ruby Amazon E-Commerce REST Service API gem rewritten to limit a single call attempt to two seconds with one more retry.

Original Code
module Amazon  
  class Ecs

    def self.send_request(opts)
      request_url = prepare_url(opts)

      res = Net::HTTP.get_response(URI::parse(request_url))
      unless res.kind_of? Net::HTTPSuccess
        raise Amazon::RequestError, "HTTP Response: #{res.code} #{res.message}"
      end
      Response.new(res.body)
    end

  end
end
Modified Code
module Amazon  
  class Ecs

    class EmptyResponse
      def items; []; end
      def total_pages; 0; end
    end

    def self.send_request(opts)

      res = timed_try(request_url, 2) do |url|

        uri = URI::parse(url)
        req = Net::HTTP.new(uri.host, uri.port)

        # Agressive timeouts
        req.open_timeout = 1
        req.read_timeout = 2

        req.start { |http| http.request_get(url) }

      end

      res.kind_of?(Net::HTTPSuccess) ? Response.new(res.body) : EmptyResponse.new

    end

private

     def timed_try(url, attempts, &block)

       attempt = 1
       begin
         block.call(url)
       rescue Timeout::Error
         if attempt >= attempts
           RAILS_DEFAULT_LOGGER.warn "[amazon_api] gave up after attempt ##{ attempt } to get data from #{ url }"
           nil
         else
           RAILS_DEFAULT_LOGGER.warn "[amazon_api] attempt ##{ attempt } timed out on getting data from #{ url }"
           attempt += 1
           retry
         end
       end

     end

  end
end

Reviewing Application Health with HAProxy Stats

Posted by val
on Thursday, March 27
One of the methods we use for checking the health of our applications is stats collected from HAProxy. We utilize it to see how many requests are scheduled for execution on mongrel instances. The graph is one indication of how our applications perform. When we launched the new version of the site three weeks ago, the graph for a single vertical (ReadingSocial) on a typical Tuesday looked like this:
So, between porting all verticals to Myspace, Orkut, Bebo, and enhancing the functionality, we spent some time on optimization. In addition to analyzing slow-query logs with mysqlsla, Aaron wrapped all external API calls (and we do a lot of them - to Amazon, Facebook, Myspace, etc) in slow monitoring so we could see where the latest external bottleneck was so we could fix it one by one. Three weeks later the graph became much more peaceful:

Viva LivingSocial!

Posted by val
on Saturday, March 15
You haven't heard from us for a while because we were working on a new project - LivingSocial - the web-site that spreads across all major social networks (Facebook, Myspace, Bebo, Orkut) allowing people to talk to each other about their interests without barriers. We are going to speak more about it in a dedicated blog while keeping this one for posts on Rails and other technologies.