MySQL Stored Function: parsing a JSON encoded string

Posted by warren
on Monday, April 14
For analytics purposes, we ended up storing JSON-encoded data as a column in a mysql table. Although we don't often need to query it directly, from time to time, it makes things a bit easier/faster. Below is a MySQL stored function that takes two parameters (a JSON encoded string, and the name of a key) and returns the value associated with that key.
CREATE FUNCTION JSON(`json` TEXT, `search_key` VARCHAR(255)) RETURNS TEXT DETERMINISTIC BEGIN

  DECLARE i INT DEFAULT 1;
  DECLARE json_length INT DEFAULT LENGTH(json);
  DECLARE state ENUM('reading_key','done_reading_key','reading_string', 'reading_array');
  DECLARE tmp_key TEXT;
  DECLARE tmp_value TEXT;
  DECLARE current_char VARCHAR(1);

  WHILE i <= json_length DO
    SET current_char = SUBSTRING(json,i,1);

    IF state = 'reading_key' THEN
      IF current_char = '"' THEN
        SET state = 'done_reading_key';
      ELSE
        SET tmp_key = CONCAT(tmp_key, current_char);
      END IF;
    ELSEIF state = 'done_reading_key' THEN
      IF current_char = '"' THEN
        SET state = 'reading_string';
      ELSEIF current_char = '[' THEN
        SET state = 'reading_array';
      END IF;
    ELSEIF state = 'reading_string' OR state = 'reading_array' THEN
      IF current_char = '\\' THEN
        SET i = i + 1;
        SET tmp_value = CONCAT(tmp_value, SUBSTRING(json,i,1));
      ELSEIF (state = 'reading_string' AND current_char = '"') OR (state = 'reading_array' AND current_char = ']') THEN
        IF search_key = tmp_key THEN
          RETURN tmp_value;
        ELSE
          SET state = NULL;
        END IF;
      ELSE
        SET tmp_value = CONCAT(tmp_value, current_char);
      END IF;
    ELSE 
      IF current_char='"' THEN
        SET state = 'reading_key';
        SET tmp_key = '';
        SET tmp_value = '';
      END IF;
    END IF;

    SET i = i + 1;
  END WHILE;

  RETURN NULL;
END

Examples

Here's a few examples of how it can be used:

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key1');
# returns 'val"ue1'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key2');
# returns 'value2'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key3');
# returns '"array1","array2"'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key4');
# returns 'value4'

SELECT JSON('{"key1":"val\\"ue1","key2":"value2","key3":["array1","array2"],"key4":"value4"}', 'key5');
# returns NULL

Notes:

If you're trying to run this in the MySQL console, you'll need to set the DELIMITER to be something other than a semi-colon. Before executing the above, run "DELIMITER $$" and after executing it, run "$$" and then "DELIMITER ;" to set your delimiter back to semi-colon.

If you're trying to run this in a Rails migration, don't forget to escape the back-slashes (i.e. '\\' should become '\\\\')

Although this handles several simple use cases of extracting JSON-encoded data, it is by no means comprehensive. There are many JSON-encoded structures that this will not work on. This will not work correctly with nested arrays, or with named hashes.

The performance of this is pretty slow. A better approach would be to create a UDF that plugs into MySQL. Here's a UDF to encode JSON data, but not decode: http://www.mysqludf.org/lib_mysqludf_json/index.php

Faster Implementation:

Here's a faster version, but it's not quite as robust:
CREATE FUNCTION JSON_FAST(`json` TEXT, `search_key` VARCHAR(255)) RETURNS TEXT DETERMINISTIC BEGIN
  IF INSTR(json, CONCAT('"', search_key, '":"')) THEN
    RETURN SUBSTRING_INDEX(SUBSTRING(json, INSTR(json, CONCAT('"', search_key, '":"')) +
           LENGTH(search_key) + 4), '"', 1);
  ELSEIF INSTR(json, CONCAT('"', search_key, '": "')) THEN
    RETURN SUBSTRING_INDEX(SUBSTRING(json, INSTR(json, CONCAT('"', search_key, '": "')) +
           LENGTH(search_key) + 5), '"', 1);
  ELSE
    RETURN NULL;
  END IF;
END
Here's some key differences:
SELECT JSON('{"key":"value \"plus quotes\""}', 'key');
# returns 'value "plus quotes"'
SELECT JSON_FAST('{"key":"value \"plus quotes\""}', 'key');
# returns 'value \'

SELECT JSON('{"key":["value1","value2"]}', 'key');
# returns '"value1","value2"'
SELECT JSON_FAST('{"key":["value1","value2"]}', 'key');
# returns NULL
Comments

Leave a response

  1. Joe GrossbergApril 14, 2008 @ 05:50 PM

    Interesting proof-of-concept.

    I'm curious about the motivations for this, since you said, "Although we don't often need to query it directly, from time to time, it makes things a bit easier/faster". But then you say performance was "pretty slow", so why did you do it in SQL instead of Ruby?

    Also, by limiting it to 255 chars, I'm guessing that you know the JSON keys' names. Why not just split that JSON string and insert into multiple columns? (You say that it has no nested hashes nor arrays.)

    And how do you avoid truncating that JSON when it's too long (thus corrupting it)?

  2. WarrenApril 14, 2008 @ 08:26 PM
    Long story short, we're storing large amounts of data for analytics purposes. We pull out useful info that we know we'll be using frequently into distinct columns so we can query it faster, but we throw the rest of the data (which has no particular structure) into a TEXT field as JSON. In terms of Ruby vs. MySQL, MySQL can process the JSON we need in seconds and discard the rows we don't need where as just the transfer time of getting all that data into Ruby and then processing it is minutes to hours.
  3. Brad BaumannApril 24, 2008 @ 04:09 PM
    I think this will be very useful once it's tweaked a bit. I'm having some problems with the function detecting value types other than strings. For example: SELECT JSON( '{"fieldname":12345,"fieldname2":"test string"}', 'fieldname' ) Returns: "fieldname2" If you do this for "fieldname2", you'll get the right return value of "test string". I'll try to code a quick work around... unless theres something I'm missing?