Wednesday, November 18, 2009

Recipe 13.1. Serializing Data with YAML










Recipe 13.1. Serializing Data with YAML





Problem


You want to serialize a data structure and use it later. You may want to send the data structure to a file, then load it into a program written in a different programming language.




Solution


The simplest way is to use the built-in
yaml
library. When you require
yaml
, all Ruby objects sprout to_
yaml
methods that convert them to the
YAML serialization format. A YAML string is human-readable, and it intuitively corresponds to the object from which it was derived:



require 'yaml'

10.to_yaml # => "--- 10\n"
'ten'.to_yaml # => "--- ten\n"
'10'.to_yaml # => "--- \"10\"\n"



Arrays are represented as bulleted lists:



puts %w{Brush up your Shakespeare}.to_yaml
# --
# - Brush
# - up
# - your
# - Shakespeare



Hashes are represented as colon-separated key-value pairs:



puts ({ 'star' => 'hydrogen', 'gold bar' => 'gold' }).to_yaml
# --
# star: hydrogen
# gold bar: gold



More complex Ruby objects are represented in terms of their classes and member variables:



require 'set'
puts Set.new([1, 2, 3]).to_yaml
# --- !ruby/object:Set
# hash:
# 1: true
# 2: true
# 3: true



You can dump a data structure to a file with YAML.dump, and load it back with YAML.load:



users = [{:name => 'Bob', :permissions => ['Read']},
{:name => 'Alice', :permissions => ['Read', 'Write']}]

# Serialize
open('users', 'w') { |f| YAML.dump(users, f) }

# And deserialize
users2 = open("users") { |f| YAML.load(f) }
# => [{:permissions=>["Read"], :name=>"Bob"},
# {:permissions=>["Read", "Write"], :name=>"Alice"}]




YAML implementations are available for Perl, Python, Java, PHP, JavaScript, and OCaml, so if you stick to the "standard" data types (strings, arrays, and so on), the serialized file will be portable across programming languages.




Discussion


If you've ever used Python's pickle module or serialized a Java object, you know how convenient it is to be able to dump an object to disk and load it back later. You don't have to define a custom data format or write an XML generator: you just shove the object into a file or a database, and read it back later. The only downside is that the serialized file is usually a binary mess that can only be understood by the serialization library.



YAML is a human-readable and somewhat cross-language serialization standard. Its format describes the simple data structures common to all modern programming languages. YAML can serialize and deserialize any combination of strings, booleans, numbers, dates and times, arrays (possibly nested arrays), and hashes (again, possibly nested ones).


You can also use YAML to serialize Ruby-specific objects: symbols, ranges, and regular expressions. Indeed, you can use YAML to serialize instances of custom classes: YAML serializes the class of the object and the values of its instance variables. There's no guarantee, though, that other programming languages will understand what you mean.[3]

[3] Ruby can also read YAML descriptions of Perl's regular expressions.


Not only is YAML human-readable, it's human-writable. You can write YAML files in a text editor and load them into Ruby as objects. If you're having trouble with the YAML representation of a particular data structure, your best bet is to define a simple version of that data structure in an irb session, dump it to YAML, and work from there.



quiz_question = ['What color is Raedon?', ['Blue', 'Albino', '*Yellow']]
puts quiz_question.to_yaml
# --
# - What color is Raedon?
# - - Blue
# - Albino
# - "*Yellow"



Before you get drunk with power, you should know that YAML shares the limitations of other serialization schemes. Most obviously, you can only deserialize objects in an environment like the one in which you serialized them. Suppose you convert a Set object to YAML in one Ruby session:



require 'yaml'
require 'set'
set = Set.new([1, 2, 3])
open("set", "w") { |f|
YAML.dump(set, f) }



In another Ruby session, you might try to convert the YAML back into a Set, without first requiring the set library:



# Bad code -- don't try this!
require 'yaml'
set = open("set") { |f| YAML.load(f) }
# => #<YAML::Object:0xb7bd8620 @ivars={"hash"=>{1=>true, 2=>true, 3=>true}},
# @class="Set">



Instead of a Set, you've got an unresolved object of class YAML::Object. The set has been loaded from the file and deserialized, but Ruby can't resolve its class name.


YAML can only serialize data; it can't serialize Ruby code or system resources (such as filehandles or open sockets). This means some objects can't be fully converted to YAML. The following code successfully serializes and deserializes a File object, but the deserialized File isn't open and doesn't point to anything in particular:



handle = open('a_file', 'w')
handle.path
# => "a_file"

handle2 = YAML.load(YAML.dump(handle))
# => #<File:0xb7bd9a58>
handle2.path
# IOError: uninitialized stream



The essence of the File objectits handle to a file on disk, granted by the operating systemhas been lost.


Objects that contain Ruby code will lose their code when dumped to YAML. This means that Proc and Binding objects will turn up empty. Objects with singleton methods will be dumped without them. Classes can't be dumped to YAML at all.


But these are all edge cases. Most data structures, even complex ones, can be serialized to
YAML and stay readable to boot.




See Also


  • Ruby standard library documentation for the yaml library

  • The YAML web page (http://www.yaml.org/)

  • Recipe 12.12, "Reading and Writing Configuration Files"

  • An episode of the Ruby Quiz focused on creating a serializable Proc object (http://www.rubyquiz.com/quiz38.html)













No comments: