Friday, October 23, 2009

Chapter 1. Strings










Chapter 1. Strings


Ruby is a programmer-friendly language. If you are already familiar with object oriented programming, Ruby should quickly become second nature. If you've struggled with learning object-oriented programming or are not familiar with it, Ruby should make more sense to you than other object-oriented languages because Ruby's methods are consistently named, concise, and generally act the way you expect.


Throughout this book, we demonstrate concepts through interactive Ruby sessions. Strings are a good place to start because not only are they a useful data type, they're easy to create and use. They provide a simple
introduction to Ruby, a point of comparison between Ruby and other languages you might know, and an approachable way to introduce important Ruby concepts like duck typing (see Recipe 1.12), open classes (demonstrated in Recipe 1.10), symbols (Recipe 1.7), and even Ruby gems (Recipe 1.20).


If you use Mac OS X or a Unix environment with Ruby installed, go to your command line right now and type irb. If you're using Windows, you can download and install the One-Click Installer from http://rubyforge.org/projects/rubyinstaller/, and do the same from a command prompt (you can also run the fxri program, if that's more comfortable for you). You've now entered an interactive Ruby shell, and you can follow along with the code samples in most of this book's recipes.


Strings in Ruby are much like strings in other dynamic languages like Perl, Python and PHP. They're not too much different from strings in Java and C. Ruby strings are dynamic, mutable, and flexible. Get started with strings by typing this line into your interactive Ruby session:



string = "My first string"



You should see some output that looks like this:



=> "My first string"



You typed in a Ruby expression that created a string "My first string", and assigned it to the variable string. The value of that expression is just the new value of string, which is what your interactive Ruby session printed out on the right side of the arrow. Throughout this book, we'll represent this kind of interaction in the following form:[1]

[1] Yes, this
was covered in the Preface, but not everyone reads the Preface.



string = "My first string" # => "My first string"



In Ruby, everything that can be assigned to a variable is an object. Here, the variable string points to an object of class String. That class defines over a hundred built-in
methods: named pieces of code that examine and manipulate the string. We'll explore some of these throughout the chapter, and indeed the entire book. Let's try out one now: String#length, which returns the number of bytes in a string. Here's a Ruby method call:



string.length # => 15



Many programming languages make you put
parentheses after a method call:



string.length() # => 15



In Ruby, parentheses are almost always optional. They're especially optional in this case, since we're not passing any
arguments into String#length. If you're passing arguments into a method, it's often more readable to enclose the argument list in parentheses:



string.count 'i' # => 2 # "i" occurs twice.
string.count('i') # => 2



The return value of a method call is itself an object. In the case of String#length, the return value is the number 15, an instance of the Fixnum class. We can call a method on this object as well:



string.length.next # => 16



Let's take a more complicated case: a string that contains non-ASCII characters. This string contains the French phrase "il était une fois," encoded as UTF-8:[2]

[2] "\xc3\xa9" is a Ruby string representation of the UTF-8 encoding of the Unicode character é.



french_string = "il \xc3\xa9tait une fois" # => "il \303\251tait une fois"



Many programming languages (notably Java) treat a string as a series of characters. Ruby treats a string as a series of bytes. The French string contains 14 letters and 3 spaces, so you might think Ruby would say the length of the string is 17. But one of the letters (the e with acute accent) is represented as two bytes, and that's what Ruby counts:



french_string.length # => 18



For more on handling different encodings, see Recipe 1.14 and Recipe 11.12. For more on this specific problem, see Recipe 1.8


You can represent special characters in strings (like the binary data in the French string) with string escaping. Ruby does different types of string escaping depending on how you create the string. When you enclose a string in double
quotes, you can encode binary data into the string (as in the French example above), and you can encode newlines with the code "\n", as in other programming languages:



puts "This string\ncontains a newline"
# This string
# contains a newline



When you enclose a string in single quotes, the only
special codes you can use are "\'" to get a literal single quote, and "\\" to get a literal backslash:



puts 'it may look like this string contains a newline\nbut it doesn\'t'
# it may look like this string contains a newline\nbut it doesn't

puts 'Here is a backslash: \\'
# Here is a backslash: \



This is covered in more detail in Recipe 1.5. Also see Recipes 1.2 and 1.3 for more examples of the more spectacular substitutions double-quoted strings can do.


Another useful way to initialize strings is with the "
here documents" style:



long_string = <<EOF
Here is a long string
With many paragraphs
EOF
# => "Here is a long string\nWith many paragraphs\n"

puts long_string
# Here is a long string
# With many paragraphs



Like most of Ruby's built-in classes, Ruby's strings define the same functionality in several different ways, so that you can use the idiom you prefer. Say you want to get a substring of a larger string (as in Recipe 1.13). If you're an object-oriented programming purist, you can use the
String#slice
method:



string # => "My first string"
string.slice(3, 5) # => "first"



But if you're coming from C, and you think of a string as an array of bytes, Ruby can accommodate you. Selecting a single byte from a string returns that byte as a number.



string.chr + string.chr + string.chr + string.chr + string.chr
# => "first"



And if you come from Python, and you like that language's slice notation, you can just as easily chop up the string that way:



string[3, 5] # => "first"



Unlike in most programming languages, Ruby strings are mutable: you can change them after they are declared. Below we see the difference between the methods String#upcase and String#upcase!:



string.upcase # => "MY FIRST STRING"
string # => "My first string"
string.upcase! # => "MY FIRST STRING"
string # => "MY FIRST STRING"



This is one of Ruby's syntactical conventions. "Dangerous" methods (generally those that modify their object in place) usually have an exclamation mark at the end of their name. Another syntactical convention is that predicates, methods that return a true/false value, have a question mark at the end of their name (as in some varieties of Lisp):



string.empty? # => false
string.include? 'MY' # => true



This use of English
punctuation to provide the programmer with information is an example of Matz's design philosophy: that Ruby is a language primarily for humans to read and write, and secondarily for computers to interpret.


An interactive Ruby session is an indispensable tool for learning and experimenting with these methods. Again, we encourage you to type the sample code shown in these recipes into an irb or fxri session, and try to build upon the examples as your knowledge of Ruby grows.


Here are some extra resources for using strings in Ruby:


  • You can get information about any built-in Ruby method with the ri command; for instance, to see more about the String#upcase! method, issue the command ri "String#upcase!" from the command line.

  • "why the lucky stiff" has written an excellent introduction to installing Ruby, and using irb and ri: http://poignantguide.net/ruby/expansion-pak-1.html

  • For more information about the design philosophy behind Ruby, read an interview with Yukihiro "Matz" Matsumoto, creator of Ruby: http://www.artima.com/intv/ruby.html












No comments: