3.8. Objects
Ruby is a very pure object-oriented language: all values are
objects, and there is no distinction between primitive types and object types as there are in many other
languages. In Ruby, all objects inherit from a class named
Object and share the methods defined by that class.
This section explains the common features of all objects in Ruby. It is
dense in parts, but it's required reading; the information here is
fundamental.
3.8.1. Object References
When we work with objects in Ruby, we are really working with
object references. It is not the
object itself we manipulate but a reference to it. When we assign a value to a variable, we are not copying an object "into" that variable; we are
merely storing a reference to an object into that variable. Some code
makes this clear:
Code View:
s = "Ruby" # Create a String object. Store a reference to it in s.
t = s # Copy the reference to t. s and t both refer to the same object.
t[-1] = "" # Modify the object through the reference in t.
print s # Access the modified object through s. Prints "Rub".
t = "Java" # t now refers to a different object.
print s,t # Prints "RubJava".
When you pass an object to a method in Ruby, it is an object
reference that is passed to the method. It is not the object itself, and
it is not a reference to the reference to the object. Another way to say
this is that method arguments are passed by value
rather than by reference, but that the values
passed are object references.
Because object references are passed to methods, methods can use
those references to modify the underlying object. These modifications
are then visible when the method returns.
3.8.1.1. Immediate values
We've said that all values in Ruby are objects and all objects
are manipulated by reference. In the reference implementation,
however, Fixnum and Symbol
objects are actually "immediate values" rather than references.
Neither of these classes have mutator methods, so
Fixnum and Symbol objects are
immutable, which means that there is really no way to tell that they
are manipulated by value rather than by reference.
The existence of immediate values should be considered an
implementation detail. The only practical difference between immediate
values and reference values is that immediate values cannot have
singleton methods defined on them. (Singleton methods
are explained in Section 6.1.4.)
3.8.2. Object Lifetime
The built-in Ruby classes described in this chapter have literal
syntaxes, and instances of these classes are created simply by including
their values literally in your code. Objects of other classes need to be
explicitly created, and this is most often done with a method named new:
myObject = myClass.new
new is a method of the Class
class. It allocates memory to hold the new object, then it initializes
the state of that newly allocated "empty" object by invoking its
initialize method. The arguments to new are passed
directly on to initialize. Most classes define an
initialize method to perform whatever initialization
is necessary for instances.
The new and initialize
methods provide the default technique for creating new classes, but
classes may also define other methods, known as "factory methods," that
return instances. We'll learn more about new,
initialize, and factory methods in Section 7.4.
Ruby objects never need to be explicitly deallocated, as they do
in languages like C and C++. Ruby uses a technique called
garbage collection to automatically destroy objects that are no longer needed.
An object becomes a candidate for garbage collection when it is
unreachable—when there are no remaining references to the object
except from other unreachable objects.
The fact that Ruby uses garbage collection means that Ruby
programs are less susceptible to memory leaks than programs written in
languages that require objects and memory to be explicitly deallocated
and freed. But garbage collection does not mean that memory leaks are
impossible: any code that creates long-lived references to objects that
would otherwise be short-lived can be a source of memory leaks. Consider
a hash used as a cache. If the cache is not pruned using some kind of
least-recently-used algorithm,
then cached objects will remain reachable as long as the hash itself is
reachable. If the hash is referenced through a global variable, then it
will be reachable as long as the Ruby interpreter is running.
3.8.3. Object Identity
Every object has an object identifier, a Fixnum, that you can obtain with the object_id
method. The value returned by this method is constant and unique for the
lifetime of the object. While the object is accessible, it will always
have the same ID, and no other object will share that ID.
The method id is a deprecated synonym for object_id. Ruby 1.8
issues a warning if you use it, and it has been removed in Ruby
1.9.
__id__ is a valid synonym for
object_id. It exists as a fallback, so you can access
an object's ID even if the object_id method has been
undefined or overridden.
The Object class implements the
hash method to simply return an object's ID.
3.8.4. Object Class and Object Type
There are several ways to determine the class of an object in Ruby. The simplest is
simply to ask for it:
o = "test" # This is a value
o.class # Returns an object representing the String class
If you are interested in the class hierarchy of an object, you can
ask any class what its superclass is:
o.class # String: o is a String object
o.class.superclass # Object: superclass of String is Object
o.class.superclass.superclass # nil: Object has no superclass
In Ruby 1.9, Object is no longer the true root
of the class hierarchy:
# Ruby 1.9 only
Object.superclass # BasicObject: Object has a superclass in 1.9
BasicObject.superclass # nil: BasicObject has no superclass
See Section 7.3 for more on
BasicObject.
So a particularly straightforward way to check the class of an
object is by direct comparison:
o.class == String # true if is o a String
The instance_of? method does the same thing and is a little more elegant:
o.instance_of? String # true if o is a String
Usually when we test the class of an object, we would also like to
know if the object is an instance of any subclass of that class. To test
this, use the is_a? method, or its synonym kind_of?:
x = 1 # This is the value we're working with
x.instance_of? Fixnum # true: is an instance of Fixnum
x.instance_of? Numeric # false: instance_of? doesn't check inheritance
x.is_a? Fixnum # true: x is a Fixnum
x.is_a? Integer # true: x is an Integer
x.is_a? Numeric # true: x is a Numeric
x.is_a? Comparable # true: works with mixin modules, too
x.is_a? Object # true for any value of x
The Class class defines the === operator in such a way that it
can be used in place of is_a?:
Numeric === x # true: x is_a Numeric
This idiom is unique to Ruby and is probably less readable than
using the more traditional
is_a? method.
Every object has a well-defined class in Ruby, and that class
never changes during the lifetime of the object. An object's
type, on the other hand, is more fluid. The type of
an object is related to its class, but the class is only part of an
object's type. When we talk about the type of an object, we mean the set
of behaviors that characterize the object. Another way to put it is that
the type of an object is the set of methods it can respond to. (This
definition becomes recursive because it is not just the name of the
methods that matter, but also the types of arguments that those methods
can accept.)
In Ruby programming, we often don't care about the class of an
object, we just want to know whether we can invoke some method on it.
Consider, for example, the << operator. Arrays,
strings, files, and other I/O-related classes define this as an append
operator. If we are writing a method that produces textual output, we
might write it generically to use this operator. Then our method can be
invoked with any argument that implements <<.
We don't care about the class of the argument, just that we can append
to it. We can test for this with the respond_to?
method:
o.respond_to? :"<<" # true if o has an << operator
The shortcoming of this approach is that it only checks the name
of a method, not the arguments for that method. For example,
Fixnum and Bignum implement
<< as a left-shift operator and expect the
argument to be a number instead of a string. Integer objects appear to
be "appendable" when we use a respond_to? test,
but they produce an error when our code appends a string.
There is no general solution to this problem, but an ad-hoc remedy, in
this case, is to explicitly rule out Numeric objects
with the is_a? method:
o.respond_to? :"<<" and not o.is_a? Numeric
Another example of the type-versus-class distinction is
the StringIO class (from Ruby's standard
library). StringIO enables reading from and writing
to string objects as if they were IO objects.
StringIO mimics the IO
API—StringIO objects define the same methods that
IO objects do. But StringIO is not
a subclass of IO. If you write a method that expects
a stream argument, and test the class of the argument with
is_a? IO, then your method won't work with
StringIO arguments.
Focusing on types rather than classes leads to a programming style
known in Ruby as "duck typing." We'll see duck typing examples in Chapter 7.
3.8.5. Object Equality
Ruby has a surprising number of ways to compare objects for
equality, and it is important to
understand how they work, so you know when to use each method.
3.8.5.1. The equal? method
The equal? method is defined by Object to test whether two values
refer to exactly the same object. For any two distinct objects, this
method always returns false:
a = "Ruby" # One reference to one String object
b = c = "Ruby" # Two references to another String object
a.equal?(b) # false: a and b are different objects
b.equal?(c) # true: b and c refer to the same object
By convention, subclasses never override the
equal? method.
Another way to determine if two objects are, in fact, the same
object is to check their object_id:
a.object_id == b.object_id # Works like a.equal?(b)
3.8.5.2. The == operator
The == operator is the most common way to test for equality. In the
Object class, it is simply a synonym for
equal?, and it tests whether two object references
are identical. Most classes redefine this operator to allow distinct
instances to be tested for equality:
a = "Ruby" # One String object
b = "Ruby" # A different String object with the same content
a.equal?(b) # false: a and b do not refer to the same object
a == b # true: but these two distinct objects have equal values
Note that the single equals sign in this code is the assignment
operator. It takes two equals signs to test for equality in Ruby (this
is a convention that Ruby shares with many other programming
languages).
Most standard Ruby classes define the ==
operator to implement a reasonable definition of equality. This
includes the Array and Hash
classes. Two arrays are equal according to == if
they have the same number of elements, and if their corresponding
elements are all equal according to ==. Two hashes
are == if they contain the same number of key/value
pairs, and if the keys and values are themselves equal. (Values are
compared with the == operator, but hash keys are
compared with the eql? method, described later in
this chapter.)
If you are a Java programmer, you are used to using the == operator to test if two objects are the same object, and you are used to using the equals method to test whether two distinct objects have the same value. Ruby's convention is just about the opposite of Java's. |
The Numeric classes perform simple type conversions in their
== operators, so that (for example) the
Fixnum 1 and the
Float 1.0 compare as equal. The
== operator of classes, such as
String and Array, normally
requires both operands to be of the same class. If the righthand
operand defines a to_str or
to_ary conversion function (see Section 3.8.7), then these operators invoke the
== operator defined by the righthand operand, and
let that object decide whether it is equal to the lefthand string or
array. Thus, it is possible (though not common) to define classes with
string-like or array-like comparison behavior.
!= ("not-equal") is used in Ruby to test for inequality. When Ruby sees
!=, it simply uses the ==
operator and then inverts the result. This means that a class only
needs to define the == operator to define its own
notion of equality. Ruby gives you the != operator
for free. In Ruby 1.9, however, classes can explicitly define their
own != operators.
3.8.5.3. The eql? method
The eql? method is defined by Object as a synonym for
equal?. Classes that override it typically use it
as a strict version of == that does no type
conversion. For example:
1 == 1.0 # true: Fixnum and Float objects can be ==
1.eql?(1.0) # false: but they are never eql!
The Hash class uses eql? to check whether two hash
keys are equal. If two objects are eql?, their
hash methods must also return the same value.
Typically, if you create a class and define the ==
operator, you can simply write a hash method and
define eql? to use ==.
3.8.5.4. The === operator
The === operator is commonly called the "case equality" operator and is
used to test whether the target value of a case
statement matches any of the when clauses of that
statement. (The case statement is a multiway branch
and is explained in Chapter 5.)
Object defines a default
=== operator so that it invokes the
== operator. For many classes, therefore, case
equality is the same as == equality. But certain
key classes define === differently, and in these
cases it is more of a membership or matching operator.
Range defines === to test
whether a value falls within the range. Regexp
defines === to test whether a string matches the
regular expression. And Class defines
=== to test whether an object is an instance of
that class. In Ruby 1.9, Symbol defines
=== to return true if the
righthand operand is the same symbol as the left or if it is a string
holding the same text. Examples:
(1..10) === 5 # true: 5 is in the range 1..10
/\d+/ === "123" # true: the string matches the regular expression
String === "s" # true: "s" is an instance of the class String
:s === "s" # true in Ruby 1.9
It is uncommon to see the === operator used
explicitly like this. More commonly, its use is simply implicit in a
case statement.
3.8.5.5. The =~ operator
The =~ operator is defined by String and
Regexp (and Symbol in Ruby 1.9)
to perform pattern matching, and it isn't really an
equality operator at all. But it does have an equals sign in it, so it
is listed here for completeness. Object defines a
no-op version of =~ that always returns
false. You can define this operator in your own
class, if that class defines some kind of pattern-matching operation
or has a notion of approximate equality, for example.
!~ is defined as the inverse of
=~. It is definable in Ruby 1.9 but not in Ruby
1.8.
3.8.6. Object Order
Practically every class can define a useful
== method for testing its instances for equality.
Some classes can also define an ordering. That is: for any two instances
of such a class, the two instances must be equal, or one instance must
be "less than" the other. Numbers are the most obvious classes for which
such an ordering is defined. Strings are also ordered, according to the
numeric ordering of the character codes that comprise the strings. (With
the ASCII text, this is a rough kind of case-sensitive alphabetical
order.) If a class defines an ordering, then instances of the class can
be compared and sorted.
In Ruby, classes define an ordering by implementing the <=> operator. This operator
should return –1 if its left operand is less than its
right operand, 0 if the two operands are equal, and
1 if the left operand is greater than the right
operand. If the two operands cannot be meaningfully compared (if the
right operand is of a different class, for example), then the operator should return
nil:
1 <=> 5 # -1
5 <=> 5 # 0
9 <=> 5 # 1
"1" <=> 5 # nil: integers and strings are not comparable
The <=> operator is all that is needed to
compare values. But it isn't particularly intuitive. So classes that
define this operator typically also include the
Comparable module as a mixin. (Modules and mixins are covered in Section 7.5.2.) The Comparable mixin defines
the following operators in terms of <=>:
< | Less than |
<= | Less than or equal |
== | Equal |
>= | Greater than or equal |
> | Greater than |
Comparable does not define the
!= operator; Ruby automatically defines that operator as the negation
of the == operator. In addition to these comparison
operators, Comparable also
defines a useful comparison method named
between?:
1.between?(0,10) # true: 0 <= 1 <= 10
If the <=> operator returns
nil, all the comparison operators derived from it return
false. The special Float value
NaN is an example:
nan = 0.0/0.0; # zero divided by zero is not-a-number
nan < 0 # false: it is not less than zero
nan > 0 # false: it is not greater than zero
nan == 0 # false: it is not equal to zero
nan == nan # false: it is not even equal to itself!
nan.equal?(nan) # this is true, of course
Note that defining <=> and including the
Comparable module defines a ==
operator for your class. Some classes define their own
== operator, typically when they can implement this
more efficiently than an equality test based on
<=>. It is possible to define classes that
implement different notions of equality in their ==
and <=> operators. A class might do
case-sensitive string comparisons for the ==
operator, for example, but then do case-insensitive comparisons for
<=>, so that instances of the class would sort
more naturally. In general, though, it is best if
<=> returns 0 if and only if
== returns true.
3.8.7. Object Conversion
Many Ruby classes define methods that return a representation
of the object as a value of a different class. The
to_s method, for obtaining a String representation
of an object, is probably the most commonly implemented and best known
of these methods. The subsections that follow describe various
categories of conversions.
3.8.7.1. Explicit conversions
Classes define explicit conversion methods for use by application code
that needs to convert a value to another representation. The most
common methods in this category are to_s,
to_i, to_f, and
to_a to convert to String,
Integer, Float, and
Array, respectively.
Built-in methods do not typically invoke these methods for you.
If you invoke a method that expects a String and
pass an object of some other kind, that method is not expected to
convert the argument with to_s. (Values
interpolated into double-quoted strings, however, are automatically
converted with to_s.)
to_s is easily the most important of the
conversion methods because string representations of objects are so
commonly used in user interfaces. An important alternative to
to_s is the inspect method.
to_s is generally intended to return a
human-readable representation of the object, suitable for end users.
inspect, on the other hand, is intended for
debugging use, and should return a representation that is helpful to
Ruby developers. The default inspect method,
inherited from Object, simply calls
to_s.
3.8.7.2. Implicit conversions
Sometimes a class has strong characteristics of some other class.
The Ruby Exception class represents an error or
unexpected condition in a program and encapsulates an error message.
In Ruby 1.8, Exception objects are not
merely convertible to strings; they are string-like objects and can be
treated as if they were strings in many contexts. For example:
# Ruby 1.8 only
e = Exception.new("not really an exception")
msg = "Error: " + e # String concatenation with an Exception
Because Exception objects are string-like, they can be used with the string
concatenation operator. This does not work with most other Ruby
classes. The reason that Exception objects can behave like
String objects is that, in Ruby 1.8,
Exception implements the implicit conversion method
to_str, and the + operator
defined by String invokes this method on its
righthand operand.
Other implicit conversion methods are to_int
for objects that want to be integer-like, to_ary for objects that want to be
array-like, and to_hash for objects that want to be
hash-like. Unfortunately, the circumstances under which these implicit
conversion methods are called are not well documented. Among the
built-in classes, these implicit conversion methods are not commonly
implemented, either.
We noted earlier in passing that the ==
operator can perform a weak kind of type conversion when testing for
equality. The == operators defined by
String, Array, and
Hash check to see if the righthand operand is of
the same class as the lefthand operand. If so, they compare them. If
not, they check to see if the righthand operand has a
to_str, to_ary, or
to_hash method. They don't invoke this method, but
if it exists, they invoke the == method of the
righthand operand and allow it to decide whether it is equal to the
lefthand operand.
In Ruby 1.9, the built-in classes String,
Array, Hash,
Regexp, and IO all define a
class method named try_convert. These methods
convert their argument if it defines an appropriate implicit
conversion method, or they return nil otherwise.
Array.try_convert(o) returns
o.to_ary if o defines that
method; otherwise, it returns nil. These try_convert methods are convenient if you want to
write methods that allow implicit conversions on their
arguments.
3.8.7.3. Conversion functions
The Kernel module defines four conversion methods that behave as global
conversion functions. These functions—Array,
Float, Integer, and
String—have the same names as the classes that they
convert to, and they are unusual in that they begin with a capital
letter.
The Array function attempts to convert its argument to an array by calling
to_ary. If that method is not defined or returns
nil, it tries the to_a method.
If to_a is not defined or returns
nil, the Array function simply
returns a new array containing the argument as its single
element.
The Float function converts Numeric arguments to
Float objects directly. For any
non-Numeric value, it calls the
to_f method.
The Integer function converts its argument to a Fixnum or
Bignum. If the argument is a
Numeric value, it is converted directly.
Floating-point values are truncated rather than rounded. If the
argument is a string, it looks for a radix indicator (a leading
0 for octal, 0x for hexadecimal,
or 0b for binary) and converts the string
accordingly. Unlike String.to_i it does not allow
nonnumeric trailing characters. For any other kind of argument, the
Integer function first attempts conversion with
to_int and then with
to_i.
Finally, the String function converts its
argument to a string simply by calling its to_s
method.
3.8.7.4. Arithmetic operator type coercions
Numeric types define a conversion method named coerce. The intent of this
method is to convert the argument to the same type as the numeric
object on which the method is invoked, or to convert both objects to
some more general compatible type. The coerce
method always returns an array that holds two numeric values of the
same type. The first element of the array is the converted value of
the argument to coerce. The second element of the
returned array is the value (converted, if necessary) on which
coerce was invoked:
1.1.coerce(1) # [1.0, 1.1]: coerce Fixnum to Float
require "rational" # Use Rational numbers
r = Rational(1,3) # One third as a Rational number
r.coerce(2) # [Rational(2,1), Rational(1,3)]: Fixnum to Rational
The coerce method is used by the arithmetic
operators. The + operator defined by Fixnum doesn't know about
Rational numbers, for example, and if its righthand
operand is a Rational value, it doesn't know how to
add it. coerce provides the solution. Numeric
operators are written so that if they don't know the type of the
righthand operand, they invoke the coerce method of
the righthand operand, passing the lefthand operand as an argument.
Returning to our example of adding a Fixnum and a
Rational, the coerce method of
Rational returns an array of two
Rational values. Now the +
operator defined by Fixnum can simply invoke
+ on the values in the array.
3.8.7.5. Boolean type conversions
Boolean values deserve a special mention in the context of type conversion. Ruby is very strict with its
Boolean values: true and false
have to_s methods, which return "true" and "false" but define no other
conversion methods. And there is no to_b method to
convert other values to Booleans.
In some languages, false is the same thing as
0, or can be converted to and from
0. In Ruby, the values true and
false are their own distinct objects, and there are
no implicit conversions that convert other values to
true or false. This is only half
the story, however. Ruby's Boolean operators and its conditional and
looping constructs that use Boolean expressions can work with values
other than true and false. The
rule is simple: in Boolean expressions, any value other than
false or nil behaves like (but
is not converted to)
true. nil, on the other hand
behaves like false.
Suppose you want to test whether the variable
x is nil or not. In some
languages, you must explicitly write a comparison expression that
evaluates to true or
false:
if x != nil # Expression "x != nil" returns true or false to the if
puts x # Print x if it is defined
end
This code works in Ruby, but it is more common simply to take
advantage of the fact that all values other than
nil and false behave like
true:
if x # If x is non-nil
puts x # Then print it
end
It is important to remember that values like
0, 0.0, and the empty string
"" behave like true in Ruby,
which is surprising if you are used to languages like C or
JavaScript.
3.8.8. Copying Objects
The Object class defines two closely
related methods for copying objects. Both clone and
dup return a shallow copy of the object on which they
are invoked. If the copied object includes one internal state that
refers to other objects, only the object references are copied, not the
referenced objects themselves.
If the object being copied defines an
initialize_copy method, then clone and dup
simply allocate a new, empty instance of the class and invoke the
initialize_copy method on this empty instance. The
object to be copied is passed as an argument, and this "copy
constructor" can initialize the copy however it desires. For example,
the initialize_copy method could
recursively copy the internal data of an object so that the resulting
object is not a simple shallow copy of the original.
Classes can also override the clone and
dup methods directly to produce any kind of copy they
desire.
There are two important differences between the
clone and dup methods defined by
Object. First, clone copies both
the frozen and tainted state (defined shortly) of an object, whereas
dup only copies the tainted state; calling
dup on a frozen object returns an unfrozen copy.
Second, clone copies any singleton methods of the
object, whereas dup does not.
3.8.9. Marshaling Objects
You can save the state of an object by passing it to the class method
Marshal.dump. If you pass an I/O stream object as the second argument,
Marshal.dump writes the state of the object (and,
recursively, any objects it references) to that stream. Otherwise, it
simply returns the encoded state as a binary string.
To restore a marshaled object, pass a string or an I/O stream
containing the object to Marshal.load.
Marshaling an object is a very simple way to save its state for
later use, and these methods can be used to provide an automatic file
format for Ruby programs. Note, however, that the binary format used by
Marshal.dump and Marshal.load is
version-dependent, and newer versions of Ruby are not guaranteed to be
able to read marshaled objects written by older versions of Ruby.
Another use for Marshal.dump and
Marshal.load is to create deep copies of objects:
def deepcopy(o)
Marshal.load(Marshal.dump(o))
end
Note that files and I/O streams, as well as
Method and Binding objects, are
too dynamic to be marshaled; there would be no reliable way to restore
their state.
YAML ("YAML Ain't Markup Language") is a commonly used alternative to the Marshal module that dumps objects to (and
loads objects from) a human-readable text format. It is in the standard
library, and you must require 'yaml' to use
it.
3.8.10. Freezing Objects
Any object may be frozen by calling its freeze method. A frozen object
becomes immutable—none of its
internal state may be changed, and an attempt to call any of its mutator
methods fails:
s = "ice" # Strings are mutable objects
s.freeze # Make this string immutable
s.frozen? # true: it has been frozen
s.upcase! # TypeError: can't modify frozen string
s[0] = "ni" # TypeError: can't modify frozen string
Freezing a class object prevents the addition of any methods to
that class.
You can check whether an object is frozen with the
frozen? method. Once frozen, there is no way
to "thaw" an object. If you copy a frozen object with
clone, the copy will also be frozen. If you copy a
frozen object with dup, however, the copy will not be
frozen.
3.8.11. Tainting Objects
Web applications must often keep track of data derived from untrusted user
input to avoid SQL injection attacks and similar security risks. Ruby
provides a simple solution to this problem: any object may be marked as
tainted by calling its taint method. Once an object
is tainted, any objects derived from it will also be tainted. The taint
of an object can be tested with the tainted? method:
s = "untrusted" # Objects are normally untainted
s.taint # Mark this untrusted object as tainted
s.tainted? # true: it is tainted
s.upcase.tainted? # true: derived objects are tainted
s[3,4].tainted? # true: substrings are tainted
User input—such as command-line arguments, environment variables,
and strings read with gets—are automatically
tainted.
Copies of tainted objects made with clone and
dup remain tainted. A tainted object may be untainted
with the untaint method. You should only do this, of
course, if you have examined the object and are convinced that it
presents no security risks.
The object tainting mechanism of Ruby is most powerful when used
with the global variable $SAFE. When this variable is
set to a value greater than zero, Ruby restricts various built-in
methods so that they will not work with tainted data. See Chapter 10 for further details on the $SAFE
variable.