Programming Ruby

The Pragmatic Programmer's Guide

Previous < Contents ^
Next >

Ruby.new



When we originally wrote this book, we had a grand plan (we were younger then). We wanted to document the language from the top down, starting with classes and objects, and ending with the nitty-gritty syntax details. It seemed like a good idea at the time. After all, most everything in Ruby is an object, so it made sense to talk about objects first.

Or so we thought.

Unfortunately, it turns out to be difficult to describe a language that way. If you haven't covered strings, if statements, assignments, and other details, it's difficult to write examples of classes. Throughout our top-down description, we kept coming across low-level details we needed to cover so that the example code would make sense.

So, we came up with another grand plan (they don't call us pragmatic for nothing). We'd still describe Ruby starting at the top. But before we did that, we'd add a short chapter that described all the common language features used in the examples along with the special vocabulary used in Ruby, a kind of minitutorial to bootstrap us into the rest of the book.

Ruby Is an Object-Oriented Language

Let's say it again. Ruby is a genuine object-oriented language. Everything you manipulate is an object, and the results of those manipulations are themselves objects. However, many languages make the same claim, and they often have a different interpretation of what object-oriented means and a different terminology for the concepts they employ.

So, before we get too far into the details, let's briefly look at the terms and notation that we'll be using.

When you write object-oriented code, you're normally looking to model concepts from the real world in your code. Typically during this modeling process you'll discover categories of things that need to be represented in code. In a jukebox, the concept of a ``song'' might be such a category. In Ruby, you'd define a class to represent each of these entities. A class is a combination of state (for example, the name of the song) and methods that use that state (perhaps a method to play the song).

Once you have these classes, you'll typically want to create a number of instances of each. For the jukebox system containing a class called Song, you'd have separate instances for popular hits such as ``Ruby Tuesday,'' ``Enveloped in Python,'' ``String of Pearls,'' ``Small talk,'' and so on. The word object is used interchangeably with class instance (and being lazy typists, we'll probably be using the word ``object'' more frequently).

In Ruby, these objects are created by calling a constructor, a special method associated with a class. The standard constructor is called new.

song1 = Song.new("Ruby Tuesday")
song2 = Song.new("Enveloped in Python")
# and so on

These instances are both derived from the same class, but they have unique characteristics. First, every object has a unique object identifier (abbreviated as object id). Second, you can define instance variables, variables with values that are unique to each instance. These instance variables hold an object's state. Each of our songs, for example, will probably have an instance variable that holds the song title.

Within each class, you can define instance methods. Each method is a chunk of functionality which may be called from within the class and (depending on accessibility constraints) from outside. These instance methods in turn have access to the object's instance variables, and hence to the object's state.

Methods are invoked by sending a message to an object. The message contains the method's name, along with any parameters the method may need.[This idea of expressing method calls in the form of messages comes from Smalltalk.] When an object receives a message, it looks into its own class for a corresponding method. If found, that method is executed. If the method isn't found, ... well, we'll get to that later.

This business of methods and messages may sound complicated, but in practice it is very natural. Let's look at some method calls. (Remember that the arrows in the code examples show the values returned by the corresponding expressions.)

"gin joint".length » 9
"Rick".index("c") » 2
-1942.abs » 1942
sam.play(aSong) » "duh dum, da dum de dum ..."

Here, the thing before the period is called the receiver, and the name after the period is the method to be invoked. The first example asks a string for its length, and the second asks a different string to find the index of the letter ``c.'' The third line has a number calculate its absolute value. Finally, we ask Sam to play us a song.

It's worth noting here a major difference between Ruby and most other languages. In (say) Java, you'd find the absolute value of some number by calling a separate function and passing in that number. You might write

number = Math.abs(number)     // Java code

In Ruby, the ability to determine an absolute value is built into numbers---they take care of the details internally. You simply send the message abs to a number object and let it do the work.

number = number.abs

The same applies to all Ruby objects: in C you'd write strlen(name), while in Ruby it's name.length, and so on. This is part of what we mean when we say that Ruby is a genuine OO language.

Some Basic Ruby

Not many people like to read heaps of boring syntax rules when they're picking up a new language. So we're going to cheat. In this section we'll hit some of the highlights, the stuff you'll just have to know if you're going to write Ruby programs. Later, in Chapter 18, which begins on page 199, we'll go into all the gory details.

Let's start off with a simple Ruby program. We'll write a method that returns a string, adding to that string a person's name. We'll then invoke that method a couple of times.

def sayGoodnight(name)
  result = "Goodnight, " + name
  return result
end

# Time for bed... puts sayGoodnight("John-Boy") puts sayGoodnight("Mary-Ellen")

First, some general observations. Ruby syntax is clean. You don't need semicolons at the ends of statements as long as you put each statement on a separate line. Ruby comments start with a # character and run to the end of the line. Code layout is pretty much up to you; indentation is not significant.

Methods are defined with the keyword def, followed by the method name (in this case, ``sayGoodnight'') and the method's parameters between parentheses. Ruby doesn't use braces to delimit the bodies of compound statements and definitions. Instead, you simply finish the body with the keyword end. Our method's body is pretty simple. The first line concatenates the literal string ``Goodnight,[visible space]'' to the parameter name and assigns the result to the local variable result. The next line returns that result to the caller. Note that we didn't have to declare the variable result; it sprang into existence when we assigned to it.

Having defined the method, we call it twice. In both cases we pass the result to the method puts, which simply outputs its argument followed by a newline.

Goodnight, John-Boy
Goodnight, Mary-Ellen

The line ``puts sayGoodnight("John-Boy")'' contains two method calls, one to sayGoodnight and the other to puts. Why does one call have its arguments in parentheses while the other doesn't? In this case it's purely a matter of taste. The following lines are all equivalent.

puts sayGoodnight "John-Boy"
puts sayGoodnight("John-Boy")
puts(sayGoodnight "John-Boy")
puts(sayGoodnight("John-Boy"))

However, life isn't always that simple, and precedence rules can make it difficult to know which argument goes with which method invocation, so we recommend using parentheses in all but the simplest cases.

This example also shows some Ruby string objects. There are many ways to create a string object, but probably the most common is to use string literals: sequences of characters between single or double quotation marks. The difference between the two forms is the amount of processing Ruby does on the string while constructing the literal. In the single-quoted case, Ruby does very little. With a few exceptions, what you type into the string literal becomes the string's value.

In the double-quoted case, Ruby does more work. First, it looks for substitutions---sequences that start with a backslash character---and replaces them with some binary value. The most common of these is ``\n'', which is replaced with a newline character. When a string containing a newline is output, the ``\n'' forces a line break.

puts "And Goodnight,\nGrandma"
produces:
And Goodnight,
Grandma

The second thing that Ruby does with double-quoted strings is expression interpolation. Within the string, the sequence #{ expression } is replaced by the value of expression. We could use this to rewrite our previous method.

def sayGoodnight(name)
  result = "Goodnight, #{name}"
  return result
end

When Ruby constructs this string object, it looks at the current value of name and substitutes it into the string. Arbitrarily complex expressions are allowed in the #{...} construct. As a shortcut, you don't need to supply the braces when the expression is simply a global, instance, or class variable. For more information on strings, as well as on the other Ruby standard types, see Chapter 5, which begins on page 47.

Finally, we could simplify this method some more. The value returned by a Ruby method is the value of the last expression evaluated, so we can get rid of the return statement altogether.

def sayGoodnight(name)
  "Goodnight, #{name}"
end

We promised that this section would be brief. We've got just one more topic to cover: Ruby names. For brevity, we'll be using some terms (such as class variable) that we aren't going to define here. However, by talking about the rules now, you'll be ahead of the game when we actually come to discuss instance variables and the like later.

Ruby uses a convention to help it distinguish the usage of a name: the first characters of a name indicate how the name is used. Local variables, method parameters, and method names should all start with a lowercase letter or with an underscore. Global variables are prefixed with a dollar sign ($), while instance variables begin with an ``at'' sign (@). Class variables start with two ``at'' signs (@@). Finally, class names, module names, and constants should start with an uppercase letter. Samples of different names are given in Table 2.1 on page 10.

Following this initial character, a name can be any combination of letters, digits, and underscores (with the proviso that the character following an @ sign may not be a digit).

Example variable and class names
Variables Constants and
Local Global Instance Class Class Names
name $debug @name @@total PI
fishAndChips $CUSTOMER @point_1 @@symtab FeetPerMile
x_axis $_ @X @@N String
thx1138 $plan9 @_ @@x_pos MyClass
_26 $Global @plan9 @@SINGLE Jazz_Song

Arrays and Hashes

Ruby's arrays and hashes are indexed collections. Both store collections of objects, accessible using a key. With arrays, the key is an integer, whereas hashes support any object as a key. Both arrays and hashes grow as needed to hold new elements. It's more efficient to access array elements, but hashes provide more flexibility. Any particular array or hash can hold objects of differing types; you can have an array containing an integer, a string, and a floating point number, as we'll see in a minute.

You can create and initialize a new array using an array literal---a set of elements between square brackets. Given an array object, you can access individual elements by supplying an index between square brackets, as the next example shows.

a = [ 1, 'cat', 3.14 ]   # array with three elements
# access the first element
a[0] » 1
# set the third element
a[2] = nil
# dump out the array
a » [1, "cat", nil]

You can create empty arrays either by using an array literal with no elements or by using the array object's constructor, Array.new .

empty1 = []
empty2 = Array.new

Sometimes creating arrays of words can be a pain, what with all the quotes and commas. Fortunately, there's a shortcut: %w does just what we want.

a = %w{ ant bee cat dog elk }
a[0] » "ant"
a[3] » "dog"

Ruby hashes are similar to arrays. A hash literal uses braces rather than square brackets. The literal must supply two objects for every entry: one for the key, the other for the value.

For example, you might want to map musical instruments to their orchestral sections. You could do this with a hash.

instSection = {
  'cello'     => 'string',
  'clarinet'  => 'woodwind',
  'drum'      => 'percussion',
  'oboe'      => 'woodwind',
  'trumpet'   => 'brass',
  'violin'    => 'string'
}

Hashes are indexed using the same square bracket notation as arrays.

instSection['oboe'] » "woodwind"
instSection['cello'] » "string"
instSection['bassoon'] » nil

As the last example shows, a hash by default returns nil when indexed by a key it doesn't contain. Normally this is convenient, as nil means false when used in conditional expressions. Sometimes you'll want to change this default. For example, if you're using a hash to count the number of times each key occurs, it's convenient to have the default value be zero. This is easily done by specifying a default value when you create a new, empty hash.

histogram = Hash.new(0)
histogram['key1'] » 0
histogram['key1'] = histogram['key1'] + 1
histogram['key1'] » 1

Array and hash objects have lots of useful methods: see the discussion starting on page 33, and the reference sections starting on pages 278 and 317, for details.

Control Structures

Ruby has all the usual control structures, such as if statements and while loops. Java, C, and Perl programmers may well get caught by the lack of braces around the bodies of these statements. Instead, Ruby uses the keyword end to signify the end of a body.

if count > 10
  puts "Try again"
elsif tries == 3
  puts "You lose"
else
  puts "Enter a number"
end

Similarly, while statements are terminated with end.

while weight < 100 and numPallets <= 30
  pallet = nextPallet()
  weight += pallet.weight
  numPallets += 1
end

Ruby statement modifiers are a useful shortcut if the body of an if or while statement is just a single expression. Simply write the expression, followed by if or while and the condition. For example, here's a simple if statement.

if radiation > 3000
  puts "Danger, Will Robinson"
end

Here it is again, rewritten using a statement modifier.

puts "Danger, Will Robinson" if radiation > 3000

Similarly, a while loop such as

while square < 1000
   square = square*square
end

becomes the more concise

square = square*square  while square < 1000

These statement modifiers should seem familiar to Perl programmers.

Regular Expressions

Most of Ruby's built-in types will be familiar to all programmers. A majority of languages have strings, integers, floats, arrays, and so on. However, until Ruby came along, regular expression support was generally built into only the so-called scripting languages, such as Perl, Python, and awk. This is a shame: regular expressions, although cryptic, are a powerful tool for working with text.

Entire books have been written about regular expressions (for example, Mastering Regular Expressions ), so we won't try to cover everything in just a short section. Instead, we'll look at just a few examples of regular expressions in action. You'll find full coverage of regular expressions starting on page 56.

A regular expression is simply a way of specifying a pattern of characters to be matched in a string. In Ruby, you typically create a regular expression by writing a pattern between slash characters (/pattern/). And, Ruby being Ruby, regular expressions are of course objects and can be manipulated as such.

For example, you could write a pattern that matches a string containing the text ``Perl'' or the text ``Python'' using the following regular expression.

/Perl|Python/

The forward slashes delimit the pattern, which consists of the two things we're matching, separated by a pipe character (``|''). You can use parentheses within patterns, just as you can in arithmetic expressions, so you could also have written this pattern as

/P(erl|ython)/

You can also specify repetition within patterns. /ab+c/ matches a string containing an ``a'' followed by one or more ``b''s, followed by a ``c''. Change the plus to an asterisk, and /ab*c/ creates a regular expression that matches an ``a'', zero or more ``b''s, and a ``c''.

You can also match one of a group of characters within a pattern. Some common examples are character classes such as ``\s'', which matches a whitespace character (space, tab, newline, and so on), ``\d'', which matches any digit, and ``\w'', which matches any character that may appear in a typical word. The single character ``.'' (a period) matches any character.

We can put all this together to produce some useful regular expressions.

/\d\d:\d\d:\d\d/     # a time such as 12:34:56
/Perl.*Python/       # Perl, zero or more other chars, then Python
/Perl\s+Python/      # Perl, one or more spaces, then Python
/Ruby (Perl|Python)/ # Ruby, a space, and either Perl or Python

Once you have created a pattern, it seems a shame not to use it. The match operator ``=~'' can be used to match a string against a regular expression. If the pattern is found in the string, =~ returns its starting position, otherwise it returns nil. This means you can use regular expressions as the condition in if and while statements. For example, the following code fragment writes a message if a string contains the text 'Perl' or 'Python'.

if line =~ /Perl|Python/
  puts "Scripting language mentioned: #{line}"
end

The part of a string matched by a regular expression can also be replaced with different text using one of Ruby's substitution methods.

line.sub(/Perl/, 'Ruby')    # replace first 'Perl' with 'Ruby'
line.gsub(/Python/, 'Ruby') # replace every 'Python' with 'Ruby'

We'll have a lot more to say about regular expressions as we go through the book.

Blocks and Iterators

This section briefly describes one of Ruby's particular strengths. We're about to look at code blocks: chunks of code that you can associate with method invocations, almost as if they were parameters. This is an incredibly powerful feature. You can use code blocks to implement callbacks (but they're simpler than Java's anonymous inner classes), to pass around chunks of code (but they're more flexible than C's function pointers), and to implement iterators.

Code blocks are just chunks of code between braces or do...end.

{ puts "Hello" }       # this is a block

do                     #   club.enroll(person)  # and so is this   person.socialize     # end                    #

Once you've created a block, you can associate it with a call to a method. That method can then invoke the block one or more times using the Ruby yield statement. The following example shows this in action. We define a method that calls yield twice. We then call it, putting a block on the same line, after the call (and after any arguments to the method).[Some people like to think of the association of a block with a method as a kind of parameter passing. This works on one level, but it isn't really the whole story. You might be better off thinking of the block and the method as coroutines, which transfer control back and forth between themselves.]

def callBlock
  yield
  yield
end

callBlock { puts "In the block" }
produces:
In the block
In the block

See how the code in the block (puts "In the block") is executed twice, once for each call to yield.

You can provide parameters to the call to yield: these will be passed to the block. Within the block, you list the names of the arguments to receive these parameters between vertical bars (``|'').

  def callBlock
    yield , 
  end

callBlock { |, | ... }

Code blocks are used throughout the Ruby library to implement iterators: methods that return successive elements from some kind of collection, such as an array.

a = %w( ant bee cat dog elk )    # create an array
a.each { |animal| puts animal }  # iterate over the contents
produces:
ant
bee
cat
dog
elk

Let's look at how we might implement the Array class's each iterator that we used in the previous example. The each iterator loops through every element in the array, calling yield for each one. In pseudo code, this might look like:

# within class Array...
def each
  for each element
    yield(element)
  end
end

You could then iterate over an array's elements by calling its each method and supplying a block. This block would be called for each element in turn.

[ 'cat', 'dog', 'horse' ].each do |animal|
  print animal, " -- "
end
produces:
cat -- dog -- horse --

Similarly, many looping constructs that are built into languages such as C and Java are simply method calls in Ruby, with the methods invoking the associated block zero or more times.

5.times {  print "*" }
3.upto(6) {|i|  print i }
('a'..'e').each {|char| print char }
produces:
*****3456abcde

Here we ask the number 5 to call a block five times, then ask the number 3 to call a block, passing in successive values until it reaches 6. Finally, the range of characters from ``a'' to ``e'' invokes a block using the method each.

Reading and 'Riting

Ruby comes with a comprehensive I/O library. However, in most of the examples in this book we'll stick to a few simple methods. We've already come across two methods that do output. puts writes each of its arguments, adding a newline after each. print also writes its arguments, but with no newline. Both can be used to write to any I/O object, but by default they write to the console.

Another output method we use a lot is printf, which prints its arguments under the control of a format string (just like printf in C or Perl).

printf "Number: %5.2f, String: %s", 1.23, "hello"
produces:
Number:  1.23, String: hello

In this example, the format string "Number: %5.2f, String: %s" tells printf to substitute in a floating point number (allowing five characters in total, with two after the decimal point) and a string.

There are many ways to read input into your program. Probably the most traditional is to use the routine gets, which returns the next line from your program's standard input stream.

line = gets
print line

The gets routine has a side effect: as well as returning the line just read, it also stores it into the global variable $_. This variable is special, in that it is used as the default argument in many circumstances. If you call print with no argument, it prints the contents of $_. If you write an if or while statement with just a regular expression as the condition, that expression is matched against $_. While viewed by some purists as a rebarbative barbarism, these abbreviations can help you write some concise programs. For example, the following program prints all lines in the input stream that contain the word ``Ruby.''

while gets           # assigns line to $_
  if /Ruby/          # matches against $_
    print            # prints $_
  end
end

The ``Ruby way'' to write this would be to use an iterator.

ARGF.each { |line|  print line  if line =~ /Ruby/ }

This uses the predefined object ARGF, which represents the input stream that can be read by a program.

Onward and Upward

That's it. We've finished our lightning-fast tour of some of the basic features of Ruby. We've had a brief look at objects, methods, strings, containers, and regular expressions, seen some simple control structures, and looked at some rather nifty iterators. Hopefully, this chapter has given you enough ammunition to be able to attack the rest of this book.

Time to move on, and up---up to a higher level. Next, we'll be looking at classes and objects, things that are at the same time both the highest-level constructs in Ruby and the essential underpinnings of the entire language.


Previous < Contents ^
Next >

Extracted from the book "Programming Ruby - The Pragmatic Programmer's Guide"
Copyright © 2001 by Addison Wesley Longman, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/)).

Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder.

Distribution of the work or derivative of the work in any standard (paper) book form is prohibited unless prior permission is obtained from the copyright holder.