Recipe 8.9. Processing Variable-Length Text Fields (Perl Cookbook)

Perl Cookbook

Perl CookbookSearch this book
Previous: 8.8. Reading a Particular Line in a FileChapter 8
File Contents
Next: 8.10. Removing the Last Line of a File
 

8.9. Processing Variable-Length Text Fields

Problem

You want to extract variable length fields from your input.

Solution

Use split with a pattern matching the field separators.

# given $RECORD with field separated by PATTERN,
# extract @FIELDS.
@FIELDS = split(/PATTERN/, $RECORD);

Discussion

The split function takes up to three arguments: PATTERN, EXPRESSION, and LIMIT. The LIMIT parameter is the maximum number of fields to split into. (If the input contains more fields, they are returned unsplit in the final list element.) If LIMIT is omitted, all fields (except any final empty ones) are returned. EXPRESSION gives the string value to split. If EXPRESSION is omitted, $_ is split. PATTERN is a pattern matching the field separator. If PATTERN is omitted, contiguous stretches of whitespace are used as the field separator and leading empty fields are silently discarded.

If your input field separator isn't a fixed string, you might want split to return the field separators as well as the data by using parentheses in PATTERN to save the field separators. For instance:

split(/([+-])/, "3+5-2");

returns the values:

(3, '+', 5, '-', 2)

To split colon-separated records in the style of the /etc/passwd file, use:

@fields = split(/:/, $RECORD);

The classic application of split is whitespace-separated records:

@fields = split(/\s+/, $RECORD);

If $RECORD started with whitespace, this last use of split would have put an empty string into the first element of @fields because split would consider the record to have an initial empty field. If you didn't want this, you could use this special form of split:

@fields = split(" ", $RECORD);

This behaves like split with a pattern of /\s+/, but ignores leading whitespace.

When the record separator can appear in the record, you have a problem. The usual solution is to escape occurrences of the record separator in records by prefixing them with a backslash. See Recipe 1.13.

See Also

The split function in perlfunc (1) and in Chapter 3 of Programming Perl


Previous: 8.8. Reading a Particular Line in a FilePerl CookbookNext: 8.10. Removing the Last Line of a File
8.8. Reading a Particular Line in a FileBook Index8.10. Removing the Last Line of a File

Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.