Contents | Prev | Next | Index

22.14 The Class java.io.StreamTokenizer

A StreamTokenizer takes an input stream and parses it into "tokens," allowing the tokens to be read one at a time. The parsing process is controlled by a table and a number of flags that can be set to various states, allowing recognition of identifiers, numbers, quoted strings, and comments in a standard style.

public class StreamTokenizer {
	public static final int TT_EOF = -1;
	public static final int TT_EOL = '\n';
	public static final int TT_NUMBER = -2;
	public static final int TT_WORD = -3;
	public int ttype;
	public String sval;
	public double nval;
	public StreamTokenizer(InputStream in);
	public void resetSyntax();
	public void wordChars(int low, int hi);
	public void whitespaceChars(int low, int hi);
	public void ordinaryChars(int low, int hi);
	public void ordinaryChar(int ch);
	public void commentChar(int ch);
	public void quoteChar(int ch);
	public void parseNumbers();
	public void eolIsSignificant(boolean flag);
	public void slashStarComments(boolean flag);
	public void slashSlashComments(boolean flag);
	public void lowerCaseMode(boolean flag);
	public int nextToken() throws IOException;
	public void pushBack();
	public int lineno();
	public String toString();
}
Each byte read from the input stream is regarded as a character in the range '\u0000' through '\u00FF'. The character value is used to look up five possible attributes of the character: whitespace, alphabetic, numeric, string quote, and comment character (a character may have more than one of these attributes, or none at all). In addition, there are three flags controlling whether line terminators are to be recognized as tokens, whether Java-style end-of-line comments that start with // should be recognized and skipped, and whether Java-style "traditional" comments delimited by /* and */ should be recognized and skipped. One more flag controls whether all the characters of identifiers are converted to lowercase.

Here is a simple example of the use of a StreamTokenizer. The following code merely reads all the tokens in the standard input stream and prints an identification of each one. Changes in the line number are also noted.

import java.io.StreamTokenizer;
import java.io.IOException;

class Tok {
	public static void main(String[] args) {
		StreamTokenizer st = new StreamTokenizer(System.in);
		st.ordinaryChar('/');
		int lineNum = -1;
		try {
			for (int tokenType = st.nextToken();
					tokenType != StreamTokenizer.TT_EOF;
					tokenType = st.nextToken()) {
				int newLineNum = st.lineno();
				if (newLineNum != lineNum) {
					System.out.println("[line " + newLineNum
											+ "]");
					lineNum = newLineNum;
				}
				switch(tokenType) {
				case StreamTokenizer.TT_NUMBER:
					System.out.println("the number " + st.nval);
					break;
				case StreamTokenizer.TT_WORD:
					System.out.println("identifier " + st.sval);
					break;
				default:
					System.out.println("  operator "
											+ (char)tokenType);
				}
			}
		} catch (IOException e) {
			System.out.println("I/O failure");
		}
	}
}
If the input stream contains this data:


10 LET A = 4.5
20 LET B = A*A
30 PRINT A, B
then the resulting output is:


[line 1]
the number 10.0
identifier LET
identifier A
  operator =
the number 4.5
[line 2]
the number 20.0
identifier LET
identifier B
  operator =
identifier A
  operator *
identifier A
[line 3]
the number 30.0
identifier PRINT
identifier A
  operator ,
identifier B

22.14.1 public static final int TT_EOF = -1;

A constant that indicates end of file was reached.

22.14.2 public static final int TT_EOL = '\n';

A constant that indicates that a line terminator was recognized.

22.14.3 public static final int TT_NUMBER = -2;

A constant that indicates that a number was recognized.

22.14.4 public static final int TT_WORD = -3;

A constant that indicates that a word (identifier) was recognized.

22.14.5 public int ttype;

The type of the token that was last recognized by this StreamTokenizer. This will be TT_EOF, TT_EOL, TT_NUMBER, TT_WORD, or a nonnegative byte value that was the first byte of the token (for example, if the token is a string token, then ttype has the quote character that started the string).

22.14.6 public String sval;

If the value of ttype is TT_WORD or a string quote character, then the value of sval is a String that contains the characters of the identifier or of the string (without the delimiting string quotes). For all other types of tokens recognized, the value of sval is null.

22.14.7 public double nval;

If the value of ttype is TT_NUMBER, then the value of nval is the numerical value of the number.

22.14.8 public StreamTokenizer(InputStream in)

This constructor initializes a newly created StreamTokenizer by saving its argument, the input stream in, for later use. The StreamTokenizer is also initialized to the following default state:

22.14.9 public void resetSyntax()

The syntax table for this StreamTokenizer is reset so that every byte value is "ordinary"; thus, no character is recognized as being a whitespace, alphabetic, numeric, string quote, or comment character. Calling this method is therefore equivalent to:

ordinaryChars(0x00, 0xff)
The three flags controlling recognition of line terminators, // comments, and /* comments are unaffected.

22.14.10 public void wordChars(int low, int hi)

The syntax table for this StreamTokenizer is modified so that every character in the range low through hi has the "alphabetic" attribute.

22.14.11 public void whitespaceChars(int low, int hi)

The syntax table for this StreamTokenizer is modified so that every character in the range low through hi has the "whitespace" attribute.

22.14.12 public void ordinaryChars(int low, int hi)

The syntax table for this StreamTokenizer is modified so that every character in the range low through hi has no attributes.

22.14.13 public void ordinaryChar(int ch)

The syntax table for this StreamTokenizer is modified so that the character ch has no attributes.

22.14.14 public void commentChar(int ch)

The syntax table for this StreamTokenizer is modified so that the character ch has the "comment character" attribute.

22.14.15 public void quoteChar(int ch)

The syntax table for this StreamTokenizer is modified so that the character ch has the "string quote" attribute.

22.14.16 public void parseNumbers()

The syntax table for this StreamTokenizer is modified so that each of the twelve characters

0 1 2 3 4 5 6 7 8 9 . -
has the "numeric" attribute.

22.14.17 public void eolIsSignificant(boolean flag)

This StreamTokenizer henceforth recognizes line terminators as tokens if and only if the flag argument is true.

22.14.18 public void slashStarComments(boolean flag)

This StreamTokenizer henceforth recognizes and skips Java-style "traditional" comments, which are delimited by /* and */ and do not nest, if and only if the flag argument is true.

22.14.19 public void slashSlashComments(boolean flag)

This StreamTokenizer henceforth recognizes and skips Java-style end-of-line comments that start with // if and only if the flag argument is true.

22.14.20 public void lowerCaseMode(boolean flag)

This StreamTokenizer henceforth converts all the characters in identifiers to lowercase if and only if the flag argument is true.

22.14.21 public int nextToken() throws IOException

If the previous token was pushed back (§22.14.22), then the value of ttype is returned, effectively causing that same token to be reread.

Otherwise, this method parses the next token in the contained input stream. The type of the token is returned; this same value is also made available in the ttype field, and related data may be made available in the sval and nval fields.

First, whitespace characters are skipped, except that if a line terminator is encountered and this StreamTokenizer is currently recognizing line terminators, then the type of the token is TT_EOL.

If a numeric character is encountered, then an attempt is made to recognize a number. If the first character is '-' and the next character is not numeric, then the '-' is considered to be an ordinary character and is recognized as a token in its own right. Otherwise, a number is parsed, stopping before the next occurrence of '-', the second occurrence of '.', the first nonnumeric character encountered, or end of file, whichever comes first. The type of the token is TT_NUMBER and its value is made available in the field nval.

If an alphabetic character is encountered, then an identifier is recognized, consisting of that character and all following characters up to, but not including, the first character that is neither alphabetic nor numeric, or up to end of file, whichever comes first. The characters of the identifier may be converted to lowercase if this StreamTokenizer is in lowercase mode.

If a comment character is encountered, then all subsequent characters are skipped and ignored, up to but not including the next line terminator or end of file. Then another attempt is made to recognize a token. If this StreamTokenizer is currently recognizing line terminators, then a line terminator that ends a comment will be recognized as a token in the same manner as any other line terminator in the contained input stream.

If a string quote character is encountered, then a string is recognized, consisting of all characters after (but not including) the string quote character, up to (but not including) the next occurrence of that same string quote character, or a line terminator, or end of file. The usual escape sequences (§3.10.6) such as \n and \t are recognized and converted to single characters as the string is parsed.

If // is encountered and this StreamTokenizer is currently recognizing // comments, then all subsequent characters are skipped and ignored, up to but not including the next line terminator or end of file. Then another attempt is made to recognize a token. (If this StreamTokenizer is currently recognizing line terminators, then a line terminator that ends a comment will be recognized as a token in the same manner as any other line terminator in the contained input stream.)

If /* is encountered and this StreamTokenizer is currently recognizing /* comments, then all subsequent characters are skipped and ignored, up to and including the next occurrence of */ or end of file. Then another attempt is made to recognize a token.

If none of the cases listed above applies, then the only other possibility is that the first non-whitespace character encountered is an ordinary character. That character is considered to be a token and is stored in the ttype field and returned.

22.14.22 public void pushBack()

Calling this method "pushes back" the current token; that is, it causes the next call to nextToken to return the same token that it just provided. Note that this method does not restore the line number to its previous value, so if the method lineno is called after a call to pushBack but before the next call to nextToken, an incorrect line number may be returned.

22.14.23 public int lineno()

The number of the line on which the current token appeared is returned. The first token in the input stream, if not a line terminator, is considered to appear on line 1. A line terminator token is considered to appear on the line that it precedes, not on the line it terminates; thus, the first line terminator in the input stream is considered to be on line 2.

22.14.24 public String toString()

The current token and the current line number are converted to a string of the form:

"Token[x], line m"
where m is the current line number in decimal form and x depends on the type of the current token:

Overrides the toString method of Object (§20.1.2).


Contents | Prev | Next | Index

Java Language Specification (HTML generated by Suzette Pelouch on February 24, 1998)
Copyright © 1996 Sun Microsystems, Inc. All rights reserved
Please send any comments or corrections to [email protected]