Tuesday, December 15, 2009

Section 7.10. From the Java Library: java.util.StringTokenizer










[Page 332 (continued)]

7.10. From the Java Library: java.util.StringTokenizer


java.sun.com/docs



One of the most widespread string-processing tasks is that of breaking up a string into its components, or tokens. For example, when processing a sentence, you may need to break the sentence into its constituent words, which are considered the sentence tokens. When processing a namepassword string, such as "boyd:14irXp", you may need to break it into a name and a password. Tokens are separated from each other by one or more characters which is known as delimiters. For a sentence, white space, including blank spaces, tabs, and line feeds, serve as the delimiters. For the password example, the colon character serves as a delimiter.



[Page 333]

Java's java.util.StringTokenizer class is specially designed for breaking strings into their tokens (Fig. 7.17). When instantiated with a String parameter, a String-Tokenizer breaks the string into tokens, using white space as delimiters. For example, if we instantiated a StringTokenizer as in the code




Figure 7.17. The java.util.StringTokenizer class.







StringTokenizer sTokenizer
= new StringTokenizer("This is an English sentence.");


it would break the string into the following tokens, which would be stored internally in the StringTokenizer in the order shown:


This
is
an
English
sentence.


Note that the period is part of the last token ("sentence."). This is because punctuation marks are not considered delimiters by default.


If you wanted to include punctuation symbols as delimiters, you could use the second StringTokenizer() constructor, which takes a second String parameter (Fig. 7.17). The second parameter specifies a string of characters that should be used as delimiters. For example, in the instantiation,



[Page 334]
StringTokenizer sTokenizer
= new StringTokenizer("This is an English sentence.", "\b\t\n,;.!");


punctuation symbols (periods, commas, and so on) are included among the delimiters. Note that escape sequences (\b\t\n) are used to specify backspaces, tabs, and new lines.


The hasMoreTokens() and nextToken() methods can be used to process a delimited string one token at a time. The first method returns true as long as more tokens remain; the second gets the next token in the list. For example, here is a code segment that will break a standard URL string into its constituent parts:


String url = "http://java.trincoll.edu/~jjj/index.html";
StringTokenizer sTokenizer = new StringTokenizer(url,":/");
while (sTokenizer.hasMoreTokens()) {
System.out.println(sTokenizer.nextToken());
}


This code segment will produce the following output:


http
java.trincoll.edu
~jjj
index.html


The only delimiters used in this case were the ":" and "/" symbols. And note that nextToken() does not return the empty string between ":" and "/" as a token.












No comments: