6.7. Special Features of Strings6.7.1. Special or Control CharactersLike most other high-level or scripting languages, a backslash paired with another single character indicates the presence of a "special" character, usually a nonprintable character, and that this pair of characters will be substituted by the special character. These are the special characters we discussed above that will not be interpreted if the raw string operator precedes a string containing these characters. In addition to the well-known characters such as NEWLINE ( \n ) and (horizontal) tab ( \t ), specific characters via their ASCII values may be used as well: \OOO or \xXX where OOO and XX are their respective octal and hexadecimal ASCII values. Here are the base 10, 8, and 16 representations of 0, 65, and 255:
Special characters, including the backslash-escaped ones, can be stored in Python strings just like regular characters. Another way that strings in Python are different from those in C is that Python strings are not terminated by the NUL (\000) character (ASCII value 0). NUL characters are just like any of the other special backslash-escaped characters. In fact, not only can NUL characters appear in Python strings, but there can be any number of them in a string, not to mention that they can occur anywhere within the string. They are no more special than any of the other control characters. Table 6.7 represents a summary of the escape characters supported by most versions of Python.
As mentioned before, explicit ASCII octal or hexadecimal values can be given, as well as escaping a NEWLINE to continue a statement to the next line. All valid ASCII character values are between 0 and 255 (octal 0177, hexadecimal 0XFF). \OOO Octal value OOO (range is 0000 to 0177) One use of control characters in strings is to serve as delimiters. In database or Internet/Web processing, it is more than likely that most printable characters are allowed as data items, meaning that they would not make good delimiters. It becomes difficult to ascertain whether or not a character is a delimiter or a data item, and by using a printable character such as a colon ( : ) as a delimiter, you are limiting the number of allowed characters in your data, which may not be desirable. One popular solution is to employ seldomly used, nonprintable ASCII values as delimiters. These make the perfect delimiters, freeing up the colon and the other printable characters for more important uses. 6.7.2. Triple QuotesAlthough strings can be represented by single or double quote delimitation, it is often difficult to manipulate strings containing special or nonprintable characters, especially the NEWLINE character. Python's triple quotes comes to the rescue by allowing strings to span multiple lines, including verbatim NEWLINEs, tabs, and any other special characters. The syntax for triple quotes consists of three consecutive single or double quotes (used in pairs, naturally): >>> hi = '''hi Triple quotes lets the developer avoid playing quote and escape character games, all the while bringing at least a small chunk of text closer to WYSIWIG (what you see is what you get) format. The most powerful use cases are when you have a large block of HTML or SQL that would be completely inconvenient to use by concanentation or wrapped with backslash escapes: errHTML = ''' 6.7.3. String ImmutabilityIn Section 4.7.2, we discussed how strings are immutable data types, meaning that their values cannot be changed or modified. This means that if you do want to update a string, either by taking a substring, concatenating another string on the end, or concatenating the string in question to the end of another string, etc., a new string object must be created for it. This sounds more complicated than it really is. Since Python manages memory for you, you won't really notice when this occurs. Any time you modify a string or perform any operation that is contrary to immutability, Python will allocate a new string for you. In the following example, Python allocates space for the strings, 'abc' and 'def'. But when performing the addition operation to create the string 'abcdef', new space is allocated automatically for the new string. >>> 'abc' + 'def' Assigning values to variables is no different: >>> s = 'abc' In the above example, it looks like we assigned the string 'abc' to string, then appended the string 'def' to string. To the naked eye, strings look mutable. What you cannot see, however, is the fact that a new string was created when the operation "s + 'def'" was performed, and that the new object was then assigned back to s. The old string of 'abc'was deallocated. Once again, we can use the id() built-in function to help show us exactly what happened. If you recall, id() returns the "identity" of an object. This value is as close to a "memory address" as we can get in Python. >> s = 'abc' Note how the identities are different for the string before and after the update. Another test of mutability is to try to modify individual characters or substrings of a string. We will now show how any update of a single character or a slice is not allowed: >>> s Both operations result in an error. In order to perform the actions that we want, we will have to create new strings using substrings of the existing string, then assign those new strings back to string: >>> s So for immutable objects like strings, we make the observation that only valid expressions on the left-hand side of an assignment (to the left of the equals sign [ = ]) must be the variable representation of an entire object such as a string, not single characters or substrings. There is no such restriction for the expression on the right-hand side. |
Tuesday, October 27, 2009
Section 6.7. Special Features of Strings
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment