Monday, January 11, 2010

Recipe 21.5. Avoiding Regular Expressions










Recipe 21.5. Avoiding Regular Expressions



21.5.1. Problem


You want to
improve script performance by optimizing string-matching operations.




21.5.2. Solution


Replace unnecessary regular expression calls with faster string and character type function alternatives.




21.5.3. Discussion


A common source of unnecessary computation is the use of regular expression functions when they are not needed'for example, if you're validating a
form submission for a valid username and want to make sure that the username contains only alphanumeric characters.


A common approach to this problem is a regular expression:


<?php
if (!preg_match('/^[a-z0-9]*$/i', $username)) {
echo 'please enter a valid username.';
}
?>



The same test can be performed much faster with the ctype_alnum( )
function.


Using code-timing techniques covered in Recipe 21.1, let's compare the above test with ctype_alnum( ):


<?php
$username = 'foo411';

$start = microtime(true);

if (!preg_match('/^[a-z0-9]*/i', $username)) {
echo 'please enter a valid username';
}

$regextime = microtime(true) - $start;

$start = microtime(true);

if (!ctype_alnum($username)) {
echo 'please enter a valid username';
}

$ctypetime = microtime(true) - $start;

echo "preg_match took: $regextime seconds\n";
echo "ctype_alnum took: $ctypetime seconds\n";
?>



This will output results similar to:


preg_match took:  0.000163078308105 seconds
ctype_alnum took: 9.05990600586E-06 seconds



ctype_alnum( ) is considerably faster; 9.05990600586E-06 is the same as 0.00000906 seconds, which is 18 times faster than the preg_match( ) regular expression, with exactly the same result.


When applied to a complex application, replacing unnecessary regular expressions with equivalent alternatives can add up to a significant performance gain.


A good litmus test when you're coding and need to decide whether or not you need to use a regular expression is whether or not the match you're performing can be explained in a brief sentence. Granted, there are some matches, such as "string is a valid email address," which cannot be adequately verified without a complex regular expression. However, "check if string A contains string B" can be tested with several different approaches, but is ultimately a very simple test that does not require regular expressions:


$haystack = 'The quick brown fox jumps over the lazy dog';
$needle = 'lazy dog';

// slowest
if (ereg($needle, $haystack)) echo 'match!';

// slow
if (preg_match("/$needle/", $haystack)) echo 'match!';


// fast
if (strstr($haystack, $needle)) echo 'match!';

// fastest
if (strpos($haystack, $needle) !== false) echo 'match!';



There is certainly a benefit to double-checking the ctype and string functions before making a commitment to a regular expression, particularly if you're working a section of code that will loop repeatedly.




21.5.4. See Also


Documentation on ctype functions at http://www.php.net/manual/en/ref.ctype.php; on string functions at http://www.php.net/manual/en/ref.strings.php; on regular expression functions at http://www.php.net/manual/en/ref.pcre.php.













No comments: