PHP - tutorial - 11 - regular expressions (RegEx)

revision:


A regular expression is a sequence of characters that forms a search pattern.

When you search for data in a text, you can use this search pattern to describe what you are searching for.

A regular expression can be a single character, or a more complicated pattern. Regular expressions can be used to perform all types of text search and text replace operations.

In PHP, regular expressions are strings composed of delimiters, a pattern and optional modifiers.

Syntax: $exp = "/xxxxxxxxxxxxx/i";

/ is the delimiter,
xxxxxxxxxxxx is the pattern that is being searched for,
i is a modifier that makes the search case-insensitive.

The delimiter can be any character that is not a letter, number, backslash or space. The most common delimiter is the forward slash (/), but when your pattern contains forward slashes it is convenient to choose other delimiters such as # or ~.

PHP provides a variety of functions that allow you to use regular expressions.

The most commonly used ones are:

preg_match() - returns 1 if the pattern was found in the string and 0 if not.

example:

1
code:
                <?php
                    $str = "Visit my interesting and valued website";
                    $pattern = "/website/i";
                    echo preg_match($pattern, $str); 
                ?>
            

preg_match_all() - returns the number of times the pattern was found in the string, which may also be 0.

example:

4
code:
                <?php
                    $str = "The rain in SPAIN falls mainly on the plains.";
                    $pattern = "/ain/i";
                    echo preg_match_all($pattern, $str);
                ?>
            

preg_replace() - returns a new string where matched patterns have been replaced with another string.

example:

Visit my Website!
code:
                <?php
                    $str = "Visit Microsoft!";
                    $pattern = "/microsoft/i";
                    echo preg_replace($pattern, "my Website", $str);
                ?>
            

search patterns can be modified/adjusted in various ways

Modifiers can change how a search is performed:

i - performs a case-insensitive search.

m - performs a multiline search (patterns that search for the beginning or end of a string will match the beginning or end of each line).

u - enables correct matching of UTF-8 encoded patterns.

Brackets are used to find a range of characters:

[abc] - find one character from the options between the brackets

[^abc] - find any character NOT between the brackets

[0-9] - find one character from the range 0 to 9.

Metacharacters are characters with a special meaning:

| - find a match for any one of the patterns separated by | as in : cat|dog|fish.

. - find just one instance of any character.

^ - finds a match at the beginning of a string as in: ^Hello.

$ - finds a match at the end of the string as in: World$.

\d - find a digit.

\s - find a whitespace character.

\b - find a match at the beginning of a word like this: \bWORD, or at the end of a word like this: WORD\b.

\uxxxx - find the Unicode character specified by the hexadecimal number xxxx.

Quantifiers define quantities:

n+ - matches any string that contains at least one n.

n* - matches any string that contains zero or more occurrences of n.

n? - matches any string that contains zero or one occurrences of n.

n{x} - matches any string that contains a sequence of X n's.

n{x,y} - matches any string that contains a sequence of X to Y n's.

n{x,} - matches any string that contains a sequence of at least X n's

You can use parentheses ( ) to apply quantifiers to entire patterns. They also can be used to select parts of the pattern to be used as a match.

example:

1
code:
               <?php
                    $str = "Apples and bananas.";
                    $pattern = "/ba(na){2}/i";
                    echo preg_match($pattern, $str);
                ?>      
            

PHP - regular expression functions

preg_filter() - returns a string or an array with pattern matches replaced, but only if matches were found.

Syntax: preg_filter(pattern, replacement, input, limit, count)

Parameter values:

pattern - required; contains a regular expression indicating what to search for.

replacements - required; a string which will replace the matched patterns. It may contain backreferences.

input - required; a string or array of strings in which the replacements are being performed.

limit - optional; defaults to -1, meaning unlimited. Sets a limit to how many replacements can be done in each string.

count - optional; after the function has executed, this variable will contain a number indicating how many replacements were performed.

example:

Array ( [0] => It is (5) o'clock [1] => (40) days [3] => In the year (2000) )
code:
               <?php
                    $input = [
                        "It is 5 o'clock",
                        "40 days",
                        "No numbers here",
                        "In the year 2000"
                    ];

                    $result = preg_filter('/[0-9]+/', '($0)', $input);
                    print_r($result);
                ?>      
            

preg_grep() - returns an array consisting only of elements from the input array which matched the pattern.

Syntax: preg_grep(pattern, input, flags)

Parameter values:

pattern - required; contains a regular expression indicating what to search for.

input - required; an array of strings.

flags - optional; there is only one flag for this function. Passing the constant PREG_GREP_INVERT will make the function return only items that do not match the pattern.

example:

Array ( [1] => Pink [4] => Purple )
code:
               <?php
                    $input = [
                        "Red",
                        "Pink",
                        "Green",
                        "Blue",
                        "Purple"
                        ];

                        $result = preg_grep("/^p/i", $input);
                        print_r($result);
                ?>      
            

preg_last_error() - returns an error code indicating the reason that the most recent regular expression call failed.

The preg_last_error() function returns an error code for the most recently evaluated regular expression. The returned value will match one of the following constants:

PREG_NO_ERROR - no error occurred.

PREG_INTERNAL_ERROR - there was an error evaluating the expression.

PREG_BACKTRACK_LIMIT_ERROR - the number of backtracks needed to evaluate the expression exceeded the limit given in PHP's configuration.

PREG_RECURSION_LIMIT_ERROR - the recursion depth needed to evaluate the expression exceeded the limit given in PHP's configuration.

PREG_BAD_UTF8_ERROR - the input string contained invalid UTF-8 data.

PREG_BAD_UTF8_OFFSET_ERROR - during evaluation, a string offset did not point to the first character of a multibyte UTF-8 symbol.

PREG_JIT_STACKLIMIT_ERROR - the JIT compiler ran out of stack memory when trying to evaluate the expression.

Syntax: preg_last_error()

example:

Invalid regular expression.
code:
               <?php
                    $str = 'The regular expression is invalid.';
                    $pattern = '/invalid//';
                    $match = @preg_match($pattern, $str, $matches);

                    if($match === false) {
                    // An error occurred
                    $err = preg_last_error();
                    if($err == PREG_INTERNAL_ERROR) {
                        echo 'Invalid regular expression.';
                    }
                    } else if($match) {
                    // A match was found
                    echo $matches[0];
                    } else {
                    // No matches were found
                    echo 'No matches found';
                    }
                ?>      
            

preg_match() - finds the first match of a pattern in a string.

Syntax: preg_match(pattern, input, matches, flags, offset)

Parameter values:

pattern - required; contains a regular expression indicating what to search for.

input - required; an array of strings.

matches - optional; the variable used in this parameter will be populated with an array containing all of the matches that were found

flags - optional; a set of options that change how the matches array is structured: - PREG_OFFSET_CAPTURE - when this option is enabled, each match, instead of being a string, will be an array where the first element is a substring containing the match and the second element is the position of the first character of the substring in the input; - PREG_UNMATCHED_AS_NULL - when this option is enabled, unmatched subpatterns will be returned as NULL instead of as an empty string.

offset - optional; defaults to 0. Indicates how far into the string to begin searching. The preg_match() function will not find matches that occur before the position given in this parameter.

example:

1
code:
               <?php
                    $str = "Visit W3Schools";
                    $pattern = "/w3schools/i";
                    echo preg_match($pattern, $str);
                ?>      
            

preg_match_all() - finds all matches of a pattern in a string.

Syntax: preg_match_all(pattern, input, matches, flags, offset)

Parameter values:

pattern - required; contains a regular expression indicating what to search for.

input - required; the string in which the search will be performed.

matches - optional; the variable used in this parameter will be populated with an array containing all of the matches that were found

flags - optional; a set of options that change how the matches array is structured. One of the following structures may be selected: - PREG_PATTERN_ORDER - default; each element in the matches array is an array of matches from the same grouping in the regular expression, with index 0 corresponding to matches of the whole expression and the remaining indices for subpattern matches; - PREG_SET_ORDER - each element in the matches array contains matches of all groupings for one of the found matches in the string. Any number of the follwing options may be applied: - PREG_OFFSET_CAPTURE - when this option is enabled, each match, instead of being a string, will be an array where the first element is a substring containing the match and the second element is the position of the first character of the substring in the input; - PREG_UNMATCHED_AS_NULL - when this option is enabled, unmatched subpatterns will be returned as NULL instead of as an empty string.

offset - optional; defaults to 0. Indicates how far into the string to begin searching. The preg_match() function will not find matches that occur before the position given in this parameter.

example:

Array ( [0] => Array ( [0] => ain [1] => AIN [2] => ain [3] => ain ) )
code:
               <?php
                $str = "The rain in SPAIN falls mainly on the plains.";
                    $pattern = "/ain/i";
                    if(preg_match_all($pattern, $str, $matches)) {
                    print_r($matches);
                    }
                ?>      
            

preg_replace() - returns a string where matches of a pattern (or an array of patterns) are replaced with a substring (or an array of substrings) in a given string.

Syntax: preg_replace(patterns, replacements, input, limit, count)

Parameter values:

patterns - required; contains a regular expression indicating what to search for.

replacements - required; a string which will replace the matched patterns. It may contain backreferences.

input - required; a string or array of strings in which the replacements are being performed.

limit - optional; defaults to -1, meaning unlimited. Sets a limit to how many replacements can be done in each string.

count - optional; after the function has executed, this variable will contain a number indicating how many replacements were performed.

example:

Visit W3Schools!
code:
               <?php
                    $str = 'Visit Microsoft!';
                    $pattern = '/microsoft/i';
                    echo preg_replace($pattern, 'W3Schools', $str);
                ?>      
            

preg_replace_callback() - given an expression and a callback, returns a string where all matches of the expression are replaced with the substring returned by the callback.

Syntax: preg_replace_callback(pattern, callback, input, limit, count)

Parameter values:

patterns - required; contains a regular expression indicating what to search for.

replacements - required; a callback function which returns the replacement. The callback function has one parameter containing an array of matches. The first element in the array contains the match for the whole expression while the remaining elements have matches for each of the groups in the expression.

input - required; a string or array of strings in which the replacements are being performed.

limit - optional; defaults to -1, meaning unlimited. Sets a limit to how many replacements can be done in each string.

count - optional; after the function has executed, this variable will contain a number indicating how many replacements were performed.

example:

Welcome(7) to(2) W3Schools.com(13)!
code:
               <?php
                    function countLetters($matches) {
                        return $matches[0] . '(' . strlen($matches[0]) . ')';
                    }

                    $input = "Welcome to W3Schools.com!";
                    $pattern = '/[a-z0-9\.]+/i';
                    $result = preg_replace_callback($pattern, 'countLetters', $input);
                    echo $result;
                ?>      
            

preg_replace_callback_array() - given an array associating expressions with callbacks, returns a string where all matches of each expression are replaced with the substring returned by the callback.

Syntax: preg_replace_callback_array(patterns, input, limit, count)

Parameter values:

pattern - required; an associative array which associates regular expression patterns to callback functions. The callback functions have one parameter which is an array of matches.The first element in the array contains the match for the whole expression while the remaining elements have matches for each of the groups in the expression./p>

input - required; a string or array of strings in which the replacements are being performed.

limit - optional; defaults to -1, meaning unlimited. Sets a limit to how many replacements can be done in each string.

count - optional; after the function has executed, this variable will contain a number indicating how many replacements were performed.

example:

There(5[1digit]) are(3[1digit]) 365[3digit] days(4[1digit]) in(2[1digit]) a(1[1digit]) year(4[1digit]).
code:
               <?php
                    function countLettersA($matches) {
                        return $matches[0] . '[' . strlen($matches[0]) . 'letter]';
                    }
                    
                    function countDigits($matches) {
                        return $matches[0] . '[' . strlen($matches[0]) . 'digit]';
                    }
                    
                    $input = "There are 365 days in a year.";
                    $patterns = [
                        '/\b[a-z]+\b/i' => 'countLetters',
                        '/\b[0-9]+\b/' => 'countDigits'
                    ];
                    $result = preg_replace_callback_array($patterns, $input);
                    echo $result;
                ?>      
            

preg_split() - breaks a string into an array using matches of a regular expression as separators.

Syntax: preg_split(pattern, string, limit, flags)

Parameter values:

pattern - required; a regular expression determining what to use as a separator./p>

string - required; the string that is being split.

limit - optional; defaults to -1, meaning unlimited. Limits the number of elements that the returned array can have. If the limit is reached before all of the separators have been found, the rest of the string will be put into the last element of the array.

flags - optional; these flags provide options to change the returned array: -PREG_SPLIT_NO_EMPTY - empty strings will be removed from the returned array; -PREG_SPLIT_DELIM_CAPTURE - if the regular expression contains a group wrapped in parentheses, matches of this group will be included in the returned array; - PREG_SPLIT_OFFSET_CAPTURE - each element in the returned array will be an array with two element, where the first element is the substring and the second element is the position of the first character of the substring in the input string.

example:

Array ( [0] => 1970 [1] => 01 [2] => 01 [3] => 00 [4] => 00 [5] => 00 )
code:
               <?php
                   $date = "1970-01-01 00:00:00";
                    $pattern = "/[-\s:]/";
                    $components = preg_split($pattern, $date);
                    print_r($components);
                ?>      
            

preg_quote() - escapes characters that have a special meaning in regular expressions by putting a backslash in front of them.

Syntax: preg_quote(input, delimiter)

Parameter values:

input - required; the string to be escaped.

delimiter - optional; defaults to null. This parameter expects a single character indicating which delimiter the regular expression will use. When provided, instances of this character in the input string will also be escaped with a backslash.

example:

The input is a URL.
code:
               <?php
                   $search = preg_quote("://", "/");
                    $input = 'https://www.w3schools.com/';
                    $pattern = "/$search/";
                    if(preg_match($pattern, $input)) {
                    echo "The input is a URL.";
                    } else {
                    echo "The input is not a URL.";
                    }
                ?>