|
|
 |
preg_match (PHP 3 >= 3.0.9, PHP 4, PHP 5) preg_match -- Perform a regular expression match Descriptionmixed preg_match ( string pattern, string subject [, array &matches [, int flags [, int offset]]] )
Searches subject for a match to the regular
expression given in pattern.
If matches is provided, then it is filled with the
results of search. $matches[0] will contain the text
that matched the full pattern, $matches[1] will have
the text that matched the first captured parenthesized subpattern, and so
on.
flags can be the following flag:
- PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string
offset will also be returned. Note that this changes the return value
in an array where every element is an array consisting of the matched
string at offset 0 and its string offset into
subject at offset 1. This
flag is available since PHP 4.3.0 .
The flags parameter is available since
PHP 4.3.0.
Normally, the search starts from the beginning of the subject string. The
optional parameter offset can be used to specify
the alternate place from which to start the search.
The offset parameter is available since
PHP 4.3.3.
Note:
Using offset is not equivalent to
passing substr($subject, $offset) to
preg_match() in place of the subject string, because
pattern can contain assertions such as
^, $ or
(?<=x). Compare:
preg_match() returns the number of times
pattern matches. That will be either 0 times
(no match) or 1 time because preg_match() will stop
searching after the first match. preg_match_all()
on the contrary will continue until it reaches the end of
subject.
preg_match() returns FALSE if an error occurred.
Tip:
Do not use preg_match() if you only want to check if
one string is contained in another string. Use
strpos() or strstr() instead as
they will be faster.
Example 1. Find the string of text "php" |
<?php
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
|
|
Example 2. Find the word "web" |
<?php
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
|
|
Example 3. Getting the domain name out of a URL |
<?php
preg_match("/^(http:\/\/)?([^\/]+)/i",
"http://www.php.net/index.html", $matches);
$host = $matches[2];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
|
This example will produce:
|
See also preg_match_all(),
preg_replace(), and
preg_split().
User Contributed Notes
preg_match
Chortos-2
14-May-2005 03:30
max wrote a fix for satch666's function, but it too has a little bug... If you write IP 09.111.111.1, it will return TRUE.
<?
$num="(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])";
if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $$ip_addr)) echo "Wrong IP Address\\n";
?>
P.S. Why did you write [0-9] and not \\d?
Gaspard
10-May-2005 04:47
If someone need it.. It validates a birth date in format JJMMAAAA
<?php
if (preg_match("/
^(0[1-9]|[1-2]{1}[0-9]{1}|3[0-1]{1})
(0[1-9]{1}|1[0-2]{1})
(19[\d]{2}|200[0-5])$/", $date)
echo "Ok" ;
?>
berndt at www dot michael - berndt dot de
01-May-2005 08:02
DKing
29-Apr-2005 05:08
This works perfectly for seeing if someone is using your copyright in their files! You could make the code and then do something like:
<?php
if( !(preg_match("<copyright stuff here>", "file.php") )
{
die("You have not retained the full copyright! Please restore it as seen below!<BR /><BR /><copyright stuff here>");
}
?>
And then it would block them from doing anything until:
(1: They put it back in
-or-
(2: They remove the above lines of code! :P
21-Apr-2005 06:37
If you are using an older version of PHP, you will find that preg_match(",", "foo,bar") works as one might like. However, for newer versions, this needs to be preg_match("/,/", "foobar"). You'll get an odd message about a delimiter if this is the problem.
MikeS
08-Apr-2005 03:34
For anyone that's looking around for info about preg_match crashes on long stings I may have a solution for you. After wasting 2 hours I finally found out it is a bug w/ PCRE and not a problem w/ my input data or regex. In my case I was able to turn on UnGreedy (U modifier) and it worked fine! Before my regex would crash on strings around 1800 chars. With no modification to the regex aside from the ungreeder modifier I ran it on strings up to 500,000 chars long! (not that it crashed at 500K, i just stopped trying to find a limit after that)
Of course this "fix" depends on the nature of regex and what you're trying to do.
Hope this helps someone!
max at clnet dot cz
07-Apr-2005 09:40
satch666 writed fix for the function valid_ipv4(), but it's not working good. I think that this code is realy functionaly.
<?
$num="([0-9]|[0-9]{2}|1\d\d|2[0-4]\d|25[0-5])";
if (!preg_match("/^$num\.$num\.$num\.$num$/", $$ip_addr)) echo "Wrong IP Address\n";
?>
carsten at senseofview dot de
14-Mar-2005 05:57
The ExtractString function does not have a real error, but some disfunction. What if is called like this:
ExtractString($row, 'action="', '"');
It would find 'action="' correctly, but perhaps not the first " after the $start-string. If $row consists of
<form method="post" action="script.php">
strpos($str_lower, $end) would return the first " in the method-attribute. So I made some modifications and it seems to work fine.
function ExtractString($str, $start, $end)
{
$str_low = strtolower($str);
$pos_start = strpos($str_low, $start);
$pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
if ( ($pos_start !== false) && ($pos_end !== false) )
{
$pos1 = $pos_start + strlen($start);
$pos2 = $pos_end - $pos1;
return substr($str, $pos1, $pos2);
}
}
erm(at)the[dash]erm/dot/com
11-Mar-2005 03:15
This is a modified version of the valid_ipv4 function that will test for a valid ip address with wild cards.
ie 192.168.0.*
or even 192.168.*.1
function valid_ipv4($ip_addr)
{
$num="(\*|[0-9]{1,3}|^1?\d\d$|2[0-4]\d|25[0-5])";
if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))
{
print_r ($matches);
return $matches[0];
} else {
return false;
}
}
info at reiner-keller dot de
12-Feb-2005 12:03
Pointing to the post of "internet at sourcelibre dot com": Instead of using PerlRegExp for e.g. german "Umlaute" like
<?php
$bolMatch = preg_match("/^[a-zA-ZäöüÄÖÜ]+$/", $strData);
?>
use the setlocal command and the POSIX format like
<?php
setlocale (LC_ALL, 'de_DE');
$bolMatch = preg_match("/^[[:alpha:]]+$/", $strData);
?>
This works for any country related special character set.
Remember since the "Umlaute"-Domains have been released it's almost mandatory to change your RegExp to give those a chance to feed your forms which use "Umlaute"-Domains (e-mail and internet address).
Live can be so easy reading the manual ;-)
mikeblake a.t. akunno d.o.t net
24-Jan-2005 09:38
The author of ExtractString below has made an error (email at albert-martin dot com).
if (strpos($str_low, $start) !== false && strpos($str_lower, $end) !== false)
Should have been
if (strpos($str_low, $start) !== false && strpos($str_low, $end) !== false)
Note the slight variable name mistake at the second strpos
kalaxy at nospam dot gmail dot com
18-Jan-2005 09:20
This is another way of implimenting array_preg_match. It also shows use of the array_walk() and create_function() functions.
<?php
function array_preg_match($pattern, $subject, $retainkey = false){
$matches = ''; array_walk($subject,
create_function('$val, $key, $array',
'if (preg_match("' . $pattern . '", "$val")) $array['. ($retainkey ? '$key':'') .'] = $val;'),
&$matches);
return $matches;
}
?>
kalon mills
hfuecks at phppatterns dot com
13-Jan-2005 07:11
Note that the PREG_OFFSET_CAPTURE flag, as far as I've tested, returns the offset in bytes not characters, which may not be what you're expecting if you're using the /u pattern modifier to make the regex UTF-8 aware (i.e. multibyte characters will result in a greater offset than you expect)
29-Dec-2004 02:44
This is a constant that helps in getting a valid phone number that does not need to be in a particular format. The following is a constant that matches the following US Phone formats:
Phone number can be in many variations of the following:
(Xxx) Xxx-Xxxx
(Xxx) Xxx Xxxx
Xxx Xxx Xxxx
Xxx-Xxx-Xxxx
XxxXxxXxxx
Xxx.Xxx.Xxxx
define( "REGEXP_PHONE", "/^(\(|){1}[2-9][0-9]{2}(\)|){1}([\.- ]|)[2-9][0-9]{2}([\.- ]|)[0-9]{4}$/" );
carboffin at msn dot com
23-Dec-2004 09:54
Heres just some quick code intended to be used in validating url vars or input strings.
<?php
if(preg_match("/^[a-z0-9]/i", $file)){
}
?>
satch666 at dot nospam dot hotmail dot com
17-Dec-2004 10:53
what a lapsus! where i said 'subpattern' at my post below, replace such by 'type of number' or by 'case';
satch666 at dot nospam dot hotmail dot com
17-Dec-2004 10:44
some fix for the function valid_ipv4() proposed by selt:
if trying, for example this wrong IP: 257.255.34.6, it is got as valid IP, getting as result: 57.255.34.6
the first subpattern of numbers defined at pattern matches with '257', because '57' is a valid string for '1?\d\d' pattern; this happens because it is not added there some logic for the string limits ...;
i have tried using '^1?\d\d$', and it works, as we are saying in plain english: if the string has 3 chars, then it is starting by '1' digit and followed by other 2, ending the string there; if it has 2 chars, then both are any digit; any other case out of this 2 doesnt match the pattern; in other words, it is defined the subrange of numbers from '10' to '199'
so the function would get as this (after modifying pattern and erasing a var, called $range, not used at function):
<?
function valid_ipv4($ip_addr)
{
$num="([0-9]|^1?\d\d$|2[0-4]\d|25[0-5])";
if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))
{
return $matches[0];
} else {
return false;
}
}
?>
internet at sourcelibre dot com
03-Dec-2004 10:34
This helped me to make a mask for all french characters. Just modify the $str in ordre to find your mask.
<pre>
<?php
$str = "àÀâÂéÉèÈëËêÊîÎïÏôÔçÇ";
$strlen = strlen($str);
$array = array();
$mask = "/^[a-zA-Z";
for ($i = 0; $i < $strlen; $i++) {
$char = $str{$i};
$hexa = dechex(ord($char));
echo htmlentities($char)." = ". $hexa . "\n";
$array[$i] = $hexa;
$mask .= '\\x' . $hexa;
}
$mask .= " ]+$/";
echo $mask;
?>
</pre>
zubfatal, root at it dot dk
25-Nov-2004 07:56
<?php
function array_preg_match($strRegEx = "", $arrHaystack = NULL, $boolNewArray = 0, $boolMatchesOnly = 0) {
if (strlen($strRegEx) < 1) {
return "ERR: \$strRegEx argument is missing.";
}
elseif ((!is_array($arrHaystack)) || (!count($arrHaystack) > 0)) {
return "ERR: \$arrHaystack is empty, or not an array.";
}
else {
unset($arrTmp);
foreach($arrHaystack as $key => $value) {
if ($boolMatchesOnly) {
if (preg_match_all($strRegEx, $value, $tmpRes)) {
$arrTmp[] = $tmpRes;
}
}
else {
if (preg_match($strRegEx, $value, $tmpRes)) {
if ($boolNewArray) { $arrTmp[] = $value; }
else { $arrTmp[$key] = $value; }
}
}
}
return $arrTmp;
}
}
?>
// zubfatal
email at albert-martin dot com
23-Oct-2004 04:39
Here is a faster way of extracting a special phrase from a HTML page:
Instead of using preg_match, e.g. like this:
preg_match("/<title>(.*)<\/title>/i", $html_content, $match);
use the following:
<?php
function ExtractString($str, $start, $end) {
$str_low = strtolower($str);
if (strpos($str_low, $start) !== false && strpos($str_lower, $end) !== false) {
$pos1 = strpos($str_low, $start) + strlen($start);
$pos2 = strpos($str_low, $end) - $pos1;
return substr($str, $pos1, $pos2);
}
}
$match = ExtractString($html_content, "<title>", "</title>");
?>
j dot gizmo at aon dot at
09-Oct-2004 08:00
in reply to rchoudhury --} pinkgreetings {-- com....
the code pasted below (with the switch statement) CANNOT work.
the construct works like this
<?php
switch ($key)
{
case <expr>:
echo "1";
break;
}
switch (true)
{
case preg_match("/pattern/",$key):
blablablabla();
break;
}
?>
however, it makes no sense to compare $key to the return value of preg_match(), and calling preg_match without a second parameter is utterly senseless as well (PHP can't smell what you want to compare pattern to)
the syntax error in your regular expression is the double slash in the beginning.
(RTFM)
rchoudhury --} pinkgreetings {-- com
17-Aug-2004 11:57
I was looking for an easy way to match multiple conditions inside a switch, and preg_match() seemed like a straightforward solution:
<?php
foreach (func_get_arg(0) as $key => $value) {
switch ($key) {
case preg_match("//^(meta_keywords | meta_desc | doctype | xmlns | lang | dir | charset)$/"):
$this->g_page_vars[$key] = $value;
break 1;
case preg_match("//^(site_title|site_desc|site_css)$/"):
$this->g_page_vars[$key] = $g_site_vars[$key];
break 1;
}
}
?>
However, while it seemed to work on one server using php 4.3.8, where it accepted only one argument (pattern) and assumed the second one (subject) to be $key, another server running 4.3.8 breaks and returns an obvious warning of "Warning: preg_match() expects at least 2 parameters, 1 given".
You probably think "why not just give preg_match a second argument then?" -- well, if we were to do that it'd be $key in this context, but that returns this error: "Warning: Unknown modifier '^'". So now the regex is bad?
One possible solution may lie in php.ini settings, though since I don't have access to that file on either server I can't check and find out.
http://www.phpbuilder.com/lists/php-developer-list/2003101/0201.php has some comments and other suggestions for the same concept, namely in using:
<?php
switch(true) {
case preg_match("/regex/",$data):
}
?>
...but this doesn't address the current single argument problem.
Either way, it's a useful way of working a switch, but it might not work.
ebiven
06-Jul-2004 03:53
To regex a North American phone number you can assume NxxNxxXXXX, where N = 2 through 9 and x = 0 through 9. North American numbers can not start with a 0 or a 1 in either the Area Code or the Office Code. So, adpated from the other phone number regex here you would get:
/^[2-9][0-9]{2}[-][2-9][0-9]{2}[-][0-9]{4}$/
05-May-2004 09:23
A very simple Phone number validation function.
Returns the Phone number if the number is in the xxx-xxx-xxxx format. x being 0-9.
Returns false if missing digits or improper characters are included.
<?
function VALIDATE_USPHONE($phonenumber)
{
if ( (preg_match("/^[0-9]{3,3}[-]{1,1}[0-9]{3,3}[-]{1,1}
[0-9]{4,4}$/", $phonenumber) ) == TRUE ) {
return $phonenumber;
} else {
return false;
}
}
?>
selt
10-Feb-2004 05:11
Concerning a list of notes started on November 11; ie
<?
$num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])";
?>
It is interesting to note that the pattern matching is done using precedence from left to right, therefore; an address such as 127.0.0.127 sent to preg_match with a hash for the matched patterns would return 127.0.0.1.
so, to obtain a proper mechanism for stripping valid IPs from a string (any string that is) one would have to use:
<?
function valid_ipv4($ip_addr)
{
$num="(1?\d\d|2[0-4]\d|25[0-5]|[0-9])";
$range="([1-9]|1\d|2\d|3[0-2])";
if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))
{
return $matches[0];
} else {
return false;
}
}
?>
thanks for all the postings ! They're the best way to learn.
mark at portinc dot net
02-Feb-2004 08:30
<?php $iptables = file ('/proc/net/ip_conntrack');
$services = file ('/etc/services');
$GREP = '!([a-z]+) ' .'\\s*([^ ]+) ' .'([^ ]+) ' .'?([A-Z_]|[^ ]+)?'.' src=(.*?) ' .'dst=(.*?) ' .'sport=(\\d{1,5}) '.'dport=(\\d{1,5}) '.'src=(.*?) ' .'dst=(.*?) ' .'sport=(\\d{1,5}) '.'dport=(\\d{1,5}) '.'\\[([^]]+)\\] ' .'use=([0-9]+)!'; $ports = array();
foreach($services as $s) {
if (preg_match ("/^([a-zA-Z-]+)\\s*([0-9]{1,5})\\//",$s,$x)) {
$ports[ $x[2] ] = $x[1];
} }
for($i=0;$i <= count($iptables);$i++) {
if ( preg_match ($GREP, $iptables[$i], $x) ) {
$x[7] =(array_key_exists($x[7],$ports))?$ports[$x[7]]:$x[7];
$x[8] =(array_key_exists($x[8],$ports))?$ports[$x[8]]:$x[8];
print_r($x);
} }
?>
nico at kamensek dot de
17-Jan-2004 01:31
As I did not find any working IPv6 Regexp, I just created one. Here is it:
$pattern1 = '([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}';
$pattern2 = '[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}';
$pattern3 = '([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}';
$pattern4 = '([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}';
$pattern5 = '([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}';
$pattern6 = '([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}';
$pattern7 = '([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}';
patterns 1 to 7 represent different cases. $full is the complete pattern which should work for all correct IPv6 addresses.
$full = "/^($pattern1)$|^($pattern2)$|^($pattern3)$
|^($pattern4)$|^($pattern5)$|^($pattern6)$|^($pattern7)$/";
brion at pobox dot com
30-Nov-2003 09:35
Some patterns may cause the PCRE functions to crash PHP, particularly when dealing with relatively large amounts of input data.
See the 'LIMITATIONS' section of http://www.pcre.org/pcre.txt about this and other limitations.
thivierr at telus dot net
23-Nov-2003 03:23
A web server log record can be parsed as follows:
$line_in = '209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"';
if (preg_match('!^([^ ]+) ([^ ]+) ([^ ]+) \[([^\]]+)\] "([^ ]+) ([^ ]+) ([^/]+)/([^"]+)" ([^ ]+) ([^ ]+) ([^ ]+) (.+)!',
$line_in,
$elements))
{
print_r($elements);
}
Array
(
[0] => 209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
[1] => 209.6.145.47
[2] => -
[3] => -
[4] => 22/Nov/2003:19:02:30 -0500
[5] => GET
[6] => /dir/doc.htm
[7] => HTTP
[8] => 1.0
[9] => 200
[10] => 6776
[11] => "http://search.yahoo.com/search?p=key+words=UTF-8"
[12] => "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
)
Notes:
1) For the referer field ($elements[11]), I intentially capture the double quotes (") and don't use them as delimiters, because sometimes double-quotes do appear in a referer URL. Double quotes can appear as %22 or \". Both have to be handled correctly. So, I strip off the double quotes in a second step.
2) The URLs should be further parsed, using parse_url, which is quicker and more reliable then preg_match.
3) I assume the requested protocol (HTTP/1.1) always has a slash character in the middle, which might not always be the case, but I'll take the risk.
4) The agent field ($elments[12]) is the most unstructured field, so I make no assumptions about it's format. If the record is truncated, the agent field will not be delimited properly with a quote at the end. So, both cases must be handled.
5) A hyphen (- or "-") means a field has no value. It is necessary to convert these to appropriate value (such as empty string, null, or 0).
6) Finally, there should be appropriate code to handle malformed web log enteries, which are common, due to junk data. I never assume I've seen all cases.
nospam at 1111-internet dot com
11-Nov-2003 02:29
Backreferences (ala preg_replace) work within the search string if you use the backslash syntax. Consider:
<?php
if (preg_match("/([0-9])(.*?)(\\1)/", "01231234", $match))
{
print_r($match);
}
?>
Result: Array ( [0] => 1231 [1] => 1 [2] => 23 [3] => 1 )
This is alluded to in the description of preg_match_all, but worth reiterating here.
bjorn at kulturkonsult dot no
31-Mar-2003 07:56
I you want to match all scandinavian characters (æÆøØåÅöÖäÄ) in addition to those matched by \w, you might want to use this regexp:
/^[\w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4]+$/
Remember that \w respects the current locale used in PCRE's character tables.
kevin dot bro at hostedstuff dot com
05-Mar-2003 08:26
As Carlos points out (2 comments above) you don't want people to mess around with your regexp. To avoid this I use this preg_addslashes function:
<?
function preg_addslashes ($foo)
{
return preg_replace("/([^A-z0-9_-]|[\\\[\]])/", "\\\\\\1", $foo);
}
$foo = "([/com|et\])";
$true = preg_match ("/^".preg_addslashes($foo)."$/", $foo);
?>
| |