List Info

Thread: Syntax for preg_match()




Syntax for preg_match()
country flaguser name
United States
2007-02-12 16:48:57
I'm trying to validate top-level domains with the following
patterns
in PHP's preg_match():

$pattern = '/^(com|net|org)$/'

if (preg_match($pattern, $domain))

This returns true even it the TLD is "com)"

Apparently it's matching 'com' and finding a line break, but
I want it
to match only something that is exactly 'com' (or net or
org).

How to accomplish this?


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-13 00:37:28
On Feb 12, 5:48 pm, "deko...hotmail.com"
<deko...hotmail.com> wrote:
> I'm trying to validate top-level domains with the
following patterns
> in PHP's preg_match():
>
> $pattern = '/^(com|net|org)$/'
>
> if (preg_match($pattern, $domain))
>
> This returns true even it the TLD is "com)"

no, your pattern returns false for the input $domain 'com)'.
(tested
under PHP 5.1.2)

> Apparently it's matching 'com' and finding a line
break, but I want it

this is not true, '$' is a zero-width anchor to label the
end of a
string/newline. it's not a newline and does not necessarily
need a
newline at the end. For your given pattern, "com"
is a match, but both
"comn" and "com)" are not a match..

> to match only something that is exactly 'com' (or net
or org).
> How to accomplish this?

Your pattern should have done this for you, except that you
dont need
the capturing parenthesis

  $pattern = '/^(?:com|net|org)$/';

Regards,
Xicheng


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-13 03:58:18
> > I'm trying to validate top-level domains with the
following patterns
> > in PHP's preg_match():
>
> > $pattern = '/^(com|net|org)$/'
>
> > if (preg_match($pattern, $domain))
>
> > This returns true even it the TLD is
"com)"
>
> no, your pattern returns false for the input $domain
'com)'. (tested
> under PHP 5.1.2)
>
> > Apparently it's matching 'com' and finding a line
break, but I want it
>
> this is not true, '$' is a zero-width anchor to label
the end of a
> string/newline. it's not a newline and does not
necessarily need a
> newline at the end. For your given pattern,
"com" is a match, but both
> "comn" and "com)" are not a
match..
>
> > to match only something that is exactly 'com' (or
net or org).
> > How to accomplish this?
>
> Your pattern should have done this for you, except that
you dont need
> the capturing parenthesis
>
>   $pattern = '/^(?:com|net|org)$/';

Yes, I was able to get it working.  But what does the '?:'
do in you
expression?  How is that different from what I had before?

Here's another preg_match() pattern that has me baffeled:

I'm trying to check a string for anything
non-alphanumeric... not sure
of syntax:

if (preg_match("/^[~`!#$%^&*()_+={}[]|:;'<,>.?]/",
$urlHost))

This is not working.  Any suggestions?  Do I need to escape
any of
those characters?

Thanks for your help.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-13 10:24:58

> Yes, I was able to get it working.  But what does the
'?:' do in you
> expression?  How is that different from what I had
before?

It's just not storing the match in a backreference.
The regex engine will store anything inside of parentheses
in the
event you need it for something else, like a search and
replace.  "?:"
just tells the engine it doesn't need to store that
particular match.


> if (preg_match("/^[~`!#$%^&*()_+={}[]|:;'<,>.?]/",
$urlHost))

you have a closing bracket in the middle of your character
class.  If
you want to have the closing bracket (]) as part of the
class you need
to escape it or put it immediately after the opening
bracket. "(?:[^w
d]|_)" should works also.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-13 10:51:23
On Feb 13, 4:58 am, "deko...hotmail.com"
<deko...hotmail.com> wrote:
> > > I'm trying to validate top-level domains with
the following patterns
> > > in PHP's preg_match():
>
> > > $pattern = '/^(com|net|org)$/'
>
> > > if (preg_match($pattern, $domain))
>
> > > This returns true even it the TLD is
"com)"
>
> > no, your pattern returns false for the input
$domain 'com)'. (tested
> > under PHP 5.1.2)
>
> > > Apparently it's matching 'com' and finding a
line break, but I want it
>
> > this is not true, '$' is a zero-width anchor to
label the end of a
> > string/newline. it's not a newline and does not
necessarily need a
> > newline at the end. For your given pattern,
"com" is a match, but both
> > "comn" and "com)" are not a
match..
>
> > > to match only something that is exactly 'com'
(or net or org).
> > > How to accomplish this?
>
> > Your pattern should have done this for you, except
that you dont need
> > the capturing parenthesis
>
> >   $pattern = '/^(?:com|net|org)$/';
>
> Yes, I was able to get it working.  But what does the
'?:' do in you
> expression?  How is that different from what I had
before?

As munged92 mentioned, (?:...) is non-capturing parentheses,
if you
dont want to use the contents enclosed by the parentheses
later, using
(?:...) can usually make your matching faster by saving
some
unnecessary work.

> Here's another preg_match() pattern that has me
baffeled:
>
> I'm trying to check a string for anything
non-alphanumeric... not sure
> of syntax:
>
> if (preg_match("/^[~`!#$%^&*()_+={}[]|:;'<,>.?]/",
$urlHost))

for non-alphanumberic characters, you can just use [W_]

   if (preg_match('/[W_]/', $urlHost)) { ... }

Regards,
Xicheng


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-13 11:32:00
Hi and thanks for the reply.

> for non-alphanumberic characters, you can just use
[W_]
>
>    if (preg_match('/[W_]/', $urlHost)) { ... }

Wow.  That's a lot nicer than the mess I was trying to use.

As an aside, I was looking a Perl regex reference sheet: it
appears
that W includes the underscore - is this correct?  Do I
need W_ in
PHP?

If checking for invalid characters in an Internet domain
name, I'd
need to add the dash:

if (preg_match('/[W_]|^-/', $domain)) { $invalid = true; }

"match anything that's not a word character or digit,
nor a dash"

I'm still a bit confused about grouping, however. What are
delimiters
used for? Are they always necessary? Why? And what is the
difference
between parentheses and brackets (or a group and a character
class)?

Thanks again for your help.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-13 13:19:15
On Feb 13, 12:32 pm, "deko...hotmail.com"
<deko...hotmail.com>
wrote:
> Hi and thanks for the reply.
>
> > for non-alphanumberic characters, you can just use
[W_]
>
> >    if (preg_match('/[W_]/', $urlHost)) { ... }
> Wow.  That's a lot nicer than the mess I was trying to
use.
>
> As an aside, I was looking a Perl regex reference
sheet: it appears
> that W includes the underscore - is this correct?  Do
I need W_ in
> PHP?

w which is [0-9a-zA-Z_] does include the underscore, W is
the
negation of w, so it does NOT include the underscore.. This
is the
same in both Perl and PHP. 

> If checking for invalid characters in an Internet
domain name, I'd
> need to add the dash:
>
> if (preg_match('/[W_]|^-/', $domain)) { $invalid =
true; }
>
> "match anything that's not a word character or
digit, nor a dash"

using a single character class might be clearer for your
case, for
example:

  if (preg_match('/[^0-9a-zA-Z-]/', $domain)) { ..... }

then you can group alphanumberics and dash, and then do a
'negate' on
the character class..

Be careful, '^' is a 'negation' metacharacter only if it's
the first
character in a character class. so [^-] is the negation of
dash, but
^- and [-^] are not...

> I'm still a bit confused about grouping, however. What
are delimiters
> used for? Are they always necessary? Why? And what is
the difference
> between parentheses and brackets (or a group and a
character class)?

in PHP regex functions, delimiters are used for separating
regex
pattern from modifiers(for example use 'i' modifier in
'/pattern/i' to
set the pattern case-insensitive), and it's necessary for
preg_
functions. If you are using PHP's ereg_ funcions, then you
dont need
delimiters to enclose the pattern, i.e.

  ereg('pattern', $string);
  preg_match('/pattern/', $string);
  preg_match('#pattern#', $string);
  preg_match('', $string);

the delimiters in preg_match(...) can be many choices, it's
the
forward slash '/' by defalut.

you can use parentheses (...) to group multiple WORDs,
like:

   (yes|no|up|down)

which word finally becomes the successful match usually
depends on the
regex engine(DFA,NFA,POSIX NFA). you can check the book
"Mastering
Regular Expressions"(by Jeffrey Friedl) for details, or
"Regular
Expression Pocket Reference" (by Tony Stubblebine) for
a fast
introduction..

brackets [...] is used to group CHARACTERs, so each time it
matches
only a SINGLE character, so:

  [google]    is the same as    [gole]

it matches 'g', 'o', 'l', or 'e' but not 'goo',
'google'....

> Thanks again for your help.

You are welcome. 

Regards,
Xicheng


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-16 15:41:17
One more question:

I want to validate a url path (limiting my definition of
"valid" to
paths that begin with a forward slash followed by a word
character).

$url_a = parse_url($myUrl);
$urlPath = $url_a['path'];

To make sure I have something that *could* be a url path
(not an
exhaustinve validation):

if (!preg_match('/^//w+/i', $urlPath))
{
   $urlPath = '/';
}
else
{
   while (preg_match('/(?:/W+$)/', $urlPath))
   {
      $urlPath = substr($urlPath, 0, strlen($urlPath) - 1);
   }
}

The while statement pattern seems to work as desired - trims
any
garbage off the end of the path. But should the if statement
pattern
be modified?

The desired meaning of the if statement is:

"If we can't find a pattern that starts with forward
slash and is
followed by a word character, then we have a bogus path, so
ignore
it."

Should I use:

'/(?:^/w)+/i'

instead of:

'/^//w+/i'

or perhaps, without negating the preg function:

'|(?://W).*/|'

which (I think) will ensure subdirectories meet my
criteria?

Thanks in advance.

By the way, I looked at Jeffrey Friedl's "Mastering
Regular
Expressions" - I found it far too loquacious. I think
an annotated
reference would be more helpful.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-16 16:47:00
On Feb 16, 4:41 pm, "deko...hotmail.com"
<deko...hotmail.com> wrote:
> One more question:
>
> I want to validate a url path (limiting my definition
of "valid" to
> paths that begin with a forward slash followed by a
word character).
>
> $url_a = parse_url($myUrl);
> $urlPath = $url_a['path'];
>
> To make sure I have something that *could* be a url
path (not an
> exhaustinve validation):
>
> if (!preg_match('/^//w+/i', $urlPath))

/w+ means a forward slash followed by at least one literal
letter
'w'

ITYM:  if (!preg_match('#^/w#', $urlPath))

use any special chars other than forward slashes as the
pattern
delimiter can make your expression clearer. Also 'i'
modifier here is
not necessary, since w includes both lowercase and
uppercase letters.

> {
>    $urlPath = '/';}

so if the $urlPath does not start with a forward
slash(followed by at
least one word), then you set $urlPath to be the '/'(root
path???) Is
this what you wanted??

> else
> {
>    while (preg_match('/(?:/W+$)/', $urlPath))
>    {
>       $urlPath = substr($urlPath, 0, strlen($urlPath) -
1);

This seems remove all chars from $urlPath????

>    }
>
> }
>
> The while statement pattern seems to work as desired -
trims any
> garbage off the end of the path. But should the if
statement pattern
> be modified?
>
> The desired meaning of the if statement is:
>
> "If we can't find a pattern that starts with
forward slash and is
> followed by a word character, then we have a bogus
path, so ignore
> it."
>
> Should I use:
>
> '/(?:^/w)+/i'

   '#^/w#'

is enough, I think...

> instead of:
>
> '/^//w+/i'
>
> or perhaps, without negating the preg function:
>
> '|(?://W).*/|'

try this:

  '|^(?:[^/]|/W)|'

which means:
1) not begin with '/'
2) or begin with '/', but followed by a non-word W

> which (I think) will ensure subdirectories meet my
criteria?
>
> Thanks in advance.

You are welcone

> By the way, I looked at Jeffrey Friedl's
"Mastering Regular
> Expressions" - I found it far too loquacious. I
think an annotated
> reference would be more helpful.

It may take some time|patience to get used to the writting.
but it's
definitely a great book, and you will feel it more when you
read it
the 2, 3 times...hehe...Good luck anyway.. 

have a good weekend,

Regards,
Xicheng


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


Re: Syntax for preg_match()
country flaguser name
United States
2007-02-16 17:44:23
Here's the latest:

$url_a = parse_url($myUrl);
$urlPath = $url_a['path'];
$urlPath_a = explode('/', $urlPath);

foreach ($urlPath_a as $subdir)
{
   if (preg_match('/^W/', $subdir))
   {
      //if any subdir begins with a non-word character,
      //the entire path is ignored
      $urlPath = '/';
      break;
   }
}

This would replace:

http://www.domain.com/
abc/???/!!!/bad_path/

with:

http://www.domain.com/

>    while (preg_match('/(?:/W+$)/', $urlPath))
>    {
>       $urlPath = substr($urlPath, 0, strlen($urlPath) -
1);

> This seems remove all chars from $urlPath????

The goal is to make accomodation for things like this:

(http://www.domain.com/s
ubdir/)

The problem is, in my latest code, '/)' would be considered
a
subdomain.  So I need a different way to validate the last
subdir in
the path... perhaps I could array_pop($urlPath_a) before the
foreach
loop...


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---


[1-10] [11-18]

about | contact  Other archives ( Real Estate discussion Medical topics )