michael wrote:
> Thanks very much! However it doesn't seem to work in
PHP. In fact it
> chokes the web server.
>
> I'm using this:
>
preg_match_all("/(?:<p>(?:(?!</?p>).)*?<
;/p>(?:(?!</?p>).)*?)(?=</body>)/si&quo
t;,
> $file, $match);
>
This looks work fine on your test string:
php -r '
$str = "<BODY><p>This is paragraph one.
The quick brown fox jumped
over the burning fence</p><p>This is the
second<p><p>This is the third
paragraph. Talk about repetition</p><p>This is
the second last
paragraph</p> <p>This is the last
paragraph</p> </BODY>";
preg_match_all("/(?:<p>(?:(?!</?p>).)*?<
;/p>(?:(?!</?p>).)*?)(?=</body>)/si&quo
t;,
$str, $match);
print_r($match);
'
Array
(
[0] => Array
(
[0] => <p>This is the second last
paragraph</p> <p>This is
the last paragraph</p>
)
)
But i guess it's just tooooo slow for a large test string
when there
are too many negative-lookahead tests..
> I guess if I could get all unique occurrences of each
opening and
> closing paragrah (<p>) into an array I could just
look at the last two
> results.
Yes, I think that should be a better way. if your HTMLs are
well-formated and no-nested <p> elements, and you want
to fetch only
contents within these elements, then the way you said should
be much
much easier and faster.
$n =
preg_match_all("/<p>.*?</p>/si",$str,
$match);
then print out
$match[0][$n-1], and $match[0][$n-2].
Also, I guess, some modules might do your job more robustly.
Regards,
Xicheng
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regex googlegroups.com
To unsubscribe from this group, send email to
regex-unsubscribe googlegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---
|