Hi Darsin, If we try it this way <([^>]+)>([^<]+)</$1> this will only work for tags without parameters, like <ul>something</ul> and will not work for tags with parameters like <a href=something>anything</a>
The idea is to repeat as $1 what was matched as an opening tag.
So let's enhance it so it matches tags with parameters: <(w+)(s+[^>]+)*>(
[^<]+)</$1>
_/ ______/ ___/ |
| | | `-- first
$1 $2 $3 match
base params text repeated
part of of the between
opening tag tags
tag
I have not tested it live, so there might be some flaws/typos.
But I hope you catch the idea.
On 12/23/05, Darsin <gmail.com" target="_blank">darsin gmail.com> wrote: > > Hi all > I am trying to build a regex which can get an html tag from given html
> text for eg if given: > > <P>this is a test html</P><P > class=left>saumitra</P><P>class=center>chaturvedi</P><P > class=center>darpan sinha</P>
> > then i can get above 4 tags as four seperate matches as shown below: > <P>this is a test html</P> > <P class=left>saumitra</P> > <P class=center>chaturvedi</P>
> <P class=center>darpan sinha</P> > > I have built the below string to be used in Visual Basic: > <[a-z][a-z]*d?s*(class=[a-z]*)*>[^<]+</[a-z][a-z]*d?> > Here d is used to test one or more occurences of a digit (for eg in
> case an H1, H2, etc is used) > Pattern works fine if the first closing tag after an opening tag is > same ie. <p>some text</p>. But fails if the tag is soemthing like: > <ul> <li>saumitra</li><li>chaturvedi</li></ul>
> The result i get are two matches <li>saumitra</li> and > <li>chaturvedi</li> where as i want one match > i would like to match the starting tag with the ending tag and pickup
> all the text (html or non-html) in between, including the tags as well. > Thus i should get all the text between <ul> and </ul> as one match. In > case i get all its children as the second and third match then i wont
> mind at all. > Any help in this regard would be appreciated. > >
-- best regards, Eugeny
|