|
|
| Need one Regular Expression - Urgent |

|
2005-12-30 16:00:21 |
> just tested your expression and i found that it
fetches the rest of the string after the
> "href" instead of only "href"
value. i've tested your expression on this string.
>
> <a id=1a class=q
href="/imghp?hl=en&tab=wi"
onClick="return qs(this);">Images</a>
I'm not seeing that in my test...
I did notice it wasn't picking up one item because there was
a newline
character in the description area of the link. I needed to
change each
".*?" string to be "(.|n)*?" to account
for that because the period
includes all characters EXCEPT the newline character. But,
using this:
<a (.|n)*?href="?((.|n)*?)("(.|n)*?|
(.|n)*?|)>((.|n)*?)</a>
...on your above string gives me this result:
# of Matches: 1
Match #1:<a id=1a class=q
href="/imghp?hl=en&tab=wi"
onClick="return
qs(this);">Images</a>
# of SubMatches: 8
SubMatch #1:
SubMatch #2:/imghp?hl=en&tab=wi
SubMatch #3:i
SubMatch #4:" onClick="return qs(this);"
SubMatch #5:"
SubMatch #6:
SubMatch #7:Images
SubMatch #8:s
So, under this pattern, submatch #2 is your HREF and #7 is
your
description.
Testing it on your other test string with the IMG tag gives
me:
# of Matches: 1
Match #1:<a class=xs...</a>
# of SubMatches: 8
SubMatch #1:
SubMatch #2:http://av.rds.yahoo.
com/_ylt=...
SubMatch #3:.
SubMatch #4:" onMouseOver...language"
SubMatch #5:"
SubMatch #6:
SubMatch #7:AltaVista USA<img...><br>
SubMatch #8:>
Another question you had was whether the IMG tag can be
removed from
SubMatch #7. Note the "<br>" in there as
well. Plus, the IMG tag
could come before OR after the text. So, the problem should
be to
remove all HTML coding from SubMatch #7. I'd go in another
direction
-- use a RegExp replace to remove all of the HTML coding
from SubMatch
#7 AFTER it's been extracted from the main string. Then,
all you need
to do is a global replacement of this pattern to a null
string:
<(.|n)*?>
|
|
| Need one Regular Expression - Urgent |

|
2005-12-31 06:04:54 |
i'm afraid its still not working well with anchors. why dont
you test
it on "http://google.com"
html code.
right now i'm testing it on google's website html code only.
i found that there are 18 different anchors. it will provide
you almost
all possible combinations though there are few mores. but
you can
atleast start with this.
|
|
| Need one Regular Expression - Urgent |

|
2005-12-31 07:33:31 |
> i found that there are 18 different anchors...on http://google.com
There is an HTML coding error on the Google page:
<a id=7a class=q href="/lochp?hl=en&tab=wl"
onClick="return
qs(this);">Local<sup><a
style="text-decoration:none"><font
color=red>New!</font></a></sup></a&g
t;
A nested anchor, which isn't even a true anchor. The last
RegExp
pattern I gave you only finds 16 anchor matches, while
manually I find
17 anchors coded. However, since the one is nested and
incorrectly
coded, I'd have to agree with the RegExp results...at least
in this
case.
|
|
| Need one Regular Expression - Urgent |

|
2006-01-03 10:43:53 |
hi Randy,
i was working on the RegEx and i've made one. here it is:
<a.*?href=(.*?)(?((?:s.*?)>.*?</a>)(?:(?:s.*?)
>(.*?)</a>)|(?:>(.*?)</a>))
i've tested it on "http://www.google.co.in
" html code.
i've Question regarding this Expression. i've used
conditional
statement here to execute another block if the first one is
NULL. this
RegEx suppose to return matches in 2d array but it is
returning 3d
array. first value is ok but the 2nd and 3rd are coming
alternate as
per the condition executes. i want to know is there any way
to solve it
or the RegEx returns the result that way only.
please do replay as i'm now very close to it.
thanks,
Lucky
|
|
| Need one Regular Expression - Urgent |

|
2006-01-04 12:29:17 |
Hi Randy,
i was working with your expression and i found that i've
ingored
white spaces and that's why i was not able to get the same
results as
you said. and now it is giving me good results. i've also
made some
changes in that and now i'm able to get only two submatches.
all that i
wanted.
here is the expression:
<a (?:.|n)*?href=((?:.|n)*?)(?:s(?:.|n)*?|
(?:.|n)*?|)>((?:.|n)*?)</a>
|
|
| Need one Regular Expression - Urgent |

|
2006-01-05 06:23:01 |
> here is the expression (to get only two submatches):
>
> <a
(?:.|n)*?href=((?:.|n)*?)(?:s(?:.|n)*?|(?:.|n)*?|)>(
(?:.|n)*?)</a>
In my tester, that does give back only two submatches for
each match,
but the first submatch is always blank...
I'm wondering if "best practice" might be to:
1. Use a simple expression to just get out the anchor
matches
2. On each of those, use one (or two) simple expression(s)
to get the
submatches.
After all, you have to run through a loop anyway...
|
|
| Need one Regular Expression - Urgent |

|
2006-01-05 08:02:10 |
How about this? Using:
<a (.|n)*?href="?((.|n)*?)(
|"|>)(.|n)*?</a>
Should end up with a match for each anchor, with the second
submatch
being the HREF string. Then, to get the label for the link,
just do a
RegExp replace on the match, changing all:
<(.|n)*?>
(basically all HTML coding of the anchor) to "".
That should leave
you with only the text used for the link (which could be
empty if the
anchor was using an image).
|
|
| Need one Regular Expression - Urgent |

|
2006-01-05 09:20:16 |
i've tried your expression but i didnt get correct results.
there must
be something missing. what i'm getting is, first match as
null and
second with href value but rest are with something else.
even few has
submaches.
have you tried on google html code? the last one i've post
with
modification is working fine. this one has some worng
results
|
|
| Need one Regular Expression - Urgent |

|
2006-01-05 23:51:21 |
I wonder if we are testing with different implementations
for Regular
Expressions? I'm assuming you're using JavaScript? I am
using the VBA
implementation.
|
|
| Need one Regular Expression - Urgent |

|
2006-01-06 11:08:00 |
i'm using "Expresso" tool which you can find at
http://www.ultrapico.com/
a>
this tool is free so you can download it and test it. this
tool is
developed using .net framework 1.1 so you also need it on
your pc.
i'm a .NET Developer and working on desktop application. i'm
using
Regex in my application which downloads html files from the
net to show
in my browser.
in short i'm using this tool to test RegEx. and then
impliment it in
the application.
|
|