List Info

Thread: having problems capturing all text possibilites.




having problems capturing all text possibilites.
user name
2006-03-08 06:12:14
Hi I'm trying to fix an expression, to grab all sets of 2
words with a
space between them, ie: hello word (valid) C# sample (Valid)
hello.
world (Invalid) hello world. (invalid) <-- need to stop
capture before
the "." and cannot use \b since \b includes
"#"

 this is the expression I am trying to work with, so far it
takes each
word fine seperately, but it does not take into account
hello. world
and it captured hello world.


(([\x23\x41-\xff]*\x2e?[\x23\x41-\xff]+)(?:[\x21-\x
2f\x41-\xff])?\s([\x23\x41-\xff]*\x2e?[\x23\x41-\x
ff]+)(?:[\x21-\x2f\x41-\xff])?)

the other problem is that it only takes pares sequentially,

I would need to have it do the follwing

hello world this is time

(hello word) (world this) (this is) (is time)

can anyone help me out.

Best Regards,
Alexandre Brisebois
http://www.pointnets
olutions.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---

having problems capturing all text possibilites.
user name
2006-03-09 20:44:21
i think u r trying to do too much in one step. why not just
match all
pairs of words as step1 and then (as step 2) decide which is
valid,
which is not?

the way u r trying ot do it now does not work. Look at your
input:

hello word C# sample hello. world hello world.

say u successfully matched two first "valid
pairs": i.e. 'hello word'
and 'C# sample' by using pattern

(\b[\w#]+)(?!\.\x20+)(\x20+\b[\w#]+\b)(?!\.)

but the next word i.e. 'hello.' cannot be a part of a
"valid pair", so
engine will skip it and match the next pair: 'world
hello'. Is it OK
with u? OR u'd rather prefer to skip over two
"illegal words"
words as a whole, so ther would be no more matches? (b/c
both
"hello. World" and "hello world."
are illegal?


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---

having problems capturing all text possibilites.
user name
2006-03-10 06:52:05
I have solve the problem with queue type adt
and I just feed words one by one into them,
and simply call a overriden ToString()

works pretty well actually.
so I am working by grabbing each word one by one.
I still have some problems though so I will look info
refining
the way I clean html out of files and a list of specific
characters

then I match single words...

still needs optimization though.

I am truely greatful for your responces.
Best Regards, 
Alexandre Brisebois


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )