Well, some of it depends upon how consistent your markers
are:
$temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
> I need to write a regex for filterin out the string
between.
AAA
BBB
CCC
> so in the above case i should have the output as:
AAAZZZZZBBB
BBBSSSSSSCCC
CCCGGGGBBB
BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB
CCC.
So you want the markers and what's between them - will there
always be a
begin/end set of markers, but just of different content?
> I have the regex for one of them but how do i do it
simultaneously for
> all 3 of them.
$temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
t =
($temp =~/(AAA)(.*?)(BBB)/g);
foreach ( t)
{
print $_;
}
So, use the alternative to create marker sets (note, you
need to add "n"
to the end of your print stmts or it'll all run together
which makes its
seem like its working ... sort of):
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my t
= ($temp =~/(AAA|BBB|CCC)(.*?)(AAA|BBB|CCC)/g);
foreach ( t) {
print "Got: ", $_, "n";
}
Sort of work - it gets:
Got: AAA
Got: ZZZZ
Got: BBB
Got: CCC
Got: GGGG
Got: BBB
you want to capture the whole shebang - so we use both the
capture parens
and, because we're using the alternative pipe "|"
, the non-capturing
parens (which are "(?:....)" ) to group our
alternatives:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my t
= ($temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g);
foreach ( t) {
print "Got: ", $_, "n";
}
Got: AAAZZZZBBB
Got: CCCGGGGBBB
But this isn't quite right as its not 'reusing' the last
marker set to be
the beginning of the first. This gets trickier, you want to
restart the
match at the marker of the previous match not just after
it. First, lets
go to the cool
while ( /.../g ) {
loop - note the change to '$1' in the print:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
print "Got: ", $1, "n";
}
Got: AAAZZZZBBB
Got: CCCGGGGBBB
Er, I have to go here but I think the proper bump
along/reset code might
be in this articles:
http://www.samag.com/documents/s=10118/sam0703i/0703i.h
tm
nope. Dang. I'll have to find it. The G marks the point of
the last
match, when you're doing a global "/g" matching
process. The "pos()"
function is the location of the current G and you can reset
that.
Something like:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
$pos = pos $temp;
print "Got ($pos):", $1, "n";
pos $temp -= 3;
}
Got (14):AAAZZZZBBB
Got (21):BBBSSSSCCC
Got (28):CCCGGGGBBB
Got (36):BBBVVVVVBBB
a
Andy Bach
Systems Mangler
Internet: andy_bach wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932
"Procrastination is like putting lots and lots of
commas in the sentence
of your life."
Ze Frank
http://lifehacker.com/softw
are/procrastination/ze-frank-on-procrastination-235859.php
a>
_______________________________________________
ActivePerl mailing list
ActivePerl listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|