List Info

Thread: Count lines with a specific pattern in a flat file




Count lines with a specific pattern in a flat file
user name
2006-09-04 09:51:45
I have a CSV file like so:

"HDR",20060629133932,"9845","9
083","0010"
1,"3","000000000690","000007&
quot;,"rsM4hJXR5Ik0O8RWghjtDBlUVAOZq7tO","
BAR","0010","","",
20.00
2,"3","000000000691","000007&
quot;,"65Xbp5dMcDFflPJnxWCrsJtV1jzcUjgd","
BAR","0010","","",
20.00
3,"3","000000000692","000007&
quot;,"SEjcf3eDA7hWmwGrNsLWoCWt1Geyh4GN","
BAR","0010","","",
20.00
4,"3","000000000693","000007&
quot;,"MJMkrp/kRMMGimeZo1uFOJzeDTVeOkFU","
BAR","0010","","",
20.00
5,"3","000000000694","000007&
quot;,"fDIBFgockQHhN+eVQxEBqqrJfZ78roja","
BAR","0010","","",
20.00
.....and so on...

Each file has about a million records or more. Instead of
iterating
through each line and counting line breaks, and ignoring
header and
footer records and counting only data records, I thought of
writing a
regex pattern for the same. Here's what I've written to
count only data
records, i.e rows that start with a number followed by a
comma and then
any othe text and ending with a line break.

numRecords =
System.Text.RegularExpressions.Regex.Matches(ret,
"(?m)^[0-9]{1, 6}*$",
System.Text.RegularExpressions.RegexOptions.Multiline).Count
;

I get a zero match collection count.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---

Count lines with a specific pattern in a flat file
user name
2006-09-04 19:18:26
Sathyaish wrote:
> I have a CSV file like so:
>
>
"HDR",20060629133932,"9845","9
083","0010"
>
1,"3","000000000690","000007&
quot;,"rsM4hJXR5Ik0O8RWghjtDBlUVAOZq7tO","
BAR","0010","","",
20.00
>
2,"3","000000000691","000007&
quot;,"65Xbp5dMcDFflPJnxWCrsJtV1jzcUjgd","
BAR","0010","","",
20.00
>
3,"3","000000000692","000007&
quot;,"SEjcf3eDA7hWmwGrNsLWoCWt1Geyh4GN","
BAR","0010","","",
20.00
>
4,"3","000000000693","000007&
quot;,"MJMkrp/kRMMGimeZo1uFOJzeDTVeOkFU","
BAR","0010","","",
20.00
>
5,"3","000000000694","000007&
quot;,"fDIBFgockQHhN+eVQxEBqqrJfZ78roja","
BAR","0010","","",
20.00
> .....and so on...
>
> Each file has about a million records or more. Instead
of iterating
> through each line and counting line breaks, and
ignoring header and
> footer records and counting only data records, I
thought of writing a
> regex pattern for the same. Here's what I've written
to count only data
> records, i.e rows that start with a number followed by
a comma and then
> any othe text and ending with a line break.
>
> numRecords =
System.Text.RegularExpressions.Regex.Matches(ret,
> "(?m)^[0-9]{1, 6}*$",
>
System.Text.RegularExpressions.RegexOptions.Multiline).Count
;
>
> I get a zero match collection count.

The pattern you provided is not doing what you thought it
should. the
asterisk '*' right behind the {1,6} quantifier means any
optional *the
last number* by the subpattern [0-9]{1,6}, so your pattern
probably
matches only pure number on a line, like

     ^1235555555555555$

>From your requirenemts, I guess you want:

    (?m)^[0-9]{1,6},.*$

which means 1-6 numbers followed by a comma and followed by
any
non-newline characters. But generally, '.*$' can be
neglected coz it
provides no more specific information to your pattern. so
I'd suggest
you using a pattern like:

    (?m)^[0-9]{1,6},

(assumed you read in the csv file in one shot. if you want
to do it in
line-mode, the (?m) modifier is not necessary)

Good luck,
Xicheng


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Regex" group.
To post to this group, send email to regexgooglegroups.com
To unsubscribe from this group, send email to
regex-unsubscribegooglegroups.com
For more options, visit this group at http://groups.go
ogle.com/group/regex
-~----------~----~----~----~------~----~------~--~---

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )