List Info

Thread: Specifying multiple separators via FS or the -F command line flag




Specifying multiple separators via FS or the -F command line flag
user name
2007-12-02 13:09:23
I wrote a bunch of one-liners that display system-monitoring
info in my
gnu/screen hardstatus line.

Here's some screenshots and my "backticks":

http://www.geo
cities.com/fcky1000/fcky

This one monitors network interface eth0:

backtick 4 0  0  awk -F "[: ]" 'BEGIN             
                
{file="/proc/net/dev";while(a==a){while(getline<
;file!=0){if        
($3=="eth0"){d=$4;u=$12;printf " %4.1f k/s
",((d-dp)/1024);printf  
"%4.1f
k/sn",((u-up)/1024)}};close(file);dp=d;up=u;system(&qu
ot;sleep  
1")}}'

NB. I do realize that when one-liners get to be this long
it's probably
    time to move them to a file 

The following two blocks are the left/right half of the
input file:

$ cat /proc/net/dev

Inter-|   Receive                                           
    
 face |bytes    packets errs drop fifo frame compressed
multicast
    lo:649004371  349860    0    0    0     0          0    
    
  eth0:673423996 22297294  789    0 1100   789          0   
 <-

|  Transmit
|bytes    packets errs drop fifo colls carrier compressed
0 649004371  349860    0    0    0     0       0          0
 0 50756739 3811203    0    0    0     0 3465469          0

There is one annoying problem:

When I bring up the interface the byte counter is separated
from
field "eth0:" by a number of spaces.

After a while as the value become larger, and eventually all
the
spaces disappear to make room for the increased number of
bytes.

I thought I would be able to get away with the following
tactics:

Specifying space OR colon as the separator so that it
doesn't matter if
the eth0 line contains:

<space><space><eth0><:><space>
<space> .. <byte-counter>

.. or

<space><space><eth0><:><byte-coun
ter>

What I hoped was that since "eth0" is followed by
":" and zero, one,
two, three, etc. <spaces> this would work with $4
always mapping to the
byte-counter field, thus saving me some involved parsing
that somewhat
defeats the purpose of using awk in the first place.

Well, unfortunately, it works when there are no spaces
between the
colon and the byte counter but it doesn't when there are
intervening
spaces.

I don't know whether this is due to the fact that there's
something
wrong with my -F flag syntax (maybe it does not mean
"colon or space"
or whether this is due to awk testing a single colon or
space, not any
succession of colons and spaces.

IOW, if there are three intervening spaces between
":" and the byte
counter, the three spaces count as $4, $5, and $6 .. and my
byte
counter becomes $7.

Naturally since there is room for 9 digits and therefore up
to 8 spaces,
this second explanation would make an initially trivial
problem rather
more complicated and possibly not a good candidate for awk.

I thought I'd use the above to illustrate my problem rather
than a test
file as Bob Proulx suggested in his reply to my initial post
in the
bug-gnu-utils list because I have no knowledge of awk apart
from reading
through an online tutorial a couple of months ago and I
might have made
some assumptions that would have confused the issue
further.

Thank you.






Re: Specifying multiple separators via FS or the -F command line flag - addendum
user name
2007-12-02 14:59:59
Here's a sample of how the multiple separators feature
behaves:

[15:52:17][gavronturki:~]$ echo " one: two:three :four
five" | awk -F "[: ]" '{print "1
"$1; print "2 "$2; print "3 "$3;
print "4 "$4; print "5 "$5; print
"6 "$6;print "7 "$7;print "8
"$8}'
1
2 one
3
4 two
5 three
6
7 four
8 five

Doesn't seem very logical to me.  

When awk successfully tests for space or colon, the
following characters
are assumed NOT to be separators even if they have been
defined as such
via the -F flag -- eg. the <space> that follows
"one:" is mapped to the
$3 variable.

Is this the way it's supposed to work?






Re: Specifying multiple separators via FS or the -F command line flag - addendum
user name
2007-12-03 22:32:02
cga2000 wrote:
> Here's a sample of how the multiple separators feature
behaves:
> 
> [15:52:17][gavronturki:~]$ echo " one: two:three :four
five" | awk -F "[: ]" '{print "1
"$1; print "2 "$2; print "3 "$3;
print "4 "$4; print "5 "$5; print
"6 "$6;print "7 "$7;print "8
"$8}'

Thanks for the small example.  (I just read your last
posting and will
probably respond to it but this one was much easier.)

> 1
> 2 one
> 3
> 4 two
> 5 three
> 6
> 7 four
> 8 five
> 
> Doesn't seem very logical to me.  

Each field separator is splitting a field.  So for example
-F_ on
"___" would delimit four fields.  But before we do
down this path I
know what you want and we are going to do it differently to
get there.

> When awk successfully tests for space or colon, the
following characters
> are assumed NOT to be separators even if they have been
defined as such
> via the -F flag -- eg. the <space> that follows
"one:" is mapped to the
> $3 variable.
> 
> Is this the way it's supposed to work?

The way it is supposed to work is defined here:

  http://www.opengroup.org/onlinepubs/009695399/ut
ilities/awk.html

Search for the section "Regular Expressions" where
the the FS ERE is
discussed.

An extended regular expression can be used to separate
fields by using
the -F ERE option or by assigning a string containing the
expression
to the built-in variable FS. The default value of the FS
variable
shall be a single <space>. The following describes FS
behavior:

   1. If FS is a null string, the behavior is unspecified.
   2. If FS is a single character:
         a. If FS is <space>, skip leading and
trailing <blank>s;
            fields shall be delimited by sets of one or more
<blank>s.
         b. Otherwise, if FS is any other character c,
fields shall be
            delimited by each single occurrence of c.
   3. Otherwise, the string value of FS shall be considered
to be an
      extended regular expression. Each occurrence of a
sequence
      matching the extended regular expression shall delimit
fields.

As you can see the default splitting behavior on a single
space is
done as a one-off special.  The space is different than any
other
field separator.

What you probably want is option 3 above where the field
separator is
an extended regular expression.  Try this:

  echo " one: two:three :four five" | awk -F
"[: ]+" '{print "1 "$1; print "2
"$2; print "3 "$3; print "4 "$4;
print "5 "$5; print "6 "$6;print "7
"$7;print "8 "$8}'
  1 
  2 one
  3 two
  4 three
  5 four
  6 five
  7 
  8 

The -F"[: ]+" has a "+" now and will
match one or more occurrences of
either character.  But there is still a difference because
leading
field separators are not trimmed.  There are a couple of
ways of
dealing with that but neither are particularly elegant.

  echo " one: two:three :four five" | awk -F
"[: ]+" '{sub(FS,"",$0);print "1
"$1; print "2 "$2; print "3 "$3;
print "4 "$4; print "5 "$5; print
"6 "$6;print "7 "$7;print "8
"$8}'
  1 one
  2 two
  3 three
  4 four
  5 five
  6 
  7 
  8 

This does a substitution across the line for the FS
variable.  That is
the same as sub(/[: ]+/,"",$0); here but using FS
ties it to -F
nicely.  The $0 can be omitted in this but I like to be
explicit.

Hope this helps,
Bob



Re: Specifying multiple separators via FS or the -F command line flag
user name
2007-12-03 23:03:55
cga2000 wrote:
> backtick 4 0  0  awk -F "[: ]" 'BEGIN        
                     
>
{file="/proc/net/dev";while(a==a){while(getline<
;file!=0){if        
> ($3=="eth0"){d=$4;u=$12;printf " %4.1f
k/s ",((d-dp)/1024);printf  
> "%4.1f
k/sn",((u-up)/1024)}};close(file);dp=d;up=u;system(&qu
ot;sleep  
> 1")}}'

Oh, gosh, that is hard to read!  But of course you already
knew that. 

> NB. I do realize that when one-liners get to be this
long it's probably
>     time to move them to a file 

Yes.

>     lo:649004371  349860    0    0    0     0         
0         
>   eth0:673423996 22297294  789    0 1100   789         
0     <-
> When I bring up the interface the byte counter is
separated from
> field "eth0:" by a number of spaces.
> After a while as the value become larger, and
eventually all the
> spaces disappear to make room for the increased number
of bytes.

I tend to pre-process these cases to guarantee space
separation.

  line=' eth0:673423996 22297294  789    0 1100   789       
  0'
  echo "$line" | awk
'{sub(/(eth[[:digit:]]|lo):/,"& ");print $2}'
  673423996

  line='  lo:649004371  349860    0    0    0     0         
0'
  echo "$line" | awk
'{sub(/(eth[[:digit:]]|lo):/,"& ");print $2}'
  673423996

I would try that technique.  Here I have used an ERE to
match either
of the cases presented.  It is still not very general
though.  I would
tend to get the list of devices dynamically.

I don't know if this is a great way to do this but so far I
have
noticed that every type of system I have tried reports
header lines in
capital letters.  I can use this to detect which lines are
device
lines.

  netstat -ni | awk '/^[[:lower:]]/{print$1}' | sort -u
  eth1
  eth2
  lo

I would probably use that to build the match regular
expression.  But
I will leave that as an exercise for the reader.  

> I thought I'd use the above to illustrate my problem
rather than a test
> file as Bob Proulx suggested in his reply to my initial
post in the
> bug-gnu-utils list because I have no knowledge of awk
apart from reading
> through an online tutorial a couple of months ago and I
might have made
> some assumptions that would have confused the issue
further.

But the full case is often very intimidating.  That big
block of awk
one-liner is too dense.  It scares people off.

I would tend not to use "only" awk here.  My own
preference is to use
shell scripts and use awk in pieces.  Also, instead of
parsing /proc I
would use netstat because it is more portable.  I am not
going to
finish this off but you can see where I am going with this:

  netstat -ni | awk '/^[[:lower:]]/{print$1,$4}'

Hope that helps,
Bob



Re: Specifying multiple separators via FS or the -F command line flag
user name
2007-12-03 23:04:59
Bob Proulx wrote:
>   line='  lo:649004371  349860    0    0    0     0    
     0'
>   echo "$line" | awk
'{sub(/(eth[[:digit:]]|lo):/,"& ");print $2}'
>   673423996

Sorry for the cut-n-paste error.  But regardless of the
error I am
sure you get the idea.

Bob



Re: Specifying multiple separators via FS or the -F command line flag - addendum
user name
2007-12-04 19:04:29
On Mon, Dec 03, 2007 at 11:32:02PM EST, Bob Proulx wrote:
> cga2000 wrote:
> > Here's a sample of how the multiple separators
feature behaves:
> > 
> > [15:52:17][gavronturki:~]$ echo " one:
two:three :four five" | awk -F "[: ]" '{print
"1 "$1; print "2 "$2; print "3
"$3; print "4 "$4; print "5 "$5;
print "6 "$6;print "7 "$7;print "8
"$8}'
> 
> Thanks for the small example.  (I just read your last
posting and will
> probably respond to it but this one was much easier.)
> 
> > 1
> > 2 one
> > 3
> > 4 two
> > 5 three
> > 6
> > 7 four
> > 8 five
> > 
> > Doesn't seem very logical to me.  

Maybe I meant "intuitive" .. except that this
overloaded term has become
such as private joke where I'm concerned that I tend to
instinctively
avoid it .. 

Intuition means that in very common situations where you're
parsing
text--and since the default FS is <space> .. it would
seem rather
"natural" to default to a behavior where two or
three or even four
spaces .. e.g. ..  only count as one separator.

??

> Each field separator is splitting a field.  So for
example -F_ on
> "___" would delimit four fields.  But before
we do down this path I
> know what you want and we are going to do it
differently to get there.
> 
> > When awk successfully tests for space or colon,
the following characters
> > are assumed NOT to be separators even if they have
been defined as such
> > via the -F flag -- eg. the <space> that
follows "one:" is mapped to the
> > $3 variable.
> > 
> > Is this the way it's supposed to work?
> 
> The way it is supposed to work is defined here:
> 
>   http://www.opengroup.org/onlinepubs/009695399/ut
ilities/awk.html
> 
> Search for the section "Regular Expressions"
where the the FS ERE is
> discussed.
> 
> An extended regular expression can be used to separate
fields by using
> the -F ERE option or by assigning a string containing
the expression
> to the built-in variable FS. The default value of the
FS variable
> shall be a single <space>. The following
describes FS behavior:
> 
>    1. If FS is a null string, the behavior is
unspecified.
>    2. If FS is a single character:
>          a. If FS is <space>, skip leading and
trailing <blank>s;
>             fields shall be delimited by sets of one or
more <blank>s.
>          b. Otherwise, if FS is any other character c,
fields shall be
>             delimited by each single occurrence of c.
>    3. Otherwise, the string value of FS shall be
considered to be an
>       extended regular expression. Each occurrence of a
sequence
>       matching the extended regular expression shall
delimit fields.
> 
> As you can see the default splitting behavior on a
single space is
> done as a one-off special.  The space is different than
any other
> field separator.

Quite "logical".

I am not a programmer and have very little time to dedicate
to the *nix
playground.  So when I have to, I grab the first online
tutorial that
makes sense and try to make the language work for me.  

Otherwise with maybe 6-8 hours a week devoted to computing
in general, I
would get nowhere.

> What you probably want is option 3 above where the
field separator is
> an extended regular expression.  Try this:
> 
>   echo " one: two:three :four five" | awk -F
"[: ]+" '{print "1 "$1; print "2
"$2; print "3 "$3; print "4 "$4;
print "5 "$5; print "6 "$6;print "7
"$7;print "8 "$8}'
>   1 
>   2 one
>   3 two
>   4 three
>   5 four
>   6 five
>   7 
>   8 

> The -F"[: ]+" has a "+" now and
will match one or more occurrences of
> either character.  

I like that.

> But there is still a difference because leading field
separators are
> not trimmed.  

But this doesn't make sense .. 

I mean .. "-F [: ]+" tells awk that "   
" eg. is a separator .. so
something like "  : :   " should be one big
separator & should become
part of the implicit "beginning of line"
separator, no ..??

As a result something like:

  :  ::   f1 f2 f3

.. should have strings "f1" "f2"
"f3" map to $1 $2 $3. 

??

> There are a couple of ways of
> dealing with that but neither are particularly
elegant.
> 
>   echo " one: two:three :four five" | awk -F
"[: ]+" '{sub(FS,"",$0);print "1
"$1; print "2 "$2; print "3 "$3;
print "4 "$4; print "5 "$5; print
"6 "$6;print "7 "$7;print "8
"$8}'
>   1 one
>   2 two
>   3 three
>   4 four
>   5 five
>   6 
>   7 
>   8 
> 
> This does a substitution across the line for the FS
variable.  That is
> the same as sub(/[: ]+/,"",$0); here but
using FS ties it to -F
> nicely.  The $0 can be omitted in this but I like to be
explicit.

> Hope this helps,

So little time .. too much stuff .. 



Re: Specifying multiple separators via FS or the -F command line flag
user name
2007-12-04 19:17:35
On Tue, Dec 04, 2007 at 12:03:55AM EST, Bob Proulx wrote:
> cga2000 wrote:
> > backtick 4 0  0  awk -F "[: ]" 'BEGIN   
                          
> >
{file="/proc/net/dev";while(a==a){while(getline<
;file!=0){if        
> > ($3=="eth0"){d=$4;u=$12;printf "
%4.1f k/s ",((d-dp)/1024);printf  
> > "%4.1f
k/sn",((u-up)/1024)}};close(file);dp=d;up=u;system(&qu
ot;sleep  
> > 1")}}'
> 
> Oh, gosh, that is hard to read!  But of course you
already knew that. 

That's why I had a link to that geocities site of
"mine" .. with all
the screen backticks .. on one line they look a bit better.

> > NB. I do realize that when one-liners get to be
this long it's probably
> >     time to move them to a file 
> 
> Yes.
> 
> >     lo:649004371  349860    0    0    0     0     
    0         
> >   eth0:673423996 22297294  789    0 1100   789    
     0     <-
> > When I bring up the interface the byte counter is
separated from
> > field "eth0:" by a number of spaces.
> > After a while the value become larger, and
eventually all the
> > spaces disappear to make room for the increased
number of bytes.
> 
> I tend to pre-process these cases to guarantee space
separation.
> 
>   line=' eth0:673423996 22297294  789    0 1100   789  
       0'
>   echo "$line" | awk
'{sub(/(eth[[:digit:]]|lo):/,"& ");print $2}'
>   673423996
> 
>   line='  lo:649004371  349860    0    0    0     0    
     0'
>   echo "$line" | awk
'{sub(/(eth[[:digit:]]|lo):/,"& ");print $2}'
>   673423996
> 
> I would try that technique.  Here I have used an ERE to
match either
> of the cases presented.  It is still not very general
though.  I would
> tend to get the list of devices dynamically.
> 
> I don't know if this is a great way to do this but so
far I have
> noticed that every type of system I have tried reports
header lines in
> capital letters.  I can use this to detect which lines
are device
> lines.
> 
>   netstat -ni | awk '/^[[:lower:]]/{print$1}' | sort
-u
>   eth1
>   eth2
>   lo
> 
> I would probably use that to build the match regular
expression.  But
> I will leave that as an exercise for the reader.  
> 
> > I thought I'd use the above to illustrate my
problem rather than a test
> > file as Bob Proulx suggested in his reply to my
initial post in the
> > bug-gnu-utils list because I have no knowledge of
awk apart from reading
> > through an online tutorial a couple of months ago
and I might have made
> > some assumptions that would have confused the
issue further.
> 
> But the full case is often very intimidating.  That big
block of awk
> one-liner is too dense.  It scares people off.

 

> I would tend not to use "only" awk here.  My
own preference is to use
> shell scripts and use awk in pieces.  

I wonder if I could write this as a bash function. 

> Also, instead of parsing /proc I would use netstat
because it is more
> portable.  

Looks like using netstat actually solves all the annoying
parsing issues
as well .. since the output is (hopefully) always
"neat" with the byte
counters in the right places?

> I am not going to finish this off but you can see where
I am going
> with this:

>   netstat -ni | awk '/^[[:lower:]]/{print$1,$4}'

I'll rewrite it next weekend with netstat providing the data
in a more
manageable format.

Thank you very much for all your help.



Re: Specifying multiple separators via FS or the -F command line flag
user name
2007-12-11 19:24:32
On Tue, Dec 04, 2007 at 12:03:55AM EST, Bob Proulx wrote:

[..]

> I would tend not to use "only" awk here.  My
own preference is to use
> shell scripts and use awk in pieces.  Also, instead of
parsing /proc I
> would use netstat because it is more portable.  I am
not going to
> finish this off but you can see where I am going with
this:
> 
>   netstat -ni | awk '/^[[:lower:]]/{print$1,$4}'

Sorry for the delay.

I briefly looked into the netstat approach but pretty much
ran into a
wall.

The problem is that I need to display the delta between two
successive
counts .. which means that I need to store the i-1 values of
my byte
counts somewhere they can be accessed next time around.

Can't seem to figure how this could be done .. either in awk
or with a
mixture of bash and awk.

Closest I've come to a solution requires temp files .. real
ugly.





[1-8]

about | contact  Other archives ( Real Estate discussion Medical topics )