|
List Info
Thread: sort - specifying sort fields/keys.
|
|
| sort - specifying sort fields/keys. |

|
2008-04-07 23:45:59 |
I use the following on debian etch:
$ sort --version
sort (GNU coreutils) 5.97
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software. You may redistribute copies of it
under the terms of
the GNU General Public License <http://www.
gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
I wanted to sort the output of the ls command by filename
and with
directories first and regular files last.
What I came up with does not quite behave as I expected:
$ cd ~/.vim
$ ls -al --color=never | sort -k1.1,1.1r -k8f
.. and this is what I get:
total 392
drwxr-xr-x 13 gavron gavron 4096 2008-04-08 00:11 .
drwxr-xr-x 46 gavron gavron 4096 2008-04-08 00:02 ..
drwxr-xr-x 2 gavron gavron 12288 2007-07-08 12:42 colors
drwxrwxr-x 2 gavron gavron 4096 2008-02-17 14:39 doc
drwxr-xr-x 2 gavron gavron 4096 2006-05-30 00:08
ftdetect
drwxr-xr-x 2 gavron gavron 4096 2007-03-09 18:32
ftplugin
drwxr-xr-x 2 gavron gavron 4096 2007-05-28 12:46 keymap
drwxrwxr-x 3 gavron gavron 4096 2007-04-28 12:43 macros
drwxr-xr-x 2 gavron gavron 4096 2008-02-10 16:58 plugin
drwxr-xr-x 2 gavron gavron 4096 2007-10-05 20:50 spell
drwxrwxr-x 2 gavron gavron 4096 2007-07-08 10:28 syntax
drwxr-xr-x 2 gavron gavron 4096 2007-07-08 10:28 tmp
drwxr-xr-x 5 gavron gavron 4096 2005-02-21 22:20
vim2ansi
-rw-r--r-- 1 gavron gavron 2879 2007-03-25 14:59
.clewn_keys
-rw-r--r-- 1 gavron gavron 1416 2007-03-25 14:59
clewn.vim
-rw-r--r-- 1 gavron gavron 166873 2006-04-09 03:16
ColorSamplerPack.zip
-rw-r--r-- 1 gavron gavron 0 2008-04-08 00:16 efile
-rw-r--r-- 1 gavron gavron 0 2008-04-08 00:16 Efile
-rw-r--r-- 1 gavron gavron 0 2008-04-08 00:11 .efile
-rw-r--r-- 1 gavron gavron 11392 2006-04-12 21:35
manpageview.tar.gz
-rw-r--r-- 1 gavron gavron 13645 2006-06-04 18:02
prtdialog.zip
-rw-r--r-- 1 gavron gavron 123628 2006-06-11 03:45
ttcoach.zip
All fine and dandy except for one little detail:
Why does .efile appear together with my other two [Ee]file's
rather than
just below .clewn_keys?
Looks like sort is ignoring the dots .. ??
Side-effect of the "f" flag specified with my
second sort key ..?
Something to do with my locale (en_US)?
Thanks!
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-08 10:22:27 |
cga2000 wrote:
> I wanted to sort the output of the ls command by
filename and with
> directories first and regular files last.
>
> What I came up with does not quite behave as I
expected:
>
> $ ls -al --color=never | sort -k1.1,1.1r -k8f
You shouldn't ever need --color=never there unless you have
aliased ls
with ls --color=always. You don't want ls --color=always
(really you
don't) and therefore you won't need to undo it with
--color=never.
> Why does .efile appear together with my other two
[Ee]file's rather than
> just below .clewn_keys?
>
> Looks like sort is ignoring the dots .. ??
You probably have set a locale setting that ignores
punctuation.
> Side-effect of the "f" flag specified with my
second sort key ..?
No.
> Something to do with my locale (en_US)?
Yes. If a locale is set then the collation sequence for the
locale is
used. This is controlled outside of sort (and also ls and
bash and
anything else that produces sorted output) and must be
respected if
set.
http://www.gnu.org/software/co
reutils/faq/#Sort-does-not-sort-in-normal-order_0021
Apparently the people who defined the collating sequence for
the en_*
locales confused working with data on a computer with
working with
text on a computer. The locale collating sequences for en_*
ignores
punctuation and folds case by default!
I have set the following in my environment to restore a
standard sort
ordering:
export LANG=en_US.UTF-8
export LC_COLLATE=C
Bob
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-08 16:52:23 |
On Tue, Apr 08, 2008 at 11:22:27AM EDT, Bob Proulx wrote:
> cga2000 wrote:
> > I wanted to sort the output of the ls command by
filename and with
> > directories first and regular files last.
> >
> > What I came up with does not quite behave as I
expected:
> >
> > $ ls -al --color=never | sort -k1.1,1.1r -k8f
>
> You shouldn't ever need --color=never there unless you
have aliased ls
> with ls --color=always. You don't want ls
--color=always (really you
> don't) and therefore you won't need to undo it with
--color=never.
You're absolutely right.
In fact I didn't want to have to worry about the effects of
color escape
codes on sort field counting at that point and that's why I
initially
specified --color=never.
On my system "ls" is aliased to "ls
--color=auto".
Regrettably the man page provided on my system lists the
three options
(never, always, auto) but does not give any explanation as
to what they
actually do.
I did a
$ ls -al | sort -k1.1,1.1r -k8f
and the output is identical (properly sorted with dots
ignored)
OTOH, I have the "l" shortcut aliased to "ls
-alh --full-time
--color=always" and I verified that
$ l | sort -k1.1,1.1r -k8f
screws up big time .. the output is "sort of"
sorted ..
Does --color=auto mean add colors when writing to a terminal
and don't
add them when writing to a file (pipe etc.)..?
> > Why does .efile appear together with my other two
[Ee]file's rather than
> > just below .clewn_keys?
> >
> > Looks like sort is ignoring the dots .. ??
>
> You probably have set a locale setting that ignores
punctuation.
OK. This is beginning to make "sense".
> > Side-effect of the "f" flag specified
with my second sort key ..?
>
> No.
>
> > Something to do with my locale (en_US)?
>
> Yes. If a locale is set then the collation sequence
for the locale is
> used. This is controlled outside of sort (and also ls
and bash and
> anything else that produces sorted output) and must be
respected if
> set.
>
> http://www.gnu.org/software/co
reutils/faq/#Sort-does-not-sort-in-normal-order_0021
Thanks. Good doc.
> Apparently the people who defined the collating
sequence for the en_*
> locales confused working with data on a computer with
working with
> text on a computer. The locale collating sequences for
en_* ignores
> punctuation and folds case by default!
Given the symptoms and the nature of sort I would probably
never have
figured that out myself. That there may be circumstances
where this
comes in handy I do not doubt .. But as to making it the
default for one
of (if not the) most widely-used locales?
> I have set the following in my environment to restore a
standard sort
> ordering:
>
> export LANG=en_US.UTF-8 export LC_COLLATE=C
I gave up on UTF-8 because I use mostly ELinks for browsing
and afaik
it's not UTF-8 ready.
I tested with LC_ALL=POSIX (as recommended in your document)
and the
"." was still being ignored.
So I issued the above export commands and (magically) data
was sorted
as data ..
Thank you very much for your clarification.
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-08 18:42:21 |
cga2000 wrote:
> Bob Proulx wrote:
> > You shouldn't ever need --color=never there unless
you have aliased ls
> > with ls --color=always. You don't want ls
--color=always (really you
> > don't) and therefore you won't need to undo it
with --color=never.
>
> You're absolutely right.
>
> In fact I didn't want to have to worry about the
effects of color escape
> codes on sort field counting at that point and that's
why I initially
> specified --color=never.
It was pointed out to me in an offlist comment that
sometimes people
*do want* ls --color=always. Such as when forcing it and
piping it
into less with 'ls --color=always | less -R'. (Oh, I guess,
yes.
> On my system "ls" is aliased to "ls
--color=auto".
>
> Regrettably the man page provided on my system lists
the three options
> (never, always, auto) but does not give any explanation
as to what they
> actually do.
The man pages are great for quick reference of major
features. But
the primary documentation for most GNU software is in the
info pages.
info coreutils 'ls invocation'
`--color [=WHEN]'
Specify whether to use color for distinguishing file
types. WHEN
may be omitted, or one of:
* none - Do not use color at all. This is the
default.
* auto - Only use color if standard output is a
terminal.
* always - Always use color.
Specifying `--color' and no WHEN is equivalent to
`--color=always'.
Piping a colorized listing through a pager like
`more' or `less'
usually produces unreadable results. However, using
`more -f'
does seem to work.
> I did a
>
> $ ls -al | sort -k1.1,1.1r -k8f
>
> and the output is identical (properly sorted with dots
ignored)
Based upon locale setting, right?
> > http://www.gnu.org/software/co
reutils/faq/#Sort-does-not-sort-in-normal-order_0021
>
> Thanks. Good doc.
The descriptions usually talk about LC_ALL because it gets
complicated. Really it is intended to set LANG. But LANG
is
overridden by LC_COLLATE and so setting LANG may have no
effect. But
LC_COLLATE is again overridden by LC_ALL. Saying all of
that in the
quick docs gets complicated and still doesn't really
describe things
like how it interacts with LC_CTYPE. I have no idea what
(possibly
bad) effects there will be for setting an incompatible
combination of
LANG, LC_CTYPE and LC_COLLATE will have on some languages.
So it
simpler just to describe LC_ALL=C as the biggest possible
lever. But
normally one would only set the lower priority locale vars
such as
LANG and possibly LC_COLLATE such as I have done.
> > Apparently the people who defined the collating
sequence for the en_*
> > locales confused working with data on a computer
with working with
> > text on a computer. The locale collating
sequences for en_* ignores
> > punctuation and folds case by default!
>
> Given the symptoms and the nature of sort I would
probably never have
> figured that out myself. That there may be
circumstances where this
> comes in handy I do not doubt .. But as to making it
the default for one
> of (if not the) most widely-used locales?
It certainly annoys me. But they didn't consult me when the
collating
sequence was chosen.
> I gave up on UTF-8 because I use mostly ELinks for
browsing and afaik
> it's not UTF-8 ready.
How does ELinks compare to Links, Lynx, or w3m for UTF-8
support? I
only use them for basic plain text us-ascii pages and so
can't judge.
> I tested with LC_ALL=POSIX (as recommended in your
document) and the
> "." was still being ignored.
Hmm... Works for me. Please double check everything.
$ touch .baz .foo bar baz foa foo foz
$ LC_ALL=en_US.UTF-8 ls -A1
bar
baz
.baz
foa
foo
.foo
foz
$ LC_ALL=C ls -A1
.baz
.foo
bar
baz
foa
foo
foz
> So I issued the above export commands and (magically)
data was sorted
> as data ..
Oh good.
> Thank you very much for your clarification.
Glad to help,
Bob
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-09 12:39:09 |
On Tue, Apr 08, 2008 at 07:42:21PM EDT, Bob Proulx wrote:
> cga2000 wrote:
> > Bob Proulx wrote:
[..]
> The man pages are great for quick reference of major
features. But
> the primary documentation for most GNU software is in
the info pages.
Sometimes I wish the humongous gnu/screen man page were
available in a
more clearly structured format like texinfo..
But as to the man pages being seen as a quick reference .. I
would tend
to disagree .. that's more the role of the --help option.
But in any case, for hobbyists like myself who pretty much
live on
borrowed time .. it can be really frustrating to have to
spend more time
and energy hunting for the adhoc doc than actually reading
it (surely
you've heard of the debian/gnu wars re: doc licensing, and
its impact on
the availability of recent texinfo docs in .deb format
right?)
> info coreutils 'ls invocation'
>
> `--color [=WHEN]'
> Specify whether to use color for distinguishing
file types. WHEN
> may be omitted, or one of:
> * none - Do not use color at all. This is
the default.
> * auto - Only use color if standard output is
a terminal.
> * always - Always use color.
> Specifying `--color' and no WHEN is equivalent
to `--color=always'.
> Piping a colorized listing through a pager like
`more' or `less'
> usually produces unreadable results. However,
using `more -f'
> does seem to work.
So I guessed right .. --color=auto, I mean .. Seem to be
getting the
hang of it!
> > I did a
> >
> > $ ls -al | sort -k1.1,1.1r -k8f
> >
> > and the output is identical (properly sorted with
dots ignored)
>
> Based upon locale setting, right?
Yes. Still getting dot-files intermixed with non-dot files.
>
> > > http://www.gnu.org/software/co
reutils/faq/#Sort-does-not-sort-in-normal-order_0021
> >
> > Thanks. Good doc.
>
> The descriptions usually talk about LC_ALL because it
gets
> complicated. Really it is intended to set LANG. But
LANG is
> overridden by LC_COLLATE and so setting LANG may have
no effect. But
> LC_COLLATE is again overridden by LC_ALL. Saying all
of that in the
> quick docs gets complicated and still doesn't really
describe things
> like how it interacts with LC_CTYPE. I have no idea
what (possibly
> bad) effects there will be for setting an incompatible
combination of
> LANG, LC_CTYPE and LC_COLLATE will have on some
languages. So it
> simpler just to describe LC_ALL=C as the biggest
possible lever. But
> normally one would only set the lower priority locale
vars such as
> LANG and possibly LC_COLLATE such as I have done.
My understanding of the problem is just too limited at this
point for me
to seriously consider looking into the
solutions/implementation.
:-(
> > > Apparently the people who defined the
collating sequence for the en_*
> > > locales confused working with data on a
computer with working with
> > > text on a computer. The locale collating
sequences for en_* ignores
> > > punctuation and folds case by default!
> >
> > Given the symptoms and the nature of sort I would
probably never have
> > figured that out myself. That there may be
circumstances where this
> > comes in handy I do not doubt .. But as to making
it the default for one
> > of (if not the) most widely-used locales?
>
> It certainly annoys me. But they didn't consult me
when the collating
> sequence was chosen.
>
> > I gave up on UTF-8 because I use mostly ELinks
for browsing and afaik
> > it's not UTF-8 ready.
>
> How does ELinks compare to Links, Lynx, or w3m for
UTF-8 support? I
> only use them for basic plain text us-ascii pages and
so can't judge.
I'm no expert obviously but I believe that ELinks supports
ISO8859-1 ..
and that's it. IOW, I have not had any issues with most
pages that hail
from Western Europe.. That's about my level of expertise in
this area.
I am under the impression that w3m (patched?) supports UTF-8
but I have
not investigated this.
As to the other two I have never used them.
As to ELinks, since I am pretty much allergic to the GUI
model and have
chosen to limit my browsing to text it pretty much meets my
needs. My
main concern is that it doesn't look like there's a lot
going on
development-wise .. so if you're not already using it I
wouldn't
recommend switching ..
> > I tested with LC_ALL=POSIX (as recommended in your
document) and the
> > "." was still being ignored.
>
> Hmm... Works for me. Please double check everything.
>
> $ touch .baz .foo bar baz foa foo foz
>
> $ LC_ALL=en_US.UTF-8 ls -A1
> bar
> baz
> .baz
> foa
> foo
> .foo
> foz
>
> $ LC_ALL=C ls -A1
> .baz
> .foo
> bar
> baz
> foa
> foo
> foz
OK. I'll test again with LC_ALL=POSIX.
>
> > So I issued the above export commands and
(magically) data was sorted
> > as data ..
>
> Oh good.
>
> > Thank you very much for your clarification.
>
> Glad to help,
Thanks, Bob.
BTW .. those awk one-liners that you helped me write a while
ago really
do a great job!
Thanks for that as well.
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-09 15:37:46 |
cga2000 wrote:
> Bob Proulx wrote:
> > The man pages are great for quick reference of
major features. But
> > the primary documentation for most GNU software is
in the info pages.
>
> Sometimes I wish the humongous gnu/screen man page were
available in a
> more clearly structured format like texinfo..
That one is a little long!
On the side of the good examples I think the GNU 'make'
texinfo manual
is a good example of a well written user manual. It really
is much
more effective than a man page.
> But as to the man pages being seen as a quick reference
.. I would tend
> to disagree .. that's more the role of the --help
option.
Many projects generate the man pages from the --help output
using
help2man. So in those cases there won't be any difference
between the
two. The advantage is that man pages will be available when
otherwise
nothing would be available.
> But in any case, for hobbyists like myself who pretty
much live on
> borrowed time .. it can be really frustrating to have
to spend more time
> and energy hunting for the adhoc doc than actually
reading it (surely
> you've heard of the debian/gnu wars re: doc licensing,
and its impact on
> the availability of recent texinfo docs in .deb format
right?)
It is a tragedy that the two largest and most active free
software
communities are on opposing sides of this issue.
> So I guessed right .. --color=auto, I mean .. Seem to
be getting the
> hang of it!
Yeah!
> BTW .. those awk one-liners that you helped me write a
while ago really
> do a great job!
Glad to be of help.
Bob
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-12 23:46:52 |
cga2000 wrote:
> $ info coreutils sort
>
> .. rather than,
>
> $ info sort
Yes. But try that with 'pr'. The problem is that it
matches a
substring of "printing" too. Instead try 'pr
invocation' to get to
the pr page.
$ info coreutils "pr invocation"
> .. as "man sort" curiously recommends ..
and there you have it, the
> compleat gnu sort manual!
Newer versions now recommend the full "sort
invocation".
> No idea if this was part of my base install, or I
fished it out of
> some non-debian repository and forgot all about it.
I haven't researched it to root cause myself but many bug
reports
indicate that it is an install-info problem in building the
info
directory. Note that the Debian install-info isn't the same
as the
GNU install-info. This is getting addressed.
Bob
|
|
| Re: sort - specifying sort fields/keys. |

|
2008-04-12 14:03:24 |
On Wed, Apr 09, 2008 at 04:37:46PM EDT, Bob Proulx wrote:
> cga2000 wrote:
> > Bob Proulx wrote:
[..]
> It is a tragedy that the two largest and most active
free software
> communities are on opposing sides of this issue.
I researched this further and found that:
1. There is a nicely structured version of the gnu screen
manual in
texinfo format on my debian system. I mean since it's
"gnu screen"
there had to be an info version of the manual.
2. The texinfo coreutils package is also available on my
debian "etch"
system .. and that includes proper documentation of the
sort utility
So you only need to type:
$ info coreutils sort
.. rather than,
$ info sort
.. as "man sort" curiously recommends .. and
there you have it, the
compleat gnu sort manual!
No idea if this was part of my base install, or I fished
it out of
some non-debian repository and forgot all about it.
3. There is a rather nice info browser called
"pinfo" that uses color
(only 8/16 of them unfortunately -- yes, I like
"pretty" and text
displayed in cyan, magenta .. etc. is not really my
favorite) .. but
more importantly pinfo has default keyboard keys that
match my vimmer
habits i.e. you navigate the links to the different nodes
via "hjkl"
and <Enter> so I can focus on reading the doc
rather than navigation
aspects. Also, hitting <Enter> on a www/mail link
launches my
browser or mailer automatically .. neat!
A bit OT, that.. but I thought worth the few extra lines.
|
|
[1-8]
|
|