List Info

Thread: sort ignores underscores (_)???




sort ignores underscores (_)???
country flaguser name
United States
2008-02-19 09:58:57

sort behaves erratically with underscores:

  % ( echo _c; echo __; echo _a ) | sort 
  __
  _a
  _c

Here, __ < _a, which implies that _ < a, but

  % ( echo _cc; echo __b; echo _ac ) | sort 
  _ac
  __b
  _cc

Now _ac < __b < _cc, which implies that a < _.

How can I get sort to treat _ consistently?  (I don't have a
strong
preference for either _ < a or a < _ as long as it is
consistent.)

TIA!

kj

-- 
NOTE: In my address everything before the first period is
backwards;
and the last period, and everything after it, should be
discarded.

Re: sort ignores underscores (_)???
user name
2008-02-19 15:02:47
kj wrote:
> sort behaves erratically with underscores:
> 
>   % ( echo _c; echo __; echo _a ) | sort 
>   __
>   _a
>   _c

Sort uses your current locale setting (e.g. LANG) to
determine the
character collation sequence.  You probably have LANG set to
a
dictionary sort order.  In dictionary sort order case is
folded and
punctuation is ignored.

See this FAQ entry for more information:

  http://www.gnu.org/software/co
reutils/faq/#Sort-does-not-sort-in-normal-order_0021

> How can I get sort to treat _ consistently?  (I don't
have a strong
> preference for either _ < a or a < _ as long as
it is consistent.)

You can solve this by setting a standard sort order instead
of a
non-standard dictionary sort ordering locale.  "C"
(or the "POSIX"
alias) is the normal standard one.

  LANG=C sort

Personally I set the following in my own environment to get
UTF-8 but
force a standard sort ordering regardless.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

Bob



Re: sort ignores underscores (_)???
country flaguser name
United States
2008-02-20 06:20:50
In <mailman.7640.1203454971.18990.help-gnu-utilsgnu.org> bobproulx.com (Bob Proulx) writes:

>Personally I set the following in my own environment to
get UTF-8 but
>force a standard sort ordering regardless.

>  export LANG=en_US.UTF-8
>  export LC_COLLATE=C

That did the trick.  Thanks!

kynn

-- 
NOTE: In my address everything before the first period is
backwards;
and the last period, and everything after it, should be
discarded.

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )