List Info

Thread: Group ordering and comparison




Group ordering and comparison
user name
2006-06-14 23:42:08
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I need a little help proving a conjecture I've come up with
in relation
to ordering and sorting.. not sure where else to go with
this but since
I'm using it in kernel code I might as well ask.

I am currently working on an in-memory compression patch for
Linux 2.6,
and need a way to determine which compressed data to throw
into the swap
file.  I decided to do it by the least compressible pages
first, LCI
(lowest compression index, where CI == PAGE_SIZE -
COMPRESSED_PAGE_SIZE).

To this end I have the problem of ordering pages, grouping
them for
efficient compression, and destroying and regrouping groups
to increase
efficiency.  I have come up with an algorithm, but it relies
on a
logical conjecture I came up with in about 5 minutes and
spent the past
hour attempting to disprove.  As it stands I believe it is
true but I
lack proper numerical theory to prove it.

The conjecture is as follows:

 - For X a multiple of N
 - For a set of X unique elements
 - For groups of N elements
 - For "sorted" meaning sorted ascending or
descending, but being
consistent through the whole of this conjecture
 - For groups in which the elements contained are sorted
 - For list L containing the set of all elements sorted
 - For list G containing the set of all groups
 - For list K containing the expansion of all groups in list
G into
their elements

 Starting at the beginning of lists L and K, only the index
representing
the first and last element of each group must be compared to
determine
the location of the first group in which list K is no longer
following
the sorted order of list L.


That is to say:

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

Mix these up in random order; group them in groups of 3;
order the
groups ascending; and rewrite into a list:

Group:
02 01 03 | 05 06 04 | 07 09 10 | 14 13 15 | 11 08 12

Sorted groups:
01 02 03 | 04 05 06 | 07 09 10 | 13 14 15 | 08 11 12

Lists compared:
01 02 03   04 05 06   07 08 09   10 11 12   13 14 15
01 02 03 | 04 05 06 | 07 09 10 | 13 14 15 | 08 11 12
 ^     ^    ^     ^    ^     ^-- Fail
_|_____|____|_____|____|_____|

As illustrated above, we know the failure to sort is in
group 3.  This
means, in our case, that one of elements 7, 8, or 9 reside
in groups 4
or 5.  We know they're not in groups 1, 2, or 3 by
conjecture.

I believe this conjecture is true for all lists of unique
elements, for
all group sizes >0, and for sparse lists.

This conjecture becomes useful in a modified form with
greater restrictions:

 - For groups being "valued" as the mean value
of their elements
 - For list G containing the set of all groups sorted

In this form, the groups above are valued as below:

Values:
01 02 03 | 04 05 06 | 07 09 10 | 08 11 12 | 13 14 15
6/3=2      15/3=5     26/3=8.7   31/3=10.3  42/3=14

This allows for fast comparison of two lists to determine
where the
lists are out of order.  This is extremely useful for
sorting lists
where the elements are grouped in a physical sense and
action must be
taken to ungroup the elements, sort them, and regroup them.

In the case of my compression system, I group pages into
sets of 32KiB,
which becomes 8 pages (ignore huge pages for this
discussion, I know
already).  It is detrimental for pages to be ill-grouped; so
I keep the
groups sorted by compression index (average of the CI of the
pages) and
attempt to find the least compressible group of pages (LCI)
that is out
of order from the LCI of pages.  (think about the above
conjecture,
you'll get it).

This is important because the changes made compress
"swapped" pages in
memory.  When memory pressure is too high, I throw the LCI
groups out
into the swap file for the pure purpose of maximizing the
overall
compression ratio of in-memory compressed pages.  By getting
rid of the
absolutely hardest to compress pages, this goal is reached;
after all,
if I compress 32KiB to 1KiB and then ditch that only to keep
a chunk of
32KiB stored as 31KiB in memory, I just wasted space for
potentially 31
times more memory to be compressed and not sent to swap!

At any rate, I can cut the search space (and thus the CPU
intensity) of
the algorithm down to O(X/(2N)) in comparison to brute
forcing, which
isn't really much strictly speaking (it's still a linear
algorithm, as N
is fixed and X is the workload), but in my case it will be
25% as
intensive.  This assumes my conjecture is true... any
takers?  

(Note that you can theoretically execute this kind of
comparison in a
sort of tree using groups of groups of ... groups of
elements, giving
you a polynomial or exponential time complexity; but I
don't feel like
thinking about it right now)
- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

    Creative brains are a valuable, limited resource. They
shouldn't be
    wasted on re-inventing the wheel when there are so many
fascinating
    new problems waiting out there.
                                                 -- Eric
Steven Raymond

    We will enslave their women, eat their children and rape
their
    cattle!
                  -- Bosc, Evil alien overlord from the
fifth dimension
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iQIVAwUBRJCezgs1xW0HCTEFAQIkdA//f58BRhEhreZLnrW+Zp4lkYlOofuu
VfKp
0QyA82iiam13Onvt4qunHHJQXsK8ESzxaSaYOw3ZwBMHcBNfYb9OTdBlZ0CU
OFo4
rYAbJZXQjOlCuJXgoaHeJ0/FNetcLIDd7RB3L0BAN32PHBoOjfEImB9lWFvq
2WFn
CuziMnBtv1kldBuBu0Va8CI+F8ZiOGE0ELrqGPgk11WSOBjlN9cCJXCfpnsX
jXx5
pLo4RHnu0uFg2smX5uwPzfaEsBMvsGwMaIWCV/aMlDvys1XwNrlAJtd7pzWz
Aym4
XtksAKqmpwzkksLgqYxdZA94F6megh3INK1PzJ8Jj/eTkuZ5vRIbDnSp56RS
s7o1
BjmyMCPMZTYOiVdTKs7r1+t1LnH+Qxth37KAk3zkndhSrY08llqhIn1bbp0r
tb2E
UwqVWMXy1D74eqHPhW0P55MrwgAlQR1+K+hWIM1Y1x6aL4BkJDC35vfCDbzi
wYnN
t0PbRdTKrCEsgcjDCME+mBAdrNQb4+ZAnhM9+zPrMmp/+NdYJ/KjoeO9HBdb
1p9C
PLtbPXVY46Tw5C0HqL0AlHXm6VMp4Te1/g6sGBQ9TIMUivxLEebZtHBnA2vU
lyYm
1p96RsPnqjgcNpMPheiVZVKDv/MWFEOuLK2llO9Z1Ph4xQUGOiHnEGhG1AkJ
hZGM
cZ9+vGdfpmY=
=u+er
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line
"unsubscribe linux-c-programming" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Group ordering and comparison
user name
2006-06-15 04:55:21
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



John Richard Moser wrote:
> I need a little help proving a conjecture I've come up
with in relation
> to ordering and sorting.. not sure where else to go
with this but since
> I'm using it in kernel code I might as well ask.
> 
[...]

I asked #math, apparently it works.  I still can't figure
out how to
deal with the swap code.. I guess I'll try to write other
useful code
and coerce someone else into writing that part for me later
so I can get
it hooked up.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

    Creative brains are a valuable, limited resource. They
shouldn't be
    wasted on re-inventing the wheel when there are so many
fascinating
    new problems waiting out there.
                                                 -- Eric
Steven Raymond

    We will enslave their women, eat their children and rape
their
    cattle!
                  -- Bosc, Evil alien overlord from the
fifth dimension
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iQIVAwUBRJDoNws1xW0HCTEFAQKPPw/9Fxa/m8HdRIGzjVDz2MvH3Y0aVorr
0r4Q
N8JorcvdNWE2GhrCsbF0usSui3+00G5HzYXqgIVTv6jNveiPzDajh1ttKXZH
qhRu
zDEtxIn+xEMcIoWIp5187CfSULodnsSL4yqdVo7u1jXvfy8G7tq1Q9o3df6v
hD//
U2gzRDdny8mOLK3W2qH/TnyvLY5/8ktx/0Xa+wGWa6HLsvz5EO8yVPpzC32B
+WT8
B2GivC/n1sDspHL7bz/KsHhXtMfV3mxm3gWjBjahPyy67TwUhYMvtvXvE0i8
9aBk
lvuwHFvc8V3/eNxj6oHtiwn7vWpHftWsItvtbu811fe3OTRbF3qct3nQSLjF
7nPw
RHkdryYrQeDOkeMVmPG6lx+DYTaqfvFEnXqliPWzVUYTTPLwM7eXrWLa0Uzb
6FhR
NmMC2olOlxmoNheLYnjARbskU0yeuQTKYepgM3PcgSm2kZYrXwNfyW52y8SF
9A4s
p85l6B6QgSfZJQ6VGWZPuljaARR3koW4HtE9oltsnwJmEJbU/mvy7BmFTBX2
MIB9
Q9/k9ZOwSZfTnjvVXKGgeHTUea0a1yU2xNk9lTMFzUFDzK93RnPXPltoIRe4
y+LR
Y+yLFQK25e1QloEjD1FZgsXKRwGCt6OOjWKctyAlRxnPgArTu9Wsw5POJTUn
tEoi
ZCozaLijf5w=
=ejo3
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line
"unsubscribe linux-c-programming" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )