List Info

Thread: Calculating percentiles in MySQL




Calculating percentiles in MySQL
user name
2006-07-27 23:25:42
This is more of a mySQL/SQL question. Hope it applies
here...

I have a semi-large table (2000-3000 records and growing) in
which I
have stored a decimal for each row. This number is an
*absolute* rating
score for the record. I'd like to derive from this a
*relative* rating
for each record from .5-10, in .5 increments.

So I need to calculate which PERCENTILE each record is in,
and assign
the relative score accordingly.

For instance, if a record has a score of 1234.5, and I find
that it's
in the 65th percentile (65% of records have absolute scores
<= 1234.5),
then I would assign it a relative score of 6.5.

My question is, how do I figure the percentile score for
each record?

I anticipate that this will be run as a batch process on a
nightly
basis, but I'm open to suggestions.

THANKS in advance!
- Ryan


--~--~---------~--~----~------------~-------~--~----~
This group is managed and maintained by the development
staff at 360 PSG. An enterprise application development
company utilizing open-source technologies for todays
small-to-medium size businesses.

For information or project assistance please visit :
http://www.360psg.com

You received this message because you are subscribed to the
Google Groups "Professional PHP Developers"
group.
To post to this group, send email to Professional-PHPgooglegroups.com
To unsubscribe from this group, send email to
Professional-PHP-unsubscribegooglegroups.com
For more options, visit this group at http:
//groups.google.com/group/Professional-PHP
-~----------~----~----~----~------~----~------~--~---

Calculating percentiles in MySQL
user name
2006-07-28 13:17:43
If you use two queries :

SELECT MAX(record_tbl.score) as max_score [WHERE X=Y
ETC...];

// Process result and set $max_score from first query

$sql = "SELECT record_tbl.id, record_tbl.name,
ROUND((record_tbl.score/$max_score)*10,1) as relative_score
ORDER BY
relative_score DESC LIMIT 50";

This gets the top 50 'relative' scores.

That sounds right... havent tried it... let me know.

Joel


--~--~---------~--~----~------------~-------~--~----~
This group is managed and maintained by the development
staff at 360 PSG. An enterprise application development
company utilizing open-source technologies for todays
small-to-medium size businesses.

For information or project assistance please visit :
http://www.360psg.com

You received this message because you are subscribed to the
Google Groups "Professional PHP Developers"
group.
To post to this group, send email to Professional-PHPgooglegroups.com
To unsubscribe from this group, send email to
Professional-PHP-unsubscribegooglegroups.com
For more options, visit this group at http:
//groups.google.com/group/Professional-PHP
-~----------~----~----~----~------~----~------~--~---

Calculating percentiles in MySQL
user name
2006-07-28 13:25:33
I didnt read the whole post.
Here is how you batch process and have the relative score
saved :

Same thing for max :

SELECT MAX(record_tbl.score) as max_score [WHERE X=Y
ETC...];
// Process result and set $max_score from first query

but then :

$sql = "UPDATE record_tbl SET
record_tbl.relative_score =
ROUND((record_tbl.score/$max_score)*10,1)";

That will update a column called 'relative_score' for each
row.
Now instead of doing every row... you can also just update
the record
when it's score changes... this gives you real-time
"relative"

Just go :

$record_id = Record id that had it's 'score' just change

SELECT MAX(record_tbl.score) as max_score [WHERE X=Y
ETC...];
// Process result and set $max_score from first query
$sql = "UPDATE record_tbl SET
record_tbl.relative_score =
ROUND((record_tbl.score/$max_score)*10,1) WHERE
record_tbl.id=$record_id";

Not a real heavy query set and depending on how often the
score changes
(like when someone saves or something). If this score is
based on
voting (dont call this query every time) have it batch the
whole set at
night.

To save you from batching everything at night, add a where
clause and a
column for 'score_last_modified' then just batch records
that have been
modified within the last 24 hours or something. (this would
just
require you update that datetime field each score change)

Hope any of this helps....

Joel


--~--~---------~--~----~------------~-------~--~----~
This group is managed and maintained by the development
staff at 360 PSG. An enterprise application development
company utilizing open-source technologies for todays
small-to-medium size businesses.

For information or project assistance please visit :
http://www.360psg.com

You received this message because you are subscribed to the
Google Groups "Professional PHP Developers"
group.
To post to this group, send email to Professional-PHPgooglegroups.com
To unsubscribe from this group, send email to
Professional-PHP-unsubscribegooglegroups.com
For more options, visit this group at http:
//groups.google.com/group/Professional-PHP
-~----------~----~----~----~------~----~------~--~---

Calculating percentiles in MySQL
user name
2006-07-28 20:45:21
Maybe I don't understand what you're doing completely, but
why not
calculate the score at run-time (e.g, for some application).
Presumably, since you want to do this at all, someone will
be looking
at the scores in some interface you've built -- couldn't
whatever
application this is calculate the relative score at runtime,
and
display that?

It seems like if you start storing relative score instead of
absolute
score, you're going to lose information (it might not be
possible to
retrieve the absolute score from the relative score, for
instance)


--~--~---------~--~----~------------~-------~--~----~
This group is managed and maintained by the development
staff at 360 PSG. An enterprise application development
company utilizing open-source technologies for todays
small-to-medium size businesses.

For information or project assistance please visit :
http://www.360psg.com

You received this message because you are subscribed to the
Google Groups "Professional PHP Developers"
group.
To post to this group, send email to Professional-PHPgooglegroups.com
To unsubscribe from this group, send email to
Professional-PHP-unsubscribegooglegroups.com
For more options, visit this group at http:
//groups.google.com/group/Professional-PHP
-~----------~----~----~----~------~----~------~--~---

Calculating percentiles in MySQL
user name
2006-07-31 20:18:26
The absolute score is computed at run time based on a few
quantitative
factors that relate to the record. That won't change unless
the record
itself changes.

Right now, there are about 2500 records in this database. In
order to
make meaningful comparisons, I want to create the relative
score that
will rank the records on a 1-10 point scale, in .5
increments.

I think i need to do the batch processing for two reasons:
(1) I anticipate this table getting much larger
(2) As new records are added, the scale will change. So for
instance,
if a route with an Abs. score of 1234.5 is a 6.5, if 100 new
records
with scores over 1234.5 are added, it might be bumped down
to a 5.5
(for example). So I see this being a very dynamic figure and
I'm
concerned with performance and resources.

I think the easiest way to get the info I want is to get a
sorted list
of the records by absolute score, and the relative score
would be the
proportion of records that have absolute scores below that
of the
current record. This seems like a simple enough query (or
subquery), so
I'll play around with the dynamic calculation -- but I
still have
reservations.


--~--~---------~--~----~------------~-------~--~----~
This group is managed and maintained by the development
staff at 360 PSG. An enterprise application development
company utilizing open-source technologies for todays
small-to-medium size businesses.

For information or project assistance please visit :
http://www.360psg.com

You received this message because you are subscribed to the
Google Groups "Professional PHP Developers"
group.
To post to this group, send email to Professional-PHPgooglegroups.com
To unsubscribe from this group, send email to
Professional-PHP-unsubscribegooglegroups.com
For more options, visit this group at http:
//groups.google.com/group/Professional-PHP
-~----------~----~----~----~------~----~------~--~---

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )