List Info

Thread: why primary key is using hex instead of integer




why primary key is using hex instead of integer
user name
2006-03-17 01:50:08
Following is a minor addendum to my email Tuesday.

In my work today I was reviewing the following from the
MySQL 5.1 Reference Manual:  
7.4.2. Make Your Data as Small as Possible

Particularly relevant excerpts follow below.  I believe
Oracle, Microsoft, Sybase and IBM provide similar
guidelines.


EXCERPTS

One of the most basic optimizations is to design your tables
to take as little space on the disk as possible. This can
result in huge improvements because disk reads are faster,
and smaller tables normally require less main memory while
their contents are being actively processed during query
execution. Indexing also is a lesser resource burden if done
on smaller columns.

...

You can get better performance for a table and minimize
storage space by using the techniques listed here:
  a.. Use the most efficient (smallest) data types possible.
MySQL has many specialized types that save disk space and
memory. For example, use the smaller integer types if
possible to get smaller tables. MEDIUMINT is often a better
choice than INT because a MEDIUMINT column uses 25% less
space. 
  b.. ... 
  c.. The primary index of a table should be as short as
possible. This makes identification of each row easy and
efficient. 
  d.. ... 
  e.. ... Shorter indexes are faster, not only because they
require less disk space, but because they give also you more
hits in the index cache, and thus fewer disk seeks.



  ----- Original Message ----- 
  From: Greg Hamer 
  To: roller-devincubator.apache.org 
  Sent: Tuesday, March 14, 2006 2:52 PM
  Subject: Re: why primary key is using hex instead of
integer


  Based on my experience as a DBA, I suspect there are
penalties, but you are only likely to see them under load.

  All data returned from databases passes through RAM.  All
databases enhance performance via caching data in RAM.  The
longer a row (i.e the more bytes in a row), the less rows
that can be cached in a given amount of RAM.  Making primary
keys that require 32 bytes of storage per row versus, say
integers which require 4 bytes, reduces the amount of data
that can be served from cache.

  Now multiply the 32 vs 4 bytes by:  a) every foreign key
referencing back to primary key; b) every non-clustered
index on the primary key; and c) every index on the foreign
keys.  

  Add it all up and there likely is detrimental impact on: 
a) performance; b) scalability; and c) cost (for both disks
and RAM).

  I think this is way 9 out of 10 DBAs prefer the shortest,
most compact datatype possible for primary keys 

  Yes, there are cases where uuids are nice (e.g. Elias and
Matt's examples of "merging content from different
servers" and "migrate a blog").  But
having the larger datatype introduces a penalty that you pay
100% of the time, vs these less frequent cases when having
uuids help out.

  Just my 2¢. 

  g

  P.S.  a quick google turned up the following link of a
MSSQL DBA's thoughts on GUIDs as primary keys:
  http://www.sqljunkies.com/WebLog/odd
s_and_ends/archive/2005/08/31/16595.aspx


  ----- Original Message ----- 
  From: "Elias Torres" <eliastorrez.us>
  To: <roller-devincubator.apache.org>
  Sent: Tuesday, March 14, 2006 8:46 AM
  Subject: Re: why primary key is using hex instead of
integer


  -----BEGIN PGP SIGNED MESSAGE-----
  Hash: SHA1

  I don't think it imposes a performance hit and even if it
did it's
  neglible. Once the database has indexed the strings or the
ints the
  lookup cost should be the same. I think it might help if
we used fixed
  char(32) as opposed to varchar(40), but that's mostly a
space issue
  since we are not using the last 8 bytes.

  - -Elias

  Allen Gilliland wrote:
  > I don't know about why this decision was made
historically, but i would
  > agree that technically this imposes a slight
performance hit.  AFAIK,
  > databases can do lookups on purely numeric keys a
little bit faster than
  > alphanumerics.
  > 
  > I have no idea if the difference is significant or
not.
  > 
  > -- Allen
  > 
  > 
  > On Tue, 2006-03-14 at 07:25, Lance Lavandowska wrote:
  > 
  >>IIRC it is largely historical: at one point Roller
used Castor, which
  >>has as it's "default" a GUID
generator that creates a 30-char primary
  >>key.
  >>
  >>Lance
  >>
  >>On 3/14/06, David M Johnson <Davidm.Johnsonsun.com> wrote:
  >>
  >>>On Mar 11, 2006, at 4:45 PM, BigLiu wrote:
  >>>
  >>>>I want to create some customer tables to
extend roller. But I have
  >>>>question
  >>>>regarding why the primary key is defined
in hex instead of integer?
  >>>>Will
  >>>>this cause any performance problem?
  >>>
  >>>I can't really remember why I chose the hex
key. I don't think it has
  >>>a significant impact on performance.
  >>>
  >>>- Dave
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )