128 Bit numbers

One can dream of one Yottabyte of memory which is about 1,208,925,819,614,629,174,706,176 bytes or 2^80. I am not sure what the good guys at Mountain View are juggling with but it must be around 2 or 3 PB.

I once again found a strange name, I call them HyperLongs. They operate like a long (64 bit) but they can handle twice the amount of data – well, daaa! - At the moment I can do the four basic calculation operations like additions, subtractions, multiplications and divisions.

The reason for this move was relative simple; I considered it necessary to simply represent a given website/page and sometime even text phases as a “natural” number – 64bit was to short - and quickly determent various bits and pieces based around it. This works extremely well since the semantics statistics shows that almost 30% of the repository is near document duplicate or worse, exact duplicates.

The system can relative quickly run the algorithms over the entire document collection and locate what I consider as “waste of core processing time and storage capacity”. It’s a walk on a knife-edge, I know that, and the approach is (O(d log d)) time in a worst case scenario where all data would be equal. At processing time it’s O(d) time which is okay in the long run considering core processing time.

For more information about I-Match algorithm’s and idf ranges I can recommend books found here

May 8, 2008 01:10 by Claus
E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Add comment


(Will show your Gravatar icon)  

  Country flag

biuquote
  • Comment
  • Preview
Loading