One can dream of one Yottabyte of memory which is about 1,208,925,819,614,629,174,706,176 bytes or 2^80. I am not sure what the good guys at Mountain View are juggling with but it must be around 2 or 3 PB.
I once again found a strange name, I call them HyperLongs. They operate like a long (64 bit) but they can handle twice the amount of data – well, daaa! - At the moment I can do the four basic calculation operations like additions, subtractions, multiplications and divisions.
The reason for this move was relative simple; I considered it necessary to simply represent a given website/page and sometime even text phases as a “natural” number – 64bit was to short - and quickly determent various bits and pieces based around it. This works extremely well since the semantics statistics shows that almost 30% of the repository is near document duplicate or worse, exact duplicates.
The system can relative quickly run the algorithms over the entire document collection and locate what I consider as “waste of core processing time and storage capacity”.
It’s a walk on a knife-edge, I know that, and the approach is (O(d log d)) time in a worst case scenario where all data would be equal. At processing time it’s O(d) time which is okay in the long run considering core processing time.
For more information about I-Match algorithm’s and idf ranges I can recommend books found here