The essence of the project is search for meaning in this enormous cloud we call the Internet. I started the research back in 2005 and began to design, plan and build small POC’s just to clarify some of the critical parts and slowly building up some infrastructure that would be able to run this. Actually I am still building the infrastructure by adding more machines to the system.
This was a process that continued for more than a year – this is a project that only evolves a couple of hours a day (mostly late evenings) – in mid 2006 the overall design was finally there. At that point, I’ve been reading pretty much everything that was written about semantic, syntax and other books after this the actuarial building began.
How I Divide by Zero
The system is a hybrid of GRID and distributed computing, some will most likely say that the concepts are the same, but really they are not. I knew from the start that one central system wouldn’t be able to cope with the growth of the textual information floating in the cloud; therefore it has been designed so it can operate from different geographical locations and still act like one system if needed.
The picture shows the 3 inner cores of the system. I have chosen this overall design because these cores provide flexibility and offer metamorphosis ability’s against other parts of the system.
The definition of flux
The name “Centiverse” is actually a small playing with words - at the time its seams like a brilliant idea - that originated when I saw my design actuary was working in practice. The two words where; center and universe. Imagine the 3 circles as small universes where words, pointers and other structures are floating around. Where you have intersections, structures will bind against words and pointers. At this point the results you get back from the cores are good. When you have full intersections, structures will bind to structures etc. and semantic networks are formed. All this can only happen in the center and that’s how it was named. A side node to this; it’s only the working title. In theory the semantic core and the thesaurus core ought to bind faster since the semantic core basically is an heavy extension to the thesaurus core in the way it adds properties like “is-a” etc. that again can be used to bind to other structures.
power of three
The Boolean Keyword Core can act like a typical search engine - depends of the query string - you can provide it with “keyword/keywords” and in return it will provide you with ranked pointers towards list results. I believe this is something that will stay for a long time because this is what the good people at Mountain View have taught the world. The Thesaurus Core will when activated act like a query expander, again you provide it with “keyword/keywords”, and if words are found that share the same meaning, it might be added into the query. Most of this core should gradually morph into the Semantic Core. The Semantic Core deals with the concepts and is the birth place of the structures. A given query can be split into words and concepts schemes are formed based around it. In theory this would work fine but not everything fits into a semantic network, there are things that should be avoided and there are “meanings” that can’t be schematized.