Alviso (Crankk.io)
4 min readNov 21, 2021

The next Google could be built on Kadena

Blockchain-based applications have multiple unique features, the most important of which is being distributed. Imagine if the new Google was not a company but a community. You might not believe it’s possible, but I consider it to be very much within the bounds of possibility. Let’s examine step by step what it means and takes to build a community-based new Google.

Building a whole rounded web search engine takes a lot of processing power and knowledge, but it looks to me like a perfectly suited challenge for a blockchain-based effort and community. Incentivizing the community is maybe the easiest part, given the nature of blockchain’s intrinsic feature of incentivizing every effort that produces the outcome.

First, you need to crawl the entire web. Looks easy enough. Discoverers can add new URL links to be visited to the blockchain for crawlers to retrieve and store. But here comes a challenge. Is there a blockchain that can store at least multiple hundreds of millions of URL links at zero or near-zero cost? There is, it is called Kadena. At this point, you could say, hey, wait a minute, storing and retrieving from multiple hundreds of millions of records is not a trivial task for even an RDBMS-based blockchain, and you would be right. Now, my understanding is that Kadena is transparent about the exact type of database used. So if storing, selecting, and retrieving records in a certain fashion is not practical for one type of database, it could be solved by swapping out the current database with one that is capable of doing that. An idea would be to designate a new chain over the existing ones for this special need. Node operators could decide to support this or not, and everything could go on happily.

The next challenge is to verify that the downloaded webpage is the true one. Multiple operators would have to re-download the page then come to a consensus that yes, it is the correct one. You can imagine that a page downloaded from multiple parts of the world at slightly different times won’t necessarily be the same one. This can be solved by sanitizing the page and storing “content” only. Not trivial, but solvable. Implementing a diff/nearness algorithm in Pact does not seem impossible. It also means that storing the (sanitized) webpage content to the blockchain would also be necessary. This brings us back to the previous point of choosing the right database. It would not be necessary to do arbitrary queries on this database, so it’s more about storage capacity than anything else.

Participants in crawling and retrieving could be incentivized in a fashion that whoever gets the page first will get the most, and witnesses (those who retrieve the page after the initial retrieval and find it to be the right one) will also get some amount for their effort.

Then comes the part of generating an index. That can be solved by having a full-text search engine behind the solution. The most apparent choice is Elastic search. I would propose to have Elastic behind the blockchain because that can do all the “normal” stuff needed plus the full-text search. It can also do things like not storing the actual content, only the index. The index is still huge although, it can still minimize storage need by an outstanding percentage.

One might ask at this point, why am I proposing to store all this on the blockchain, and why am I trying to seemingly maximize what the blockchain itself needs to be able to store and retrieve. Well, it’s the kind of thinking of a truly distributed network. It needs to come from principle. Follow the old KGB thinking of “Trust but verify” and in that regard, follow Satoshi. In this regard, the KGB archive is synonymous with that of the blockchain. There are pacts as to who can do what and how that action is validated and verified. Verification needs to come from Pacts (Smart contracts) resolving and not validators collectively agreeing.

We finally come to the step of actually providing search results. One thing Elastic cannot do by itself is to provide search results equivalent in quality to that of Google. So there need to be search result providers who again need to be validated somehow. Here we can follow a two-pronged approach. One, search result providers can have their proprietary ranking algorithms. The other: matches can be verified as being a true match from the index. Popularity and quality will drive users to the best ones, but at the same time, those search results can be verified from the index and only incentivized if they are a real match.

I’ve validated some of my assumptions by creating Fringehits.com. It contains only a minimal portion of the internet, having about 110M unique URLs and about 30M pages indexed. There is an open-source project that has billions of URLs, and that was my starting point. I’ve created a crawler, a sanitizer and put those all into an Elastic index.

Almost unnecessary to mention but the source of the incentives aka who pays for it can come from advertisers. It’s also easy to create your own coin on Kadena, so minting, buying, burning a coin is one of the easiest parts. A “search” coin can serve the ecosystem.

I know this proposal is not a small project, but hopefully, something like this will emerge. My intention here was to encourage the creation of such an effort.