Article image
Profile image
FirstBatch I Company
July 28, 2023
A Game Changer for Privacy-Preserving & Decentralized LLMCache, ID Management, and Session Storage

The Need

Many modern applications must handle large amounts of unstructured or semi-structured data that doesn't fit neatly into rows and columns. Key-value databases shine for these use cases because they can manage flexible data models and scale massively while maintaining high performance. Their simple architecture removes the need for complex SQL queries, schema migrations, and table joins. The intrinsic speed and flexibility of key-value databases make them suitable for use in a diverse range of applications such as shopping carts, LLMCaches, and identity management applications.

image

In traditional settings, corporations retain ownership of the data, and despite key-value databases being categorized as "non-relational", these businesses still connect user data to corresponding keys and values. As demonstrated over the past decade, there's been a significant shift towards trustless systems and infrastructures.

We are currently in an era where artificial intelligence increasingly permeates our daily lives. Most of these AI products utilize LLM caching to deliver near real-time responses. This implies they are preserving the outcomes of costly computing operations, such as predictions or inferences, for reuse in subsequent requests. As more businesses and users rely on LLMs for various objectives, including personal assistants, the models start processing an escalating amount of sensitive data and Personally Identifiable Information (PII). When these output data are cached in a centralized system, they become potential targets for malicious activities. This raises legitimate concerns regarding data privacy and security. Therefore we need a system where the burden of breaking a trust moves from liabilities and reprimands, towards cryptography and mathematical soundness. The advent of blockchain has shown that the necessary tools to achieve this goal exist, although they are yet to be tested.

Zero-Knowledge Cryptography

Blockchain (and distributed systems in general) are systems that allow multiple machines to agree on a state, even in the presence of malicious units to some threshold. But this is not enough to ensure user-level privacy.

Zero-knowledge cryptography is one of the major components in achieving privacy. In particular, what is known as a zk-SNARK (zero-knowledge succinct non-interactive argument of knowledge) allows a user (referred to as prover) to prove that a statement is true, without revealing any other information other than the fact that the statement is true. The proof is checked by another party (referred to as a verifier). It is here that “succinct” and “non-interactive” take the stage: this verification process is efficient and does not require interaction between the prover and verifier. This makes it practical to put the verifier on systems with limited resources, like blockchain smart contracts. Anyone can call the smart contract to verify proofs using the existing blockchain network. This allows private transactions on public blockchains.

image

Privacy-Preservation

Zero-knowledge (ZK) cryptography paved the way for a new generation of technical solutions. As the theory progressed and its practical problems were solved, the task for developers was reduced to simply implementing the “circuit” as shown in the diagram above. Many products soon followed:

  • Tornado Cash is a notable example, where users can hide their transactions although all transactions are in the public
  • Dark Forest is a space-warfare game, where player locations in the universe are kept secret
  • Using really simple circuits, a user can provide that they are above a certain age without revealing their actual age
  • A chat group could be built where each user simply proves that they know the secret that only the group members can know, without revealing the secret itself

Let us now step back a bit, and get back to database solutions. Where does one even place “privacy” in a centralized database? In particular, a relational database by definition requires data to be relatable among other data in the same storage, so there must be a minimum level of information gained by the business provider to make that relation, a privacy leakage so to say. A non-relational database thankfully relaxes the constraints a bit, but still, how does one provide provable privacy to the users? How can a database, tasked with storing user data, not know who a user is and let them control their own data?

HollowDB- Fast, Permanent & Efficient

Here, HollowDB came to being as a solution to this problem: a method of privacy preservation within a key-value database. HollowDB is designed from the ground up to give individuals true data ownership, moving beyond centralized models. It allows storing complex objects as key-value pairs on the blockchain while preserving privacy - only data owners control access. This groundbreaking approach unlocks the ability to build efficient applications with familiar web2-like experiences by offering up to 8ms put and up to 12ms get times which also makes it the fastest.

It accomplishes this through a clever combination of zero-knowledge proofs and on-chain smart contract verification. Users compute the Key on the client-side based on secretes, preventing any external party from inferring or accessing data. Then, to update the value that is written at some key, a user needs to provide zero-knowledge proof that they were the ones who wrote the value there in the first place. No one, not even our servers or the smart contract knows who that user is though! So, the user provides a zero-knowledge proof showing that they computed the key on their client-side. Smart contracts then validate these proofs on the blockchain for complete reliability and immutability. With validation happening on-chain, HollowDB provides the highest level of trust and integrity for even highly sensitive data. This makes it uniquely suited for decentralized AI/ML applications requiring strong privacy assurances. Besides, the innovative architecture scales massively to handle heavy loads while delivering fast performance.

By bringing together lightning-fast speeds, intuitive development, user privacy, permanent on-chain storage, and trustless integrity, HollowDB provides the missing backbone for the decentralized web of the future.

Permanent Storage at the Lowest Cost

We have mentioned smart contracts, which imply the usage of blockchains, and these systems are notorious for their high transaction fees especially when large storage requirements are of the essence. How does HollowDB act like a smart-contract key-value database without suffering from these drawbacks?

The answer is Arweave, the first truly permanent information storage network, backed by a sustainable endowment. Arweave enables on-chain and permanent data storage at any scale for a one-time fee. Since it is decentralized, it is not at risk of being compromised, hacked, or accidentally deleted.

Using the permaweb as our storage layer gives us endless data persistence: as long as the blockchain continues to operate, your data will be there. Such a claim would require huge costs and assumptions in a Web2 service. However, Arweave charges a one-time fee for storage which is the first reason behind the low cost of storage with HollowDB.

Size Agnostic

For large values, we have a special trick: Bundlr Network. Bundlr is a data network, optimized for performance, and infinite scalability, allowing users to sign and pay with any token. It works as a layer on top of Arweave, to provide a gateway that speeds up the write and read operations. Data uploaded to Bundlr is instantly accessible and faster than uploading directly to Arweave. So, instead of storing the entire large value, we upload it to Arweave using Bundlr Network and store the corresponding transaction id as the value in the database itself. This allows for even smaller costs and removes any bottleneck that may occur due to data size.

In addition to providing low-cost permanent storage efficiently, HollowDB delivers high availability, scalability, and low latency by storing and retrieving data across multiple nodes in a cluster. Data replication and fault tolerance further guarantee the integrity and availability of data. These capabilities make HollowDB ideally suited for building high-performance caching applications like LLMCaches that require speed, reliability, and scale. The distributed architecture ensures smooth operation even under heavy load, while built-in redundancy protects against potential points of failure. Furthermore, HollowDB secures user privacy through zero-knowledge technology, ensuring each user can only access their own LLM cache. This privacy protection, combined with the performance benefits of a distributed cache, makes HollowDB the ideal back-end for personalized real-time AI services. HollowDB not only keeps data persistently over the long term but makes that data rapidly accessible when needed to power real-time services, all while preserving user privacy.

Essentially, HollowDB packs a smart contract that allows key-value database functionality along with zero-knowledge proof verifications. It comes with an NPM package to allow developers to easily use the database too. The entire project is open-source, and can be found at the following link: https://github.com/firstbatchxyz/hollowdb

HollowDB-as-a-Service

We love open source but also want to provide a seamless developer experience for developers who just want to “use” the code, not worry about setting the correct environment, fighting the trouble of bundling and packaging, and such. Our goal is to make HollowDB as easy as possible. Therefore we handle the big part of the infrastructure so you can focus on your data. Our service acts as a performant bridge between clients and the blockchain, optimized for scalability.

So, besides all the values that HollowDB offers, we also enable you to build your own decentralized and privacy-preserving database in just a few clicks via HollowDB Developer Dashboard.

image

We "simulate transactions ahead of submission" to achieve web2-like speeds while leveraging web3 permanence and trust. Users maintain full control - you can opt out at any time, retaining sole ownership of your data.

More Clients Are On The Way

Furthermore, we are going to provide clients in many languages, not just JavaScript but also Go, Rust, and Python so that developers can integrate HollowDB into their stack with ease. For the client, all complexity there is to use HollowDB will be about taking care of an API key. This is especially important considering zero-knowledge proof integrations; these niche cryptographic operations are not yet common among stacks and require specialized care for each technology. By doing the heavy lifting ourselves on zero-knowledge proof generation and providing it to the client, abstracted away; we achieve seamless usage of HollowDB from the perspective of any client.

Sit Back & Stay Up to Date

There is yet one more advantage of using our service: keeping up-to-date. Indeed, developers are welcome to set up their own environment and use our open-source code. However, the blockchain space and Arweave ecosystem continue growing at an ever-increasing rate, and these often come with technical updates that require breaking changes in the internals of our smart contracts. Note that these “breaking changes” do not break the data, but rather change the way they are accessed and updated. For our services, updating the open-source shall come together with necessary client-side patches, taking that maintenance burden away from our users in such a swift-moving space.

What to Build?

So far we explained how HollowDB's innovative architecture makes it an ideal backbone for a wide variety of cutting-edge applications. To provide a better understanding of what kind of application HollowDB can provide huge improvements, here are a couple of examples :

  • For decentralized AI and machine learning systems, HollowDB provides the critical security and reliability needed for proper model functioning. By requiring on-chain validation for any data changes, HollowDB ensures the integrity of the data that AI models depend on. This prevents data corruption or tampering by malicious actors - protecting model accuracy. Additionally, HollowDB grants developers seamless access to clean, structured data for training and evaluating models as well as collaborating efficiently.
  • As a foundation for low-latency caching solutions such as LLMCache, HollowDB harnesses a distributed cluster to deliver blazing-fast data storage and retrieval. By replicating data across nodes, HollowDB ensures high availability even in cases of hardware failure. The system scales massively to handle heavy loads without slowing down. And built-in fault tolerance mechanisms guarantee the consistency of cached data. Together, these capabilities enable HollowDB to power ultra-responsive caching that users expect from today's real-time AI applications.
  • For decentralized identity management, HollowDB safeguards personal user data through zero-knowledge proofs. Users remain completely anonymous while still being able to prove they own their identities. This prevents unauthorized access, theft, and abuse. HollowDB also facilitates rapid and simple data storage and retrieval to give users full control over their profiles.
  • To store session data and states, HollowDB offers quick and reliable data access across devices. Users can pick up where they left off as they switch devices, without losing context. HollowDB retrieves their session instantly, rather than forcing cumbersome re-logins. This creates seamless uninterrupted experiences.
  • And finally, for on-chain metadata storage, HollowDB enables platforms to immutably store index data like usernames, posts, comments, etc without bloating the blockchain or requiring extensive computation. Zero-knowledge proofs allow efficient verification of correctness, ensuring the security and integrity of metadata.

This range of disruptive use cases highlights HollowDB's versatility and performance. By blending blockchain, distributed systems, and cryptography, it opens the door to the next generation of decentralized applications.

Custom Circuits

HollowDB makes use of a particular circuit as described above, where a user provides a zero-knowledge proof that they own a key. However, there is no reason to limit HollowDB to using just a single circuit. Verifying a zero-knowledge proof requires only one circuit-specific data: a tiny object called the verification key. It is certainly possible to store multiple verification keys in a single contract, allowing verification of different types of proofs, and even updating a verification key later to keep up with newer circuits.

HollowDB is designed to allow such extensions over its base functionality. Consider an example: a Sudoku scoreboard. Instead of a key ownership proof as mentioned before, users can provide a Sudoku solution proof (one that proves the user has a solution but does not reveal what it is) and then they can add themselves to the scoreboard once the proof is verified. This way, without interfering with any central service, they can put their names on the scoreboard by proving that they have solved the puzzle!

Multiple circuits can operate together within HollowDB, with or without interacting with each other; we believe this to be an integral part in the future use cases of HollowDB; especially as the zero-knowledge circuits grow in availability. For example, a zero-knowledge proof that you own the private key of some wallet address could be added to the authorization logic, adding an additional circuit on top of the key-preimage knowledge circuit. In another case, different hash functions could be used for different preimage knowledge proofs, requiring different circuits.

Whitelisting

HollowDB allows operations to be whitelisted so that only certain addresses are allowed to use them. Both whitelisting and zero-knowledge proof authorization mechanisms can be enabled together or separately. For example, a user may want to use whitelists only, and not bother with zero-knowledge proof generation and verification.

A typical example of whitelisting is as follows: a backend server is used as a middleware between a client and HollowDB contracts. In doing so, an Arweave wallet (or Ethereum wallet) is kept in the backend, and only this wallet is whitelisted to do put and update operations. This way, we take the burden of wallet management from the user side and also pay transaction fees for them (which are often negligible).

The control over whitelisting is fine-grained, one can disable whitelist requirements for put but have it be required for update operations and such. HollowDB is amenable to the addition of new whitelists for new operations too; a user could define their own custom contract operations, and introduce specific whitelists for them, just like using the new circuits mentioned above!

Get Started

You can find the open source HollowDB library here or start building with ease via HollowDB-as-a-service by requesting early access to become one of the first builders : www.hollowdb.xyz

© 2023 FIRSTBATCH. ALL RIGHTS RESERVED.
PRIVACY