Article image
Profile image
FirstBatch I Company
July 27, 2023

User Embeddings: The missing piece of the LLM stack

Advance Your AI Stack

I. Introduction

Large Language Models (LLMs) have surfaced as a new, potent toolset for constructing software.

As with most emerging technologies, these new models present distinct advantages and challenges.

The core of the LLM application stack comprises a set of systems, tools, and design patterns that form an effective architecture for deploying LLMs. While the exact stack may vary depending on an application's specific needs, some elements have proved effective across different use cases.

You can explore articles from a16z and Sequoia on the evolving LLM application stack:

Emerging Architectures for LLM Applications

Emerging Architectures for LLM Applications

The New Language Model Stack



The architecture of the web is undergoing significant changes.

Currently, the internet operates on an inverted client/server model where each user represents a node on the network, and to accomplish any objective, the user must interact with the server.

However, the future envisions users functioning as personal servers - human beings becoming API endpoints. Users will gain complete control over their own data and information.

The road ahead is lined with the concept of customized autonomous agents for every activity we undertake.


To fully realize the transition to autonomous agents and offer users 'plug-and-play' experiences where they feel at ease, we must enhance the LLM stack with robust privacy measures, personalized features, and autonomous personal authority.

In this article, we will discuss the areas in the stack needing cryptographic privacy, interoperable personalization, and self-sovereignty.

Additionally, we will provide solutions to the challenges highlighted by a16z and Sequoia.

II. We are adopting Privacy-Preserving and Democratic approach for the following reasons

82% of the world’s data is unstructured, and LLMs transform unstructured data into vectors. In the new age of AI, it will be the race to intimacy on the structured data generated by LLMs.

At FirstBatch, our work is centered around the development of intimate AI products, fostering deeper connections with users through highly personalized interactions with AI.

We offer user embeddings, compatible with all vector databases, enabling instantly personalized experiences while ensuring personal and privacy-preserving memory for AI.

Interacting with AI models can raise privacy concerns regarding user data due to data collection, storage, and processing.

It is essential that third parties provide cryptographic techniques, such as zero-knowledge proofs, as a means of verifying user data without accessing it.

FirstBatch addresses privacy and security concerns through permanent, decentralized storage and Zero Knowledge key-value access for identity management. Decentralized, permanent storage of vector embeddings allows for a democratic distribution and access mechanism for the data that AI needs to learn. This ensures that data is never stored in a single location, making it more secure and tamper-proof. Additionally, data owners retain full control of their data, preventing unauthorized access or misuse.

Data remains accessible even as it is updated and evolves over time. Besides centralized storages, decentralized storages offer the advantage of scalability and availability. Data is distributed across multiple nodes, which makes it more resilient and fault tolerant. This ensures continuity of data access even if some nodes go offline.

Additionally, data security is strengthened by storing it in the blockchain using Zero-Knowledge proofs (ZKP), making it increasingly difficult for malevolent parties to access or manipulate it.

III. Understanding Vector Embeddings

Embeddings allow LLMs to have memory.

Practically, in every undertaking related to machine learning, the primary step revolves around embedding or converting various forms of data into numerical representations.

Vector embeddings are a way of representing any object (words, images, people, documents, products, etc.) as a list of numbers, in other words, multidimensional vectors. The main idea is for similar objects to have a shorter distance between their vector representations compared to less similar ones. For example, the vector for “dolphin” would be much closer to “shark” than “theatre”.

One of the useful properties of vector embeddings is that they can capture semantic relationships and allow us to find words that are used in similar contexts or meanings. For example, if we subtract the vector for “woman” from the vector for “actress” and add the “man” vector to it, the resulting vector will be almost the same as the “actor” vector. This makes vectors comparable and allows the creation of efficient models for search and recommendation algorithms.

Since vector embeddings are used in many different areas, the methodology for each type of vectorization is different from others:

  • Word embeddings encode the meanings of words into vector representations. There are advanced methods for generating word embeddings such as Word2vec, GloVe, and BERT; each of them has a different approach, but we won’t be getting into the details. Word embeddings are useful for many NLP tasks like machine translation, text classification, and sentiment analysis.


  • Image embeddings represent images as vectors in a space that captures visual semantics. They are usually trained on large datasets of images labeled with keywords or categories. Image embedding models like ResNet and VGGNet use deep convolutional neural networks to analyze the visual elements of images and encode them into vector representations.

Image embeddings enable applications like image search, image captioning, and visual question answering.


  • Similarly, vector embeddings for other object types such as documents or products all have different requirements and methods that need to be accounted for before using embeddings, but ultimately these embeddings all share the key properties outline above.

These vector representations (a.k.a. embeddings) can be used as inputs for various machine learning algorithms and natural language processing tasks like clustering, classification, semantic search, and recommendation systems. Vector embeddings have thus become fundamental building parts for many applications in AI.

IV. Vector Databases

Once the vectors are obtained, akin to any form of data, an efficient storage solution becomes a necessity. Vector databases, often termed "long-term AI memory," are specifically crafted to proficiently store and handle vector embeddings. These databases are fine-tuned to support similarity search and nearest neighbor queries, paving the way for rapid and precise extraction of vectors depending on their congruity with a query vector.

Vector databases play a crucial role in machine learning by providing scalable and high-performance storage solutions for vector data, and facilitating tasks such as recommendation systems, content-based search, clustering, and anomaly detection.

Utilizing vector databases for machine learning tasks offers several advantages. They provide efficient storage and retrieval of vector embeddings, enabling faster query response times and real-time similarity searches. Vector databases are designed to handle high-dimensional vector data and offer specialized indexing structures and search algorithms, optimizing performance for similarity-based operations. They can scale horizontally to handle large volumes of vector data and can be seamlessly integrated with machine learning frameworks for streamlined workflows.

Several vector database technologies have emerged to address the specific requirements of vector storage and querying. Some notable examples include:


V. The omitted element in the assembly of outstanding AI products

Problem 1, stated by Sequoia:


Three common elements can be found in the most intimate AI products, such as TikTok's feed, Google Search, and Amazon's recommendation system:

  • Structured Data
  • Vector Search
  • High-Level User Data Utilization

To create Intimacy like the most successful AI products, we must integrate vector search and user data utilization into LLMs.

Considering specific requirements such as data privacy, scalability, and governance, it is important to acknowledge that it is still early for 99% of businesses to adopt vector databases. For many companies without dedicated AI developers or data scientists, running AI models and algorithms can be a significant challenge. To truly democratize AI-driven services, we must provide a composable stack that allows businesses to effortlessly implement personalization and recommendation systems.

Tailoring AI services and agents to meet user needs marks a pivotal stage in the AI adoption journey for countless businesses. Tech giants like Google, Amazon, and TikTok have mastered the art of structuring data, performing vector searches, and harnessing user data impeccably.

At FirstBatch, our mission is to empower businesses with the capability to perfect their use of user data, enabling them to construct deeply personalized AI products.

We've introduced a novel solution to vector databases called User Embeddings to address these challenges and enable businesses to leverage AI capabilities without requiring extensive expertise or resources. This ensures that the benefits of AI-driven services are accessible to a wider range of businesses, fostering a more inclusive and diverse ecosystem.

Problem 2, stated by a16z:


As previously noted, we're witnessing significant changes in the web's architecture.

As we transition to a model where humans essentially become the API endpoint, the interplay between user data and personalized autonomous agents will become critical to everything we do online.

AI agents are positioned to become a key component of the Large Language Model (LLM) app architecture. However, for these AI agents to truly serve users, we must improve the personalization pipelines. Unfortunately, the current user architecture of the web falls short of supporting the advancement of LLMs and AI agents.

The limitations are evident:

  • Users lose an excessive amount of time on straightforward tasks like self-introduction.
  • Services are siloed, inhibiting the sharing of comprehensive information.
  • The fear of data sharing looms due to privacy concerns.


In response to these challenges, user embeddings can empower Personal AI agents with Personal Long-Term Memory, bridging these gaps and making the AI web a more user-centric environment.

VI. Our Novel Solution: User Embeddings

User embeddings are vector representations of users that store their behaviors, interests, preferences, etc. They are important for enabling personalized and customized experiences across websites & applications, leveraging the unique properties of vector embeddings that allow efficient search and similarity calculation methods.

This is different from segmentation-driven recommendations, which uses user information to divide users into groups. With user embeddings, each user is treated as an individual, and experiences are tailored to their individual preferences. They leverage vector embeddings' unique properties that allow efficient search and similarity calculation methods. While traditional user data such as demographics and ratings provide limited insights, user embeddings generated from user interactions and behaviors can provide a much better understanding of each person. User embedding driven recommendations will allow businesses to offer more sophisticated services to their customers.

One of the most important things about user embeddings is that they can be integrated into any vector database thanks to their modular architecture. Once the product/content data is vectorized, it can be stored on a vector database together with the user embeddings, and the app can use these vectors to make recommendations when the user interacts with it.

Each interaction can be used to update the user embeddings, to better represent the users’ changing interests over time. As represented in the image below, user embeddings can be easily integrated into any platform that uses a vector database:


Personalization with FirstBatch is simple from the platform’s point of view: you just need to add the personalization plugin to any place on your website or app. No backend development, and no login credential management.

From the first interaction, without any prior in-session data from the users, FirstBatch empowers platforms with robust instant personalization that can be adapted to subsequent in-session behavior or preferences.


VII. Privacy for User Embeddings & LLM Caching

Problem 3, stated by Sequoia:


Cryptographic authorization and privacy hold paramount importance in the future landscape of user data utilization, storing embeddings for AI agents, and AI services.

Without the trust instilled by robust cryptographic authorization, businesses and users may hesitate to share sensitive data, significantly hindering AI advancement. Therefore, adopting robust cryptographic measures is not a mere option; it's an absolute necessity for fostering a secure environment where data sharing is safe, and innovation can thrive unimpeded.

FirstBatch is creating a secure environment where both businesses and users can confidently store their embeddings on a public ledger.

This confidence is fostered through the implementation of Zero-Knowledge authentication, a cryptographic method that allows users to prove their identities without revealing sensitive information. With Zero-Knowledge proofs, FirstBatch guarantees clients' data privacy by allowing data access authentication only to whitelisted parties.

This method of privacy preservation is extended to LLM caching operations, providing another layer of protection.

The importance of privacy in cache operations for LLMs lies in safeguarding user queries and cached contents, ensuring data confidentiality, and mitigating the risks of unauthorized access and potential privacy breaches.

LLM caching is of great importance in scaling LLM APIs and hosting operations. When optimized, it reduces computational overhead, improves response times, and enhances the overall performance of LLMs. By implementing intelligent and distributed caching solutions like FirstBatch’s HollowDB, LLM APIs can efficiently handle increased loads, deliver faster responses, and support a larger user base without sacrificing quality, performance, or privacy.

HollowDB is unique in that it supports optional caching with LMDB or Redis. Since HollowDB takes input from a smart contract instance, by applying cache overrides to the contract instance outside and passing that instance to HollowDB, builders will be able to use the caching of their choice for HollowDB.

In addition to flexibility, there are a few key benefits to using a decentralized key-value database like HollowDB instead of its cloud competitors for caching operations.

  • Firstly, permanent and tamper-resistant storage ensures the longevity and integrity of persistent logs that are important for transparency and security.
  • Secondly, there is no reliance on centralized cloud providers, resulting in increased censorship resistance and reduced dependency on a single entity for data availability.
  • Lastly, Arweave’s ( A blockchain specialized on permanent storage) pay-once storage model eliminates recurring costs associated with cloud-based services, making it an economically efficient solution for long-term caching needs.

HollowDB ensures data privacy for clients through the implementation of Zero-Knowledge proofs and whitelisting for data access authentication. Implementing Zero-Knowledge (ZK) access controls for caching operations offers privacy benefits by ensuring that sensitive data, such as user queries and cached contents, remains confidential and inaccessible to unauthorized parties. ZK access controls are crucial for enhancing user trust and complying with expanding privacy regulations.

VIII. Conclusion

With advancements in ML such as AGI, ethical considerations become paramount. The responsible and ethical use of machine learning requires careful attention to issues such as privacy, fairness, and transparency.

At FirstBatch, we are committed to advancing the connection between AI's long-term memory and user data, while prioritizing a privacy-preserving approach.

We recognize the intrinsic value of user data and the potential it holds for improving AI performance and accuracy, but we also understand the paramount importance of maintaining user trust through robust privacy measures.