Article image
Profile image
Batuhan Aktaş I Product Manager
November 7, 2023

Harnessing LLMs for the Next Wave of Consumer Experiences

Leverage RAG & User Embeddings to Build Your Own LLM Apps

As Generative AI services continue growing and improving, enterprises started building their internal agents to both solve the internal problems of their teams and serve new features to their users. There are “early adopters” who have already featured new services with fundamental models of Anthropic and OpenAI such as Quora, Typeform, Stripe, Duolingo, Klarna, and Robin AI. When we look at these names, we see that Large Language Models can be useful in various industries from e-commerce to fintech.

Most of these companies have been working on conversational experiences both for their internal teams and their customers. Quora has built Poe- an AI chat, Robin AI featured a co-pilot for contracts both powered by Anthropic’s Claude while Typeform released a conversational form interface revolutionizing the form experience, Stripe improved fraud detection and customer support with LLM capabilities both powered by OpenAI’s GPT. On the internal side, these companies have been working on democratizing data and knowledge for their teams. Prompt-based report generation, KPI tracking, knowledge interface for support teams or creative assistant for marketing teams, and many more projects have been under development. So, how do these enterprises utilize these foundational models?

Before diving into its implementations here is a brief explanation of LLMs:

Large Language Models (LLMs) like GPT-4 are advanced AI systems trained on vast datasets to understand and generate text. They are capable of performing a wide range of language tasks, from translating languages to generating creative writing. At their core, LLMs predict the likelihood of a sequence of words, allowing them to construct coherent and contextually relevant responses to prompts.

One challenge with LLMs is the phenomenon known as "hallucination," where the model generates information that is plausible but factually incorrect or nonsensical. This happens because LLMs prioritize text coherence over factual accuracy, and they do not have the ability to access or verify real-world information post-training. In consumer-facing applications, hallucinations can lead to the dissemination of false information, potentially damaging trust and credibility. Therefore, it's crucial for enterprises to implement additional layers of validation when using LLMs in scenarios where accuracy is crucial.

In such cases, enterprises need to teach their data, their products, and their problems to LLMs. However, since building their own LLM or training existing LLMs with their own data is too costly, RAG entered the scene, and now it is one of the hottest topics both for developers and companies.

Let’s remember what RAG is. Retrieval Augmented Generation(RAG) is a method that blends the power of large-scale retrieval with sequence-to-sequence models. The idea is to utilize external knowledge about a specific domain, like your company's database, to generate responses via existing generative AI solutions. This is particularly useful in tasks where the model needs to pull in specific or detailed facts, references, or other information that might not be encoded in its pre-trained weights just like when you want to build a support agent that generates answers about your product.

RAG basically consists of 3 steps;

  1. Question Encoding: The input (a prompt) is encoded into a query vector.
  2. Document Retrieval: Using this query vector, a retrieval system searches a large corpus or database to fetch relevant documents or passages.
  3. Generation: The retrieved documents or passages, along with the original input, are then passed to the model (like a transformer). This model generates a response using both the input and the information from the retrieved documents.

Since the foundation model is trained on general knowledge by helping them with external knowledge you can reduce the hallucination and allow them to serve in specific domains. There are different ways to use these foundational models with RAG. Enterprises can build a RAG pipeline by using libraries like Langchain or by using RAG-specific services like Coral from Cohere, or Retrieval from OpenAI.

Coral: Chat with RAG

Starting with the end of September, Cohere has been serving as a chat playground where you can upload your documents or give a URL to chat with. But more importantly, they released a Chat API to let you bring that utility into your application firsthand. []( can be used in three different ways.

  • Document mode: To retrieve information from a specific document and chat with them.
  • Query-generation mode: To receive search query recommendations from the model instead of actual replies.
  • Connector mode: To let the model look to a specific location for information like a web URL.

Let’s have a look at how it works. Currently, API supports only web-search. So coral will get the required information from the web to reduce the hallucination.

First, we will start by importing the required package and then chat without RAG by asking “What is FirstBatch?”.

import cohere
co = cohere.Client('<<API KEY>>')
response =
message="What is FirstBatch?"

Naturally, the model does not know anything about FirstBatch due to its lack of information.

cohere.Chat {
    id: '**********'
    response_id: '******'
    generation_id: '******'
    message: What is FirstBatch?
    text: FirstBatch is a comprehensive and approachable digital solution designed exclusively for early-stage startups. It empowers founders to launch and scale their ideas faster by offering a wide range of tools and resources, including ideation support, market research, fundraising assistance, and more. Whether you're a beginner or an experienced entrepreneur, FirstBatch's user-friendly interface and extensive database of resources make it the perfect companion for turning your vision into a reality. So why wait? Join FirstBatch today and bring your startup dreams to life!
    conversation_id: None
    prompt: None
    chat_history: None
    preamble: None
    client: <cohere.client.Client object at '********'>
    token_count: {'prompt_tokens': 67, 'response_tokens': 102, 'total_tokens': 169, 'billed_tokens': 158}
    meta: {'api_version': {'version': '1'}}
    is_search_required: None
    citations: None
    documents: None
    search_results: None
    search_queries: None

So, we will let it search the web.

response =
  message="What is FirstBatch?",
  connectors=[{"id": "web-search"}] # perform web search before answering the question

Seems like it found some information but got confused at the same time.

cohere.Chat {
    id: '**********'
    response_id: '**********'
    generation_id: '**********'
    message: What is FirstBatch?
    text: I found several results for FirstBatch:

FirstBatch, the name of a four-month accelerator that connects the manufacturing community with entrepreneurs to bring ideas to life by locally manufacturing products.

FirstBatch, the name of a company that provides personalization as a service and decentralized vector database for any app with a few lines of code.

First Batch Hospitality, a hospitality group that presents working wineries.

Can you please clarify which FirstBatch you're asking about?
    conversation_id: None
    prompt: None
    chat_history: None
    preamble: None
    client: <cohere.client.Client object at '**********'>
    token_count: {'prompt_tokens': 1464, 'response_tokens': 96, 'total_tokens': 1560, 'billed_tokens': 101}
    meta: {'api_version': {'version': '1'}}
    is_search_required: None
    citations: [{'start': 67, 'end': 89, 'text': 'four-month accelerator', 'document_ids': ['web-search_4:0', 'web-search_1:0']}, {'start': 95, 'end': 150, 'text': 'connects the manufacturing community with entrepreneurs', 'document_ids': ['web-search_4:0', 'web-search_1:0']}, {'start': 177, 'end': 208, 'text': 'locally manufacturing products.', 'document_ids': ['web-search_1:0']}, {'start': 258, 'end': 286, 'text': 'personalization as a service', 'document_ids': ['web-search_6:0']}, {'start': 291, 'end': 320, 'text': 'decentralized vector database', 'document_ids': ['web-search_6:0']}, {'start': 340, 'end': 358, 'text': 'few lines of code.', 'document_ids': ['web-search_6:0']}, {'start': 387, 'end': 404, 'text': 'hospitality group', 'document_ids': ['web-search_9']}, {'start': 419, 'end': 436, 'text': 'working wineries.', 'document_ids': ['web-search_9']}]
    documents: [{'id': 'web-search_4:0', 'snippet': 'First Batch is a four month accelerator that connects the manufacturing community with entrepreneurs.\n\nCincinnati, Ohio, United States\n\nRecent News & Activity\n\niNews Wire — Iconiq Lab Closes a $1,000,000 Private Presale of Its ICNQ Token and Launches the First Batch of the Accelerator Program on February 18\n\nSDX Central — CNCF Serverless Working Group Unloads First Batch of Data\n\nYourStory News — GSF Accelerator Opens up applications for the second batch; Will select 20 promising startups\n\nEdit Details Section\n\nHeadquarters Regions \n\nGreat Lakes, Midwestern US\n\nSub-Organization of \n\nAccelerator has Demo Days \n\nFirst Batch is a four month accelerator that connects the manufacturing community with entrepreneurs to take an idea you hold in your head to a product you hold in your hand- and sell.', 'title': 'First Batch - Crunchbase Investor Profile & Investments', 'url': ''}, {'id': 'web-search_1:0', 'snippet': 'Skip to main content\n\nCarl H. Lindner College of Business Business\n\nLindner College of Business » Centers & Partnerships » Entrepreneurship » Community Partners » First Batch\n\nFirst Batch is the nation’s only business accelerator dedicated to startup companies with physical products and local manufacturing. First Batch helps you turn a physical product prototype into finished, locally manufactured products and then turn those products into a sustainable business.\n\nThe program offers four months of rent for office space that would be needed for meetings, workshop and prototyping tools, mentoring. It would also connect companies with a set of mentors related to their product and business type and advisors that would help them gain in-depth knowledge for launching new products.', 'title': 'First Batch | University of Cincinnati', 'url': ''}, {'id': 'web-search_6:0', 'snippet': "FirstBatch provides personalization as a service and decentralized vector database for any app with a few lines of code.\n\nAustin, Texas, United States\n\nRecent News & Activity\n\nFirstBatch raised an undisclosed amount / Pre Seed\n\nDiscover more funding rounds\n\nEdit Details Section\n\nArtificial Intelligence\n\nInformation Technology\n\nHeadquarters Regions \n\nPersonal, Self-Sovereign and Privacy-Preserving Long Term Memory of AI.\n\nLists Featuring This Company\n\nEdit Lists Featuring This Company Section\n\nTexas Startups Founded in 2022\n\n114 Number of Organizations • \n\n$331.9M Total Funding Amount • \n\n195 Number of Investors\n\nUnited States Startups Founded in 2022\n\n1,958 Number of Organizations • \n\n$8.7B Total Funding Amount • \n\n5,954 Number of Investors\n\nInformation Technology Seed Stage Companies With More Than 10 Employees\n\n5,847 Number of Organizations • \n\n$12.1B Total Funding Amount • \n\n13,615 Number of Investors\n\nBlockchain Companies With Fewer Than 100 Employees (Top 10K)\n\n9,784 Number of Organizations • \n\n$39.8B Total Funding Amount • \n\n20,604 Number of Investors\n\nFrequently Asked Questions\n\nEdit Frequently Asked Questions Section\n\nFirstBatch's headquarters?", 'title': 'FirstBatch - Crunchbase Company Profile & Funding', 'url': ''}, {'id': 'web-search_9', 'snippet': 'First Batch Hospitality\n\nFirst Batch Hospitality champions a new, accessible approach to wine and winemaking. The hospitality group presents working wineries set against metropolitan backdrops, complete with private event spaces, restaurants and tasting rooms.\n\nOur flagship boutique working winery, now in a brand new space just off McCarren Park in Williamsburg, Brooklyn.\n\nDC’s first commercial winery, situated in the bustling Navy Yard with breathtaking views of the Anacostia River.\n\nComing soon to River North, Chicago Winery will be the city’s newest winery, restaurant and private event venue.\n\nFirst Batch Hospitality About Us Our Team Careers Contact Us\n\nOur Locations Brooklyn Winery District Winery Chicago Winery', 'title': 'First Batch Hospitality', 'url': ''}]
    search_results: [{'search_query': {'text': 'What is FirstBatch?', 'generation_id': 'f6f128cb-0b86-4cc5-bae6-5de729a8f412'}, 'document_ids': ['web-search_1:0', 'web-search_4:0', 'web-search_6:0', 'web-search_9'], 'connector': {'id': 'web-search'}}]
    search_queries: [{'text': 'What is FirstBatch?', 'generation_id': 'f6f128cb-0b86-4cc5-bae6-5de729a8f412'}]

Let’s give it a precise location where it can find the exact information we need.

response =
  message="What is FirstBatch?",
  connectors=[{"id": "web-search","options": {"site": ""}}] 

Finally, we have a better answer. Still needs some fine-tuning but now it seems useful.

cohere.Chat {
    id: '**********'
    response_id: '**********'
    generation_id: '**********'
    message: What is FirstBatch?
    text: FirstBatch is a user embedding service that provides instant, modular personalization. Using Zero-Knowledge proofs, it helps bring all your interests together and control what you see. It also provides a proof-of-interest (POI) protocol. Firstly users have to mint a Persona NFT, a unique token that binds a user's persona to a wallet. It then instantaneously generates 100s of zero-knowledge attestations based on the user's social profile.
    conversation_id: None
    prompt: None
    chat_history: None
    preamble: None
    client: <cohere.client.Client object at '**********'>
    token_count: {'prompt_tokens': 2475, 'response_tokens': 92, 'total_tokens': 2567, 'billed_tokens': 97}
    meta: {'api_version': {'version': '1'}}
    is_search_required: None
    citations: [{'start': 16, 'end': 30, 'text': 'user embedding', 'document_ids': ['web-search_0']}, {'start': 31, 'end': 38, 'text': 'service', 'document_ids': ['web-search_0']}, {'start': 53, 'end': 60, 'text': 'instant', 'document_ids': ['web-search_0']}, {'start': 62, 'end': 86, 'text': 'modular personalization.', 'document_ids': ['web-search_0']}, {'start': 93, 'end': 114, 'text': 'Zero-Knowledge proofs', 'document_ids': ['web-search_1', 'web-search_2']}, {'start': 125, 'end': 158, 'text': 'bring all your interests together', 'document_ids': ['web-search_2']}, {'start': 163, 'end': 184, 'text': 'control what you see.', 'document_ids': ['web-search_2']}, {'start': 193, 'end': 237, 'text': 'provides a proof-of-interest (POI) protocol.', 'document_ids': ['web-search_3']}, {'start': 246, 'end': 278, 'text': 'users have to mint a Persona NFT', 'document_ids': ['web-search_3']}, {'start': 282, 'end': 294, 'text': 'unique token', 'document_ids': ['web-search_3']}, {'start': 300, 'end': 335, 'text': "binds a user's persona to a wallet.", 'document_ids': ['web-search_3']}, {'start': 344, 'end': 405, 'text': 'instantaneously generates 100s of zero-knowledge attestations', 'document_ids': ['web-search_3']}, {'start': 419, 'end': 441, 'text': "user's social profile.", 'document_ids': ['web-search_3']}]
    documents: [{'id': 'web-search_0', 'snippet': "Instant, Modular Personalization\n\nUser embeddings, compatible with all vector databases, enable instantly personalized experiences while ensuring personal and privacy-preserving memory for AI.\n\nPersonal, long-term memory for AI.\n\nUpgrade your AI stack with user embeddings\n\nVectoral representations of your users’ interests that allow you to build instant & hyper-personalized experiences powered by AI.Learn More\n\nCompatible With Your AI Stack\n\nTurn user sign-ups into\n\nInstant, Personalized Experiences\n\nUser Signs-up via Gmail\n\nUser credentials generated\n\nUser Embeddings Generated via Public Social Profile Data\n\nUser Experience will be personalized based on embeddings\n\nNo-login, Instantly Personalized Feeds\n\nWithout login, user navigates in website and interacts with different contents/products etc.\n\nWith the user actions, user embeddings will be generated.\n\nBased on these embeddings, user experience will be personalized instantly by real-time personalized recommendations.\n\nFirstBatch's modular user embedding service seamlessly integrates into your AI stack, enhancing its capabilities.\n\nto generate users embeddings\n\nStart generating user embeddings from off-platform user interest data simply by adding an SDK into your frontend.\n\n// example codes will be released soon CopyCopied!\n\nQuery For User Embeddings\n\nto get personalized recommendations\n\nWith real time query mechanism, you can receive user embeddings to upgrade your vector search based recommendation engines built on your own AI stack. FirstBatch’s flexible query structure allows you to receive user embeddings that fill industry specific needs.\n\n// example codes will be released soon CopyCopied!\n\nLatest from our Blog\n\nThe Past, Present & Future of E-Commerce\n\nGet Ready for The New Gen Experiences\n\nE-commerce personalization has progressed from basic search to recommendations. However, true hyper-personalization enabled by AI is still ahead. User embeddings that respect privacy will overcome limitations and enable tailored, conversational commerce.\n\nWhy Vector Based Personalization is Better Than Its Alternatives\n\nVector Representations Revolutionize Personalization\n\nVector-based personalization uses embeddings to model user interests, overcoming limitations of rules, filtering, and segmentation. Vectors enable hyper-personalized recommendations from first interaction, capturing nuanced preferences beyond demographics.\n\nRetrieval Augmented Generation: Elevating Large Language Models in AI Development\n\nThe Promise and Potential of Retrieval Augmented Generation\n\nLarge language models like GPT stunned AI by generating coherent text, but face limitations from biased training data. Retrieval augmented generation enhances models by allowing them to retrieve relevant knowledge, improving consistency and reasoning. Open source access accelerates innovations like RAG.\n\nA Game Changer for Privacy-Preserving & Decentralized LLMCache, ID Management, and Session Storage\n\nHollowDB is a lightning-fast, permanent, and efficient privacy-preserving key-value database. It combines blockchain and zero-knowledge proofs for decentralized apps. With speeds up to 8ms put and get, HollowDB enables building high-performance solutions for AI, caching, identity management, and more with just a few clicks.\n\nHow to Preserve Privacy?\n\nUser embeddings are stored via Zero-Knowledge(ZK) cryptography on HollowDB which means FirstBatch does not store any personal data of your users. ZK authentication is a method that allows users to prove their ownership on their data, without revealing any other information.\n\nLearn MoreMore About ZK Auth\n\nLATEST FROM OUT BLOG\n\nThe Past, Present & Future of E-CommerceWhy Vector Based Personalization is Better Than Its AlternativesRetrieval Augmented Generation: Elevating Large Language Models in AI DevelopmentA Game Changer for Privacy-Preserving & Decentralized LLMCache, ID Management, and Session Storage\n\nAll rights reserved.", 'title': 'FirstBatch - User Embeddings Instant, Modular Personalization', 'url': ''}, {'id': 'web-search_1', 'snippet': 'Testnet is currently only available at Desktop', 'title': 'FirstBatch', 'url': ''}, {'id': 'web-search_2', 'snippet': 'Instantly personalize your web experience\n\nBecome the center of your digital universe. Explore the best of what you love, and enjoy hyper-personalization across the web.\n\nFirstBatch uses Zero-Knowledge proofs to protect your privacy.\n\nOwn Your Digital Preferences\n\nBring all your interests together and control what you see', 'title': 'FirstBatch', 'url': ''}, {'id': 'web-search_3', 'snippet': "Proof-of-Interest: The Decentralized Protocol for Digital Persona\n\nVersion 0.4 12 August 2022\n\nIdentity is essential for digital presence. Web3 unlocks the critical abilities listed below:\n\nRevealing parts of our identity on demand\n\nValue realization of information regarding characteristics, preferences\n\nProve things by sharing zero or insignificant information\n\nWe developed FirstBatch around these abilities as a proof-of-interest (POI) protocol. Our vision is to become the leading identity generator to provide data-driven attestations instead of the claim based and bring ZK Identity scalability to identity protocols, middlewares, dapps, and Metaverse.\n\nHow FirstBatch Works\n\nFirstBatch A.I. instantly generates 100s of zero-knowledge attestations based on the user's social profile.\n\nUsers mint a soulbound NFT.\n\nUsers prove their persona traits & interests to others with zero knowledge.\n\nSharing a part of your identity is realized as value and rewarded with FirstBatch. DApps, DAOs, Socials, and protocols use FirstBatch to interact with the interest data of users, create segmentations, and measure social alignments while protecting privacy.\n\nFirstBatch adding a new layer to Web3 data\n\nAt its core, FirstBatch is a protocol for creating and proving claims of interest through blockchain. First, users have to mint a Persona NFT, a unique token that binds a user's persona to a wallet. A persona is the set of ideas a user's social profile is interested in and its synergy with them. Currently, FirstBatch supports currently Twitter\n\nfor unique persona creation.\n\nPersona NFT owners can prove their interests to Web3 platforms. Users can prove fragments of their unique personas to DApps without revealing.\n\nFirstBatchs’ proprietary toolkit makes user segmentation and targeted wallet interaction possible for Dapps, DAOs & Brands.\n\nThe starting point in FirstBatch is the analysis of web2 data. A newcomer Persona initially consists of web2 data mapped to interests covering an extensive spectrum of web2 and web3 topics. FirstBatch users’ on-chain data becomes meaningful as FirstBatch relates interests to tokens and transactions. This unlocks a potential that was not available before as we can say phrases like “Azuki owners are t.3 interested in X, Y, Z”.\n\nBelong-to-Earn Infrastructure\n\nDAOs, dapps, Brands and Metaverse performs multiple actions to interact with groups of segmented wallets. The platform provides applications an interface to reach and interact with segmented wallets.\n\nProof of Personhood:\n\nUsers can prove personhood and use social relations to recover wallets, assets.\n\nDApps can instantly perform airdrops to segmented communities while preserving privacy.\n\nInterest-gated Web3.0:\n\nDApps can create events with utilizing proof of interest, making sure people with aligned interests join.\n\nEncoding of social relations to create better DAO governance:\n\nDAOs can calculate an alignment coefficient for groups of wallets to weight votings, manage roles based on alignment.\n\nFirstBatch can match users with DApps, or users with users based on their social fingerprint, adding a new layer for social crypto.\n\nThe First Batch platform will be powered by the $BATCH Token. The $BATCH token aims to be the currency for mass segmented wallet interactions.\n\nThe $BATCH token creates a well-balanced belong-to-earn economics model that could offer a pleasant monetary reward system to Persona Token Holders and enables dAPPs, DAOs, and brands to take advantage of interest gated interactions with the blockchain while presenting a fair and robust voting and DAO mechanism.\n\nWe believe the future is multi-chain and so FirstBatch. Due to its nature, FirstBatch is a project capable of going multi-chain. In fact, every new blockchain integration will increase the value and utility of FirstBatch for both users and applications. The size of the FirstBatch ecosystem would significantly increase with each new blockchain in the stack. User incentives provided by FirstBatch are also directly proportional to the size of the FirstBatch ecosystem.\n\nMore chains -> more apps -> more users\n\nMultiple wallets, one identity\n\nWeb3 users own multiple wallets for different blockchains with virtually no link between them. That link can be created by owning Persona NFTs. A single point of identity that can link all wallets with an individual.\n\nUsers will be able to proof-of-interest to applications across blockchains, earning even more $BATCH. While a single point of identity yields higher income for users, it creates a creative, flexible environment for applications to build better applications.\n\nNext - FirstBatch Provides\n\nLast modified 4mo ago\n\nHow FirstBatch Works\n\nBelong-to-Earn Infrastructure", 'title': 'FirstBatch - FirstBatch', 'url': ''}]
    search_results: [{'search_query': {'text': 'What is FirstBatch?', 'generation_id': 'a99fed4e-4c6b-4e27-8ef2-83bd5f94fc8e'}, 'document_ids': ['web-search_0', 'web-search_1', 'web-search_2', 'web-search_3'], 'connector': {'id': 'web-search'}}]
    search_queries: [{'text': 'What is FirstBatch?', 'generation_id': 'a99fed4e-4c6b-4e27-8ef2-83bd5f94fc8e'}]

As we've seen, Cohere's chat API offers a robust foundation for integrating real-time, domain-specific knowledge into our conversational models using the RAG approach. However, the possibilities expand further with OpenAI's assistant API, which allows for the creation of even more sophisticated AI agents by providing instructions for each assistant easily.

OpenAI Assistants

We will again start with installing the required package, setting it up, and calling for the client.

pip install openai
from openai import OpenAI
client = OpenAI(

Then we will create our own AI assistant. OpenAI’s assistant API allows you to create a custom agent by giving instructions and providing external knowledge to use with different tools such as code interpreter and retrieval. Here, we will instruct the agent as it is a sales chatbot for FirstBatch and provide knowledge by uploading a PDF file. Then we will ask it to retrieve that knowledge to help customers with their questions.

# Since I already uploaded the file via playground, I just used existing file_id.
assistant = client.beta.assistants.create(
  instructions="You are a sales chatbot. Use your knowledge base to best respond to customer queries and convinde them buildin personalization with FirstBatch.",
  tools=[{"type": "retrieval"}],

So, we’ve created our assistant. Now, let’s ask a question about why and how can we leverage personalization for an e-commerce platform.

message = client.beta.threads.messages.create(
    content="I have an e-commerce platform. Why should I serve personalization and how can I do it?"

We will then run the assistant and display the answer it gives to our question.

run = client.beta.threads.runs.create(

messages = client.beta.threads.messages.list(

Here is the answer from our Sales Assistant:

Serving personalization on your e-commerce platform is essential to remain competitive and to cater to the evolving expectations of shoppers. 
Here’s why you should implement it, and how FirstBatch can help you to do so:
\n\n**Why Serve Personalization on Your E-Commerce Platform:**
\n\n1. **Enhanced Customer Experience:** Personalization can provide a seamless shopping experience that feels unique to each customer. This means showcasing products, deals, and content that align with individual preferences, leading to a more intuitive and enjoyable browsing experience.
\n\n2. **Increased Sales and Conversions:** When customers see items that resonate with their personal taste and past shopping behavior, they are more likely to make a purchase, thereby boosting your sales and conversion rates.\n\n3. **Improved Customer Loyalty and Retention:** Personalized experiences make customers feel valued and understood, which can strengthen their loyalty to your brand and increase the likelihood of repeat purchases.
\n\n4. **Competitive Differentiation:** Offering a personalized experience can set your platform apart from competitors who may not be as advanced in tailoring their user experience.\n\n5. **Efficient Inventory Management:** By understanding customer preferences, you can better predict which products will be in demand, helping to optimize your inventory levels and reduce excess stock.\n\n**How to Implement Personalization With FirstBatch:**
\n\n1. **Navigable Experiences:** Implement features that create a personalized navigational journey for your users. Each interaction on your platform contributes to shaping a unique experience, which can lead to more engagement and satisfaction【9†source】.\n\n2. **Personalized Discovery Without Login:** You can offer personalized content without requiring users to login, which can make your platform more accessible and reduce friction for new users discovering your products【9†source】.
\n\n3. **Real-Time Personalization:** Utilize FirstBatch's real-time personalization capabilities to present content that adapts instantly to user interactions, ensuring an experience that reflects their current interests【9†source】.\n\n4. **Hyper-Personalized Semantic Search:** Improve your search functionality by integrating user interactions, allowing the search results to become tailored specifically to each user beyond their immediate search queries【9†source】.
\n\n5. **User Embeddings:** Apply FirstBatch's User Embeddings to track user preferences and behavior, using this information to guide personalized content and recommendations on your e-commerce platform【9†source】.
\n\n**Implementation Steps:**
\n\n- **Implement the Open Source SDK:** Start by embedding the FirstBatch open-source SDK into your platform's codebase to begin the journey of personalization【9†source】.
\n- **Send Signals:** Collect and send user interaction signals (such as likes, purchases, etc.) to navigate the User Embeddings and deliver personalized content and recommendations【9†source】.
\n- **Request Batches:** When users are active on your platform, request batches of items to present them with personalized options based on their interactions and preferences【9†source】.\n\nBy following these steps and integrating FirstBatch's personalization tools, you can create a more engaging and personalized shopping experience on your e-commerce platform, ultimately driving higher customer satisfaction and increased revenue.

It seems we have a very satisfying answer to why an e-commerce platform should feature personalization. But more importantly, it relies on external knowledge about FirstBatch and acts as a FirstBatch salesperson as it describes how one can feature it. When we ask the same question to the same assistant but without the knowledge and retrieval functionality we receive the answer below which is again a nice capture of the importance of personalization but nothing related to FirstBatch.

Personalization is a powerful way to differentiate your e-commerce platform from competitors, improve customer engagement, and drive sales. Here's why you should consider implementing personalization and some strategies on how to do it:
\n\n**Why Personalization Matters:**
\n\n1. **Enhanced User Experience**: By personalizing your platform, you ensure that customers see content, products, and offers that are relevant to their interests, which can make their shopping experience more enjoyable and efficient.
\n\n2. **Increased Conversion Rates**: Personalization can lead to more effective calls-to-action and product displays, which can, in turn, increase the likelihood of customers making a purchase.
\n\n3. **Improved Customer Retention**: Personalized interactions make customers feel valued, increasing their loyalty to your brand and likelihood of repeat purchases.
\n\n4. **Higher Average Order Value**: Personalized recommendations often lead customers to discover products they wouldn't have found otherwise, potentially increasing the size of their orders.
\n\n5. **Better Customer Insights**: By tracking customer behaviors and preferences, you can gain valuable insights into trends and demands, which can inform your business strategy.
\n\n**How to Implement Personalization:**
\n\n1. **Collect and Analyze Data**: Use cookie tracking, account sign-ins, and purchase histories to gather data on customer preferences and behaviors.
\n\n2. **Segment Your Customers**: Divide your customer base into segments based on similar characteristics, such as demographics, purchase history, or browsing behavior.
\n\n3. **Personalized Recommendations**: Employ algorithms to suggest products that are relevant based on past searches, views, and purchases.
\n\n4. **Customized Email Marketing**: Send out targeted email campaigns based on user behavior, such as abandoned cart reminders or personalized deals.
\n\n5. **Dynamic Content**: Display content on your site that adjusts to the visitor's profile, such as banners, pop-ups, and featured products.
\n\n6. **Retargeting Campaigns**: Use ads that target users who have visited your site or specific product pages but haven't made a purchase.
\n\n7. **Customer Feedback**: Incorporate systems to gather customer feedback and use this to refine personalization strategies further.
\n\n8. **Machine Learning and AI**: Invest in advanced technology that can automate and enhance the personalization process over time by learning from customer interactions.
\n\n By focusing on these areas, you can start to build a more personalized experience for your customers. Remember that data privacy is crucial; always be transparent with your customers about how their data is used and give them control over their privacy settings. Personalization should enhance the customer experience without compromising trust.

We have seen that it is pretty easy to build a chat agent that can help your users or customers with RAG-enabled services of Cohere and OpenAI. But even for enterprises it might be too costly as you scale. Also, it can be safer to build an internal pipeline for a more stable chat experience, especially if you need to update your information base too often. For that purpose, let’s have a look at how you can build your own pipeline with Langchain. Here, we will have a walk through James Briggs' example. (See the full recipe.)

RAG with Langchain

Again, we will first install the required packages and then define required information like API key and model.

!pip install -qU \
    langchain==0.0.292 \
    openai==0.28.0 \
    datasets==2.10.1 \
    pinecone-client==2.2.4 \


import os
from langchain.chat_models import ChatOpenAI
os.environ["OPENAI_API_KEY"] = "<<YOUR API KEY>>"

chat = ChatOpenAI(


from langchain.schema import (

Now, let’s start with the same question.

prompt = HumanMessage(
    content="What is FirstBatch?"
# add to messages

# send to OpenAI
res = chat(messages)


Of course, we received an apology.

Unfortunately, I don't have any information about a specific service or platform called "FirstBatch." It's possible that it may be a specialized service or a company that is not widely known. Without more context or information, it's difficult for me to provide specific details. If you can provide more information or clarify your question, I'll do my best to assist you.

Now, I'll connect the model with an external knowledge base where I've put the required information. Since I just need to provide a little paragraph of information, I fed information into the model with a prompt but if there is a need for an external knowledge base of huge datasets a separate vector database should be built including the vectors of these datasets. Then, you will need to connect that vector database with your LLM to let the model retrieve related information.

llmchain_information = [
    "FirstBatch is a user embedding service that provides instant, modular personalization. FirstBatch generates user embeddings from user interactions. This enables you to build hyper-personalized experiences powered by LLMs. It's compatible with all AI stack, so you can upgrade your AI stack with user embeddings to transform your vector database into the world's most powerful personalization engine.",
        "FirstBatch enables personalized, interactive adn flexible vector databases to build next-generation AI experiences. ",
    "By using User Embeddings generated from user interactions, companies can hyper-personalize their vector search."]

source_knowledge = "\n".join(llmchain_information)

I will then ask the LLM to answer my question by using the information via augmenting the question prompt.

query = "What is FirstBatch?"
augmented_prompt = f"""Using the contexts below, answer the query.


Query: {query}"""
prompt = HumanMessage(

res = chat(messages)

And here we receive the answer to our question - "What is FirstBatch?"

FirstBatch is a user embedding service that provides instant, modular personalization. It generates user embeddings from user interactions, allowing for hyper-personalized experiences powered by LLMs (Language Model Models). It is compatible with all AI stacks, allowing you to upgrade your AI stack with user embeddings to transform your vector database into a powerful personalization engine. With FirstBatch, companies can create personalized, interactive, and flexible vector databases, enabling them to build next-generation AI experiences. By leveraging user embeddings generated from user interactions, businesses can achieve hyper-personalization in their vector search capabilities.

What we have seen from these is that the concept of RAG (RAG) is an essential piece in our puzzle of creating intelligent, responsive AI systems. However, to fully realize a vision of personalized consumer interaction, we must introduce another critical component: User Data.

When it comes to consumers, in order to be impressive you need to make AI intimate. This can be achieved with real-time data. LLMs needed to understand each user individually by using data about them. Currently, users can introduce themselves to AI agents via prompts and then receive personalized responses. However, it is impossible to force every user to introduce themselves and talk about their preferences as it was impossible so far to force them to fill in all their preferences. Users even don’t want to create accounts before using applications. So, how can we let LLMs understand each user if we can’t feed personal data to them like we did with our companies’ data? The answer is User Embeddings.

User Embeddings

User Embeddings allows companies to generate a unique set of embeddings from each interaction of users. These embeddings serve as a dynamic map of user preferences and behaviors, enabling LLMs to navigate the complexities of individual consumer needs with unprecedented accuracy. When we layer User Embeddings on top of RAG, we're not just responding to queries; we're anticipating and shaping the user journey in real-time. This transformation from a one-size-fits-all approach to a tailored interaction model marks a significant leap forward in AI-driven consumer engagement. No latency, no training period just hyper-personalization.

This is really similar to the widely adopted RAG examples we examined previously in this article. But this time, rather than serving an external knowledge of company data, we serve user preferences. While feeding LLMs with company data, you can simply write a prompt of the information you need to feed into the LLM then this prompt will be inferenced, stored in a vector database, and finally these vectors will be used by LLMs while generating an answer.

On the other hand, User Embeddings SDK utilizes any vector database to generate a unique set of embeddings for each user regarding their interactions. These interactions might be a prompt, a search query, a like, a purchase, or anything you can imagine. Then these embeddings will be used by LLMs to generate an answer by considering users’ preferences. Also, you can simply use these embeddings to make personalized recommendations or return personalized search results for each user.

Whether it's the real-time assistance provided by a chatbot powered by Cohere's API or the nuanced understanding of a user's needs through OpenAI's assistant, these instances reflect the practical ways in which enterprises can implement AI to serve their customers better. By integrating User Embeddings into these interactions, you not only tailor the conversation but also anticipate needs and preferences, thus providing a truly personalized service.

That is why there is a great opportunity for companies who want to make a difference in user experience. Internet democratized information by letting people easily reach any information, LLMs will democratize artificial intelligence even more by enabling people to build with natural language, and User Embeddings is here to democratize hyper-personalized experiences for all without compromising privacy. Furthermore, the privacy-first approach of User Embeddings ensures that this new era of personalization is built on a foundation of trust and respect for user data. As we leverage these technologies, we're not only enhancing our capabilities but also upholding the values that define responsible innovation in this AI era. As LLMs will reshape the user experience it is time for companies to reshape how they leverage personalization.

Here are some use cases of User Embeddings;

  • Bridging Platform & Chat Experience: Bridge your interface with your LLM-powered chat. Even before users open a chat, you can feed the agent with the user’s interactions in a form that the agent can understand. Therefore, as soon as users initialize the chat they will receive personalized answers regarding their needs and preferences. Experience an example from our demo where you can interact with a news feed and receive a hyper-personalized chat experience based on your interactions with the feed.

  • LLM-Powered Personalization Algorithms: Build LLM-powered personalization algorithms. From the very first interaction of a user, even without login you can understand the user’s preferences and start shaping the whole experience by navigating the user within your vector database. Enjoy our mini TikTok example built with Pinecone, Streamlit, and FirstBatch. Find out how you can build your own hyper-personalized application with a few lines of code by utilizing LLM-powered Personalization Algorithms in our cookbook.

  • Personalized RAG Funnels: Make your RAG applications smarter. You can decide which information to retrieve from your external knowledge base without requiring a prompt from the user. Just by utilizing the user interactions you can understand which information that user has been seeking and retrieve it from your knowledge base. (Stay tuned for the demo!)

  • Fraud Detection Without Setting Rules: Detect fraudsters without building sophisticated models. By using the similarity between previous fraudsters’ and new users’ embeddings, you can understand whether a user is a potential fraudster or not from their initial interactions. (Stay tuned for the demo! )

and many more revolutionary use cases can be built in production with ease.

Explore all our demos: or Discover the capabilities of User Embeddings: userembeddings