No Huddle Offense

Your personal AI LLM based agent

February 28th, 2024 • Comments Off

Trevor Noah made an interesting comment in one of hit recent podcast:

You know, the new workforce becomes one where your GPT is almost your resume. Your GPT is almost more valuable than you are in a strange way. Do you know what I mean? It’s like a combination of everything you think and everything you’ve thought and the way you synthesize ideas combined with your own personal GPT becomes. And I mean, this is me just like thinking of a crazy future where you go, you literally get to a job and they go, what’s your GPT? And you say, well, here’s mine.

I found this comment intriguing, and based on that I thought, time to go a bit deeper on running AI models locally for privacy reasons; And best way to learn is to build it yourself. So I thought: Can I grab an AI model, add my e.g. CV to it (so it gets to know me), and I can ask it a few questions about me?

Preamble: the following blog post was created using exactly the same methodology as described within it (just instead of a CV it was given a rough draft of the post for the RAG). See notes at the bottom of the post for more information.

User: Can you create a blog post for me with the title: “Building your personal AI LLM based agent.”

Sure, I can help you draft a blog post titled “Building your personal AI LLM based agent” where we discuss optimizing a set of text paragraphs together. Here’s a suggested outline for the blog post:

Introduction: Discuss the potential future of personalized AI agents and the idea of a “personal GPT” as mentioned in the Trevor Noah and Sam Altman podcast.
Background: Briefly explain what an LLM (Language Model) is and its relevance to building a personalized AI agent.
Methodology: Introduce the concept of using RAG (~~Rapidly Agile Graphics~~ Retrieval Augmented Generation) and the llama_index package for building a local and relatively private personalized AI assistant.
Building the chat flow: Walk through the process of setting up the LLM, embedding model, and indexer, and creating a simple for loop chat function.
Personalizing the assistant: Demonstrate how to add custom prompts and chat history to make the assistant more effective and personalized to the user.
Conclusion: Recap the benefits and potential future applications of a personalized AI assistant and encourage readers to explore the […]

User: Can you write the introduction for me

In the “What Now?” podcast, Trevor Noah and Sam Altman discussed the potential future of personalized AI agents, where these models would get to know individuals, access their personal data, and work effectively in their context. This concept, as Trevor suggested, could lead to a future workforce where an individual’s GPT (Generative Pre-trained Transformer) becomes their resume or avatar, shaping how others interact with them. To build such a personalized AI agent, we can explore using a local, reasonably private approach, as opposed to relying on big companies for fine-tuning or accessing vast resources. One way to start is by using llama_index, a Python package, to build a simple chat flow for our local AI assistant. This assistant would be able to answer questions related to our expertise and knowledge, providing a more personalized and helpful interaction. The benefits of having a personalized AI agent include privacy, customization, and the ability to work effectively within our unique context. As we continue to explore this project, we’ll dive deeper into the methodology and code required to build and run our very own “personal AI LLM based agent.”

User: Can you write the background section for me

An LLM, or Language Model, is a type of artificial intelligence model that understands and generates human language. In the context of building a personalized AI agent, an LLM can serve as the foundation for understanding and responding to user queries in a more effective and personalized manner. This can be achieved by fine-tuning an LLM on an individual’s data or knowledge base using local models like LlamaCPP and indexing systems like Faiss or Annoy. By doing so, the AI agent becomes an intelligent assistant tailored to the user’s specific needs and preferences. This local approach offers several advantages over relying on large cloud-based models, including privacy, customization, and improved performance within the user’s unique context. For instance, you can create a local AI agent using the Mistral 7B model with LlamaCPP and the UAE-Large-V1 embedding model. First, download the Mistral 7B model for LlamaCPP manually from the Hugging Face model hub and provide it to the helper routine. Next, download the UAE-Large-V1 embedding model if not present/cached locally. Now that we have a model, we can fine-tune the AI agent.

User: Can you write the methodology section for me

Install the required packages for llama_index and Hugging Face Transformers using pip:

pip install llama-index transformers

Obtain a language model (LLM) and embedding model (EmbedModel) by downloading a pre-trained model from Hugging Face Model Hub or your local storage. For example, you can use Mistral-7b model for this purpose. Save the model files in a specific directory.

Create two helper functions to load the LLM and EmbedModel from the saved files:

def get_models(path_to_gguf):
    llm = base.LlamaCPP(
        model_path=path_to_gguf,
        context_window=2048,
        max_new_tokens=256,
        verbose=False
    )

    embedding_model = huggingface.HuggingFaceEmbedding(
        model_name="WhereIsAI/UAE-Large-V1",
        cache_folder="hugging_cache")

    core.Settings.llm = llm
    core.Settings.embed_model = embedding_model

    utils.set_global_tokenizer(
        transformers.AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1").encode
    )

    return llm, embedding_model

Initialize the indexer using llama_index by providing the storage directory and data directory. Create a query engine from the indexer:

def get_index(path_to_storage, path_to_data):
    if not os.path.exists(path_to_storage):
        # load the documents and create the index
        documents = readers.SimpleDirectoryReader(path_to_data).load_data()
        index = vector_store.VectorStoreIndex.from_documents(documents, show_progress=True)
        # store it for later
        index.storage_context.persist(persist_dir=path_to_storage)
    else:
        # load the existing index
        storage_context = core.StorageContext.from_defaults(persist_dir=path_to_storage)
        index = indices.load_index_from_storage(storage_context)
    return index

Initialize the chat engine using the query engine, LLM, embedding model, custom prompt, and chat history. Define a maximum number of chat iterations (MAX_CHAT). Create a simple for loop chat function called ‘chat()’ that takes the following steps:

def chat():
    llm, embed_model = get_models("<path to model>/mistral-7b-instruct-v0.2.Q4_K_M.gguf")
    index = get_index("./storage", "./data")

    query_engine = index.as_query_engine()
    ce = chat_engine.CondenseQuestionChatEngine.from_defaults(
        llm=llm,
        query_engine=query_engine,
        condense_question_prompt=custom_prompt,
        chat_history=custom_chat_history,
    )
    for _ in range(MAX_CHAT):
        q = input('\nUser: ')
        if q == 'exit':
            break
        streaming_response = ce.stream_chat(f"<s>;[INST]{q}[/INST]")
        for token in streaming_response.response_gen:
            print(token, end="")

User: Can you write the section on “Personalizing the assistant” for me

Create a custom_prompts~~.json file~~ containing your desired prompts ~~in JSON format~~. Update your init_index function to include custom prompts as an argument ~~and use them when creating your index~~. This will help personalize your local AI agent using LlamaIndex:

custom_prompt = core.PromptTemplate(
    """\
<s><INST>
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.
<Chat History>
{chat_history}
<Follow Up Message>
{question}
<Standalone question>
</INST></s>
"""
)

custom_chat_history = [
    types.ChatMessage(
        role=types.MessageRole.USER,
        content="Hello assistant, we are having a insightful discussion about Thijs Metsch today. Answer questions in a"
                " positive, helpful and empathetic way.",
    ),
    types.ChatMessage(role=types.MessageRole.ASSISTANT, content="Okay, sounds good."),
]

User: Finally, write the conclusions.

In conclusion, the future of workforce might involve a significant shift towards the integration of personalized AI language model-based agents, as suggested by Trevor Noah and Sam Altman during their podcast conversation. These agents, which could be considered an extension or avatar of an individual, have the potential to revolutionize the way we interact and work by providing context-specific responses and solutions. The process of building such an agent can be initiated using tools like llama_index, which allows for the creation of a simple yet effective chatbot. By combining the user’s expertise and the capabilities of the AI model, a more efficient and personalized assistant can be developed, offering a unique blend of human intelligence and machine learning. As the field of AI continues to evolve, it’s essential to consider the potential benefits and challenges, and to approach the technology with a thoughtful and informed perspective, as outlined in various viewpoints. The code for this blog post, which demonstrates the implementation of a local and relatively private AI language model-based agent, can be found here.

So did it work? It did! Once I indexed my CV I can ask the model that previously knew nothing about me sth:

User: Does Thijs know what an SLO is?
Thijs Metsch has expertise in defining APIs for SLO management […]

Postamble: AI is here to stay I think; in what form & detail we’ll see. LLM are not the solution to everything in the end. I would encourage you to take into consideration different viewpoints, like this one on “What kind of bubble is AI?“. There is more stuff to come; personally, I would love to use Rust instead of python for this kind of work. I’m sure that will come soon.

Some notes on the blog post – especially I wanted to note what all I had to tweak:

I did find that the Mistral 7B model work really well with a reasonable footprint.
I did some formatting like adding line breaks etc.
I did insert hyperlinks to the relevant bits & pieces.
The code to pull this off is surprisingly fast to write, the libraries are great; you’ll get sth reasonable good in ~100 LOC!
The results I got weren’t always that great, but a bit of prompt tuning would have fixed that.
It thought RAG stood for “Rapidly Agile Graphics” not “Retrieval Augmented Generation”.
I did change out the code snippets with the actually code used; it did add some import statements to the python code which were not necessary (e.g. import faiss).
It actually came up with good ideas like the addition of pip commands which were not given in the draft post.
It took a while to generate the first token – subsequent ones are obviously faster.

BTW It has been years, but finally I think I found a reason to add more RAM to my system. Up to now the amount I had was plenty – oh how times change…

Categories: Personal • Tags: Artificial Intelligence, LLM • Permalink for this article