III. Managing Agents history

Introduction to history management

As shown in the previous examples, each agent maintains its own message history, which forms its memory. When a new request is sent to the LLM (e.g., via Ollama), the entire history is forwarded to the inference server. The LLM responds to the latest prompt, using the context provided by previous messages and, if available, the initial system prompt.

This is what an history looks like:

There are 4 types of messages:

First, there is the "System" prompt, if present, always goes first:

1: The optional "System" prompt allows to set the initial instructions for the LLM.

Then it's only an alternation between these two:

2: The "User" prompt coming from the Task you write.
3: The "Assistant" message which is the answer from the LLM.

You may also have noticed a "Tool" message:

4: The "Tool" message is used by the OpenAi standard to use tools. This is abstract by Yacana so you don't have to deal with this fine mess.

Unfortunately, sending the whole history to the LLM for each Task to solve has some disadvantages that can not be overturned:

The longer the history, the longer the LLM takes to analyze it and return an answer.
Each LLM comes with a maximum token window size. This is the maximum number of words an LLM can analyze in one run, therefore it's maximum memory.
One token roughly represents one word or 3/4 of a word. More information on token count per word here or here.

To counteract those negative effects it is recommended you clean the history when possible. For instance, you could use the forget=True parameter in the Task() class so the prompt and the LLM response do not get saved to the history. You'll see there are many ways to preserve the history from useless noise.

The Agent class comes with a .history property of type History. It exposes methods to manipulate and view the history. For instance, you can use the .pretty_print() method to print the history on the standard output using the classic color scheme:


agent1.history.pretty_print()

Or you can use .get_messages_as_dict() which is great for parsing!


messages = agent1.history.get_messages_as_dict()
messages[0].content # First message
messages[1].content # Second message

Let's discover all the different ways to interact with an agent's history!

Printing History

The history class exposes a .pretty_print() method to print the history on the standard output. It uses the classic color scheme to make it easier to read.
Or you can use .get_messages_as_dict() which is great for parsing!

Look at this simple example:


from yacana import LoggerManager, OllamaAgent, Task

# Let's deactivate automatic logging so that only OUR prints are shown
LoggerManager.set_log_level(None)

agent1 = OllamaAgent("Cook", "llama3.1:8b", system_prompt="You are a pastry chef")

Task("Generate 5 pastry names followed by the associated estimated calorie.", agent1).solve()
Task("Rank the pastries from the lowest calorie count to the largest.", agent1).solve()

print("############## Agent 1 history pretty print ##################")
agent1.history.pretty_print()
print("############## END ##################")

print("")

print("-------------- Agent 1 history dictionnary --------------")
print(str(agent1.history.get_messages_as_dict()))
print("-------------- END --------------")


############## Agent 1 history pretty print ##################

[user]:
Generate 5 pastry names followed by the associated estimated calorie.

[assistant]:
Here are 5 pastry names with their associated estimated calorie counts:

1. **Cinnamon Swirl Brioche** (250-300 calories) - A sweet, buttery brioche filled with a gooey cinnamon swirl.
2. **Lemon Lavender Mille-Feuille** (400-450 calories) - Layers of flaky pastry, lemon curd, and lavender cream create a bright and airy dessert.
3. **Chocolate Soufflé Cake** (500-550 calories) - A rich, decadent chocolate cake that rises like a cloud from the oven, served with a scoop of vanilla ice cream.
4. **Raspberry Almond Croissant** (200-250 calories) - Flaky, buttery croissants filled with sweet and tart raspberry jam and topped with sliced almonds.
5. **Pistachio Rosewater Macarons** (150-200 calories) - Delicate, chewy macarons flavored with pistachio and rosewater, sandwiched together with a light and creamy filling.

Note: The estimated calorie counts are approximate and may vary based on specific ingredients and portion sizes used.

[user]:
Rank the pastries from the lowest calorie count to the largest.

[assistant]:
Based on the estimated calorie counts I provided earlier, here are the pastries ranked from lowest to highest:

1. **Pistachio Rosewater Macarons** (150-200 calories)
2. **Raspberry Almond Croissant** (200-250 calories)
3. **Cinnamon Swirl Brioche** (250-300 calories)
4. **Lemon Lavender Mille-Feuille** (400-450 calories)
5. **Chocolate Soufflé Cake** (500-550 calories)

Let me know if you have any other questions!

############## END ##################

-------------- Agent 1 history dictionary --------------
[{'role': 'system', 'content': 'You are a pastry chef'}, {'role': 'user', 'content': 'Generate 5 pastry names followed by the associated estimated calorie.'}, {'role': 'assistant', 'content': 'Here are 5 pastry names with their associated estimated calorie counts:\n\n1. **Cinnamon Swirl Brioche** (250-300 calories) - A sweet, buttery brioche filled with a gooey cinnamon swirl.\n2. **Lemon Lavender Mille-Feuille** (400-450 calories) - Layers of flaky pastry, lemon curd, and lavender cream create a bright and airy dessert.\n3. **Chocolate Soufflé Cake** (500-550 calories) - A rich, decadent chocolate cake that rises like a cloud from the oven, served with a scoop of vanilla ice cream.\n4. **Raspberry Almond Croissant** (200-250 calories) - Flaky, buttery croissants filled with sweet and tart raspberry jam and topped with sliced almonds.\n5. **Pistachio Rosewater Macarons** (150-200 calories) - Delicate, chewy macarons flavored with pistachio and rosewater, sandwiched together with a light and creamy filling.\n\nNote: The estimated calorie counts are approximate and may vary based on specific ingredients and portion sizes used.'}, {'role': 'user', 'content': 'Rank the pastries from the lowest calorie count to the largest.'}, {'role': 'assistant', 'content': 'Based on the estimated calorie counts I provided earlier, here are the pastries ranked from lowest to highest:\n\n1. **Pistachio Rosewater Macarons** (150-200 calories)\n2. **Raspberry Almond Croissant** (200-250 calories)\n3. **Cinnamon Swirl Brioche** (250-300 calories)\n4. **Lemon Lavender Mille-Feuille** (400-450 calories)\n5. **Chocolate Soufflé Cake** (500-550 calories)\n\nLet me know if you have any other questions!'}]
-------------- END --------------

Output speaks for itself.

Creating and loading checkpoints

As mentioned earlier it's better to keep the History clean. Too many prompts and unrelated questions will lead to poorer results so if you have the opportunity to scratch some portion then you should.
Yacana allows you to make history snapshots and rollback to any of them. This is particularly useful when reaching the end of a flow branch and needing to roll back to start a new one.

It is as simple as this:


# Creating a checkpoint
checkpoint_id: str = agent1.history.create_check_point()

The checkpoint_id is merely a unique string identifier that you can use to load back a save. Like this:


# Go back in time to when the checkpoint was created
agent1.history.load_check_point(checkpoint_id)

Note that you can make a snapshot before rolling back to a previous save. This way you could go back… to the future. ^^
Are you okay Marty?

Let's take a concrete example. You have a pastry website that generates pastry recipes.
The flow will look like this:

Propose 5 pastry names ;
Create a checkpoint ;
The user chooses one of the pastries ;
We show the associated calories of the selected pastry ;
If the user is okay with it we end the program ;
If the user is not okay with the calorie count we go back to the checkpoint and propose to choose from the the list again ;
Repeat until satisfied ;
We'll show the final agent's History and make sure that it ONLY stored the selected pastry ;

With a bit of color, it would look like this:

pastry1B


from yacana import LoggerManager, OllamaAgent, Task

# Let's deactivate automatic logging so that only OUR prints are shown; Maybe reactivate (to "info") if you want to see what's happening behind the scenes.
LoggerManager.set_log_level(None)

agent1 = OllamaAgent("Cook", "llama3.1:8b", system_prompt="You are a pastry chef")

# Getting a list of pastries
pastries: str = Task("Generate 5 pastry names displayed as a list. ONLY output the names and nothing else.", agent1).solve().content
print(f"Welcome, you may order one of the following pastries\n{pastries}")

#Looping till the user is satisfied
while True:
    print("")

    # Creating our checkpoint to go back in time
    checkpoint_id: str = agent1.history.create_check_point()

    # Asking for one of the pastries from the list
    user_choice: str = input("Please choose one of the above pastries: ")

    # Printing associated calories for the selected pastry
    pastry_calorie_question: str = Task(f"The user said '{user_choice}'. Your task is to output a specific sentence and replace the <replace> tags with the correct values: 'You selected the <replace>selected pastry</replace>. The average calorie intake for this pastry is <replace>average associated calories for the selected pastry</replace>. Do you wish to continue ?", agent1).solve().content
    print(pastry_calorie_question)

    # Asking if the user wants to continue
    is_satisfied: str = input("Continue ? ")

    # Basic yes / no router
    router_answer: str = Task(f"The user said '{is_satisfied}'. Evaluate if the user was okay with its order. If he was, ONLY output 'yes', if not only output 'no'.", agent1).solve().content

    if "yes" in router_answer.lower():
        print("Thank you for your order.")
        # The user was satisfied with his choice. Exiting the loop...
        break
    else:
        # The user wants to choose another pastry. Let's go back in time by loading are previous checkpoint!
        agent1.history.load_check_point(checkpoint_id)
        #  Let's go back to the top of the loop
        continue

print("############## Agent 1 history pretty print ##################")
agent1.history.pretty_print()
print("############## END ##################")

▶️ Output:


Welcome, you may order one of the following pastries
1. Whipped Wonders
2. Creamy Confections
3. Flaky Fancies
4. Golden Galettes
5. Sugar Serenades

Please choose one of the above pastries: The Creamy one looks good
You selected the Creamy Confections. The average calorie intake for this pastry is 350-400 calories per serving. Do you wish to continue?
Continue ? no

Please choose one of the above pastries: Hummm. The golden one?
You selected the Golden Galettes. The average calorie intake for this pastry is approximately 250-300 calories per serving. Do you wish to continue?
Continue ? yes
Thank you for your order.

############## Agent 1 history pretty print ##################

[user]:
Generate 5 pastry names displayed as a list. ONLY output the names and nothing else.

[assistant]:
1. Whipped Wonders
2. Creamy Confections
3. Flaky Fancies
4. Golden Galettes
5. Sugar Serenades

[user]:
The user said 'Hummm. The golden one ?'. Your task is to output a specific sentence and replace the <replace> tags with the correct values: 'You selected the <replace>selected pastry</replace>. The average calorie intake for this pastry is <replace>average associated calories for the selected pastry</replace>. Do you wish to continue ?

[assistant]:
You selected the Golden Galettes. The average calorie intake for this pastry is approximately 250-300 calories per serving. Do you wish to continue?

[user]:
The user said 'yes'. Evaluate if the user was okay with the order. If he was, ONLY output 'yes', if not only output 'no'.

[assistant]:
yes

############## END ##################

As you can see in the above output, we went for "the creamy one" but when shown the calories, refused to continue… After that, we chose the "Golden Galettes" which was satisfying. Then the program ended with an output of the agent's history.
We can see in the agent's output that it only remembered us choosing the "Golden Galettes" but not the "Creamy Confections". This is because we loaded the last checkpoint which rolled us back to making our choice again.

Zero-prompt shot vs multi-prompt shot

When an LLM struggles to solve a complex task it may be time to give it a little help.

In large language models, the approach to prompting can significantly influence the model's performance.

Zero-shot prompting asks the model to complete a task without any prior examples, relying solely on its pre-existing knowledge. This can lead to varied results, especially in more complex tasks.
One-shot prompting improves accuracy by providing the model with a single example, offering some guidance on how to approach the task.
Few-shot prompting further enhances performance by supplying multiple examples, allowing the model to have a better understanding of the task's nuances and producing more reliable and accurate results.

Yacana provides you with a way to add new Messages to the History manually. The History class exposes an .add_message(...) method.
It takes an argument of type Message() with two parameters: a [MessageRole]() enum and the string message itself.

For example:


from yacana import OllamaAgent, Message, MessageRole

# Creating a basic agent with an empty history
agent1 = OllamaAgent("AI assistant", "llama3.1:8b")

# We create a fake prompt identified as coming from the user (Thx to `MessageRole.USER`)
user_message = Message(MessageRole.USER, "What's 2+2 ?")

# We create a fake answer identified as coming from the LLM (Thx to `MessageRole.ASSISTANT`)
fake_ai_response = Message(MessageRole.ASSISTANT, "The answer is 4")

# Let's add these two Messages to the Agent's History
agent1.history.add_message(user_message)
agent1.history.add_message(fake_ai_response)

# Print the content of the history
agent1.history.pretty_print()

Outputs:


[user]:
What's 2+2 ?

[assistant]:
The answer is 4

The Agent's History successfully contains the two messages we manually added.

The .add_message() always adds new messages at the end of the stack, however you can add messages wherever you want using other History methods.

⚠️ Try to keep the alternation of USER and ASSISTANT as this is how "instruct" LLMs have been trained.

▶️ Let's see a 0-shot example asking for a JSON output extracted from a given sentence:


from yacana import OllamaAgent, Task

agent1 = OllamaAgent("Ai assistant", "llama3.1:8b")

Task(f"Print the following sentence as JSON, extracting the names and rephrasing the actions: 'Marie is walking her dog. Ryan is watching them through the window. The dark sky is pouring down heavy raindrops.'", agent1).solve()

Outputs:


INFO: [PROMPT][To: Ai assistant]: Print the following sentence as JSON extracting the names and rephrasing the actions: 'Marie is walking her dog. Ryan is watching them through the window. The dark sky is pouring down heavy raindrops.'

INFO: [AI_RESPONSE][From: Ai assistant]: Here is the sentence rewritten in JSON format:
{
	"people": [
		{
			"name": "Marie",
			"action": "walking"
		},
		{
			"name": "Ryan",
			"action": "watching through the window"
		}
	],
	"weather": {
		"condition": "heavy raindrops",
		"sky": "dark sky"
	}
}
Let me know if you'd like me to help with anything else!

Not bad but there's noise. We would like to output the JSON and nothing else. No bedside manners. The Let me know if you'd like me to help with anything else! must go.
Let's introduce another optional Task() parameter: json_output=True. This relies on the inference server (Ollama, OpenAI, etc.) to output as JSON.

It is preferable to prompt the LLM to "output as JSON" in addition to this option.

⚠️ This is only best effort ! Meaning that it may fail at outputting correct JSON. If you need reliable JSON then go try structured output !

▶️ Replace the Task with this one:


Task(f"Print the following sentence as JSON extracting the names and rephrasing the actions: 'Marie is walking her dog. Ryan is watching them through the window. The dark sky is pouring down heavy raindrops.'", agent1, json_output=True).solve()

Outputs:


INFO: [PROMPT][To: Ai assistant]: Print the following sentence as JSON extracting the names and rephrasing the actions: 'Marie is walking her dog. Ryan is watching them through the window. The dark sky is pouring down heavy raindrops.'

INFO: [AI_RESPONSE][From: Ai assistant]: {"names": ["Marie", "Ryan"], "actions": {"Marie": "is walking", "Ryan": "is watching"}, "description": [{"location": "window", "activity": "watching"}, {"location": "outdoors", "activity": "pouring raindrops"}]}

Way better. No more noise.
However, we would prefer having an array of name and action even for the weather (the name would be sky and the action raining).

▶️ Let's give the LLM an example of what we expect by making it believe it already outputted it correctly once:


from yacana import OllamaAgent, Task, MessageRole, Message

agent1 = OllamaAgent("Ai assistant", "llama3.1:8b")

# Making a fake valid interaction
agent1.history.add_message(Message(MessageRole.USER, "Print the following sentence as json extracting the names and rephrasing the actions: 'John is reading a book on the porch while the cold wind blows through the trees.'"))
agent1.history.add_message(Message(MessageRole.ASSISTANT, '[{"name": "John", "action": "Reading a book.", "Cold wind": "Blowing through the trees."]'))

Task(f"Print the following sentence as json extracting the names and rephrasing the actions: 'Marie is walking her dog. Ryan is watching them through the window. The dark sky is pouring down heavy raindrops.'", agent1).solve()

Outputs:


INFO: [PROMPT][To: Ai assistant]: Print the following sentence as JSON extracting the names and rephrasing the actions: 'Marie is walking her dog. Ryan is watching them through the window. The dark sky is pouring down heavy raindrops.'

INFO: [AI_RESPONSE][From: Ai assistant]: [{"name": "Marie", "action": "Walking her dog."}, {"name": "Ryan", "action": "Watching Marie and her dog through the window."}, {"name": "The dark sky", "action": "Pouring down heavy raindrops."}]

This is perfect!
(❕ Model temperature may impact performance here. Consider using a lower value.)
You can add multiple fake interactions like this one to cover more advanced use cases and train the LLM on how to react when they happen. It would become multi-shot prompting.

Saving Agent state

Yacana provides a way to store an Agent state into a file and load it later. All the Agent's properties, model settings, etc are saved including the History.

Examples of use cases:

A chatbot for customers: When the customer leaves the session, the agent's state is saved and can be loaded back when the customer comes back to continue the conversation.
You are facing a crash at some point in a script and need to restart the whole inference steps to get to the desired history state.
Doing this cost a lot of money because tokens are not free. Fortunately, you can save agent state just before the crash.
Now, you can boot your script right before the crash point, load the agent's state and investigate the crash by replaying it for free all the times you need.

To save an Agent do the following:


from yacana import OllamaAgent, Task

agent1 = OllamaAgent("Ai assistant", "llama3.1:8b")

Task("What's 2+2 ?", agent1).solve()

# Exporting the agent1 current state to a file called agent1_save.json
agent1.export_to_file("./agent1_save.json")

If you look at the file agent1_save.json you'll see something like this:


{
    "name": "Ai assistant",
    "model_name": "llama3.1:8b",
    "system_prompt": null,
    "model_settings": {},
    "endpoint": "http://127.0.0.1:11434",
    "history": [
        {
            "role": "user",
            "content": "What's 2+2 ?"
        },
        {
            "role": "assistant",
            "content": "The answer to 2+2 is... (drumroll please)... 4!"
        }
    ]
}

Now let's load back this agent from the dead using .import_from_file()!
In another Python file add this code snippet:


from yacana import GenericAgent, Task

# You can use any of the Agent classes. The GenericAgent is merely the abstract parent class.
agent2: GenericAgent = GenericAgent.import_from_file("./agent1_save.json")

Task("Multiply by 2 the previous result", agent2).solve()

The .import_from_file(...) acts as a factory pattern, returning a new Agent instance.

▶️ Output:


INFO: [PROMPT]: Multiply by 2 the previous result

INFO: [AI_RESPONSE]: If we multiply 4 by 2, we get...

8!

As you can see when asked to multiply by 2 the previous result, it remembered agent1's result which was 4. Then it performed the multiplication and got us 8. 🎉

Example use case: A user ends its conversation with a chatbot. The agent saves its state to a file. When the user logs in later, the Agent is loaded with the history and the conversation can carry on where it was left of.

Managing Agent History using tags

The Task() class exposes a parameter to attach tags to messages. You can then use the tags to find specific messages inside the history to read or alter them.
Tags are a convenient way to keep track of things and change history state during runtime.

Adding a tag to a Task will automatically add it to all the messages generated by this Task.
To add tags to your prompt messages use the tags=[...] parameter like this:


from yacana import OllamaAgent, Task, GenericMessage

agent1 = OllamaAgent("Ai assistant", "llama3.1:8b")

first_response: GenericMessage = Task("What's 2+2 ?", agent1, tags=["first_task"]).solve()

This told Yacana to add the tag "first_task" to both the prompt and LLM output message generated by the task.

Note that you can also tag individual Message manually:


...

first_response.add_tags(["first_task_response"])

This will add the tag "first_task_response" to the LLM's response.

You can also do this all on one line because Task.solve() returns the message instance:


from yacana import OllamaAgent, Task

agent1 = OllamaAgent("Ai assistant", "llama3.1:8b")

# If you don't need the message instance you can do this:
Task("What's 2+2 ?", agent1, tags=["first_task"]).solve().add_tags(["first_task_response"])

Let's make a longer example to show how tagging works. Bellow is a script that will ask 3 numeric related questions to the LLM.
The script will then change the second question + answer to something completely different. Finally, the script will ask the LLM what question was not about numbers. Because we updated the second task, it should tell us that it's the second one.


from yacana import OllamaAgent, Task, GenericMessage

agent1 = OllamaAgent("Ai assistant", "llama3.1:8b")

# Creating 3 numeric related questions with one tag each
Task("What's 2+2 ?", agent1, tags=["first_task"]).solve()
Task("What's 20+20 ?", agent1, tags=["second_task"]).solve()
# For the last one, let's also tag the response
Task("What's 200+200 ?", agent1, tags=["third_task"]).solve().add_tags(["third_task_answer"])

# Let's print the first task's prompt and response
first_task_prompt: GenericMessage = agent1.history.get_messages_by_tags(["first_task"])[0]
first_task_response: GenericMessage = agent1.history.get_messages_by_tags(["first_task"])[1]
print("First task prompt: ", first_task_prompt.content)
print("First task response: ", first_task_response.content)

# Let's print the second task's prompt and response
second_task_prompt: GenericMessage = agent1.history.get_messages_by_tags(["second_task"])[0]
second_task_response: GenericMessage = agent1.history.get_messages_by_tags(["second_task"])[1]
print("Second task prompt: ", second_task_prompt.content)
print("Second task response: ", second_task_response.content)

# Let's print the third task's prompt and response using the tags and not only the index
third_task_prompt: GenericMessage = agent1.history.get_messages_by_tags(["third_task"])[0]
third_task_response: GenericMessage = agent1.history.get_messages_by_tags(["third_task_answer"])[0]  # Here we use the tag added manually on line 9
print("Third task prompt: ", third_task_prompt.content)
print("Third task response: ", third_task_response.content)


# Let's change the second task's prompt
second_task_prompt.content = "Why is the sky blue ?"

# Let's change the second task's response
second_task_response.content = "Because of Rayleigh scattering."

# Now let's ask the LLM what question is not about numbers
Task("What question was not about numbers ?", agent1).solve()

# You'll see the answer during runtime with the logging system but let's print the whole history anyway
print("\n--Showing whole history--\n")
agent1.history.pretty_print()

Output (without logging) :


First task prompt:  What's 2+2 ?
First task response:  The answer is: 4!
Second task prompt:  What's 20+20 ?
Second task response:  That's a bigger one! The answer is: 40!
Third task prompt:  What's 200+200 ?
Third task response:  The answer is: 400!

--Showing whole history--

[user]:
What's 2+2 ?

[assistant]:
The answer is: 4!

[user]:
Why is the sky blue ?

[assistant]:
Because of Rayleigh scattering.

[user]:
What's 200+200 ?

[assistant]:
The answer is: 400!

[user]:
What question was not about numbers ?

[assistant]:
That would be "Why is the sky blue?" It's a classic example of an explanation of a phenomenon in science.

Builtin tags

There are a few builtin tags that are automatically added to some messages:

Tag name	Tag description
yacana_response	All responses from the LLM are tagged with this. When using enhanced tool calling there will be multiple messages tagged with this.
yacana_prompt	All prompts created with a Task are tagged with this.

More may be added in the future.

Let's build!

Let's make another example. This time we'll count how many messages are present between the first LLM's response and the third task prompt.
It's quite specific so be sure to understand the range we're selecting.
We'll just be counting, but you can think of anything, even deleting the messages!


from typing import List

from yacana import OllamaAgent, Task, GenericMessage

agent1 = OllamaAgent("Ai assistant", "llama3.1:latest")

start_tag = "start"
end_tag = "end"

Task("What's 2+2 ?", agent1).solve().add_tags([start_tag])
Task("What's 20+20 ?", agent1).solve()
Task("What's 200+200 ?", agent1, tags=[end_tag]).solve()

# Getting all messages from the history
messages: List[GenericMessage] = agent1.history.get_all_messages()

nb_messages = 0
start_counting = False
for message in messages:
    print("\n" + message.content + " ==> " + str(message.tags))
    if start_tag in message.tags:
        start_counting = True
    elif end_tag in message.tags:
        break
    elif start_counting is True:
        nb_messages += 1

print("Number of messages between the two selected messages: ", nb_messages)

Output (without logging):


What's 2+2 ? ==> ['yacana_prompt']

The answer is: 4! ==> ['yacana_response', 'start']

What's 20+20 ? ==> ['yacana_prompt']

Easy one!

The answer is: 40! ==> ['yacana_response']

What's 200+200 ? ==> ['end', 'yacana_prompt']
Number of messages between the two selected messages:  2

III. Managing Agents history

Introduction to history management

Printing History

Creating and loading checkpoints

Zero-prompt shot vs multi-prompt shot

Saving Agent state

Managing Agent History using tags

Builtin tags

Let's build!

Pagination

Related Youtube video