V. Agents' features

Structured output

Simple JSON Mode

The simplest way to get JSON output is to use the json_output=True parameter on a task:


message = Task("Tell me 1 fact about Canada using the format {'countryName': '', 'fact': ''}", agent, json_output=True).solve()
                

However, this approach is "best effort". This means the agent will do its best to generate valid JSON, but there's no guarantee on the syntactic quality of the generated JSON as no grammar is enforced.
Also, always ask for JSON output in the prompt or else the LLM will have trouble generating anything.
Optionnaly, you can pass a struture for the LLM to follow. This way you can parse the output.


Structured Output with Pydantic

To get more reliable and typed JSON outputs, Yacana offers structured_output. This feature uses Pydantic to define a strict schema that the response must follow.
Let's write an example using a pydantic class:


from pydantic import BaseModel

class CountryFact(BaseModel):
    name: str
    fact: str

class Facts(BaseModel):
    countryFacts: list[CountryFact]
                

The above snippet represents the Fact class. This class has a member countryFacts which is a list of CountryFact. And this new class has 2 members a name (string) and an associated fact (string).
In JSON it could be represented like so:

[
    {
        "name": "France",
        "fact": "Has the eiffel tower"
    },
    {
        "name": "USA",
        "fact": "Has the manhattan bridge"
    }
]
                    

The benefits of using a class based approch instead of a JSON one is that parsing is way cleaner.
When parsing JSON, your IDE won't help you access correct member as it doesn't know the JSON format. This can lead in many programming mistakes.
Whereas using the Pydantic class approch ensures you access existing members and loop over items that can actually be looped uppon, etc.

Now, let's ask an LLM to fill this pydantic base class:

from pydantic import BaseModel

from yacana import Task, OllamaAgent

class CountryFact(BaseModel):
    name: str
    fact: str

class Facts(BaseModel):
    countryFacts: list[CountryFact]

agent = OllamaAgent("AI assistant", "llama3.1:8b", system_prompt="You are a helpful AI assistant")

message = Task("Tell me 3 facts about Canada.", agent, structured_output=Facts).solve()
# Prints the response as a pure JSON string
print(message.content)

# Typed access to data through the structured_output object
print("Name = ", message.structured_output.countryFacts[0].name)
print("Fact = ", message.structured_output.countryFacts[0].fact)
                

The benefits of this approach are numerous:

  • Automatic schema validation of the response
  • Typed access to data through Python classes
  • Better quality of generated JSON responses
  • IDE autocompletion support

The structured_output is particularly useful when you need to process responses programmatically and want to guarantee the data structure.


Streaming

Streaming allows you to get the output of an LLM token by token instead of waiting for the whole response to come back.
It's particularly useful when you want to display the response to the user in real-time or need to process the response incrementally.
To enable streaming, you can define a streaming callback that will receive the tokens as they are generated:


from yacana import Task, OllamaAgent, GenericMessage

def streaming(chunk: str):
    print(f"chunk = |{chunk}|")

agent = OllamaAgent("AI assistant", "llama3.1:8b", system_prompt="You are a helpful AI assistant")

message: GenericMessage = Task("Tell me 1 facts about France.", agent, streaming_callback=streaming).solve()
print("Full response = ", message.content)
                

Output:

INFO: [PROMPT][To: AI assistant]: Tell me 1 facts about France.
chunk = |Here|
chunk = |'s|
chunk = | one|
chunk = | fact|
chunk = |:

|
chunk = |The|
chunk = | E|
chunk = |iff|
chunk = |el|
chunk = | Tower|
chunk = | in|
chunk = | Paris|
chunk = |,|
...
Full response =  Here's one fact:
The Eiffel Tower in Paris, France was originally intended to be a temporary structure, but it has become an iconic symbol of the country and a popular tourist destination, standing at over 324 meters (1,063 feet) tall!
                

Using medias

You can give medias to Agents and make them interact with images, audios and more.
You can even mix tools and medias in the same task!
To use medias with Ollama you'll need to install a multi-modal model like llama3.2-vision or Llava.


ollama pull llama3.2-vision:11b
                    

You can use the OpenAiAgent with 'gpt-4o-mini' as its multi modal by default and supports images and sound. However, every medias is transformed into tokens and will count in your rate limit! The media is encoded to base64 before being sent.


To run the following snippets, cd into the root github repo, create the file there and run the code.

from yacana import Task, OllamaAgent, GenericMessage

vision_agent = OllamaAgent("AI assistant", "llama3.2-vision:11b", system_prompt="You are a helpful AI assistant")

Task("Describe this image", vision_agent, medias=["./tests/assets/burger.jpg"]).solve()
                    

Outputs:

INFO: [PROMPT][To: AI assistant]: Describe this image

INFO: [AI_RESPONSE][From: AI assistant]: This black and white photo showcases a close-up view of a hamburger. The burger is centered on the image, with its bun covered in sesame seeds and two patties visible beneath. A slice of cheese is positioned between the buns, while lettuce peeks out from underneath. A small amount of ketchup or mustard is visible at the bottom of the patty.
                        
 The background is blurred, suggesting that the burger was photographed on a table or countertop. The overall mood and atmosphere of this photo are casual and informal, as if it was taken by someone enjoying their meal in a relaxed setting.
                    

This model doesn't support multiple medias in the same request, but you can use Yacana with ChatGPT to do so.
Now let's use tools on medias ! The following snippet will analyse an image and send the dominant color to a tool. The tool will return the associated hexa code for the given color.

from yacana import Task, OllamaAgent, GenericMessage, Tool

# Defining a simple function to be used as tool. It translates a common color name to its hexa version
def color_name_to_hexa(color_name: str) -> str:
    color_map = {
        "red": "#FF0000",
        "green": "#00FF00",
        "blue": "#0000FF",
        "yellow": "#FFFF00",
        "black": "#000000",
        "white": "#FFFFFF"
    }
    return color_map.get(color_name.lower(), "#000000")

# Defining the tool itself, referencing the function
color_name_to_hexa_tool = Tool("ColorNameToHexa", "Converts a color name to its hexadecimal representation. For example red -> #FF0000", color_name_to_hexa)

vision_agent = OllamaAgent("AI assistant", "llama3.2-vision:11b", system_prompt="You are a helpful AI assistant")

Task("What color is the object ?", vision_agent, medias=["https://avatars.githubusercontent.com/u/12821004"], tools=[color_name_to_hexa_tool]).solve()
                    


Yacana supports HTTPS URLs in addition to local file system. In this example we are giving an image by its URL. Note, that even when using an URL, the media will still be converted to tokens before being sent.

Output:

INFO: [PROMPT][To: AI assistant]: I give you the following tool definition that you must use to fulfill a future task: color_name_to_hexa(color_name: str) -> str - Converts a color name to its hexadecimal representation. For example red -> #FF0000. Please acknowledge the given tool.

INFO: [AI_RESPONSE][From: AI assistant]: I have taken note of the provided tool:

`color_name_to_hexa(color_name: str) -> str`

This function takes a string representing a color name as input and returns a string representing the corresponding hexadecimal color code.

I'm ready to use this tool for any future task that requires converting color names to their hexadecimal representations. What's the next step?

INFO: [PROMPT][To: AI assistant]: To use the tool you MUST extract each parameter and use it as a JSON key like this: {"arg1": "", "arg2": ""}. You must respect arguments type. For instance, the tool `getWeather(city: str, lat: int, long: int)` would be structured like this {"city": "new-york", "lat": 10, "lon": 20}. In our case, the tool call you must use must look like that: {'color_name': 'arg 0'}

INFO: [AI_RESPONSE][From: AI assistant]: I understand the requirement now. To use the `color_name_to_hexa(color_name: str) -> str` tool, I will extract each parameter and create a JSON object with the extracted parameters as keys.

In this case, since there is only one parameter, `color_name`, which is of type `str`, I will structure the call like this:

{'color_name': 'red'}

Please let me know when to proceed!

INFO: [PROMPT][To: AI assistant]: You have a task to solve. Use the tool at your disposition to solve the task by outputting as JSON the correct arguments. In return you will get an answer from the tool. The task is:
What color is the object ?

INFO: [AI_RESPONSE][From: AI assistant]: { "color_name": "blue" }

INFO: [TOOL_RESPONSE][ColorNameToHexa]: #0000FF
                

The answer to the question was indeed blue. And the tool returned the hexadecimal code for blue!

Pagination