V. Agents' features

Structured output

Simple JSON Mode

The simplest way to get JSON output is to use the json_output=True parameter on a task:


message = Task("Tell me 1 fact about Canada using the format {'countryName': '', 'fact': ''}", agent, json_output=True).solve()
                

However, this approach is "best effort". This means the agent will do its best to generate valid JSON, but there's no guarantee on the syntactic quality of the generated JSON as no grammar is enforced.
Also, always ask for JSON output in the prompt or else the LLM will have trouble generating anything.
Optionnaly, you can pass a struture for the LLM to follow. This way you can parse the output.


Structured Output with Pydantic

To get more reliable and typed JSON outputs, Yacana offers structured_output. This feature uses Pydantic to define a strict schema that the response must follow.
Let's write an example using a pydantic class:


from pydantic import BaseModel

class CountryFact(BaseModel):
    name: str
    fact: str

class Facts(BaseModel):
    countryFacts: list[CountryFact]
                

The above snippet represents the Fact class. This class has a member countryFacts which is a list of CountryFact. And this new class has 2 members a name (string) and an associated fact (string).
In JSON it could be represented like so:

[
    {
        "name": "France",
        "fact": "Has the eiffel tower"
    },
    {
        "name": "USA",
        "fact": "Has the manhattan bridge"
    }
]
                    

The benefits of using a class based approch instead of a JSON one is that parsing is way cleaner.
When parsing JSON, your IDE won't help you access correct member as it doesn't know the JSON format. This can lead in many programming mistakes.
Whereas using the Pydantic class approch ensures you access existing members and loop over items that can actually be looped uppon, etc.

Now, let's ask an LLM to fill this pydantic base class:

from pydantic import BaseModel

from yacana import Task, OllamaAgent

class CountryFact(BaseModel):
    name: str
    fact: str

class Facts(BaseModel):
    countryFacts: list[CountryFact]

agent = OllamaAgent("AI assistant", "llama3.1:8b", system_prompt="You are a helpful AI assistant")

message = Task("Tell me 3 facts about Canada.", agent, structured_output=Facts).solve()
# Prints the response as a pure JSON string
print(message.content)

# Typed access to data through the structured_output object
print("Name = ", message.structured_output.countryFacts[0].name)
print("Fact = ", message.structured_output.countryFacts[0].fact)
                

The benefits of this approach are numerous:

  • Automatic schema validation of the response
  • Typed access to data through Python classes
  • Better quality of generated JSON responses
  • IDE autocompletion support

The structured_output is particularly useful when you need to process responses programmatically and want to guarantee the data structure.


Streaming

Streaming allows you to get the output of an LLM token by token instead of waiting for the whole response to come back.
It's particularly useful when you want to display the response to the user in real-time or need to process the response incrementally.
To enable streaming, you can define a streaming callback that will receive the tokens as they are generated:


from yacana import Task, OllamaAgent, GenericMessage

def streaming(chunk: str):
    print(f"chunk = |{chunk}|")

agent = OllamaAgent("AI assistant", "llama3.1:8b", system_prompt="You are a helpful AI assistant")

message: GenericMessage = Task("Tell me 1 facts about France.", agent, streaming_callback=streaming).solve()
print("Full response = ", message.content)
                

Output:

INFO: [PROMPT][To: AI assistant]: Tell me 1 facts about France.
chunk = |Here|
chunk = |'s|
chunk = | one|
chunk = | fact|
chunk = |:

|
chunk = |The|
chunk = | E|
chunk = |iff|
chunk = |el|
chunk = | Tower|
chunk = | in|
chunk = | Paris|
chunk = |,|
...
Full response =  Here's one fact:
The Eiffel Tower in Paris, France was originally intended to be a temporary structure, but it has become an iconic symbol of the country and a popular tourist destination, standing at over 324 meters (1,063 feet) tall!
                

Using medias

You can give medias to Agents and make them interact with images, audios and more.
You can even mix tools and medias in the same task!
To use medias with Ollama you'll need to install a multi-modal model like llama3.2-vision or Llava.


ollama pull llama3.2-vision:11b
                    

You can use the OpenAiAgent with 'gpt-4o-mini' as its multi modal by default and supports images and sound. However, every medias is transformed into tokens and will count in your rate limit! The media is encoded to base64 before being sent.


To run the following snippets, cd into the root github repo, create the file there and run the code.

from yacana import Task, OllamaAgent, GenericMessage

vision_agent = OllamaAgent("AI assistant", "llama3.2-vision:11b", system_prompt="You are a helpful AI assistant")

Task("Describe this image", vision_agent, medias=["./tests/assets/burger.jpg"]).solve()
                    

Outputs:

INFO: [PROMPT][To: AI assistant]: Describe this image

INFO: [AI_RESPONSE][From: AI assistant]: This black and white photo showcases a close-up view of a hamburger. The burger is centered on the image, with its bun covered in sesame seeds and two patties visible beneath. A slice of cheese is positioned between the buns, while lettuce peeks out from underneath. A small amount of ketchup or mustard is visible at the bottom of the patty.
                        
 The background is blurred, suggesting that the burger was photographed on a table or countertop. The overall mood and atmosphere of this photo are casual and informal, as if it was taken by someone enjoying their meal in a relaxed setting.
                    

This model doesn't support multiple medias in the same request, but you can use Yacana with ChatGPT to do so.
Now let's use tools on medias ! The following snippet will analyse an image and send the dominant color to a tool. The tool will return the associated hexa code for the given color.

from yacana import Task, OllamaAgent, GenericMessage, Tool

# Defining a simple function to be used as tool. It translates a common color name to its hexa version
def color_name_to_hexa(color_name: str) -> str:
    color_map = {
        "red": "#FF0000",
        "green": "#00FF00",
        "blue": "#0000FF",
        "yellow": "#FFFF00",
        "black": "#000000",
        "white": "#FFFFFF"
    }
    return color_map.get(color_name.lower(), "#000000")

# Defining the tool itself, referencing the function
color_name_to_hexa_tool = Tool("ColorNameToHexa", "Converts a color name to its hexadecimal representation. For example red -> #FF0000", color_name_to_hexa)

vision_agent = OllamaAgent("AI assistant", "llama3.2-vision:11b", system_prompt="You are a helpful AI assistant")

Task("What color is the object ?", vision_agent, medias=["https://avatars.githubusercontent.com/u/12821004"], tools=[color_name_to_hexa_tool]).solve()
                    


Yacana supports HTTPS URLs in addition to local file system. In this example we are giving an image by its URL. Note, that even when using an URL, the media will still be converted to tokens before being sent.

Output:

INFO: [PROMPT][To: AI assistant]: I give you the following tool definition that you must use to fulfill a future task: color_name_to_hexa(color_name: str) -> str - Converts a color name to its hexadecimal representation. For example red -> #FF0000. Please acknowledge the given tool.

INFO: [AI_RESPONSE][From: AI assistant]: I have taken note of the provided tool:

`color_name_to_hexa(color_name: str) -> str`

This function takes a string representing a color name as input and returns a string representing the corresponding hexadecimal color code.

I'm ready to use this tool for any future task that requires converting color names to their hexadecimal representations. What's the next step?

INFO: [PROMPT][To: AI assistant]: To use the tool you MUST extract each parameter and use it as a JSON key like this: {"arg1": "<value1>", "arg2": "<value2>"}. You must respect arguments type. For instance, the tool `getWeather(city: str, lat: int, long: int)` would be structured like this {"city": "new-york", "lat": 10, "lon": 20}. In our case, the tool call you must use must look like that: {'color_name': 'arg 0'}

INFO: [AI_RESPONSE][From: AI assistant]: I understand the requirement now. To use the `color_name_to_hexa(color_name: str) -> str` tool, I will extract each parameter and create a JSON object with the extracted parameters as keys.

In this case, since there is only one parameter, `color_name`, which is of type `str`, I will structure the call like this:

{'color_name': 'red'}

Please let me know when to proceed!

INFO: [PROMPT][To: AI assistant]: You have a task to solve. Use the tool at your disposition to solve the task by outputting as JSON the correct arguments. In return you will get an answer from the tool. The task is:
What color is the object ?

INFO: [AI_RESPONSE][From: AI assistant]: { "color_name": "blue" }

INFO: [TOOL_RESPONSE][ColorNameToHexa]: #0000FF
                

The answer to the question was indeed blue. And the tool returned the hexadecimal code for blue!

Thinking LLMs (ie: Deepseek)

Thinking LLMs are a new breed of LLMs that can reason and think step by step to solve complex problems.
They work in a similar way to Yacana's tool calling feature as the LLM makes it own reasoning loop before giving the final answer.
The most famous opensource thinking LLM is Deepseek.

However, the result of a Task(...) with a thinking LLM returns the complete reasoning process, not just the final answer.
This means that the message.content will have the tokens <think></think> followed by the final response.
The content in between these tokens can disrupt Yacana so you should provide the framework's Agent class the correct delimiters to use.


from yacana import OllamaAgent, Task, Tool

def get_weather(city: str) -> str:
    # Faking the weather API response
    return "Foggy"

def send_weather(city: str, weather: str) -> None:
    print(f"Sending weather for {city}: {weather}")

# Creating a Deepseek agent and specifying the thinking tokens for this LLM
agent = OllamaAgent("Ai assistant", "deepseek-r1:latest", thinking_tokens=("<think>", "</think>"))

# Defining 2 tools
get_weather_tool = Tool("get_weather", "Returns the weather for a given city.", get_weather)
send_weather_tool = Tool("send_weather", "Sends the weather for a given city.", send_weather)

Task(f"Send the current weather in L.A to the weather service. Use the tools in the correct order.", agent, tools=[get_weather_tool, send_weather_tool]).solve()

print("\n--history--\n")
agent.history.pretty_print()
                

In the above snippet we used the thinking_tokens=("start_token", "end_token") to tell Yacana what is Deepseek's self reasoning process and what's the actual answer.
This way it will not disrupt Yacana's features anymore and you can use this LLM as any normal LLM.



MCP tool support

Yacana offers MCP tools discovery and calling. It can connect to HTTP streamable endpoints. Meaning that it can connect to remote MCP servers like https://mcp.deepwiki.com/mcp.
We do not and will not support local MCP servers using STDIO transport. STDIO servers are a bad practice and will slowly be deprecated in the future. If you have an STDIO server you want to use, you can make it HTTP streamable by using an MCP reverse proxy like mcp-proxy.

For now Yacana only support tools discovery and not other types of resources that can be exposed by the MCP server like "templates", etc.

Connecting to an MCP server


from yacana import Mcp

deepwiki = Mcp("https://mcp.deepwiki.com/mcp")
deepwiki.connect()
                

Output:

INFO: [MCP] Connecting to MCP server (https://mcp.deepwiki.com/mcp)...

INFO: [MCP] Connected to MCP server: DeepWiki v0.0.1

INFO: [MCP] Available tool: read_wiki_structure - Get a list of documentation topics for a GitHub repository

INFO: [MCP] Available tool: read_wiki_contents - View documentation about a GitHub repository

INFO: [MCP] Available tool: ask_question - Ask any question about a GitHub repository
                

As you can see, Yacana discovered 3 tools from the MCP server. You can now use them as any other tool in your Tasks.

Deepwiki is a website proposing an AI agent and an MCP server to ask questions about public GitHub repositories. It was made by the team behind Devin. And even though their original product was highly controversial, the Deepwiki project is quite nice.

If you need to remove a tool from the list you can do it like this: deepwiki.forget_tool("read_wiki_structure"). This way when you get the tools from the Mcp object it will not return this tool anymore.

Calling MCP tools

▶️ Like any other tool, you can use the MCP tools in a Task and choose the type of execution you want. It can either be the Yacana style (default) or the OpenAi style.
Using the method get_tools(<tool_type>) on the Mcp object will return a list of all the remote tools using the required execution type.
For example:

from yacana import Task, Tool, OllamaAgent, Mcp, ToolType

deepwiki = Mcp("https://mcp.deepwiki.com/mcp")
deepwiki.connect()

ollama_agent = OllamaAgent("Ai assistant", "llama3.1:8b")

Task("Asking question about repo: In the repo 'rememberSoftwares/yacana' how do you instanciate an ollama agent ?", ollama_agent, tools=deepwiki.get_tools_as(ToolType.YACANA)).solve()
                

A few important notes about the MCP tools:
  1. MCP tools are optional by default (and by design).
  2. You cannot mix tools with different execution types in the same Task.
  3. Ollama does not support defining tools as required when using OpenAi execution mode. In this mode, Ollama tools are always considered optional even is you set optional=False in the Tool constructor.
▶️ To call the MCP tools using the OpenAi style, you can use the get_tools_as(...) method with value ToolType.OPENAI
For example: Task("Asking question about repo: XXXXX ?", ollama_agent, tools=deepwiki.get_tools_as(ToolType.YACANA)).solve().

Mixing MCP and local tools

You can mix MCP tools with local tools in the same Task. This way you can use the best of both worlds.
For example, let's add an actualize_server tool to our previous Task and ask the LLM to use it:


from yacana import Task, Tool, OllamaAgent, Mcp, ToolType


def update_server() -> None:
    """Updates the server."""
    return None

update_server_tool = Tool("update_server", "Triggers a server update.", update_server)

deepwiki = Mcp("https://mcp.deepwiki.com/mcp")
deepwiki.connect()

ollama_agent = OllamaAgent("Ai assistant", "llama3.1:8b")

Task("Please update the server.", ollama_agent, tools=deepwiki.get_tools_as(ToolType.YACANA) + [update_server_tool]).solve()
                

The important part here is tools=deepwiki.get_tools_as(ToolType.YACANA) + [update_server_tool]. The tools= parameter takes a list of tools. Python can concatenate lists so because get_tools_as(...) returns a list of tools, we can concatenate it with our local tool list between [...].

Authentication with MCp servers

Because MCP servers can be protected by an authentication system, Yacana supports passing headers to the MCP server when connecting to it. Every call made to the server will send these headers.
Use this to pass a bearer token or any other authentication header required by the server.

                    deepwiki = Mcp("https://mcp.deepwiki.com/mcp", {"lookatthis": "header"})
                

MCP support is available as an Alpha feature. Please open an issue if you encounter any problem while using it.

Pagination