
Youtube video for this section is still under creation. Please be patient ^^
Yacana was initially designed to work only with Ollama. However, many projects require mixing both private LLM providers and local open source models to achieve a greater production grade product. Private LLM providers like OpenAI or Anthropic are great for production due to their quality but cost a lot of money. On the other hand, local open source models are way cheaper to run but their quality is not always there. Hence having the ability to mix both is a great asset.
The force of Yacana is to provide you with the same programming API whether you use Ollama or an OpenAI-compatible endpoint.
To be fair there is one important difference between the OllamaAgent and OpenAiAgent : the way tools are called.
For Ollama, tools are called using an "enhanced tool calling" system where Yacana will iterate over the tools and call the appropriate ones with its own
internal method. This system was made specifically for local LLMs to achieve higher call success rates.
For OpenAI, tools are called following the OpenAI standard. So, when using ChatGPT you won't have any troubles calling tools as chatGPT is tailored for this.
However, when using other inference servers like VLLM you will have a lower success rate at calling tools. This is a little bit unfortunate and will be addressed in
another update. Our aim is to make both Agents capable of using both tool calling systems.
Stay tuned for future updates!
Using ChatGpt requires using the OpenAiAgent
. It has the same constructor and functionnalities
than the OllamaAgent
. It just has one more parameter: the api_token
required to authenticate to OpenAI servers.
To connect Yacana to ChatGPT, simply use the OpenAiAgent like this:
from yacana import OpenAiAgent, Task, GenericMessage
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", api_token="sk-proj-XXXXXXXXXXXXXXX")
# Use the agent to solve a task
message: GenericMessage = Task("What is the capital of France?", openai_agent).solve()
print(message.content)
from yacana import OpenAiAgent, Task, ToolError, Tool
def get_temperature(city: str):
if type(city) is not str:
raise ToolError("City name must be a string.")
return f"The temperature in {city} is 18 degrees celcius."
get_temperature_tool = Tool("get_temperature", "Takes a city name and returns the temperature in this city.", get_temperature)
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", api_token="sk-proj-XXXXXXXXXXXXXXXXX")
Task("What's the temperature in Paris?", openai_agent, tools=[get_temperature_tool]).solve()
INFO: [PROMPT][To: AI assistant]: What's the temperature in Paris?
INFO: [AI_RESPONSE][From: AI assistant]: [{"id": "call_aJvyCX0wamCaORoNfQcyE5O2", "type": "function", "function": {"name": "get_temperature", "arguments": "{\"city\": \"Paris\"}"}}]
INFO: [TOOL_RESPONSE][get_temperature]: The temperature in Paris is 18 degrees celcius.
INFO: [PROMPT][To: AI assistant]: Retrying with original task and tools answer: 'What's the temperature in Paris?'
INFO: [AI_RESPONSE][From: AI assistant]: The temperature in Paris is currently 18 degrees Celsius.
INFO: [PROMPT][To: AI assistant]: Retrying with original task and tools answer: 'What's the temperature in Paris?'
OpenAiModelSettings
class to configure the
LLM to show the logprobs
.
Logprobs are the probability for each next token. It ranges from 0 to minus infinity. So the closer to 0 the most probable will
the next token be. Using the model config, we'll ask for the best 3 candidates for each token.
.raw_llm_json
member to access
the raw JSON output of the LLM. In there we'll find all the information we need.
import json
from yacana import OpenAiAgent, Task, ToolError, Tool, OpenAiModelSettings, HistorySlot
# Defining parameters for our agent
model_settings = OpenAiModelSettings(temperature=0, logprobs=True, top_logprobs=3)
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", model_settings=model_settings, api_token="sk-proj-XXXXXXXXXXXXXXX")
Task("Tell me 1 facts about Canada.", openai_agent).solve()
# Getting the last slot added to the History
slot: HistorySlot = openai_agent.history.get_last_slot()
# Showing the output of the LLM to get the logprobs
print("Raw JSON output from LLM :")
print(json.dumps(json.loads(slot.raw_llm_json), indent=2))
INFO: [PROMPT][To: AI assistant]: Tell me 1 facts about Canada.
INFO: [AI_RESPONSE][From: AI assistant]: Canada is the second-largest country in the world by total area, covering approximately 9.98 million square kilometers (3.85 million square miles).
Raw JSON output from LLM :
{
"id": "chatcmpl-BTrS4UC2GyJSNTf9AqOYDrklPd3Sx",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": {
"content": [
{
"token": "Canada",
"bytes": [
67,
97,
110,
97,
100,
97
],
"logprob": -0.011068690568208694,
"top_logprobs": [
{
"token": "Canada",
"bytes": [
67,
97,
110,
97,
100,
97
],
"logprob": -0.011068690568208694
},
{
"token": "One",
"bytes": [
79,
110,
101
],
"logprob": -4.511068820953369
},
{
"token": " Canada",
"bytes": [
32,
67,
97,
110,
97,
100,
97
],
"logprob": -11.386068344116211
}
]
},
{
"token": " is",
"bytes": [
32,
105,
115
],
"logprob": -0.08894743025302887,
"top_logprobs": [
{
"token": " is",
"bytes": [
32,
105,
115
],
"logprob": -0.08894743025302887
},
{
"token": " has",
"bytes": [
32,
104,
97,
115
],
"logprob": -2.4639475345611572
},
{
"token": " possesses",
"bytes": [
32,
112,
111,
115,
115,
101,
115,
115,
101,
115
],
"logprob": -12.588947296142578
}
]
},
...
logprobs
data is not available inside the Message object itself. So, we must use the .raw_llm_json
member from the surrounding slot to access the raw JSON LLM output and get a look at the logprobs.
Using media with ChatGPT is similar to the OllamaAgent, with the key difference being that ChatGPT supports multiple media files in a single request.
from yacana import Task, OpenAiAgent
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", api_token="sk-proj-XXXXXXXXXXXXXXXXX")
Task("Describe this image", openai_agent, medias=["./tests/assets/burger.jpg", "./tests/assets/flower.png"]).solve()
ChatGPT has a feature allowing it to return multiple version of the same message. It's used to provide you with alternative responses. For instance,
some could be more formal, some could be more creative, etc.
Yacana offers a way to get these alternative responses.
Let's say we want 3 alternative responses to the prompt "What is the main invention of Nicolas Tesla?" then select
the third one as the main message of the slot instead of the first one (default).
from typing import List
from yacana import Task, OpenAiAgent, GenericMessage, OpenAiModelSettings, HistorySlot
# Requesting 3 alternative responses using "n" parameter
model_settings = OpenAiModelSettings(n=3, temperature=1.0)
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", model_settings=model_settings, api_token="sk-proj-XXXXXXXXXXXXXXX")
Task("What is the main invention of Nicolas Tesla (short response) ?", openai_agent).solve()
message: GenericMessage = openai_agent.history.get_last_message()
print(f"\nCurrent main message is: {message.content}\n")
# Getting the last slot from the history
slot: HistorySlot = openai_agent.history.get_last_slot()
# Getting the messages from the slot
messages: List[GenericMessage] = slot.messages
# Printing the messages with a counter for readability
for i, message in enumerate(messages, start=1):
print(f"\n{i}): {message.content}")
# Setting the main message index to 2 (3rd message)
slot.set_main_message_index(2)
# Getting the main message again
message: GenericMessage = openai_agent.history.get_last_message()
print(f"\nCurrent main message is: {message.content}\n")
INFO: [PROMPT][To: AI assistant]: What is the main invention of Nicolas Tesla (short response) ?
INFO: [AI_RESPONSE][From: AI assistant]: Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which became the standard for electrical power distribution. He also made significant contributions to wireless communication, induction motors, and numerous other innovations in electrical engineering.
Current main message is: Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which became the standard for electrical power distribution. He also made significant contributions to wireless communication, induction motors, and numerous other innovations in electrical engineering.
1): Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which became the standard for electrical power distribution. He also made significant contributions to wireless communication, induction motors, and numerous other innovations in electrical engineering.
2): Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which is the basis for modern electrical power distribution. Additionally, he made significant contributions to numerous innovations, including the Tesla coil, radio technology, and wireless transmission of energy.
3): One of Nikola Tesla's main inventions is the alternating current (AC) electrical system, which includes the AC motor and transformer. This system revolutionized the way electricity is generated and transmitted, enabling long-distance power distribution and laying the foundation for the modern electrical grid.
Current main message is: One of Nikola Tesla's main inventions is the alternating current (AC) electrical system, which includes the AC motor and transformer. This system revolutionized the way electricity is generated and transmitted, enabling long-distance power distribution and laying the foundation for the modern electrical grid.
The HistorySlot has been discused above. But to put it simply, it wraps each message in the history.
This means that the history is not a list of Message but a list of slots (makes sense?).
Each slot has one or more message. In our case, we have 3. The first one is the main message and is the one presented to the LLM during inference
and the other 2 are alternate messages.
The slot allows to switch what message should be considered as main message using .set_main_message_index(n)
.
In the output, the first look at Current main message shows that the first message is selected (ends with "engineering").
After setting the index at 2 (starting from 0) the second round of Current main message show the third message is selected (ends with "grid").
To help you go through the installation process on WSL you can follow this tutorial: Installing VLLM on WSL.
First, let's install the VLLM.
You can find the detailed installation steps on the VLLM documentation.
We recommand you use conda to install VLLM. Note that conda does not allow you to use it in an enterprise environment without paying for a license. You can use 'uv' instead.
Once conda is installed you can simply pip install vllm.
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm
Now, let's start the inference server with a model. If it's not already present it will be downloaded.
We'll use the Llama-3.2-1B-Instruct model.
vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192 --guided-decoding-backend outlines --enable-auto-tool-choice --tool-call-parser llama3_json
For the inference to start you will need a HuggingFace Account to validate the Facebook's license agreement.
Read the VLLM tutorial if you need help with that step.
About the vllm command line parameters:
Once the inference server is running, you can use the following code to create an OpenAI-compatible agent:
from yacana import OpenAiAgent, GenericMessage, Task
# Note the endpoint parameter is set to the VLLM server address
vllm_agent = OpenAiAgent("AI assistant", "meta-llama/Llama-3.1-8B-Instruct", system_prompt="You are a helpful AI assistant", endpoint="http://127.0.0.1:8000/v1", api_token="leave blank", runtime_config={"extra_body": {'guided_decoding_backend': 'outlines'}})
# Use the agent to solve a task
message: GenericMessage = Task("What is the capital of France?", vllm_agent).solve()
print(message.content)
Doing simple tool calling:
from yacana import OpenAiAgent, Tool, Task
vllm_agent = OpenAiAgent("AI assistant", "meta-llama/Llama-3.1-8B-Instruct", system_prompt="You are a helpful AI assistant", endpoint="http://127.0.0.1:8000/v1", api_token="leave blank", runtime_config={"extra_body": {'guided_decoding_backend': 'outlines'}})
# Defining a fake weather tool
def get_weather(city: str) -> str:
return f"The weather in {city} is sunny with a high of 25°C."
# Defining the tool
get_weather_tool = Tool("Get_weather", "Calls a weather API and returns the current weather in the given city.", get_weather)
# Adding runtime configuration to the underlying OpenAi library so it works with VLLM
extra_body = {
'guided_decoding_backend': 'outlines',
'tool_choice': 'auto',
'enable_auto_tool_choice': True,
'tool_call_parser': 'auto'
}
Task("What's the weather in paris ?", vllm_agent, tools=[get_weather_tool], runtime_config={"extra_body": extra_body}).solve()
Note how we used the runtime_config
parameter to specify the guided decoding backend. You can use this parameter to specify other parameters as well.
This is a direct access to the underlying library.
For OpenAI we use the OpenAI python client. You can set any parameter supported by this library.
These settings can either be set at the Agent level or at the Task level. For more information
please refer to the Accessing the underlying client library section.
Using structured outputs is the same as with the OllamaAgent. This is the power of Yacana. It provides you with the same API for structured outputs on
local LLMs as on OpenAI. However, you still need to provide the outline
parameter. In this example we set it at the Agent level because it will be
usefull for every future task requiring grammar enforcement.
from pydantic import BaseModel
from yacana import OpenAiAgent, GenericMessage, Task
class CountryFact(BaseModel):
name: str
fact: str
class Facts(BaseModel):
countryFacts: list[CountryFact]
vllm_agent = OpenAiAgent("AI assistant", "meta-llama/Llama-3.1-8B-Instruct", system_prompt="You are a helpful AI assistant", endpoint="http://127.0.0.1:8000/v1", api_token="leave blank", runtime_config={"extra_body": {'guided_decoding_backend': 'outlines'}})
message: GenericMessage = Task("Tell me 3 facts about Canada.", vllm_agent, structured_output=Facts).solve()
# Print the content of the message as a JSON string
print(message.content)
# Print the structured output as a real class instance
print("Name = ", message.structured_output.countryFacts[0].name)
print("Fact = ", message.structured_output.countryFacts[0].fact)
All other features, like medias, streaming, etc. are also available with the OpenAiAgent and can be used in the exact same way. Please refer to the main documentation for more information.