
YouTube video for this section is still under creation. Please be patient ^^
Yacana was initially designed to work only with Ollama. However, many projects require mixing both private LLM providers and local open source models to achieve a greater production grade product. Private LLM providers like OpenAI or Anthropic are great for production due to their quality but cost a lot of money. On the other hand, local open source models are way cheaper to run but their quality is not always there. Hence, having the ability to mix both is a great asset.
The force of Yacana is to provide you with the same programming API whether you use Ollama or an
OpenAI-compatible endpoint.
To be fair there is one important difference between the OllamaAgent and OpenAiAgent : the way tools
are called.
For Ollama, tools are called using an "enhanced tool calling" system where Yacana will iterate over
the tools and call the appropriate ones with its own
internal method. This system was made specifically for local LLMs to achieve higher call success
rates.
For OpenAI, tools are called following the OpenAI standard. So, when using ChatGPT you won't have
any troubles calling tools as chatGPT is tailored for this.
However, when using other inference servers like VLLM you will have a lower success rate at calling
tools. This is a little bit unfortunate and will be addressed in
another update. Our aim is to make both Agents capable of using both tool calling systems.
Stay tuned for future updates!
Using ChatGpt requires using the OpenAiAgent
. It has the same constructor and
functionalities
than the OllamaAgent
. It just has one more parameter: the api_token
required to authenticate to OpenAI servers.
To connect Yacana to ChatGPT, simply use the OpenAiAgent like this:
from yacana import OpenAiAgent, Task, GenericMessage
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", api_token="sk-proj-XXXXXXXXXXXXXXX")
# Use the agent to solve a task
message: GenericMessage = Task("What is the capital of France?", openai_agent).solve()
print(message.content)
OpenAiModelSettings
class to configure the
LLM to show the logprobs
.
Logprobs are the probability for each next token. It ranges from 0 to minus infinity. So the closer to 0
the most probable will
the next token be. Using the model config, we'll ask for the best 3 candidates for each token.
.raw_llm_json
member to access
the raw JSON output of the LLM. In there we'll find all the information we need.
import json
from yacana import OpenAiAgent, Task, ToolError, Tool, OpenAiModelSettings, HistorySlot
# Defining parameters for our agent
model_settings = OpenAiModelSettings(temperature=0, logprobs=True, top_logprobs=3)
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", model_settings=model_settings, api_token="sk-proj-XXXXXXXXXXXXXXX")
Task("Tell me 1 facts about Canada.", openai_agent).solve()
# Getting the last slot added to the History
slot: HistorySlot = openai_agent.history.get_last_slot()
# Showing the output of the LLM to get the logprobs
print("Raw JSON output from LLM :")
print(json.dumps(json.loads(slot.raw_llm_json), indent=2))
INFO: [PROMPT][To: AI assistant]: Tell me 1 facts about Canada.
INFO: [AI_RESPONSE][From: AI assistant]: Canada is the second-largest country in the world by total area, covering approximately 9.98 million square kilometers (3.85 million square miles).
Raw JSON output from LLM :
{
"id": "chatcmpl-BTrS4UC2GyJSNTf9AqOYDrklPd3Sx",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": {
"content": [
{
"token": "Canada",
"bytes": [
67,
97,
110,
97,
100,
97
],
"logprob": -0.011068690568208694,
"top_logprobs": [
{
"token": "Canada",
"bytes": [
67,
97,
110,
97,
100,
97
],
"logprob": -0.011068690568208694
},
{
"token": "One",
"bytes": [
79,
110,
101
],
"logprob": -4.511068820953369
},
{
"token": " Canada",
"bytes": [
32,
67,
97,
110,
97,
100,
97
],
"logprob": -11.386068344116211
}
]
},
{
"token": " is",
"bytes": [
32,
105,
115
],
"logprob": -0.08894743025302887,
"top_logprobs": [
{
"token": " is",
"bytes": [
32,
105,
115
],
"logprob": -0.08894743025302887
},
{
"token": " has",
"bytes": [
32,
104,
97,
115
],
"logprob": -2.4639475345611572
},
{
"token": " possesses",
"bytes": [
32,
112,
111,
115,
115,
101,
115,
115,
101,
115
],
"logprob": -12.588947296142578
}
]
},
...
logprobs
data is not available inside the Message object itself. So, we must use the .raw_llm_json
member from the surrounding slot to access the raw JSON LLM output and get a look at the logprobs.
from yacana import Task, OpenAiAgent
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", api_token="sk-proj-XXXXXXXXXXXXXXXXX")
Task("Describe this image", openai_agent, medias=["./tests/assets/burger.jpg", "./tests/assets/flower.png"]).solve()
ChatGPT has a feature allowing it to return multiple version of the same message. It's used to
provide you with alternative responses. For instance,
some could be more formal, some could be more creative, etc.
Yacana offers a way to get these alternative responses.
Let's say we want 3 alternative responses to the prompt "What is the main invention of Nicolas
Tesla?" then select
the third one as the main message of the slot instead of the first one (default).
from typing import List
from yacana import Task, OpenAiAgent, GenericMessage, OpenAiModelSettings, HistorySlot
# Requesting 3 alternative responses using "n" parameter
model_settings = OpenAiModelSettings(n=3, temperature=1.0)
openai_agent = OpenAiAgent("AI assistant", "gpt-4o-mini", system_prompt="You are a helpful AI assistant", model_settings=model_settings, api_token="sk-proj-XXXXXXXXXXXXXXX")
Task("What is the main invention of Nicolas Tesla (short response) ?", openai_agent).solve()
message: GenericMessage = openai_agent.history.get_last_message()
print(f"\nCurrent main message is: {message.content}\n")
# Getting the last slot from the history
slot: HistorySlot = openai_agent.history.get_last_slot()
# Getting the messages from the slot
messages: List[GenericMessage] = slot.messages
# Printing the messages with a counter for readability
for i, message in enumerate(messages, start=1):
print(f"\n{i}): {message.content}")
# Setting the main message index to 2 (3rd message)
slot.set_main_message_index(2)
# Getting the main message again
message: GenericMessage = openai_agent.history.get_last_message()
print(f"\nCurrent main message is: {message.content}\n")
INFO: [PROMPT][To: AI assistant]: What is the main invention of Nicolas Tesla (short response) ?
INFO: [AI_RESPONSE][From: AI assistant]: Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which became the standard for electrical power distribution. He also made significant contributions to wireless communication, induction motors, and numerous other innovations in electrical engineering.
Current main message is: Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which became the standard for electrical power distribution. He also made significant contributions to wireless communication, induction motors, and numerous other innovations in electrical engineering.
1): Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which became the standard for electrical power distribution. He also made significant contributions to wireless communication, induction motors, and numerous other innovations in electrical engineering.
2): Nicolas Tesla is best known for his development of the alternating current (AC) electrical system, which is the basis for modern electrical power distribution. Additionally, he made significant contributions to numerous innovations, including the Tesla coil, radio technology, and wireless transmission of energy.
3): One of Nikola Tesla's main inventions is the alternating current (AC) electrical system, which includes the AC motor and transformer. This system revolutionized the way electricity is generated and transmitted, enabling long-distance power distribution and laying the foundation for the modern electrical grid.
Current main message is: One of Nikola Tesla's main inventions is the alternating current (AC) electrical system, which includes the AC motor and transformer. This system revolutionized the way electricity is generated and transmitted, enabling long-distance power distribution and laying the foundation for the modern electrical grid.
The HistorySlot has been disused above. But to put
it simply, it wraps each message in the history.
This means that the history is not a list of Message but
a list of slots (makes sense?).
Each slot has one or more message. In our case, we have 3. The first one is the main message and is
the one presented to the LLM during inference
and the other 2 are alternate messages.
The slot allows to switch what message should be considered as main message using .set_main_message_index(n)
.
In the output, the first look at Current main message shows that the first message is
selected (ends with "engineering").
After setting the index at 2 (starting from 0) the second round of Current main message
shows the third message is selected (ends with "grid").
To help you go through the installation process on WSL you can follow this tutorial: Installing VLLM on WSL.
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm
Now, let's start the inference server with a model. If it's not already present it will be
downloaded.
We'll use the Llama-3.2-1B-Instruct
model.
vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192 --guided-decoding-backend outlines --enable-auto-tool-choice --tool-call-parser llama3_json
For the inference to start you will need a HuggingFace Account to validate the Facebook's license
agreement.
Read the VLLM tutorial if you need help with that step.
Once the inference server is running, you can use the following code to create an OpenAI-compatible agent:
from yacana import OpenAiAgent, GenericMessage, Task
# Note the endpoint parameter is set to the VLLM server address
vllm_agent = OpenAiAgent("AI assistant", "meta-llama/Llama-3.1-8B-Instruct", system_prompt="You are a helpful AI assistant", endpoint="http://127.0.0.1:8000/v1", api_token="leave blank", runtime_config={"extra_body": {'guided_decoding_backend': 'outlines'}})
# Use the agent to solve a task
message: GenericMessage = Task("What is the capital of France?", vllm_agent).solve()
print(message.content)
Doing simple tool calling:
from yacana import OpenAiAgent, Tool, Task
vllm_agent = OpenAiAgent("AI assistant", "meta-llama/Llama-3.1-8B-Instruct", system_prompt="You are a helpful AI assistant", endpoint="http://127.0.0.1:8000/v1", api_token="leave blank", runtime_config={"extra_body": {'guided_decoding_backend': 'outlines'}})
# Defining a fake weather tool
def get_weather(city: str) -> str:
return f"The weather in {city} is sunny with a high of 25°C."
# Defining the tool
get_weather_tool = Tool("Get_weather", "Calls a weather API and returns the current weather in the given city.", get_weather)
# Adding runtime configuration to the underlying OpenAi library so it works with VLLM
extra_body = {
'guided_decoding_backend': 'outlines',
'tool_choice': 'auto',
'enable_auto_tool_choice': True,
'tool_call_parser': 'auto'
}
Task("What's the weather in paris ?", vllm_agent, tools=[get_weather_tool], runtime_config={"extra_body": extra_body}).solve()
Note how we used the runtime_config
parameter to specify the guided decoding backend.
You can use this parameter to specify other parameters as well.
This is direct access to the underlying library.
For OpenAI, we use the OpenAI python client.
You can set any parameter supported by this library.
These settings can either be set at the Agent level or at the Task level. For more information
please refer to the Accessing
the underlying client library section.
Using structured outputs is the same as with the OllamaAgent. This is the power of Yacana. It
provides you with the same API for structured outputs on
local LLMs as on OpenAI. However, you still need to provide the outline
parameter. In
this example we set it at the Agent level because it will be
useful for every future task requiring grammar enforcement.
from pydantic import BaseModel
from yacana import OpenAiAgent, GenericMessage, Task
class CountryFact(BaseModel):
name: str
fact: str
class Facts(BaseModel):
countryFacts: list[CountryFact]
vllm_agent = OpenAiAgent("AI assistant", "meta-llama/Llama-3.1-8B-Instruct", system_prompt="You are a helpful AI assistant", endpoint="http://127.0.0.1:8000/v1", api_token="leave blank", runtime_config={"extra_body": {'guided_decoding_backend': 'outlines'}})
message: GenericMessage = Task("Tell me 3 facts about Canada.", vllm_agent, structured_output=Facts).solve()
# Print the content of the message as a JSON string
print(message.content)
# Print the structured output as a real class instance
print("Name = ", message.structured_output.countryFacts[0].name)
print("Fact = ", message.structured_output.countryFacts[0].fact)
All other features, like medias, streaming, etc. are also available with the OpenAiAgent and can be used in the exact same way. Please refer to the main documentation for more information.