If you have tried to use the OpenAI o-series reasoning models, such as o1 or o3, with PromptFlow recently, you certainly ran into a nasty surprise. While PromptFlow supports a wide range of models and providers, the o-series models are not among them. This is of course quite a shame, especially if you’d like to benchmark or evaluate your flows against those models.
In this short post, we will look at a workaround.
The error π
When you try to use the o-series models in PromptFlow - regardless whether you try this locally, of from Azure AI Foundry, you will get an error message similar to this:
pf.flow.node failed with UserErrorException: Exception: OpenAI API hits BadRequestError: Error code: 400 - {’error’: {‘message’: “Unsupported parameter: ‘max_tokens’ is not supported with this model. Use ‘max_completion_tokens’ instead.”, ’type’: ‘invalid_request_error’, ‘param’: ‘max_tokens’, ‘code’: ‘unsupported_parameter’}} [Error reference: https://platform.openai.com/docs/guides/error-codes/api-errors]
This is because the o-series models do not support the max_tokens parameter, which is used by PromptFlow to limit the number of tokens in the response. Instead, they use max_completion_tokens, which is not supported by PromptFlow.
There are also additional extra parameters specific to the o-series models, such as reasoning_effort, which are not supported by PromptFlow either. Another difference from the classic GPT-series API, is that the o-series does not support the traditional system role, but instead uses developer role. This is important to note, as it will affect the way you structure your prompts.
At the moment there are Github issues open for this, such as this one and this one, but, so far, they have not been addressed.
The workaround π
The simplest workaround is to simply ditch the built-in LLM tool and replace it with a custom Python one. You are then free to interact with the LLM API directly, and you can use the o-series models without any issues. The trade-off is that you need to do a little more orchestration by hand.
The example is shown below. Consider the following simple 1-step flow:
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
environment:
python_requirements_txt: requirements.txt
inputs:
chat_history:
type: list
is_chat_history: true
default: []
question:
type: string
is_chat_input: true
default: Two trains, 20 miles apart, approach each other. A fly starts at the front of one train and flies towards the other. Upon reaching it, it turns around and flies toward the other train. It continues like that until the train collide. The fly travels at 15 kmph, and each train travels at 10 kmph. How far does the fly travel before being squashed?
outputs:
response:
type: string
reference: ${Fetch_LLM_Response.output}
is_chat_output: true
nodes:
- name: Fetch_LLM_Response
type: python
source:
type: code
path: fetch_llm_response.py
inputs:
deployment_name: o3-mini
connection: open_ai_connection
question: ${inputs.question}
The flow allows the user to provide a question (with an interesting default, suitable for a reasoning model, already present for testing purposes). The question then flows to the Fetch_LLM_Response node, which is a Python node that will call the Azure OpenAI or OpenAI API directly, using the fetch_llm_response function.
This flow here assumes that you have an open_ai_connection connection set up in your PromptFlow environment, and that you have a fetch_llm_response.py file in the same directory as your flow file. The fetch_llm_response.py file would be as follows:
from openai import AzureOpenAI
from typing import Union
from promptflow.core import tool
from promptflow.connections import CustomConnection, AzureOpenAIConnection, OpenAIConnection
def get_client(connection: Union[CustomConnection, AzureOpenAIConnection, OpenAIConnection]):
conn = dict(connection)
api_key = conn.get("api_key")
if api_key.startswith("sk-"):
from openai import OpenAI as Client
else:
from openai import AzureOpenAI as Client
connection_dict = dict(connection)
conn = dict(
api_key=api_key,
)
conn.update(
azure_endpoint=connection_dict.get("api_base"),
api_version="2025-01-01-preview",
)
return Client(**conn)
@tool
def fetch_llm_response(question: str, deployment_name: str, connection: Union[CustomConnection, AzureOpenAIConnection, OpenAIConnection] = None) -> str:
client = get_client(connection)
prompt = "You are a math expert. Answer the user's question in detail."
response = client.chat.completions.create(
messages=[
{"role": "developer", "content": prompt},
{"role": "user", "content": question}
],
max_completion_tokens=5000,
model=deployment_name,
reasoning_effort="low"
)
txt = response.choices[0].message.content
return txt
In this case we need to be using the latest 2.x Python openai package, as that is the one that supports the o-series models. We use some trickery to switch easily between Azure OpenAI and OpenAI API, depending on the connection type. This gives us the flexibility of testing and running this flow in both environments (in fact, we also include CustomConnection to allow for other reasoning model providers, as long as their API surface is compatible with OpenAI).
With such approach, we are of course in control of setting all the parameters, including max_completion_tokens and reasoning_effort, and correctly applying the developer role to the prompt.
This is a simple example, but you can of course extend this to include more complex flows, such as using the chat_history input to provide context for the model. Finally, you could also support streaming responses, if you wanted to.
Overall, it’s an acceptable workaround as it unlocks the o-series models for use in PromptFlow, and allows you to take advantage of their unique features. Hopefully the official first class support for the reasoning models will come soon!
The demo code can be found on GitHub.