I’ve recently been working on setting up a bunch of Agent Framework samples, which would showcase the cooperation between cloud agents (backed by LLMs in the cloud) and local agents (running on your own machine). Since I primarily work on a Mac, the natural choice for me was to use MLX as the local model runner, which required a bit of bootstrapping - and felt quite tedious. So, the natural next step was to create a library that would make it easy to integrate MLX models into Agent Framework applications, since there wasn’t one available yet.
Today, I’m excited to announce the release of the MLX Integration Library for Agent Framework! This library simplifies the process of integrating MLX models into your Agent Framework applications, allowing you to leverage local Mac AI capabilities seamlessly alongside cloud-based agents.
Getting started π
The library is available on PyPI, so you can install it using pip:
pip install agent-framework-mlx
The features are:
- run local models using mlx-lm - any model in the MLX format (local or from Hugging Face), can now be easily integrated into Agent Framework applications
- streaming support - full support for both buffered and streamed responses
- configurable generation - full-control over generation parameters like temperature, top-p, repetition penalty, and more
- message preprocessing - hook into the pipeline to modify messages before they are converted to prompts.
The most basic usage is to simply create an MLXChatClient instance and use it as a chat model. You may also want to configure generation parameters using the MLXGenerationConfig class. Here’s a quick example:
from agent_framework_mlx import MLXChatClient, MLXGenerationConfig
config = MLXGenerationConfig(
temp=0.7,
max_tokens=200,
verbose=False
)
client = MLXChatClient(
model_path="mlx-community/Phi-4-mini-instruct-4bit",
generation_config=config
)
messages = [
ChatMessage(role=Role.SYSTEM, text="You are a helpful assistant."),
ChatMessage(role=Role.USER, text="Explain quantum computing to a 5 year old in one sentence.")
]
response = await client.get_response(messages=messages, chat_options=ChatOptions())
print(f"π€ Assistant: {response.text}")
You can also take advantage of streaming generation:
async for update in client.get_streaming_response(messages=messages, chat_options=ChatOptions()):
print(update.text, end="", flush=True)
print("\n")
Integration with Agent Framework π
Of course just using the MLXChatClient directly is not very exciting. The real power comes when you integrate it into an Agent Framework workflow. Since MLXChatClient is a subclass of BaseChatClient, you can use it wherever a chat model is expected, such as when creating an agent.
Below is an example of a workflow where a local MLX-powered agent is the first point of contact. If the local agent is unsure about the answer, it falls back to a cloud-based agent using Azure AI. This is a concept that I discussed in a previous post - except now with the MLX integration library, it’s much easier to set up.
local_client = MLXChatClient(
model_path="mlx-community/Phi-4-mini-instruct-4bit",
generation_config=config
)
async with (
AzureCliCredential() as credential,
AzureAIAgentClient(async_credential=credential).create_agent(
name="Cloud_LLM",
instructions="You are a fallback expert. The previous assistant was unsure. Provide a complete answer.",
) as cloud_agent,
):
local_agent = ChatAgent(
name="Local_SLM",
instructions="You are a helpful assistant.",
chat_client=local_client
)
builder = WorkflowBuilder()
builder.set_start_executor(local_agent)
builder.add_edge(
source=local_agent,
target=cloud_agent,
condition=should_fallback_to_cloud
)
workflow = builder.build()
# interact with the workflow...
Final thoughts π
The code is of course available on GitHub, and as always licensed under MIT. I hope this library makes it easier for developers to harness the power of local models with MLX in their Agent Framework applications!
Currently the library only supports Python, but if there is a demand for the .NET variant, we can definitely make that happen as well.


