Using Local Phi-3 Models in AutoGen with Strathweb Phi Engine

Β· 1041 words Β· 5 minutes to read

I recently announced Strathweb Phi Engine, a cross-platform library/toolset for conveniently running Phi-3 (almost) anywhere. Today I would like to show how to integrate a local Phi-3 model, orchestrated by Strathweb Phi Engine, into an agentic workflow built with AutoGen.

AutGen Background πŸ”—

AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. Using AutoGen, developers can also flexibly define agent interactions, behaviors and construct sophisticated workflows around them.

The original paper introducing AutoGen can be found here, and is a great starting point for understanding the framework and the benefits of using agentic workflows.

Initially, AutoGen was written in (surprise!) Python, but has since been officially ported to C# as well. This is what we will be using in this post, since, of course, Strathweb Phi Engine provides a C# API as well.

Phi-3 in AutoGen πŸ”—

AutoGen ships with built-in integration for various model providers, such as OpenAI, Gemini, Anthropic or Mistral, allowing for easy agent integration into those model deployments. Additionally, it has integration for general purpose model orchestrators such as Ollama or LMStudio, which can be used to run the models locally and then integrate them into the AutoGen workflow via a local HTTP server.

This is also a quick and easy way to integrate a Phi-3 model into AutoGen - download it, for example, into LM Studio, and expose over local HTTP endpoint. At the same time, it’s not a convenient approach to building an application that you would want to distribute - after all, you would not want to have this additional dependency in production, or even worse, require your users to install and run a third party orchestrator just to be able to use your application.

This is where Strathweb Phi Engine comes in. It allows you to run Phi-3 models locally, without any additional dependencies, with the model running in-process. And specifically for AutoGen, there is now a new helper NuGet package, Strathweb.Phi.Engine.AutoGen, that provides the AutoGen IAgent and IStreamingAgent implementations, allowing you to easily integrate a local Phi-3 model into an AutoGen workflow.

Using Strathweb Phi Engine with AutoGen πŸ”—

The project repo contains a set of examples, ported from the official AutoGen examples, that demonstrate how to use a truly local, in-process Phi-3 with AutoGen via Strathweb Phi Engine.

The most basic example would look as follows. First bootstrap the local Phi-3 model

var cacheDir = Path.Combine(Directory.GetCurrentDirectory(), ".cache");
var modelBuilder = new PhiEngineBuilder();
var model = modelBuilder.Build(cacheDir);

Then create the LocalPhiAgent and use it in the AutoGen workflow

var assistantAgent = new LocalPhiAgent("assistant", model, "You convert what user said to all uppercase.")
    .RegisterPrintMessage();

var reply = await assistantAgent.SendAsync("hello world");

// TextMessage from assistant
// --------------------
// HELLO WORLD
// --------------------

LocalPhiAgent is non-streaming, so the model’s output is buffered and only returned once the model has finished processing the input. If you need streaming, you can use the LocalPhiStreamingAgent instead. Note that you need to set the streaming handler on the PhiEngineBuilder first, and then pass it to the LocalPhiStreamingAgent.

The below example is the same as before, except now the output is streamed as it is being processed by the model.

var cacheDir = Path.Combine(Directory.GetCurrentDirectory(), ".cache");
var modelBuilder = new PhiEngineBuilder();
modelBuilder.WithEventHandler(new BoxedPhiEventHandler(handler));
var model = modelBuilder.Build(cacheDir);

var assistantAgent = new LocalStreamingPhiAgent("assistant", model, "You convert what user said to all uppercase.", handler)
    .RegisterPrintMessage();

var reply = await assistantAgent.SendAsync("hello world");

// TextMessage from assistant
// --------------------
// HELLO WORLD
// --------------------

Of course the whole point of using AutoGen is to build more complex workflows, so the examples in the project repo also demonstrate how to build more sophisticated interactions between agents, and how to orchestrate them in a more complex workflow. One such example (ported from official AutoGen examples) is the interaction between student and a teacher, where the teacher is asking math questions and the student is answering them.

var cacheDir = Path.Combine(Directory.GetCurrentDirectory(), ".cache");
var modelBuilder = new PhiEngineBuilder();
modelBuilder.WithEventHandler(new BoxedPhiEventHandler(handler));
var model = modelBuilder.Build(cacheDir);

// teacher agent - asks questions and checks answers
var teacher = new LocalStreamingPhiAgent("teacher", model,
        @"You are a teacher that asks a pre-school math question for student. Ask a question but do not provide the answer. As soon as student provides the answer, check the answer.
If the answer is correct, praise the student and stop the conversation by saying[COMPLETE].
If the answer is wrong, you ask student to fix it. Do not ask a new question.",handler)
    .RegisterMiddleware(async (msgs, option, agent, _) =>
    {
        var reply = await agent.GenerateReplyAsync(msgs, option);
        if (reply.GetContent()?.ToLower().Contains("complete") is true)
        {
            return new TextMessage(Role.Assistant, GroupChatExtension.TERMINATE, from: reply.From);
        }
        return reply;
    })
    .RegisterPrintMessage();

// student agent - answers the math questions
var student = new LocalStreamingPhiAgent("student", model,
        "You are a student that answers questions from teacher", handler)
    .RegisterPrintMessage();

// start the conversation
var conversation = await student.InitiateChatAsync(
            receiver: teacher,
            message: "Hey teacher, please create a math question for me.",
            maxRound: 10);

Notice that the PhiEngine model instance that we use, is stateless, so we can use the same one for both agents, which helps to keep the memory footprint lower. It is of course also possible to run one agent as local Phi-3 agent, and another one as a remote one, for example using OpenAI Agent integration.

Conclusion πŸ”—

Strathweb Phi Engine provides a convenient, low ceremony, pretty much plug-and-play way to run Phi-3 models locally, without any 3rd party software or additional dependencies, and integrate them into AutoGen workflows. The surfaced APIs are fully managed, meaning no unsafe code needs to be written.

The new Strathweb.Phi.Engine.AutoGen package provides the necessary IAgent and IStreamingAgent implementations, allowing you to easily integrate a local Phi-3 model into an AutoGen workflow.

The current limitations for AutoGen are the same as the Strahtweb Phi Engine has, namely that at the moment only GGUF models are supported (safetensors are coming soon). Also, Phi-3 models have not been officially trained with functions/tool support, so this is currently not supported yet, although there are some approaches to achieve that, so that is on the roadmap.

The Strathweb.Phi.Engine.AutoGen package is not available on NuGet yet, but it can be easily built from the source using the instructions in the repo, or it can be downloaded from the build artifacts.

About


Hi! I'm Filip W., a software architect from ZΓΌrich πŸ‡¨πŸ‡­. I like Toronto Maple Leafs πŸ‡¨πŸ‡¦, Rancid and quantum computing. Oh, and I love the Lowlands 🏴󠁧󠁒󠁳󠁣󠁴󠁿.

You can find me on Github, on Mastodon and on Bluesky.

My Introduction to Quantum Computing with Q# and QDK book
Microsoft MVP