Local AI inference has become increasingly important for developers seeking to build robust, privacy-preserving applications. In this deep dive, I’ll show you how to leverage Strathweb Phi Engine multi-platform library to run Microsoft’s Phi-family models directly in your .NET applications, exploring both basic integration patterns and advanced features that make Phi inference more accessible than ever.
Getting Started with Phi Inference in .NET π
The major value propositions of Strathweb Phi Engine are:
- its plug-and-play approach to Phi running models locally - it can silently download them for you from HuggingFace, cache for later reuse, and load on demand
- ability to run the models in-process, without any additional installations, dependencies, deamons or third-party orchestrators
- its strongly typed API that allows you to interact with the models in a type-safe manner, without having to deal with raw JSON payloads, raw tokens or complicated streaming interfaces
At its core, Strathweb Phi Engine provides a straightforward way to run Phi-3 models in your applications. Let’s look at a basic example:
using Strathweb.Phi.Engine;
bool isNonQuantizedMode = args.Contains("--non-quantized");
var inferenceOptionsBuilder = new InferenceOptionsBuilder();
inferenceOptionsBuilder.WithTemperature(0.9);
inferenceOptionsBuilder.WithTokenCount(100);
var inferenceOptions = inferenceOptionsBuilder.Build();
var cacheDir = Path.Combine(Directory.GetCurrentDirectory(), ".cache");
var modelBuilder = new PhiEngineBuilder();
PhiModelProvider modelProvider = isNonQuantizedMode ?
new PhiModelProvider.HuggingFace("microsoft/Phi-3-mini-4k-instruct", "main") :
new PhiModelProvider.HuggingFaceGguf("microsoft/Phi-3-mini-4k-instruct-gguf", "Phi-3-mini-4k-instruct-q4.gguf", "main");
modelBuilder.WithEventHandler(new BoxedPhiEventHandler(new ModelEventsHandler()));
modelBuilder.WithModelProvider(modelProvider);
var model = modelBuilder.BuildStateful(cacheDir, "You are a hockey poet");
var result = model.RunInference("Write a haiku about ice hockey", inferenceOptions);
Console.WriteLine($"{Environment.NewLine}Tokens Generated: {result.tokenCount}{Environment.NewLine}Tokens per second: {result.tokensPerSecond}{Environment.NewLine}Duration: {result.duration}s");
class ModelEventsHandler : PhiEventHandler
{
public void OnInferenceEnded() {}
public void OnInferenceStarted() {}
public void OnInferenceToken(string token) =>
Console.Write(token);
public void OnModelLoaded() =>
Console.WriteLine("Model loaded!");
}
This sample allows you to run inference locally, with full control over model parameters and execution flow. The engine handles model downloading, caching, and efficient inference out of the box.
Additionally, both GGUF and Safe Tensors formats are support, and you can switch between them by simply changing the model provider. You can also load the pre-downloaded model from a local directory by using the PhiModelProvider.FileSystemGguf and PhiModelProvider.FileSystem providers.
Strathweb Phi Engine supports both stateful and stateless models. In the stateful case (BuildStateful()), the engine takes care of managing the conversation history, allowing you to send only individual messages to the model. This is akin to the Assistants API of OpenAI. In the stateless case (Build()), you have to send the entire conversation history on each request (which is similar to the standard Chat Completions API of many LLMs).
Embracing .NET’s AI Evolution π
Building on the recent integration of Strathweb Phi Engine with AutoGen that I discussed previously, I’m excited to share another cool development in making local Phi models more accessible within the .NET ecosystem.
Microsoft recently unveiled Microsoft.Extensions.AI, introducing a set of unified abstractions for AI services in .NET. Much like how Microsoft.Extensions.Logging revolutionized logging standardization, these abstractions aim to create a common language for interacting with AI services across the ecosystem.
Today, I’m pleased to announce that Strathweb Phi Engine now fully supports the IChatClient interface from Microsoft.Extensions.AI.Abstractions. This integration is available through a new NuGet package, Strathweb.Phi.Engine.Microsoft.Extensions.AI. While the package isn’t yet published to Nuget.org, you can access it through the project’s build artifacts. This compatibility layer enables interachangability between local Phi models orchestrated by Strathweb Phi Engine, and the broader spectrum of AI services in the .NET ecosystem, marking another step forward in our journey to make local AI inference more accessible and standardized.
Here’s how you can use Strathweb Phi Engine with the new IChatClient interface:
using Strathweb.Phi.Engine;
using Strathweb.Phi.Engine.Microsoft.Extensions.AI;
using ChatMessage = Microsoft.Extensions.AI.ChatMessage;
using ChatRole = Microsoft.Extensions.AI.ChatRole;
var cacheDir = Path.Combine(Directory.GetCurrentDirectory(), ".cache");
var handler = new StreamingEventHandler();
var modelBuilder = new PhiEngineBuilder();
modelBuilder.WithEventHandler(new BoxedPhiEventHandler(handler));
var model = modelBuilder.Build(cacheDir);
// Create an IChatClient instance
var chatClient = model.AsChatClient("Local Phi-3 Demo", handler,
systemInstruction: "You convert what user said to all uppercase.");
// Send a message using the standard interface
var message = new ChatMessage(ChatRole.User, "hello world");
var response = await chatClient.CompleteAsync([message]);
Console.WriteLine(response);
This prints HELLO WORLD to the console.
The beauty of this approach lies in its flexibility. As highlighted in Microsoft’s announcement, the IChatClient interface allows you to write code that works consistently across different AI providers. This means you can easily switch between local Phi models and cloud services like Azure OpenAI or Ollama without changing your application logic:
IChatClient client =
environment.IsDevelopment() ?
model.AsChatClientAsChatClient("Local Phi-3 Demo", handler,
systemInstruction: "You convert what user said to all uppercase.") :
new AzureAIInferenceChatClient(...);
One of the key features of IChatClient is streaming support. This allows you to receive tokens in real-time as they’re generated. This is of course supported by Strathweb Phi Engine, and you can leverage it accordingly:
await foreach (var update in chatClient.CompleteStreamingAsync([
new ChatMessage(ChatRole.System, "you are an ice hockey poet"),
new ChatMessage(ChatRole.User, "write a haiku")
]))
{
Console.Write(update);
}
This prints the generated tokens as they’re produced by the model.
The integration with Microsoft.Extensions.AI brings several technical advantages:
- Middleware support: leverage the growing ecosystem of Microsoft.Extensions.AI middleware for capabilities like caching, logging, and telemetry
- Standardized testing: write tests against the IChatClient interface rather than specific implementations
- Hybrid deployments: easily switch between local and cloud-based inference based on your requirements
Summary π
Strathweb Phi Engine is a powerful tool for running Phi models in .NET applications. With the recent integration with Microsoft.Extensions.AI, you can now seamlessly integrate local Phi models with the broader .NET AI ecosystem. This opens up new possibilities for building robust, privacy-preserving applications that leverage the power of AI without compromising on user data privacy.
The library is not yet available on Nuget, but building from source should be absolutely not a problem. You can also download the Nuget packages from the build artifacts on the project’s GitHub page (the Nuget packages are all built after merges to the main branch). I encourage you to try out the library and reach out to me with any feedback or questions (or just open a Github issue).