Running Phi Inference in .NET Applications with Strathweb Phi Engine

Local AI inference has become increasingly important for developers seeking to build robust, privacy-preserving applications. In this deep dive, I’ll show you how to leverage Strathweb Phi Engine multi-platform library to run Microsoft’s Phi-family models directly in your .NET applications, exploring both basic integration patterns and advanced features that make Phi inference more accessible than ever.

Decorating a Quantum Christmas Tree with Q# and Qiskit

For a few years in a row now, around this time of the year, I have been writing a festive Q# quantum computing post. This year I would like to keep the tradition going and explore another fun topic .

Ever wondered what would happen if we let quantum mechanics decorate a ๐ŸŽ„ Christmas tree ? Let’s explore a quantum program - in both my favorite quantum programming language, Q#, as well as in Qiskit - that makes quantum effects visible through festive decorations.

Generating OpenQASM from Q# code

In the summer of 2024, I announced the Q# Bridge library, which allows you to run Q# simulations from many popular high-level languages - C#, Swift, Python and Kotlin. Today, I would like to write about a brand new feature in the library, an ability to generate OpenQASM 2.0 code from Q# source.

This is a feature that Q# toolchain does not natively supports, and it adds to the value proposition of Q# Bridge - acting as a literal bridge between Q# and other ecosystems (traditional languages or, in this case, quantum).

Simplifying the AI workflow: Access different types of model deployments with Azure AI Inference

In this post, we will explore the flexibility behind Azure AI Inference, a new library from Azure, which allows us to run inference against a wide range of AI model deployments - both in Azure and, as we will see in this notebook, in other places as well.

It is available for Python and for .NET - in this post, we will focus on the Python version.

Strathweb Phi Engine - now with Safe Tensors support

This summer, I announced the Strathweb Phi Engine โ€” a cross-platform library for running Phi inference anywhere. Up until now, the library only supported models in the quantized GGUF format. Today, I’m excited to share that the library now also supports the Safe Tensor model format.

This enhancement significantly expands the scope of use cases and interoperability for the Strathweb Phi Engine. With Safe Tensor support, you can now load and execute models in a format that is not only performant but also prioritizes security and memory safety. Notably, all the Phi models published by Microsoft use the Safe Tensor format by default.

How GPT-4o-mini can be simultaneously 20x cheaper and 2x more expensive than GPT-4o

GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don’t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!

Speech-based retrieval augmented generation (RAG) with GPT-4o Realtime API

On October 1st, OpenAI and Microsoft (Azure OpenAI) announced the availability of the GPT-4o Realtime API for speech and audio. It is a new, innovative way of interacting with the GPT-4o model family, the provides a “speech in, speech out” conversational interface. Contrary to traditional text-based APIs, the Realtime API allows sending the audio input directly to the model, and receiving the audio output back. This is a significant improvement over the existing solutions to voice-enabled assistants, which required converting the audio to text first, and then converting the text back to audio. The Realtime API is currently in preview, and the SDKs for various languages have mixed-level of support for them, but it is already possible to build exciting new applications with it.

The low-latency speech-based interface also poses some challenges to established AI architectural patterns, such as Retrieval-Augmented Generation (RAG) - and today we will tackle just that, and have a look at a small sample realtime-voice RAG app in .NET.

Using Local Phi-3 Models in AutoGen with Strathweb Phi Engine

I recently announced Strathweb Phi Engine, a cross-platform library/toolset for conveniently running Phi-3 (almost) anywhere. Today I would like to show how to integrate a local Phi-3 model, orchestrated by Strathweb Phi Engine, into an agentic workflow built with AutoGen.

Building a chat app with Blazor WASM, SignalR and post-quantum end-to-end encryption

I previously blogged about post-quantum cryptography on this blog a few times. Among other things, I released a set of helper libraries for working with Dilithium in .NET and Duende Identity Server, as well as shared some general samples on post-quantum cryptography in .NET.

Earlier this month, in a big milestone, NIST released the first 3 finalized Post-Quantum encryption standards. I thought it might be nice to celebrate this by building a simple chat application with Blazor WASM and SignalR, that uses post-quantum cryptography for end-to-end encryption.

Strathweb.Dilithium for Duende Identity Server now supports automatic key management

Earlier this week, I released version 0.2.0 of my post-quantum cryptography helper library .NET, Strathweb.Dilithium, which introduces a new feature - automatic key management support in Duende Identity Server. This feature plugs into the automatic key management capabilities of Duende Identity Server, and allows you to automatically generate and manage Dilithium keys for token signing purposes, without having to manually handle the key generation and rotation.

About


Hi! I'm Filip W., a software architect from Zรผrich ๐Ÿ‡จ๐Ÿ‡ญ. I like Toronto Maple Leafs ๐Ÿ‡จ๐Ÿ‡ฆ, Rancid and quantum computing. Oh, and I love the Lowlands ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ.

You can find me on Github, on Mastodon and on Bluesky.

My Introduction to Quantum Computing with Q# and QDK book
Microsoft MVP