Fine tuning Phi models with MLX
Recently, I dedicated quite a lot of room on this blog to the topic of running Phi locally with the Strathweb Phi Engine. This time, I want to focus on a different aspect of adopting small language models like Phi - fine-tuning them. We are going to do this with Apple’s MLX library, which offers excellent performance for ML-related tasks on Apple Silicon.
We are going to do LoRA fine tuning of a Phi model, and then invoke it using Strathweb Phi Engine.
Running Phi Inference in .NET Applications with Strathweb Phi Engine
Local AI inference has become increasingly important for developers seeking to build robust, privacy-preserving applications. In this deep dive, I’ll show you how to leverage Strathweb Phi Engine multi-platform library to run Microsoft’s Phi-family models directly in your .NET applications, exploring both basic integration patterns and advanced features that make Phi inference more accessible than ever.
Decorating a Quantum Christmas Tree with Q# and Qiskit
For a few years in a row now, around this time of the year, I have been writing a festive Q# quantum computing post. This year I would like to keep the tradition going and explore another fun topic .
Ever wondered what would happen if we let quantum mechanics decorate a ๐ Christmas tree ? Let’s explore a quantum program - in both my favorite quantum programming language, Q#, as well as in Qiskit - that makes quantum effects visible through festive decorations.
Generating OpenQASM from Q# code
In the summer of 2024, I announced the Q# Bridge library, which allows you to run Q# simulations from many popular high-level languages - C#, Swift, Python and Kotlin. Today, I would like to write about a brand new feature in the library, an ability to generate OpenQASM 2.0 code from Q# source.
This is a feature that Q# toolchain does not natively supports, and it adds to the value proposition of Q# Bridge - acting as a literal bridge between Q# and other ecosystems (traditional languages or, in this case, quantum).
Simplifying the AI workflow: Access different types of model deployments with Azure AI Inference
In this post, we will explore the flexibility behind Azure AI Inference, a new library from Azure, which allows us to run inference against a wide range of AI model deployments - both in Azure and, as we will see in this notebook, in other places as well.
It is available for Python and for .NET - in this post, we will focus on the Python version.
Strathweb Phi Engine - now with Safe Tensors support
This summer, I announced the Strathweb Phi Engine โ a cross-platform library for running Phi inference anywhere. Up until now, the library only supported models in the quantized GGUF format. Today, I’m excited to share that the library now also supports the Safe Tensor model format.
This enhancement significantly expands the scope of use cases and interoperability for the Strathweb Phi Engine. With Safe Tensor support, you can now load and execute models in a format that is not only performant but also prioritizes security and memory safety. Notably, all the Phi models published by Microsoft use the Safe Tensor format by default.
How GPT-4o-mini can be simultaneously 20x cheaper and 2x more expensive than GPT-4o
GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don’t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!
Speech-based retrieval augmented generation (RAG) with GPT-4o Realtime API
On October 1st, OpenAI and Microsoft (Azure OpenAI) announced the availability of the GPT-4o Realtime API for speech and audio. It is a new, innovative way of interacting with the GPT-4o model family, the provides a “speech in, speech out” conversational interface. Contrary to traditional text-based APIs, the Realtime API allows sending the audio input directly to the model, and receiving the audio output back. This is a significant improvement over the existing solutions to voice-enabled assistants, which required converting the audio to text first, and then converting the text back to audio. The Realtime API is currently in preview, and the SDKs for various languages have mixed-level of support for them, but it is already possible to build exciting new applications with it.
The low-latency speech-based interface also poses some challenges to established AI architectural patterns, such as Retrieval-Augmented Generation (RAG) - and today we will tackle just that, and have a look at a small sample realtime-voice RAG app in .NET.
Using Local Phi-3 Models in AutoGen with Strathweb Phi Engine
I recently announced Strathweb Phi Engine, a cross-platform library/toolset for conveniently running Phi-3 (almost) anywhere. Today I would like to show how to integrate a local Phi-3 model, orchestrated by Strathweb Phi Engine, into an agentic workflow built with AutoGen.
Building a chat app with Blazor WASM, SignalR and post-quantum end-to-end encryption
I previously blogged about post-quantum cryptography on this blog a few times. Among other things, I released a set of helper libraries for working with Dilithium in .NET and Duende Identity Server, as well as shared some general samples on post-quantum cryptography in .NET.
Earlier this month, in a big milestone, NIST released the first 3 finalized Post-Quantum encryption standards. I thought it might be nice to celebrate this by building a simple chat application with Blazor WASM and SignalR, that uses post-quantum cryptography for end-to-end encryption.
About
Hi! I'm Filip W., a software architect from Zรผrich ๐จ๐ญ. I like Toronto Maple Leafs ๐จ๐ฆ, Rancid and quantum computing. Oh, and I love the Lowlands ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ.
Recent Posts
- 2025/01/17, Fine tuning Phi models with MLX
- 2024/12/20, Running Phi Inference in .NET Applications with Strathweb Phi Engine
- 2024/12/16, Decorating a Quantum Christmas Tree with Q# and Qiskit
- 2024/12/12, Generating OpenQASM from Q# code
- 2024/11/22, Simplifying the AI workflow: Access different types of model deployments with Azure AI Inference
Categories
- ai (18)
- ai search (5)
- apache cordova (1)
- asp.net 5 (17)
- asp.net core (47)
- asp.net mvc (35)
- asp.net mvc 6 (7)
- asp.net vnext (6)
- asp.net web api (96)
- astronomy (1)
- autogen (1)
- azure (22)
- azure service bus (1)
- azure-devops (1)
- benchmark dotnet (1)
- bing maps (1)
- blazor (2)
- c plus (2)
- c-sharp (155)
- cryptography (5)
- csharp (6)
- csharp 10 (2)
- dnx (3)
- dotnet-cli (2)
- dotnet-script (11)
- duende (4)
- editorconfig (1)
- entity framework (2)
- espn api (2)
- events (1)
- ffi (4)
- fsharp (1)
- git (1)
- glimpse (1)
- html5 (4)
- identity server (2)
- iis (2)
- il (1)
- intro to qc (19)
- ios (5)
- javascript (9)
- jquery (4)
- jquery mobile metro (1)
- katana (2)
- kindle (1)
- knockout.js (8)
- kotlin (2)
- last.fm api (2)
- linq (1)
- mac (3)
- macos (1)
- mathematica (1)
- msbuild (3)
- mvc core (3)
- nancy (2)
- native (1)
- net (144)
- net 5 (3)
- net 6 (5)
- net 7 (7)
- net 8 (3)
- net 9 (1)
- net core (49)
- net sdk (2)
- ninject (2)
- odata (4)
- oidc (2)
- omnisharp (13)
- openai (11)
- osx (2)
- owin (5)
- phi (6)
- php (1)
- python (1)
- q-sharp (36)
- qir (3)
- qiskit (1)
- quantum computing (40)
- roslyn (30)
- rust (5)
- scriptcs (11)
- scripting (9)
- security (8)
- servicestack (2)
- signalr (8)
- swift (8)
- testing (5)
- twitter boostrap (1)
- typescript (1)
- visual studio (4)
- visual studio code (11)
- wasi (3)
- wasm (3)
- windows phone 7 (1)
- wordpress (1)
- wpf (2)