How GPT-4o-mini can be simultaneously 20x cheaper and 2x more expensive than GPT-4o

GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don’t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!

Speech-based retrieval augmented generation (RAG) with GPT-4o Realtime API

On October 1st, OpenAI and Microsoft (Azure OpenAI) announced the availability of the GPT-4o Realtime API for speech and audio. It is a new, innovative way of interacting with the GPT-4o model family, the provides a “speech in, speech out” conversational interface. Contrary to traditional text-based APIs, the Realtime API allows sending the audio input directly to the model, and receiving the audio output back. This is a significant improvement over the existing solutions to voice-enabled assistants, which required converting the audio to text first, and then converting the text back to audio. The Realtime API is currently in preview, and the SDKs for various languages have mixed-level of support for them, but it is already possible to build exciting new applications with it.

The low-latency speech-based interface also poses some challenges to established AI architectural patterns, such as Retrieval-Augmented Generation (RAG) - and today we will tackle just that, and have a look at a small sample realtime-voice RAG app in .NET.

Using Local Phi-3 Models in AutoGen with Strathweb Phi Engine

I recently announced Strathweb Phi Engine, a cross-platform library/toolset for conveniently running Phi-3 (almost) anywhere. Today I would like to show how to integrate a local Phi-3 model, orchestrated by Strathweb Phi Engine, into an agentic workflow built with AutoGen.

Building a chat app with Blazor WASM, SignalR and post-quantum end-to-end encryption

I previously blogged about post-quantum cryptography on this blog a few times. Among other things, I released a set of helper libraries for working with Dilithium in .NET and Duende Identity Server, as well as shared some general samples on post-quantum cryptography in .NET.

Earlier this month, in a big milestone, NIST released the first 3 finalized Post-Quantum encryption standards. I thought it might be nice to celebrate this by building a simple chat application with Blazor WASM and SignalR, that uses post-quantum cryptography for end-to-end encryption.

Strathweb.Dilithium for Duende Identity Server now supports automatic key management

Earlier this week, I released version 0.2.0 of my post-quantum cryptography helper library .NET, Strathweb.Dilithium, which introduces a new feature - automatic key management support in Duende Identity Server. This feature plugs into the automatic key management capabilities of Duende Identity Server, and allows you to automatically generate and manage Dilithium keys for token signing purposes, without having to manually handle the key generation and rotation.

Announcing Strathweb Phi Engine - a cross-platform library for running Phi-3 anywhere

I recently wrote a blog post about using Rust to run Phi-3 model on iOS. The post received an overwhelmingly positive response, and I got a lot of questions about running Phi-3 using similar approach on other platforms, such as Android, Windows, macOS or Linux. Today, I’m excited to announce the project I have been working on recently - Strathweb Phi Engine, a cross-platform library for running Phi-3 (almost) anywhere.

Built-in support for Server Sent Events in .NET 9

Many years ago I wrote a book about ASP.NET Web API. One of the chapters in that book was dedicated to supporting push communication between the server and the client, and one of the covered techniques was the niche technology called Server-Sent Events (SSE). At the time, SSE was not widely supported by browsers, however, it was super simple and effective way to push data from the server to the client, and it was a great fit for scenarios where you needed to push data from the server to the client in a one-way fashion without much ceremony.

Over the years, SSE has not really gained much traction, and WebSockets have become the de-facto standard for push communication. However, in recent years a certain OpenAI came out with their API that uses SSE for streaming responses from their Large Language Model, and, pretty much overnight, SSE became cool again.

In .NET 9, SSE finally is getting first-class client-side support and the first preview was released this week with .NET Preview 9.

Announcing Q# Bridge - a library bringing the Q# simulator and tools to C#, Swfit and Kotlin

Over the past year, Q# and the QDK, have undergone a massive transformation, with the entire toolchain moving to Rust - which resulted in a significant performance improvement, better portability of the toolchain and the ability to run Q# on a wide range of platforms. This was especially striking compared to the 0.x versions of the QDK, which was coupled to the .NET SDK.

Today I would like to announce a new Q# ecosystem project called Q# Bridge, built under the Q# Community organization, which acts as a wrapper around the Q# simulator and tools, allowing you to easily use them from C#, Swift and Kotlin. The library is designed to be a lightweight wrapper around the Q# tooling, providing a simple API to interact with the Q# simulator and tools (such as resource estimation, QIR generation or circuit descriptions), without the need to write any Rust code or deal with marshaling or FFI directly.

Running Microsoft's Phi-3 Model in an iOS app with Rust

Last month, Microsoft released the exciting new minimal AI model, Phi-3 mini. It’s a 3.8B model that can outperform many other larger models, while still being small enough to run on a phone. In this post, we’ll explore how to run the Phi-3 model inside a SwiftUI iOS application using the minimalist ML framework for Rust, called candle, and built by the nice folks at HuggingFace.

Tool Calling with Azure OpenAI - Part 2: Using the tools directly via the SDK

Last time around, we discussed how Large Language Models can select the appropriate tool and its required parameters out of freely flowing conversation text. We also introduced the formal concept of those tools, which are structurally described using an OpenAPI schema.

In this part 2 of the series, we are going to build two different .NET command line assistant applications, both taking advantage of the tool calling integration. We will orchestrate everything by hand - that is, we will only use the Azure OpenAI Service API directly (or rather using the .NET SDK for Azure OpenAI) - without any additional AI frameworks.

About


Hi! I'm Filip W., a cloud architect from Zürich 🇨🇭. I like Toronto Maple Leafs 🇨🇦, Rancid and quantum computing. Oh, and I love the Lowlands 🏴󠁧󠁢󠁳󠁣󠁴󠁿.

You can find me on Github and on Mastodon.

My Introduction to Quantum Computing with Q# and QDK book
Microsoft MVP