Last summer, I launched Strathweb Phi Engine — a cross-platform library for running Phi model inference via a simple, high-level API, from a number of high-level languages: C#, Swift, Kotlin and Python.
Today I am happy to announce support for Phi-4, the latest model in the Phi family, which Microsoft AI released in December 2024.
Phi-4 Support 🔗
Phi-4 has 14B active parameters and is architecturally almost the same as Phi-3, so only minimal changes were needed to support it in the Strathweb Phi Engine (in both GGUF and SafeTensor formats). At the same time, it achieves improved performance over its predecessor and other similarly sized models, primarily due to the more strategic use of synthetic data, improved training methodologies and innovations in the post‑training process (more on that can be found in the technical report).
From the inference standpoint, Phi-4 uses ChatML format for the prompt, a significant difference compared to Phi-3, which relied on the Llama format. For low-level frameworks such a change is inconsequential, as the user has to feed the raw prompt to the model anyway, so the formatting responsibility is with the caller. However, being a high-level library, Strathweb Phi Engine exposes an API that is strongly typed and operates on the concept of simple typed messages (similar to e.g. the OpenAPI REST API) - so some changes were needed to accommodate this new format.
Example 🔗
The Swift sample in the repo has been updated to demonstrate how to load and use the Phi-4k model. Here’s a shortened version of the sample:
let modelProvider = PhiModelProvider.huggingFaceGguf(
modelRepo: "microsoft/phi-4-gguf",
modelFileName: "phi-4-q4.gguf",
modelRevision: "main"
)
let tokenizerProvider = TokenizerProvider.huggingFace(
tokenizerRepo: "microsoft/phi-4",
tokenizerFileName: "tokenizer.json"
)
let inferenceOptionsBuilder = InferenceOptionsBuilder()
try inferenceOptionsBuilder.withChatFormat(chatFormat: ChatFormat.chatMl)
let inferenceOptions = try inferenceOptionsBuilder.build()
class ModelEventsHandler: PhiEventHandler {
func onInferenceStarted() {
print(" ℹ️ Inference started...")
}
func onInferenceEnded() {
print("\n ℹ️ Inference ended.")
}
func onInferenceToken(token: String) {
print(token, terminator: "")
}
func onModelLoaded() {
print("""
🧠 Model loaded!
****************************************
""")
}
}
let modelBuilder = PhiEngineBuilder()
try modelBuilder.withEventHandler(eventHandler: BoxedPhiEventHandler(handler: ModelEventsHandler()))
try modelBuilder.withModelProvider(modelProvider: modelProvider)
try modelBuilder.withTokenizerProvider(tokenizerProvider: )
let model = try modelBuilder.buildStateful(
cacheDir: cacheDir,
systemInstruction: "You are a hockey poet. Be brief and polite."
)
_ = try model.runInference(
promptText: "Write a haiku about ice hockey",
inferenceOptions: inferenceOptions
)
In short, in order to work with Phi-4, you need to:
- point the library at the Phi-4 model (either local, or on HuggingFace, and either in GGUF or SafeTensors format)
- point the library at the Phi-4 tokenizer (local or from HuggingFace), as the default tokenizer would be Phi-3 specific
- set the model prompt to ChatML (the default remains Llama format)
And that’s it! For full sample please refer to the Github repository.