This summer, I announced the Strathweb Phi Engine β a cross-platform library for running Phi inference anywhere. Up until now, the library only supported models in the quantized GGUF format. Today, I’m excited to share that the library now also supports the Safe Tensor model format.
This enhancement significantly expands the scope of use cases and interoperability for the Strathweb Phi Engine. With Safe Tensor support, you can now load and execute models in a format that is not only performant but also prioritizes security and memory safety. Notably, all the Phi models published by Microsoft use the Safe Tensor format by default.
Using the Safe Tensor format π
In Strathweb Phi Engine, models are loaded by supplying the model location via PhiModelProvider. (Note: The casing used here and in the examples below is specific to the Swift implementation. Other supported languages may use slightly different naming conventions to align with their respective language standards.)
There are now four possibilities for loading models:
- PhiModelProvider.huggingFaceGguf: Loads a model in the GGUF format directly from Hugging Face. This requires specifying the repository name, model filename, and the model version (branch).
- PhiModelProvider.huggingFace: Loads a model in the Safe Tensor format directly from Hugging Face. This requires specifying the repository name and model version (branch). The library automatically detects the necessary model files by looking for model.safetensors.index.json and config.json in the repository.
- PhiModelProvider.fileSystemGguf: Loads a model in the GGUF format from the local filesystem. You must provide the absolute path to the GGUF file.
- PhiModelProvider.fileSystem: Loads a model in the Safe Tensor format from the local filesystem. You must provide the absolute paths to both the model.safetensors.index.json and config.json files.
Hereβs a complete example of how to load and use the Phi-3-mini-4k model in both GGUF and Safe Tensor formats using Swift:
import Foundation
let isQuantizedMode = CommandLine.arguments.contains("--quantized")
if isQuantizedMode {
print(" π Quantized mode is enabled.")
} else {
print(" πͺ Safe tensors mode is enabled.")
}
let modelProvider = isQuantizedMode ?
PhiModelProvider.huggingFaceGguf(modelRepo: "microsoft/Phi-3-mini-4k-instruct-gguf", modelFileName: "Phi-3-mini-4k-instruct-q4.gguf", modelRevision: "main") :
PhiModelProvider.huggingFace(modelRepo: "microsoft/Phi-3-mini-4k-instruct", modelRevision: "main")
let inferenceOptionsBuilder = InferenceOptionsBuilder()
try! inferenceOptionsBuilder.withTemperature(temperature: 0.9)
let inferenceOptions = try! inferenceOptionsBuilder.build()
let cacheDir = FileManager.default.currentDirectoryPath.appending("/.cache")
class ModelEventsHandler: PhiEventHandler {
func onInferenceStarted() {}
func onInferenceEnded() {}
func onInferenceToken(token: String) {
print(token, terminator: "")
}
func onModelLoaded() {
print("""
π§ Model loaded!
****************************************
""")
}
}
let modelBuilder = PhiEngineBuilder()
try! modelBuilder.withEventHandler(eventHandler: BoxedPhiEventHandler(handler: ModelEventsHandler()))
let gpuEnabled = try! modelBuilder.tryUseGpu()
try! modelBuilder.withModelProvider(modelProvider: modelProvider)
let model = try! modelBuilder.buildStateful(cacheDir: cacheDir, systemInstruction: "You are a hockey poet. Be brief and polite.")
// Run inference
let result = try! model.runInference(promptText: "Write a haiku about ice hockey", inferenceOptions: inferenceOptions)
print("""
****************************************
π Tokens Generated: \(result.tokenCount)
π₯οΈ Tokens per second: \(result.tokensPerSecond)
β±οΈ Duration: \(result.duration)s
ποΈ GPU enabled: \(gpuEnabled)
""")
And that’s it! Whether you’re working with quantized models or exploring Safe Tensor use cases, this update opens up new possibilities. You can find more info and samples in the Strathweb Phi Engine repository. Have fun experimenting with Phi!