Using your own data with GPT models in Azure OpenAI - Part 3: Calling Azure OpenAI Service via .NET SDK

Β· 1623 words Β· 8 minutes to read

In the last post of this series we set up a demo .NET client application that was able to call and utilize a GPT model hosted in Azure OpenAI Service, which in turn was integrated with our own custom data via Azure AI Search. We did this using the bare bones REST API - and in part three, it’s time to shift gears and explore how to accomplish similar task using the .NET SDK, which offers a more streamlined and less ceremonious approach over calling the HTTP endpoints directly.

Series overview πŸ”—

Adding the .NET SDK πŸ”—

We can add the .NET SDK to our project by referencing the Azure.AI.OpenAI Nuget package, which, at the time of writing, is available in version 1.0.0-beta.12:

<PackageReference Include="Azure.AI.OpenAI" Version="1.0.0-beta.12" />

This SDK can be a powerful ally, enabling us to interact with Azure OpenAI Service in a more .NET-centric way. In particular, it will be able to handle streaming of responses from the cloud, which will offer a significantly better user experience, as the model’s answer chunks will appear progressively on the screen.

In our original implementation we conveniently ignored streaming, and used the REST API to fetch the complete response instead, which meant longer waiting times in the UI. The main reason why we did that was that OpenAI uses server sent events for streaming, which are fine to work with in JavaScript or, more generally, browser-based scenarios, but are not natively supported by the .NET HttpClient, and would require writing some extra the plumbing code. Instead, the SDK supports them out of the box, and exposes the response stream as IAsyncEnumerable which is very convenient to work with in .NET applications.

Integrating the SDK πŸ”—

Due to the fact that in the last post we abstracted away all of the configuration data into a reusable AzureOpenAiContext, integrating the SDK into our application will require minimal changes. In fact, the only thing we need to do is to write a method that will take in AzureOpenAiContext as input, and perform the necessary calls to the Azure OpenAI backend.

Let’s start by creating a placeholder static method, just like we did last time, and follow it up with the implementation.

static partial class Demo
{
    public static async Task RunWithSdk(AzureOpenAiContext context)
    {
        // all the code goes here
    }
}

In the implementation, we start by creating a new OpenAIClient instance - we need to pass our service endpoint URL and the key to it. What follows is the infinite loop which will prompt the user for input, read it, and make available for processing further. This is structurally identical to what we did in the last post when working with REST API, except then the client was an HttpClient.

var openAiClient = new OpenAIClient(new Uri(context.AzureOpenAiServiceEndpoint),
            new AzureKeyCredential(context.AzureOpenAiServiceKey));
while (true)
{
    var prompt = Console.ReadLine();
    // todo
}

Next, we have to prepare the request - and obviously instead of manually constructing HTTP requests, we can use strongly-typed objects and methods provided by the SDK - in our specific case, we will need ChatCompletionsOptions. The fundamental things that are required are obviously the model deployment’s name, as well as the messages - the system instructions, passed in as ChatRequestSystemMessage and the user’s prompt, passed in as ChatRequestUserMessage. The usual suspects - properties such as Temperature, MaxTokens or NucleusSamplingFactor (maps to TopP in the REST API) - are also there, and allow us to control the behavior and creativity of the GPT.

Additionally, the key property to set in our specific use case, are the extensions options. This is similar to what we did with the REST API, and this is where we are going to configure everything related to our Azure AI Search integration. Note that the extension type is still called AzureCognitiveSearchChatExtensionConfiguration, even though the product itself was recently renamed from “Azure Cognitive Search” to “Azure AI Search”. I expect this and related classes to be renamed accordingly in the near future, especially as the SDK is still in beta and almost every update at this stage brings breaking changes.

var request = new ChatCompletionsOptions
{
    DeploymentName = context.AzureOpenAiDeploymentName,
    Messages = 
    {
        new ChatRequestSystemMessage(context.SystemInstructions),
        new ChatRequestUserMessage(prompt)
    },
    Temperature = 1,
    MaxTokens = 400,
    NucleusSamplingFactor = 1f,
    AzureExtensionsOptions = new AzureChatExtensionsOptions
    {
        Extensions = 
        { 
            new AzureCognitiveSearchChatExtensionConfiguration
            {
                ShouldRestrictResultScope = context.RestrictToSearchResults,
                SearchEndpoint = new Uri($"https://{context.AzureSearchService}.search.windows.net"),
                Key = context.AzureSearchKey,
                IndexName = context.AzureSearchIndex,
                DocumentCount = (int)context.SearchDocumentCount,
                QueryType = new AzureCognitiveSearchQueryType(context.AzureSearchQueryType),
                SemanticConfiguration = context.AzureSearchQueryType is "semantic" or "vectorSemanticHybrid"
                    ? context.AzureSearchSemanticSearchConfig
                    : "",
                FieldMappingOptions = new AzureCognitiveSearchIndexFieldMappingOptions
                {
                    ContentFieldNames = { "content" },
                    UrlFieldName = "blog_url",
                    TitleFieldName = "metadata_storage_name",
                    FilepathFieldName = "metadata_storage_path"
                },
                RoleInformation = context.SystemInstructions
            } 
        }
    }
};

With the .NET SDK, initiating a streaming request is straightforward. We simply call a GetChatCompletionsStreamingAsync method on our client object, similar to sending a standard request, but with streaming enabled. This approach abstracts away the complexities of HTTP communication, offering a more fluid and responsive interaction model.

var completionResponse = await openAiClient.GetChatCompletionsStreamingAsync(request);

Processing the response πŸ”—

This GetChatCompletionsStreamingAsync yields a StreamingResponse, which is an IAsyncEnumerable. Such design allows us to iterate over the response, receiving each chunk as it becomes available for processing. Typically, citations (the so-called “tool” response) are received first, followed by the actual message, delivered in multiple segments.

However, it’s worth noting that, as of now, the citations do not have strongly-typed support in the .NET SDK. They are returned as JSON strings encapsulated within a content property. This situation is slightly inconvenient, and it forces us to perform manual post-processing. Despite this, it’s a feature that will likely be integrated into the SDK in the future.

The way we can handle that is by looking at the AzureExtensionsContext property of the response, and attempting to deserialize OpenAICitationResponse out of it. We already have the necessary models, since we set them up last time around when working with the HTTP API. In situations when AzureExtensionsContext is not available, we simply print out the ContentUpdate which constitutes the part of the model’s response chunk that is being streamed down to us at a given moment.

var options = new JsonSerializerOptions
{
    PropertyNameCaseInsensitive = true,
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase
};
AnsiConsole.Markup(":robot: ");

OpenAICitationResponse citationResponse = null;
await foreach (var message in completionResponse)
{
    if (message.AzureExtensionsContext != null)
    {
        var extensionMessage = message.AzureExtensionsContext.Messages.FirstOrDefault();
        if (extensionMessage != null && !string.IsNullOrWhiteSpace(extensionMessage.Content))
        {
            citationResponse =
                JsonSerializer.Deserialize<OpenAICitationResponse>(extensionMessage.Content, options);
        }
    }
    else
    {
        Console.Write(message.ContentUpdate);
    }
}

The final piece is the visualization of the citations - which we buffered into the local citationResponse variable, so that we can render it after the actual response. This is unrelated to the fact that we used the .NET SDK anymore - and the code will be the same as we used in the last post.

if (citationResponse != null && citationResponse.Citations.Any())
{
    Console.WriteLine();
    var referencesContent = new StringBuilder();
    referencesContent.AppendLine();

    for (var i = 1; i <= citationResponse.Citations.Length; i++)
    {
        var citation = citationResponse.Citations[i - 1];
        referencesContent.AppendLine($"  :page_facing_up: [[doc{i}]] {citation.Title}");
        referencesContent.AppendLine($"  :link: {citation.Url ?? citation.FilePath}");
    }

    var panel = new Panel(referencesContent.ToString())
    {
        Header = new PanelHeader("References")
    };
    AnsiConsole.Write(panel);
}
else
{
    Console.WriteLine();
}

And that’s it! We have reconstructed the logic we previously wrote using REST API, but by using the .NET SDK. The response from the model is now being nicely streamed back to us, and answer chunks are appearing progressively - which ensures that our application remains responsive and engaging, even during complex or lengthy data retrieval operations.

Putting it all together πŸ”—

We can close off by adding a toggle selection to our application, which will allow us to switch between the two implementations - the REST API one, which we created last time around, and the new SDK based one. Since we already use console UI components from SpectreConsole, we can take advantage of its prompt control.

var demo = AnsiConsole.Prompt(
    new SelectionPrompt<string>()
        .Title("Choose the [green]example[/] to run?")
        .AddChoices("REST API", "SDK"));

AnsiConsole.MarkupLine($":robot: I'm a Strathweb AI assistant! Ask me anything about the content from strathweb.com blog!");

switch (demo)
{
    case "REST API":
        await RunWithRestApi(context);
        break;
    case "SDK":
        await RunWithSdk(context);
        break;
    default:
        Console.WriteLine("Nothing selected!");
        break;
}

With everything set up, we can run the application and select the SDK variant. This will enable the response to be streamed to us in real time. Functionally, we maintain the same enriching experience, with the model being effectively grounded in our custom dataset. Below is a sample interaction:

πŸ€– I’m a Strathweb AI assistant! Ask me anything about the content from strathweb.com blog!

what has Filip written about Werner Heisenberg?

πŸ€– Filip, in his blog posts, mentioned Werner Heisenberg while discussing the historical background of quantum mechanics. He singled out Heisenberg as a pivotal figure in the history of quantum mechanics, particularly highlighting his 1925 paper as the foundation of modern quantum mechanics [doc1]. He commended Heisenberg’s genius for revolutionizing the approach to understanding the behavior of subatomic particles. Heisenberg abandoned the deterministic realism inherent to classical physics in favor of a non-commutative understanding of observable properties like momentum and position at the quantum level [doc1]. Heisenberg is also indirectly referenced in relation to the “Copenhagen interpretation”, a mainstream interpretation of quantum mechanics that he championed in collaboration with Niels Bohr and others [doc2]. This interpretation posits that we cannot reason about the reality of a quantum object until it’s actually measured [doc2].

References:

πŸ“„ [doc1] 2020-03-20-intro-to-quantum-computing-with-q-part-1-the-background-and-the-qubit.md

πŸ”— https://www.strathweb.com/2020/03/intro-to-quantum-computing-with-q-part-1-the-background-and-the-qubit/

πŸ“„ [doc2] 2020-04-08-intro-to-quantum-computing-with-q-part-2-superposition.md

πŸ”— https://www.strathweb.com/2020/04/intro-to-quantum-computing-with-q-part-2-superposition/

You can find the source code for this blog post on Github. In our next installment, we’ll take our application to the next level by integrating vector search capabilities.

About


Hi! I'm Filip W., a cloud architect from ZΓΌrich πŸ‡¨πŸ‡­. I like Toronto Maple Leafs πŸ‡¨πŸ‡¦, Rancid and quantum computing. Oh, and I love the Lowlands 🏴󠁧󠁒󠁳󠁣󠁴󠁿.

You can find me on Github, on Mastodon and on Bluesky.

My Introduction to Quantum Computing with Q# and QDK book
Microsoft MVP