How GPT-4o-mini can be simultaneously 20x cheaper and 2x more expensive than GPT-4o

Β· 883 words Β· 5 minutes to read

GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don’t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!

How GPT-4o-mini can be super cheap πŸ”—

GPT-4o-mini is a great model for many use cases. It is fast, accurate, has a good balance between performance and cost and is multimodal, just like GPT-4o - they both can process image input. GPT-4o-mini costs 0.15$ per 1 million input tokens, while GPT-4o (the 2024-08-06 version) costs 2.50$ per 1 million input tokens. This means, the full model is more expensive by a factor of ~17. The earlier version of GPT-4o (2024-05-13) is even more expensive, at 5.00$ per 1 million input tokens, while the older variants of GPT-4, are even worse - for example gpt-4-32k costs a staggering 60.00$ per 1 million input tokens, meaning it is 400x more expensive than GPT-4o-mini (!).

These are prices at the time of writing this post, and they are subject to change, so always make sure to check the latest pricing on the OpenAI website or Azure OpenAI website (the pricing discussion that follows is applicable to both platforms).

This obviously highlights two things - first, always make sure to upgrade to the latest version of the model, as the price is likely to be lower. Second, when in doubt, start by using mini as the default choice, and only upgrade to the full model if you really need the extra performance.

How GPT-4o-mini can be super expensive πŸ”—

The mini model also has vision capabilities - similar to its bigger brother (or sister?), GPT-4o. Vision input is still charged in tokens, and the image tokens are indeed priced the same as the text ones. However, the way the processing costs are calculated is not that straight forward as with textual input.

The image input can be handled in two modes - low resolution or high resolution, which is specified by the developer making the request. Low resolution mode is designed for use cases where high precision is not necessary, allowing fewer tokens to be used, thus keeping the costs down. The cost for low resolution mode is fixed - regardless of the resolution of the image. For GPT-4o, that fixed cost is 85 tokens.

GPT-4o Value
Price per 1M tokens $2.50
Total Tokens 85
Total Price $0.000213

So processing 10 000 such images would cost 2.13$.

Given that we just established that GPT-4o-mini is ~17 times cheaper than GPT-4o, we can expect the same for the vision processing, right? This would imply that 10 000 images in low resolution mode should cost us about 0.125$. However, the fixed cost for GPT-4o-mini is a surprising 2833 tokens (!). So even with cheaper price per tokens, the costs climb up quickly.

GPT-4o-mini Value
Price per 1M tokens $0.15
Total Tokens 2833
Total Price $0.000425

So processing 10 000 low resolution mode images would cost 4.25$ here, making the mini model twice as expensive as GPT-4o (!).

Similar pricing discrepancy applies to the hi-resolution mode, where pricing calculation is even more complicated:

  • the image is first scaled to fit to a 2048px x 2048px square, without changing the aspect ratio
  • then scaling down is done is such a way, that so that the shortest side becomes 768px
  • the image is divided into 512px squares
  • each square has an associated fixed token cost, and total cost is calculated as the sum of all the squares

For GPT-4o, the fixed cost is 170 tokens per square. Now, assuming a 1024px x 1024px image, the total cost would be:

GPT-4o Value
Price per 1M tokens $2.50
Resized Width 768
Resized Height 768
512 x 512 Tiles 2 Γ— 2
Total Tiles 4
Base Tokens 85
Tile Tokens 170 Γ— 4 = 680
Total Tokens 765
Total Price $0.001913

So processing 10 000 such images would cost 19.13$.

For GPT-4o-mini, the fixed cost is 2833 tokens per square. Again, assuming a 1024px x 1024px image, the cost breakdown is:

GPT-4o-mini Value
Price per 1M tokens $0.15
Resized Width 768
Resized Height 768
512 x 512 Tiles 2 Γ— 2
Total Tiles 4
Base Tokens 2833
Tile Tokens 5667 Γ— 4 = 22668
Total Tokens 25501
Total Price $0.003825

Remarkably, yet again the mini model is twice as expensive as GPT-4o. Here, processing 10 000 such images would cost 38.25$.

Conclusion πŸ”—

GPT-4o-mini should be the default model of choice if you are using OpenAI models. Especially during development, the cost savings can be significant. However, always be aware of the pricing quirks, especially when it comes to vision processing. The fixed cost for vision processing is significantly higher than for the full model, making it twice as expensive.

It is not clear what is the reason behind such pricing approach. Some theories have even suggested that there is no vision capability in GPT-4o-mini at all, and the vision requests to it are simply re-routed to the full GPT-4o model, which would explain the pricing πŸ˜…

We might never know, but for now, for vision tasks it’s best to engage with the GPT-4o model directly.

About


Hi! I'm Filip W., a software architect from ZΓΌrich πŸ‡¨πŸ‡­. I like Toronto Maple Leafs πŸ‡¨πŸ‡¦, Rancid and quantum computing. Oh, and I love the Lowlands 🏴󠁧󠁒󠁳󠁣󠁴󠁿.

You can find me on Github, on Mastodon and on Bluesky.

My Introduction to Quantum Computing with Q# and QDK book
Microsoft MVP