GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don’t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!
How GPT-4o-mini can be super cheap π
GPT-4o-mini is a great model for many use cases. It is fast, accurate, has a good balance between performance and cost and is multimodal, just like GPT-4o - they both can process image input. GPT-4o-mini costs 0.15$ per 1 million input tokens, while GPT-4o (the 2024-08-06 version) costs 2.50$ per 1 million input tokens. This means, the full model is more expensive by a factor of ~17. The earlier version of GPT-4o (2024-05-13) is even more expensive, at 5.00$ per 1 million input tokens, while the older variants of GPT-4, are even worse - for example gpt-4-32k costs a staggering 60.00$ per 1 million input tokens, meaning it is 400x more expensive than GPT-4o-mini (!).
These are prices at the time of writing this post, and they are subject to change, so always make sure to check the latest pricing on the OpenAI website or Azure OpenAI website (the pricing discussion that follows is applicable to both platforms).
This obviously highlights two things - first, always make sure to upgrade to the latest version of the model, as the price is likely to be lower. Second, when in doubt, start by using mini as the default choice, and only upgrade to the full model if you really need the extra performance.
How GPT-4o-mini can be super expensive π
The mini model also has vision capabilities - similar to its bigger brother (or sister?), GPT-4o. Vision input is still charged in tokens, and the image tokens are indeed priced the same as the text ones. However, the way the processing costs are calculated is not that straight forward as with textual input.
The image input can be handled in two modes - low resolution or high resolution, which is specified by the developer making the request. Low resolution mode is designed for use cases where high precision is not necessary, allowing fewer tokens to be used, thus keeping the costs down. The cost for low resolution mode is fixed - regardless of the resolution of the image. For GPT-4o, that fixed cost is 85 tokens.
GPT-4o | Value |
---|---|
Price per 1M tokens | $2.50 |
Total Tokens | 85 |
Total Price | $0.000213 |
So processing 10 000 such images would cost 2.13$.
Given that we just established that GPT-4o-mini is ~17 times cheaper than GPT-4o, we can expect the same for the vision processing, right? This would imply that 10 000 images in low resolution mode should cost us about 0.125$. However, the fixed cost for GPT-4o-mini is a surprising 2833 tokens (!). So even with cheaper price per tokens, the costs climb up quickly.
GPT-4o-mini | Value |
---|---|
Price per 1M tokens | $0.15 |
Total Tokens | 2833 |
Total Price | $0.000425 |
So processing 10 000 low resolution mode images would cost 4.25$ here, making the mini model twice as expensive as GPT-4o (!).
Similar pricing discrepancy applies to the hi-resolution mode, where pricing calculation is even more complicated:
- the image is first scaled to fit to a 2048px x 2048px square, without changing the aspect ratio
- then scaling down is done is such a way, that so that the shortest side becomes 768px
- the image is divided into 512px squares
- each square has an associated fixed token cost, and total cost is calculated as the sum of all the squares
For GPT-4o, the fixed cost is 170 tokens per square. Now, assuming a 1024px x 1024px image, the total cost would be:
GPT-4o | Value |
---|---|
Price per 1M tokens | $2.50 |
Resized Width | 768 |
Resized Height | 768 |
512 x 512 Tiles | 2 Γ 2 |
Total Tiles | 4 |
Base Tokens | 85 |
Tile Tokens | 170 Γ 4 = 680 |
Total Tokens | 765 |
Total Price | $0.001913 |
So processing 10 000 such images would cost 19.13$.
For GPT-4o-mini, the fixed cost is 2833 tokens per square. Again, assuming a 1024px x 1024px image, the cost breakdown is:
GPT-4o-mini | Value |
---|---|
Price per 1M tokens | $0.15 |
Resized Width | 768 |
Resized Height | 768 |
512 x 512 Tiles | 2 Γ 2 |
Total Tiles | 4 |
Base Tokens | 2833 |
Tile Tokens | 5667 Γ 4 = 22668 |
Total Tokens | 25501 |
Total Price | $0.003825 |
Remarkably, yet again the mini model is twice as expensive as GPT-4o. Here, processing 10 000 such images would cost 38.25$.
Conclusion π
GPT-4o-mini should be the default model of choice if you are using OpenAI models. Especially during development, the cost savings can be significant. However, always be aware of the pricing quirks, especially when it comes to vision processing. The fixed cost for vision processing is significantly higher than for the full model, making it twice as expensive.
It is not clear what is the reason behind such pricing approach. Some theories have even suggested that there is no vision capability in GPT-4o-mini at all, and the vision requests to it are simply re-routed to the full GPT-4o model, which would explain the pricing π
We might never know, but for now, for vision tasks it’s best to engage with the GPT-4o model directly.