Dear ZenMux User,
My name is Yu Haize, and I'm the founder of ZenMux. I'm reaching out today to be fully transparent about a billing error we've identified, and to let you know how we're making it right.
This issue was first spotted and reported by a community member, FEIYU, whose sharp eye and persistence brought it to our attention. After receiving his report, we immediately launched a thorough investigation, which ultimately confirmed the root cause and the full scope of impact. I'd like to take this opportunity to sincerely thank FEIYU — a healthy product depends on users who are willing to speak up.
What Happened
ZenMux supports four API protocols: OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and Google Gemini. When integrating models from different providers, we map each provider's usage response data to the corresponding standard billing fields. The bug was introduced in this mapping layer.
Different model providers report Input Token usage differently — specifically, how prompt_tokens is defined varies in a fundamental way:
Type A providers (e.g., OpenAI): prompt_tokens excludes cache read tokens. They are reported separately, and billed independently:
Cost = prompt_tokens × prompt_price
+ cache_read_tokens × cache_read_priceType B providers (e.g., Google Gemini): prompt_tokens already includes cache read tokens. The correct billing formula is therefore:
Cost = (prompt_tokens - cache_read_tokens) × prompt_price
+ cache_read_tokens × cache_read_priceWhen configuring Type B provider models, we mistakenly applied the same logic used for Type A, calculating costs as:
Cost = prompt_tokens × prompt_price + cache_read_tokens × cache_read_priceThis caused cache_read_tokens to be billed twice — once embedded within prompt_tokens, and once again as a separate line item — resulting in overcharges.
Affected Models
The following models were subject to this billing error:
google/gemini-2.5-progoogle/gemini-2.5-flashgoogle/gemini-2.5-flash-litegoogle/gemini-3-flash-previewgoogle/gemini-3-pro-previewgoogle/gemini-3.1-flash-lite-previewgoogle/gemini-3.1-pro-preview
Based on our data investigation, your account made calls to one or more of these affected models, resulting in overcharges on your account.
Compensation
We have recalculated the exact overcharge based on your original API usage. The difference will be refunded as credits to your PAYG account — no action is required on your part. The credits will be applied automatically within 24 hours.
Regardless of whether you are a PAYG user or a subscriber, all refunds are calculated against raw API usage and credited uniformly to your PAYG Credits balance. You can review the full details at:
https://zenmux.ai/platform/pay-as-you-go
What We've Fixed
All affected model configurations have been corrected and billing is now accurate. To prevent this from happening again, we have added an automated validation check in the billing mapping layer, and have made the prompt_tokens field type (whether or not it includes cache read tokens) a required configuration parameter in our model onboarding process.
Accurate billing is foundational to the trust you place in us. This was our mistake, and there are no excuses for it. Rather than stay silent, we chose to investigate the moment a user raised the flag, and to communicate openly — because at ZenMux, we believe a trustworthy product cannot be built on covering up problems, and a strong community depends on users who are willing to call them out.
If you have any questions about this refund or your billing history, please reach out to us at [support email] and we'll walk through it with you directly.
We're truly sorry for the inconvenience this has caused. Thank you for your understanding and continued support.
Yu Haize, Founder of ZenMux March 10, 2026
