FAQ
An explanation of the self-developed scheduling engine, price, and how we keep costs down.
Why don't you let me choose a model?
Because selecting a model is a chore, and it is difficult to select accurately manually. Our self-developed scheduling engine has five major algorithm modules (intent recognition, L1-L5 difficulty classification, intelligent routing, quality assessment, circuit breaker degradation). It scores the difficulty and classifies the intent of each request, and then selects the cheapest one from the qualified model pool. For L1 simple tasks, use Qwen-Turbo ($0.20/M), for L4 deep reasoning, use GPT-4o-mini or DeepSeek-Reasoner, and for L5 complex tasks, use GPT-4o / Claude Sonnet - fully automatic and well-founded.
What is the difference from OpenRouter / general aggregation gateway?
Most aggregation gateways are thin proxies for "you specify the model name → we forward it." We are not - we use our self-developed scheduling engine to grade the difficulty and analyze the intent of each request, and then make decisions based on the "cheapest model that meets the quality threshold". The capability dimension is the comprehensive score of overseas benchmark (HumanEval/MMLU-Pro/MATH/MT-Bench) and Chinese benchmark (OpenCompass/SuperCLUE/CMMLU) with a weight of 60/40, and is not self-reported by the manufacturer.
How much can you save?
By our mixed cost model: 80% of traffic is daily chat served by the efficient model (approximately $0.40/1M input), 20% is hard traffic served by the flagship model (approximately $8/1M). Your flat price is $3/$12. Compared to always buying the flagship model, you can save 40~56% depending on the combination of issues.
Do I have to pay for retries and hedging paths?
No. The cost of internal retries, hedging, and cache warming is borne by us. You only pay for the input you actually receive + the final output. There is a reconciliation tool in the management backend to view the complete breakdown.
What is cache discount?
Cache hits (exact or semantic) are charged at 25% of regular price. Repeating a question is 75% cheaper the second time. We also use upstream prompt caching (OpenAI / Anthropic / DeepSeek) internally, and the money saved has been reflected in the flat price.
Can I use OpenAI SDK?
Can. Our API is fully compatible with OpenAI. Point the base_url of the SDK to our gateway, model=nexevo/balanced, and start using it. Function calling, streaming, and visual input are all available with zero changes.
Will my data be used for training?
No. We forward requests according to the "not for training" terms of the upstream manufacturer. Your prompts and responses are not retained for model improvement.
Which models do you route to?
OpenAI, Anthropic, Google, DeepSeek, Mistral, xAI, Tongyi, Moonshot, Zhipu, Cohere, Together, Fireworks, Groq, Cerebras, Perplexity, SiliconFlow — 60+ models in total. We continue to integrate new models in the backend, and your code will have zero changes.