Question 1

Why don't you let me choose a model?

Accepted Answer

Because selecting a model is a chore, and it is difficult to select accurately manually. Our self-developed scheduling engine has five major algorithm modules (intent recognition, L1-L5 difficulty classification, intelligent routing, quality assessment, circuit breaker degradation). It scores the difficulty and classifies the intent of each request, and then selects the cheapest one from the qualified model pool. For L1 simple tasks, use Qwen-Turbo ($0.20/M), for L4 deep reasoning, use GPT-4o-mini or DeepSeek-Reasoner, and for L5 complex tasks, use GPT-4o / Claude Sonnet - fully automatic and well-founded.

Question 2

What is the difference from OpenRouter / general aggregation gateway?

Accepted Answer

Most aggregation gateways are thin proxies for "you specify the model name → we forward it." We are not - we use our self-developed scheduling engine to grade the difficulty and analyze the intent of each request, and then make decisions based on the "cheapest model that meets the quality threshold". The capability dimension is the comprehensive score of overseas benchmark (HumanEval/MMLU-Pro/MATH/MT-Bench) and Chinese benchmark (OpenCompass/SuperCLUE/CMMLU) with a weight of 60/40, and is not self-reported by the manufacturer.

Question 3

How much can you save?

Accepted Answer

By our mixed cost model: 80% of traffic is daily chat served by the efficient model (approximately $0.40/1M input), 20% is hard traffic served by the flagship model (approximately $8/1M). Your flat price is $3/$12. Compared to always buying the flagship model, you can save 40~56% depending on the combination of issues.

Question 4

Do I have to pay for retries and hedging paths?

Accepted Answer

No. The cost of internal retries, hedging, and cache warming is borne by us. You only pay for the input you actually receive + the final output. There is a reconciliation tool in the management backend to view the complete breakdown.

Question 5

What is cache discount?

Accepted Answer

Cache hits (exact or semantic) are charged at 25% of regular price. Repeating a question is 75% cheaper the second time. We also use upstream prompt caching (OpenAI / Anthropic / DeepSeek) internally, and the money saved has been reflected in the flat price.

Question 6

Can I use OpenAI SDK?

Accepted Answer

Can. Our API is fully compatible with OpenAI. Point the base_url of the SDK to our gateway, model=nexevo/balanced, and start using it. Function calling, streaming, and visual input are all available with zero changes.

Question 7

Will my data be used for training?

Accepted Answer

No. We forward requests according to the "not for training" terms of the upstream manufacturer. Your prompts and responses are not retained for model improvement.

Question 8

Which models do you route to?

Accepted Answer

OpenAI, Anthropic, Google, DeepSeek, Mistral, xAI, Tongyi, Moonshot, Zhipu, Cohere, Together, Fireworks, Groq, Cerebras, Perplexity, SiliconFlow — 60+ models in total. We continue to integrate new models in the backend, and your code will have zero changes.

FAQ