AI Runtime—not just another router
One call = Routing + Caching + Cross-model Memory + Anti-hallucination + On-demand Agent
We don’t sell you models — freeing you from vendor lock-in. At its core: MCP and desktop applications. One-line config integration with Claude Desktop or Cursor. A neutral runtime layer that model vendors, by structure, cannot provide.
OpenAI-Compatible · Main Entry Point: POST /v1/conductor/chat
Infrastructure of Each Era
Both are neutral layers situated between applications and underlying services — such a layer has yet to emerge in the AI era.
One API, Six Capabilities
Each request automatically runs through the Conductor pipeline. Customer perception = one call = complete result. Internally = 9 visible decision steps; model vendors structurally lack 3 critical capabilities.
Local semantic cache
Save 30–50% tokens
pgvector single-vector coarse filtering + Voyage Rerank-2.5 fine ranking; match only if sim ≥ 0.95. Cross-provider universal, with dual isolation per user and per options. Pre-warm cron ensures new users hit on their first visit.
Cross-model memory auto-injection
Switch models without losing context
Automatically prepends Recall + differential injection (token -60%) when switching model_id. Family-specific format adaptation: Claude XML, GPT Markdown, Gemini fenced YAML, Llama plaintext — each model reads its most comfortable syntax.
Anti-hallucination verification
Intent Whitelist Trigger
Five high-risk intent categories—legal, medical, financial, security, and code_critical—are automatically cross-validated using a low-cost judge, conserving budget on simple chats while ensuring reliability for high-sensitivity scenarios. Fabricated responses can also trigger auto-retry with a stronger model to regenerate the answer.
Agent-on-demand
On-demand multi-step, not a standalone product
When `agent=auto-if-multi-step`, the system automatically determines: simple Q&A is answered directly; complex tasks (involving planning, tool invocation, or multi-step reasoning) are automatically routed to an agent sandbox for iterative execution. `max_cost_usd` enforces a hard cutoff to prevent runaway loops.
Per-call X-Ray trace
Transparent per call
X-Ray Badge displays: pipeline decisions / cache hits / memory injections / cost estimates / latency. Turning black-box into white-box—every request can be audited down to the model, decisions, and cost details.
Sticky session + break-even
Cost Intelligence Optimization
Lock model within the same session to avoid random jitter; calculate break-even before switching models—if attaching memory costs 1.3× more than staying on the current model, automatically recommend “sticky” mode. Decisions are visible, logs are auditable, and enforcement is optional in Q2.
Conductor includes Quorum + Recall
Conductor’s `verify=auto` uses Quorum’s dual-AI cross-verification mechanism to combat hallucination. It can also operate independently as an MCP widget, ideal for human in-depth review of critical decisions.
The memory automatically injected by Conductor when switching models is encapsulated in Recall. Recall can also operate independently as an MCP widget, enabling search and retrospective analysis for any AI conversation.
Wherever you use AI, Conductor is there.
Not just another web playground. MCP provides a single tool (`conductor_ask`) that enables native integration of Claude Desktop, Cursor, or any MCP client—seamlessly embedding into their chat or editor experience.
Claude Desktop
1 tool: conductor_ask
- Configuring the Nexevo MCP server → Claude automatically gains the conductor_ask tool
- mode parameter: chat / save_memory / search_memory — internal automatic routing
- Claude invocation works identically to its native tools—seamless switching.
Cursor / VSCode
Automatic Code Context Injection
- MCP server listens for IDE selection / current file
- conductor_ask mode=chat automatically includes code context
- Automated code review / refactoring suggestions / bug detection integrated with Conductor
Nexevo Desktop App
Local AI Workspace
- One-click switch between Claude / GPT-4o / Gemini; memory automatically carries over
- X-Ray real-time display: cache hits + tokens saved + dollars saved
- Recall Capsule: Desktop-level management + cross-conversation reuse
We ensure you’re not locked in to any single provider.
Model vendors have a natural incentive to lock you into their own ecosystem—APIs, cache, memory, and agents are all under their control. The reality at the application layer is: Claude leads today, GPT-5 launches tomorrow, and Gemini surges ahead the day after. Nexevo acts on your behalf—cross-vendor, swappable, memory preserved—so model selection returns to “judging by performance,” not “weighing lock-in costs.”
Compare with Similar Products
Routing + Caching + Memory + Anti-Hallucination + Agent + X-Ray = One Product.
| Capability | Conductor | OpenRouter | Portkey | Letta |
|---|---|---|---|---|
| OpenAI-Compatible API | ✓ | ✓ | ✓ | — |
| Multi-model routing | ✓ | ✓ | ✓ | — |
| Local semantic cache (cross-provider) | ✓ | — | — | — |
| Cross-model memory + format adaptation | ✓ Exclusive | — | — | Partial |
| Pre-warm cluster first-visit hit | ✓ Exclusive | — | — | — |
| Anti-hallucination verify (high-risk intent) | ✓ | — | guardrails | — |
| Per-call X-Ray trace UI | ✓ Exclusive | — | Partial | — |
| MCP Native + Desktop Application | ✓ | — | — | Partial |
| Neutral / Not vendor-locked | ✓ | ✓ | ✓ | — |
Compared against publicly available information from May 2026. Specific capabilities may evolve; refer to each vendor’s documentation for details.
Get Started with Conductor
Get started in 5 minutes. OpenAI SDK compatible = change just one line: base_url. Or connect MCP to Claude Desktop / Cursor.
Free quota for new users · OpenAI-compatible SDK · Documentation in Chinese and English