Nexevo.aiNexevo.ai
Nexevo Conductor / Smart Hub

AI Runtime—not just another router

One call = Routing + Caching + Cross-model Memory + Anti-hallucination + On-demand Agent

We don’t sell you models — freeing you from vendor lock-in. At its core: MCP and desktop applications. One-line config integration with Claude Desktop or Cursor. A neutral runtime layer that model vendors, by structure, cannot provide.

OpenAI-Compatible · Main Entry Point: POST /v1/conductor/chat

Tokens saved
511,776,365
USD value saved
$1,275.92
Hallucinations caught
11,358

Infrastructure of Each Era

Both are neutral layers situated between applications and underlying services — such a layer has yet to emerge in the AI era.

Web
Cloudflare
Observability
Datadog
Frontend
Vercel
AI Runtime
Nexevo

One API, Six Capabilities

Each request automatically runs through the Conductor pipeline. Customer perception = one call = complete result. Internally = 9 visible decision steps; model vendors structurally lack 3 critical capabilities.

Local semantic cache

Save 30–50% tokens

pgvector single-vector coarse filtering + Voyage Rerank-2.5 fine ranking; match only if sim ≥ 0.95. Cross-provider universal, with dual isolation per user and per options. Pre-warm cron ensures new users hit on their first visit.

Cross-model memory auto-injection

Switch models without losing context

Automatically prepends Recall + differential injection (token -60%) when switching model_id. Family-specific format adaptation: Claude XML, GPT Markdown, Gemini fenced YAML, Llama plaintext — each model reads its most comfortable syntax.

Anti-hallucination verification

Intent Whitelist Trigger

Five high-risk intent categories—legal, medical, financial, security, and code_critical—are automatically cross-validated using a low-cost judge, conserving budget on simple chats while ensuring reliability for high-sensitivity scenarios. Fabricated responses can also trigger auto-retry with a stronger model to regenerate the answer.

Agent-on-demand

On-demand multi-step, not a standalone product

When `agent=auto-if-multi-step`, the system automatically determines: simple Q&A is answered directly; complex tasks (involving planning, tool invocation, or multi-step reasoning) are automatically routed to an agent sandbox for iterative execution. `max_cost_usd` enforces a hard cutoff to prevent runaway loops.

Per-call X-Ray trace

Transparent per call

X-Ray Badge displays: pipeline decisions / cache hits / memory injections / cost estimates / latency. Turning black-box into white-box—every request can be audited down to the model, decisions, and cost details.

Sticky session + break-even

Cost Intelligence Optimization

Lock model within the same session to avoid random jitter; calculate break-even before switching models—if attaching memory costs 1.3× more than staying on the current model, automatically recommend “sticky” mode. Decisions are visible, logs are auditable, and enforcement is optional in Q2.

Built-in Components

Conductor includes Quorum + Recall

Conductor’s `verify=auto` uses Quorum’s dual-AI cross-verification mechanism to combat hallucination. It can also operate independently as an MCP widget, ideal for human in-depth review of critical decisions.

The memory automatically injected by Conductor when switching models is encapsulated in Recall. Recall can also operate independently as an MCP widget, enabling search and retrospective analysis for any AI conversation.

MCP / Desktop Application as the Core

Wherever you use AI, Conductor is there.

Not just another web playground. MCP provides a single tool (`conductor_ask`) that enables native integration of Claude Desktop, Cursor, or any MCP client—seamlessly embedding into their chat or editor experience.

Claude Desktop

1 tool: conductor_ask

  • Configuring the Nexevo MCP server → Claude automatically gains the conductor_ask tool
  • mode parameter: chat / save_memory / search_memory — internal automatic routing
  • Claude invocation works identically to its native tools—seamless switching.

Cursor / VSCode

Automatic Code Context Injection

  • MCP server listens for IDE selection / current file
  • conductor_ask mode=chat automatically includes code context
  • Automated code review / refactoring suggestions / bug detection integrated with Conductor

Nexevo Desktop App

Local AI Workspace

  • One-click switch between Claude / GPT-4o / Gemini; memory automatically carries over
  • X-Ray real-time display: cache hits + tokens saved + dollars saved
  • Recall Capsule: Desktop-level management + cross-conversation reuse
We don’t sell you models.

We ensure you’re not locked in to any single provider.

Model vendors have a natural incentive to lock you into their own ecosystem—APIs, cache, memory, and agents are all under their control. The reality at the application layer is: Claude leads today, GPT-5 launches tomorrow, and Gemini surges ahead the day after. Nexevo acts on your behalf—cross-vendor, swappable, memory preserved—so model selection returns to “judging by performance,” not “weighing lock-in costs.”

Compare with Similar Products

Routing + Caching + Memory + Anti-Hallucination + Agent + X-Ray = One Product.

CapabilityConductorOpenRouterPortkeyLetta
OpenAI-Compatible API
Multi-model routing
Local semantic cache (cross-provider)
Cross-model memory + format adaptation✓ ExclusivePartial
Pre-warm cluster first-visit hit✓ Exclusive
Anti-hallucination verify (high-risk intent)guardrails
Per-call X-Ray trace UI✓ ExclusivePartial
MCP Native + Desktop ApplicationPartial
Neutral / Not vendor-locked

Compared against publicly available information from May 2026. Specific capabilities may evolve; refer to each vendor’s documentation for details.

Get Started with Conductor

Get started in 5 minutes. OpenAI SDK compatible = change just one line: base_url. Or connect MCP to Claude Desktop / Cursor.

Free quota for new users · OpenAI-compatible SDK · Documentation in Chinese and English

Conductor · AI Runtime — One runtime for every LLM, agent on tap | Nexevo.ai