Tasks vs Conductor agent-on-demand
Both Conductor and Tasks involve multi-step execution, but they target different workflows. One-line distinction: Conductor is conversation-shaped(turn-by-turn, can escalate to agent); Tasks is deliverable-shaped(submit a goal, the backend autonomously runs and returns a report).
| Attribute | Conductor | Tasks |
|---|---|---|
| Call shape | POST /v1/conductor/chat · messages[] array | POST /v1/tasks · goal / deliverables / budget |
| Return timing | Usually < 60s, real-time | Minutes to hours, async + poll / webhook |
| Use case | Code review, Q&A, Quorum comparison, IDE integration | Data analysis reports, research surveys, batch jobs, ops automation |
| Execution | 9-step pipeline + optional agent multi-step loop | Planner DAG + Verifier + Auto-repair loop |
| Human in the loop | None, fully LLM-automated | L1/L2 autonomy levels can require approval gates between steps |
| Billing | Per token + max_cost_usd ceiling | Per-task budget_usd contract upfront |
Self-Healing v2 loop (Plan → Execute → Evaluate → Adjust → Loop)
The core of Tasks is the self-healing architecture — not "run and deliver", but every step has a Verifier multi-criteria score; failure triggers Auto-repair (reflect + retry). Expected success rate +20–25 pp vs single-shot runs. Four roles in the loop:
Decomposes the high-level goal into a DAG (nodes = subtasks, edges = dependencies). Can be disabled (enable_planner=false) to run as a single node.
Per-node executor — uses Conductor agent-on-demand, can invoke 22 built-in tools (rag_search / python_exec / web_search / generate_image / sql_query / spreadsheet_analyze etc.).
Multi-criteria scoring of node output (completeness / correctness / relevance / citation quality etc.). Low score → trigger Auto-repair.
Takes Verifier feedback → reflects → adjusts plan / prompt → retries (auto_repair_max_rounds, default 2).
Once all nodes pass Verifier (or are marked Partial Success) → the task returns the final deliverable + run trace.
Autonomy levels (L1 / L2 / L3)
Each task takes an autonomy_level controlling human-in-the-loop granularity:
L1 — Strict: Plan must be human-approved before execution; each major node also gates on approval. Suits first-run new task templates, high-risk workflows (finance / legal).
L2 — Semi-autonomous: Plan stage requires approval; execution runs continuously through nodes; Verifier failure pops an approval gate to continue / adjust / abort.
L3 — Full autonomy: zero human intervention; Planner + Verifier + Auto-repair loop closes itself; final deliverable returned directly. Suits stable, well-tested task templates.
Setting it:POST /v1/tasks { autonomy_level: "L2" } — defaults to L2 if omitted.
Task lifecycle
Submit → Planner builds DAG → [L1/L2 approval gate] → Executor runs node
↑ ↓
| Verifier scores
| ↓
└──── Auto-repair (on failure) Pass?
↓
All nodes done → final deliverable
↓
Return report + run traceFull lifecycle: POST /v1/tasks to create → GET /v1/tasks/{id} to poll (or webhook notify) → completion returns the deliverable field.
REST API reference
| Method | Path | Description |
|---|---|---|
POST | /v1/tasks | Submit task, returns task_id + initial status |
GET | /v1/tasks/{task_id} | Query task status / per-node progress / final deliverable |
GET | /v1/tasks/pending-approvals | List all nodes awaiting L1/L2 approval |
POST | /v1/tasks/{task_id}/approve | Approve a pending node, task resumes execution |
POST | /v1/tasks/{task_id}/reject | Reject the current node, task is marked failed and aborts |
Key fields for POST /v1/tasks request body:
| Field | Type | Required | Description |
|---|---|---|---|
goal | string | ✓ | High-level goal description. Required if template_id is not given |
template_id | string | — | Use a preset task template (instead of goal) |
slots | Record<string,Any> | — | Variable bindings when using template_id |
deliverables | string[] | — | Explicit deliverable list (else Planner infers) |
budget_usd | float | — | Total task budget ceiling; exceeded → fail |
deadline_sec | int | — | Total task timeout (seconds) |
autonomy_level | "L1" | "L2" | "L3" | — | Defaults to L2 |
enable_planner | bool | — | false → single-node run (fast but no DAG) |
auto_repair_max_rounds | int | — | Max Auto-repair rounds, default 2 |
wait | bool | — | true = sync (recommended < 60s); false = async, returns task_id |
attachments | Array<file> | — | Attachments (datasets / templates / reference docs) |
curl examples
# 1) 提交任务
curl https://api.nexevo.ai/v1/tasks \
-H "Authorization: Bearer $NEXEVO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"goal": "分析这季度的客户流失数据,给我一份带建议的报告",
"deliverables": ["流失率趋势图", "Top-5 流失原因", "可执行建议清单"],
"budget_usd": 2.00,
"deadline_sec": 600,
"autonomy_level": "L2",
"wait": false,
"attachments": []
}'
# → { "task_id": "task_abc", "status": "planning", ... }
# 2) 查状态(L2 模式下可能停在 planning_approval)
curl https://api.nexevo.ai/v1/tasks/task_abc \
-H "Authorization: Bearer $NEXEVO_API_KEY"
# 3) 批准 plan,任务继续往下跑
curl -X POST https://api.nexevo.ai/v1/tasks/task_abc/approve \
-H "Authorization: Bearer $NEXEVO_API_KEY"
# 4) 任务完成后 GET → status=succeeded + deliverable 字段
# deliverable 通常是 markdown 报告 + (可选)附件 URL
FAQ
Tasks or /v1/conductor/chat — which should I use?
Conversation shape (messages back-and-forth + immediate answer) → /v1/conductor/chat. Deliverable shape (submit goal + async report + budget / approval / multi-step verify) → /v1/tasks. Simple code review = Conductor. Data analysis report = Tasks.
L1 / L2 / L3 — which level?
First-run new templates use L1; once familiar, switch to L2; stable production flows go L3. Going L1 → L3 over time is the common production journey.
What if budget_usd is exceeded?
Auto-repair estimates incremental cost before retrying; over budget → task fails without further spend. L2 mode pops an approval gate before exceeding.
What tools does Tasks use? Can I restrict?
Executor opens 22 built-in tools by default (rag_search / python_exec / web_search / generate_image / sql_query / spreadsheet_analyze / document_extract / browser_use / computer_use / memory_read / memory_write etc.). Add a tools=[...] field in the request to restrict to a subset (recommended for security-sensitive workflows).
Can I resume a failed task?
Yes — failed nodes carry a resume_token; POST /v1/tasks/{id}/resume with the token. But for tasks failing at Plan stage or first node, resubmitting with an improved prompt is usually more reliable.
Related
- Conductor doc — conversation-shape main entry; Tasks' Executor uses it internally
- MCP doc — Claude Desktop / Cursor one-click
- Recall doc — Task conclusions can auto-save as Recall capsules