Architecture Overview
Deployment Perspective: What Runs Where?
Before diving into technical details, let's clarify where each component runs. Agent Network uses a Server-Client architecture -- one central Server connects to multiple distributed Agent clients.
Deployment Topology
Component Deployment Quick Reference
| Component | Runs On | Port | Purpose | npm Package |
|---|---|---|---|---|
| CommHub Server | Server (1 machine) | 9200 | Message routing, task management, auth, database | @sleep2agi/commhub-server |
| Dashboard | Local or standalone server | 3000 default | Web UI (Overview / Nodes / Tasks / Messages / Chat / Admin / Settings — see the Dashboard doc for per-page detail) | @sleep2agi/agent-network-dashboard |
| anet CLI | Each client machine | -- | Command-line management tool (full command list: CLI reference) | @sleep2agi/agent-network |
| Agent Node | Each client machine | -- | AI worker (receives tasks, calls AI, reports results) | @sleep2agi/agent-node |
| Claude Code | Client machine | -- | Interactive AI development (joins network via MCP) | Anthropic official |
| Channel Plugins | Client machine | -- | Telegram (v0.8 stable); WeChat / Feishu via external MCP plugins (see channels.md) | channel/ |
Port Reference
| Port | Component | Protocol | Description |
|---|---|---|---|
| 9200 | CommHub Server | HTTP | MCP (POST /mcp), SSE (GET /events/:alias), REST (/api/*) |
| 3000 | Dashboard | HTTP | Default port for anet hub dashboard |
Local vs Production
| Local Development | Production Deployment | |
|---|---|---|
| CommHub Server | Local localhost:9200 | Server YOUR_IP:9200 |
| Agent Node | Local, --hub localhost:9200 | Client machine, --hub YOUR_IP:9200 |
| Dashboard | localhost:3000 | YOUR_IP:3000 or standalone deploy |
| Database | Local SQLite file | Server SQLite file |
| Communication | All via localhost | Via internal network / public IP |
System Architecture
Agent Network uses a centralized message routing architecture where all agents communicate through the CommHub Server.
Four npm Packages
Agent Network ships as four npm packages with clear responsibilities:
| Package | Purpose | How to install / run |
|---|---|---|
@sleep2agi/agent-network | anet CLI -- config management, service launcher, status monitoring | npm i -g @sleep2agi/agent-network |
@sleep2agi/agent-node | Agent runtime -- AI model + tool calls + task handling | anet node create + anet node start |
@sleep2agi/commhub-server | Communication hub -- message routing + SSE push + task management | anet hub start |
@sleep2agi/agent-network-dashboard | Web Dashboard -- visual monitoring + task management (Overview / Nodes / Tasks / Messages / Chat / Admin / Settings) | anet hub dashboard (CLI auto-fetches) |
They can be used independently or composed:
- Just need CLI control: install
@sleep2agi/agent-network - Just need the agent runtime:
anet node create+anet node start - Just need the comm hub:
bunx @sleep2agi/commhub-server - Just need the Web UI:
anet hub dashboard
Full version scheme (independent semver per npm package vs the v0.10.x bundle-release anchor) is documented in Versioning.
CommHub Server
CommHub Server is the core of the entire system, responsible for message routing, state management, and task tracking.
Runs on: Server (1 machine). All client Agents connect to it.
Triple Protocol
| Protocol | Endpoint | Purpose | Auth |
|---|---|---|---|
| MCP Streamable HTTP | POST /mcp | Agent tool calls (send_task, report_status, etc.) | Bearer Token |
| SSE | GET /events/:alias | Real-time push of tasks/messages to agents | Bearer Token |
| REST | GET/POST /api/* | Dashboard / CLI / external integrations | Bearer Token |
v0.10.0 new — per-server daemon observability endpoint family (#99 Phase 1 scaffold, commhub-server@0.8.2, default path needs agent-network@2.2.1+)
Two new REST endpoints expose single-host health + per-agent list, used by the dashboard ServersDrawer and any monitoring / external observability integration:
GET /api/server/:host/health— current health snapshot for a single host (CPU / mem / disk + 24h bucketed history5m/1h/24h) plusalert_levelGET /api/server/:host/agents— agents on a single host + per-agentprocess_telemetry(rss/cpu_pct/uptime_seconds/in_flight_count, #142 shipped inagent-node@2.4.0+ server schema aligned incommhub-server@0.8.2)
Version requirement: to reach these two endpoints via the default anet hub start path you need agent-network ≥ 2.2.1 (the v0.10.1 hotfix bumped PINNED_SERVER_VERSION from 0.8.0 to 0.8.2).
v0.10.2 Hero A complement: agent-node ≥ 2.4.1 adds host disk telemetry — latest.disk_total_gb / disk_used_gb / disk_avail_gb (sampled via execFileSync('df', ['-k', '/']), sharing one POSIX path across Linux + macOS; gracefully null on Windows or parse failure) + alert_level gains disk_avail < 1GB critical / < 5GB warn triggers + the 24h history buckets carry disk_avail_min / disk_used_max extreme-aggregation fields, closing #99 per-server daemon Phase 2 host metrics, final 10%.
The control layer (kill / restart / redeploy) is deferred to v0.11.0. Details: REST API — server endpoint family.
MCP Tool Groups
CommHub provides 17 MCP Tools in two groups:
Agent-side tools (4) -- agents report status and fetch tasks:
| Tool | Description |
|---|---|
report_status | Heartbeat + status reporting (idle/working/error) |
report_completion | Task completion report + results |
get_inbox | Fetch pending messages |
ack_inbox | Acknowledge message receipt |
Hub-side tools (13) -- command center / Dashboard manages tasks:
| Tool | Description |
|---|---|
send_task | Dispatch a task (with lifecycle) |
send_message | Send a message (no processing triggered) |
send_reply | Reply to a task |
send_ack | Acknowledge task receipt |
retry_task | Retry a failed task |
cancel_task | Cancel a pending task |
reassign_task | Reassign a task to another agent |
get_task | Query task details |
list_tasks | Query task list |
get_all_status | Get all session statuses |
get_session_status | Get single session details |
broadcast | Broadcast a message to all agents |
get_completions | Query completion records |
Database Design
SQLite with WAL mode, 14 tables:
Additional tables: completions (completion records), task_events (task event log), audit_log (audit trail), licenses (licensing), network_invites (invite codes), rename_txn (RFC-010 node-rename two-phase transaction state: prepared / committed / aborted).
SSE Push Mechanism
Agents receive tasks in real time via SSE long connections, eliminating the need for polling:
Heartbeat and Timeout
- Agents send heartbeats (
report_status) every 3 minutes - Server updates
last_seen_aton every request - After 10 minutes without a heartbeat, agents are automatically marked
offline - SSE auto-reconnects on disconnect (#202: exponential backoff
1s → 30scap + re-register on every successful (re)connect + give up after 1h continuous failure — see agent-node)
Agent Node
Agent Node is the working unit in the network, responsible for receiving tasks, invoking the AI model, and reporting results.
Runs on: Client machines (can be multiple). Connects to CommHub Server over the network.
Four Runtimes
| Runtime | AI Engine | Use Case | Models |
|---|---|---|---|
claude-code-cli | spawn local claude process | Reuse Claude subscription / interactive tool use | Claude Sonnet/Opus (subscription) |
claude-agent-sdk | Anthropic Claude Agent SDK | Programmatic access to any Anthropic-compatible API | Anthropic / MiniMax / DeepSeek / GLM / Kimi / InternLM / Xiaomi MiMo / OpenRouter (see Multi-model) |
codex-sdk | OpenAI Codex SDK (v0.10.0+ can opt-in to a direct stdio path — see below) | Code generation, tool use | OpenAI Codex |
grok-build-acp | spawn local grok ACP server | xAI Grok Build ACP-protocol cross-agent collaboration | xAI Grok (grok-build series) (details on GitHub ↗) |
v0.10.0 new — codex-direct-stdio opt-in path (#141)
Set ANET_CODEX_STDIO_DIRECT=1 to make agent-node switch the codex runtime from the @openai/codex-sdk wrapper to spawn('codex', ['app-server']) + a ~155 LOC direct stdio JSON-RPC client, getting the full 67-method v2 protocol surface (thread / turn / item / realtime) and bypassing the wrapper's --mcp-config HTTP-transport bug family (#102 hang root cause). v0.10.x (including the current stable) still defaults to the wrapper; v0.11.0 plans to flip the default and rename the toggle to ANET_CODEX_LEGACY_SDK=1 opt-out. The LLM-side tool surface is unchanged (the codex thread still uses only its baked-in tools; the commhub roundtrip is still handled by the agent-node parent process) — what changes is purely the transport protocol between agent-node and the codex process. Details: runtimes — codex-sdk § codex-direct-stdio + agent-node — env vars § ANET_CODEX_STDIO_DIRECT + v0.10.0 GitHub release notes.
MCP integration paths (per runtime, v0.9.0+)
The four runtimes expose commhub tools to the LLM via different paths — this affects the tool names the LLM sees and how you debug routing problems:
claude-agent-sdk uses in-process SDK MCP (#102 Option A, agent-node 2.3.5-preview.0+):
- agent-node creates an in-process
McpServerviacreateSdkMcpServer({ name: "commhub" })and registers the 7 agent-facing tools (send_task/send_message/send_reply/get_all_status/get_session_status/get_task/list_tasks) - Each tool handler forwards the call from inside agent-node to CommHub's
POST /mcpvia the JSON-RPCinitialize → tools/callchain - The LLM sees the SDK-namespaced tool name
mcp__commhub__send_task(singlecommhubprefix) — notmcp__commhub__commhub__send_taskor other double-prefix variants - Verify
agent-node/src/commhub-mcp.tscreateCommhubSdkMcpServer()
Why doesn't claude-agent-sdk use HTTP MCP directly? Claude Agent SDK 0.2.x forwards mcpServers={commhub:{type:"http", url:.../mcp}} verbatim to the claude binary's --mcp-config, but the binary's HTTP MCP path does not issue initialize / tools/list against the endpoint — commhub never sees the binary subprocess's requests, so the tool list is empty for the LLM (#102 root cause). Option A hosts the MCP server inside agent-node's own process to bypass this SDK limitation.
claude-code-cli uses stdio + local .anet/node-server.js proxy: the anet CLI writes a .mcp.json in the project cwd that registers commhub as { "type": "stdio", "command": "bun", "args": [".anet/node-server.js"] } (agent-network/bin/cli.ts ensureMcpJson). The claude binary spawns that local bun script as a stdio MCP server, and node-server.ts forwards tool calls to CommHub's /mcp over HTTP internally (agent-network/src/node-server.ts StdioServerTransport). Tool names live in the node-server.ts namespace.
codex-sdk does not expose commhub tools to the LLM: codexOpts does not pass mcpServers (agent-node/src/cli.ts). The codex thread only sees its baked-in tools (Read / Write / Edit / Bash / Glob / Grep / WebSearch). Multi-agent dispatch happens outside the LLM in agent-node's parent process: agent-node maintains the SSE connection plus report_status / get_inbox / send_reply calls back to CommHub, feeds the task text into the codex thread, and posts the codex reply back via CommHub. The codex thread itself does not know commhub exists — it is just an LLM worker.
grok-build-acp uses explicit per-session mcpServers injection + HTTP transport (v0.10.11 preview #204):
agent-node explicitly passes an mcpServers list to the Grok ACP server on every session/new / session/load. The preview chain went through two phases:
- preview.2 (
4b5a657): Stdio variant —mcpServers: [{ name: "commhub", command: "bun", args: ["<abs-path>/.anet/node-server.js"], env: { COMMHUB_ALIAS, COMMHUB_TOKEN, COMMHUB_URL, ... } }]; Grok spawns.anet/node-server.jsas a stdio MCP subprocess. Structurally fixes the shared-.mcp.jsonidentity bug, but still subject to stdout-pollution / bun-PATH / framing risks. - preview.6 (
abefbe8): transport switched to HTTP —mcpServers: [{ type: "http", name: "commhub", url: "${COMMHUB_URL}/mcp", headers: [{ name: "Authorization", value: "Bearer ${AUTH_TOKEN}" }, ...] }]; Grok calls commhub/mcpdirectly over HTTP (Grok ACPinitreportsmcpCapabilities = {http: true, sse: true}). commhub-server/mcpalready derivesfrom_sessionfrom the bearer ntok_ (server/src/index.ts:446-448d1d867e#194 hub-side), so attribution is automatic. Bypasses the subprocess + bun PATH + framing + stdout-pollution risk surface entirely. Tool names come back from commhub/mcpJSON-RPC. - preview.7 (
72e28fd): per-node isolated cwd. Vincent's UAT still showed the wrongfrom=alias — root cause: Grok CLI also reads cwd.mcp.jsonalongside the ACPsession/newmcpServers injection, so two commhub MCP servers coexist and the stale stdio one wins the LLM's hello. Fix: ACPsession/newnow explicitly passescwd: <home>/.anet/nodes/<node-id>/grok-cwd/. That dir symlinks the top-level user files (so the LLM'sRead('./*')still works) but omits.mcp.json— Grok CLI's cwd discovery finds nothing and there's no stdio fallback. Multi-node concurrent-spawn safe by construction.
⚠ Debug tip: if the LLM can't call a commhub tool, check the runtime first — for
claude-agent-sdknodes, confirmcommhub-mcp.tsis in dist (agent-node ≥ 2.3.5-preview.0); forclaude-code-clinodes, check the.mcp.jsonhastype: stdioand the.anet/node-server.jspath is correct; forcodex-sdknodes, look at the agent-node parent process logs (the codex thread never calls commhub); forgrok-build-acpnodes (current stable,agent-node@2.4.9+, #204 per-node isolated cwd), look for[grok] commhub MCP server resolved: <abs-path>in the agent-node log plus the per-node isolated cwd under.anet/nodes/<alias>/; v0.10.10 and earlier (agent-node@2.4.8)grok-build-acpfollows the legacy shared-cwd path (susceptible to stale.mcp.jsonidentity pollution, fixed by #204) — see grok-build-runtime.md.
Task Processing Flow
Key rule: Only task type messages trigger AI processing (think). message and reply are logged but not processed, preventing infinite loops.
Isolation Strategy
Each Agent Node instance is fully isolated and does not read host machine global config — it passes settingSources: [] to claude-agent-sdk's query() (the SDK entry point is the query() function, not a new Agent({...}) class):
const options = {
model: MODEL || undefined,
settingSources: [], // Fully isolated — does not read ~/.claude/ etc.
// permissionMode / mcpServers / env ...
};
for await (const message of query({ prompt, options })) { /* ... */ }anet CLI
anet CLI is the management tool for Agent Network, covering Hub / account / network / node / monitoring / demo operations (full command list: CLI reference).
Runs on: Each client machine. Points to CommHub Server via --hub parameter or config file.
Configuration Priority
Configuration Files
Global config ~/.anet/config.json:
{
"hub": "http://YOUR_IP:9200",
"token": "utok_xxxxx"
}Project node config {cwd}/.anet/nodes/<alias>/config.json (v0.8 per-node subdirectory schema; the old .anet/config.json {alias, type} 2-field format was the early V2 layout — see Agent Node for the full field list):
{
"anet_version": "0.1.0",
"node_id": "n_a1b2c3d4",
"node_name": "commander",
"alias": "commander",
"runtime": "claude-code-cli",
"network_id": "net_a1b2c3d4",
"channels": ["server:commhub"],
"env": {},
"flags": { "dangerouslySkipPermissions": true, "teammateMode": "in-process" },
"session": "550e8400-e29b-41d4-a716-446655440000"
}Dashboard
Dashboard is a separate Web process that talks to CommHub over REST:
| Type | Tech Stack | Runs On | Port | Features |
|---|---|---|---|---|
| Dashboard | Next.js 16 | Local, Vercel, or standalone server | 3000 default | Chat, Nodes, Tasks, Messages, Networks, Logs, Admin |
Channel Plugins
Channel plugins enable agents to integrate with external communication platforms.
- Telegram -- via Bot API (v0.8 stable,
anet channel add telegram) - WeChat / Feishu -- via external MCP plugins (not part of
@sleep2agi/commhub-server); see Channel plugin docs
Runs on: Client machines, mounted as MCP Servers on Claude Code.
Channel message format:
<channel source="telegram" chat_id="123" user="alice">
User's message
</channel>Code Structure
agent-network/ # repo root (github.com/sleep2agi/agent-network) — monorepo
├── server/ # CommHub Server (Bun + SQLite) → runs on Server
│ └── src/
│ ├── index.ts # HTTP routing + MCP + SSE
│ ├── tools.ts # 17 MCP Tools
│ ├── auth.ts # Auth + permissions + network management
│ ├── db.ts # Database + table definitions
│ ├── db-adapter.ts # DB adapter layer (SQLite + abstract interface)
│ ├── push.ts # SSE push management
│ └── password-dict.ts # Weak password dictionary (v0.8 admin bootstrap)
├── agent-network/ # anet CLI + CommHub SDK → runs on Client
│ ├── bin/cli.ts # CLI entry (full command list: [CLI docs](/en/guide/cli))
│ └── src/
│ ├── index.ts # default export
│ ├── client.ts # CommHub SDK client
│ ├── server.ts # Server programmatic entry
│ └── node-server.ts # Agent Node long-running server entry
├── agent-node/ # Agent runtime → runs on Client
│ └── src/cli.ts # Three engines + task processing
├── channel/ # Claude Code Channel plugins → runs on Client
│ └── commhub-channel.ts
├── demos/ # Demo orchestrations
│ └── codex-telegram-squad/
└── docs/ # Design docsSecurity Architecture
See Security Design for details. Key security measures:
- Dual token authentication: utok_ (user-level) + ntok_ (network-level)
- Network isolation: Server-side enforced network_id, clients cannot cross networks
- RBAC with four permission levels: owner / admin / member / viewer
- SQL injection protection: All queries are parameterized
- Rate limiting: Registration 30/min, login 10/min per IP
- Audit logging: All operations recorded
- v0.8 RFC-001 Phase 2:
COMMHUB_AUTH_TOKENmaster token soft-deprecated (only/api/*read + deprecation warning); firstanet hub startauto-bootstraps admin utok_ (~/.anet/server/admin-utok.jsonchmod 600) with default accountadmin / anethub; password strength ≥ 8 + weak-password dictionary;anet passwd/anet hub admin reset-usertools;anet doctor --fixprobes and reissues expiredntok_. See RFC-001.