system overview

architecture diagram#

                            ┌──────────────────┐
                            │    Developer      │
                            │ (create / attach) │
                            └──┬─────┬──────┬──┘
                               │     │      │
             ┌─────────────────┘     │      └─────────────────┐
             ▼                       ▼                        ▼
┌────────────────────┐  ┌───────────────────────┐  ┌───────────────────────┐
│  Lark / Slack Bot   │  │   Web App (Browser)    │  │  Developer CLI (ov)   │
│ @bot develop "xxx"  │  │  live terminal xterm.js │  │  ov task / ov attach  │
└────────┬───────────┘  └───────────┬───────────┘  └───────────┬───────────┘
         │                          │                           │
         ▼                          ▼                           ▼
┌─────────────────┐  ┌──────────────────────────────────┐  ┌──────────────┐
│ Message Adapter  │─▶│       Overlord Server (:9000)     │◀─│ REST + WS    │
│ webhook → Command│  │  NestJS + SQLite + Redis          │  └──────────────┘
└─────────────────┘  │  JWT auth + RBAC + BullMQ          │
                     │                                    │
                     │  ┌──────────────────────────────┐  │
                     │  │  Task Dispatcher + Notifier   │  │
                     │  └──────────────┬───────────────┘  │
                     └────────────────┼───────────────────┘
                                      │ WebSocket control channel
                   ┌──────────────────┼──────────────────┐
             ┌─────┴─────┐     ┌─────┴─────┐     ┌─────┴─────┐
             │  Worker 1  │     │  Worker 2  │     │  Worker N  │
             │ PTY+Agent  │     │ PTY+Agent  │     │ PTY+Agent  │
             │ (Claude /  │     │ (Cursor /  │     │ (Custom)   │
             │  Codex)    │     │  Claude)   │     │            │
             └─────┬─────┘     └───────────┘     └───────────┘
                   │                 ▲
                   │                 │ Bidirectional PTY
                   ▼
             ┌──────────────┐     ┌─────────────────────────────┐
             │ Git Push MR  │────▶│ Notifier → Bot / Web notify │
             └──────────────┘     └─────────────────────────────┘

core components#

component	description	runs on
message adapter	receives lark / slack webhook events, parses user commands, outputs unified `Command` objects	server
overlord web	browser-based dashboard — task management, live pty terminal, machine monitoring, admin panel	server
task dispatcher	scheduling engine — manages task queue (bullmq), selects workers, tracks task lifecycle, persists to sqlite	server
worker agent	execution engine — manages git worktrees, spawns pty terminals, runs ai agents, manages cursor tunnels	each worker machine
pty manager	worker sub-component — creates pseudo-terminals via node-pty, streams i/o bidirectionally	worker
pipeline runner	worker sub-component — monitors pty output, detects stage completion, injects next skill command	worker
notifier	sends notifications via the source platform (lark cards, slack blocks) and in-app notifications	server
developer cli (ov)	command-line tool — task creation, pty attach, project/machine queries, notifications	developer machine

monorepo packages#

package	description
`packages/protocol`	shared types, enums, websocket frames, constants
`apps/server`	nestjs backend — auth, dispatcher, scheduler, websocket gateway
`apps/web`	react frontend — dashboard, task management, live terminal
`apps/worker`	worker process — agent execution, pty management, git operations
`apps/cli`	operations cli (`overlord install/start/stop/doctor/upgrade`)
`apps/developer-cli`	developer cli (`ov setup/task/attach/status/upgrade`)
`apps/e2e`	end-to-end integration tests

data flow#

task creation — developer creates a task via web, cli, or bot
dispatching — dispatcher selects the best available worker based on capacity, capabilities, and load
workspace setup — worker creates an isolated git worktree for the task
execution — pipeline runner drives the ai agent through configured stages
monitoring — pty output streams in real time to web dashboard and cli
completion — agent commits code, pushes branch, creates mr/pr
notification — notifier informs the developer through their original channel

task state machine#

QUEUED → ASSIGNED → RUNNING ──→ COMPLETED
                       │  ↘
                       │   CANCELLED
                       ↓       ↑
                   SUSPENDED ──┤
                       │    ──→ RUNNING (reconnect)
                       │    ──→ COMPLETED
                       │    ──→ FAILED (timeout)
                       │
              FAILED ←─┘
                ↓
            QUEUED (retry)

status	description
`QUEUED`	task created, waiting for available worker
`ASSIGNED`	worker selected, preparing workspace
`RUNNING`	pipeline executing (current stage tracked)
`SUSPENDED`	pipeline awaiting human confirmation for a stage gate, or worker disconnected — awaiting reconnection
`COMPLETED`	all stages finished, code committed
`FAILED`	execution error — can be retried
`CANCELLED`	manually cancelled by user

machine selection#

the dispatcher selects target machines using these criteria (in priority order):

user-specified machine via --on parameter
exclude offline and draining machines
filter by required capabilities (e.g., claude, cursor)
exclude machines above load threshold (default 85% cpu/memory)
exclude machines with all slots full
prefer machines that already have the project's base repository
select by lowest composite load score
tie-break by raw hardware capacity

security model#

authentication: jwt tokens (access + refresh) with totp 2fa
authorization: role-based access control (developer, lead, admin)
api tokens: scoped personal access tokens for cli and api usage
worker auth: one-time enrollment tokens + jwt for ongoing communication
audit trail: all administrative actions logged to audit_logs table

permissions & roles development