system overview

architecture diagram#

                            ┌──────────────────┐
                            │    Developer      │
                            │ (create / attach) │
                            └──┬─────┬──────┬──┘
                               │     │      │
             ┌─────────────────┘     │      └─────────────────┐
             ▼                       ▼                        ▼
┌────────────────────┐  ┌───────────────────────┐  ┌───────────────────────┐
│  Lark / Slack Bot   │  │   Web App (Browser)    │  │  Developer CLI (ov)   │
│ @bot develop "xxx"  │  │  live terminal xterm.js │  │  ov task / ov attach  │
└────────┬───────────┘  └───────────┬───────────┘  └───────────┬───────────┘
         │                          │                           │
         ▼                          ▼                           ▼
┌─────────────────┐  ┌──────────────────────────────────┐  ┌──────────────┐
│ Message Adapter  │─▶│       Overlord Server (:9000)     │◀─│ REST + WS    │
│ webhook → Command│  │  NestJS + SQLite + Redis          │  └──────────────┘
└─────────────────┘  │  JWT auth + RBAC + BullMQ          │
                     │                                    │
                     │  ┌──────────────────────────────┐  │
                     │  │  Task Dispatcher + Notifier   │  │
                     │  └──────────────┬───────────────┘  │
                     └────────────────┼───────────────────┘
                                      │ WebSocket control channel
                   ┌──────────────────┼──────────────────┐
             ┌─────┴─────┐     ┌─────┴─────┐     ┌─────┴─────┐
             │  Worker 1  │     │  Worker 2  │     │  Worker N  │
             │ PTY+Agent  │     │ PTY+Agent  │     │ PTY+Agent  │
             │ (Claude /  │     │ (Cursor /  │     │ (Custom)   │
             │  Codex)    │     │  Claude)   │     │            │
             └─────┬─────┘     └───────────┘     └───────────┘
                   │                 ▲
                   │                 │ Bidirectional PTY
                   ▼
             ┌──────────────┐     ┌─────────────────────────────┐
             │ Git Push MR  │────▶│ Notifier → Bot / Web notify │
             └──────────────┘     └─────────────────────────────┘

core components#

componentdescriptionruns on
message adapterreceives lark / slack webhook events, parses user commands, outputs unified Command objectsserver
overlord webbrowser-based dashboard — task management, live pty terminal, machine monitoring, admin panelserver
task dispatcherscheduling engine — manages task queue (bullmq), selects workers, tracks task lifecycle, persists to sqliteserver
worker agentexecution engine — manages git worktrees, spawns pty terminals, runs ai agents, manages cursor tunnelseach worker machine
pty managerworker sub-component — creates pseudo-terminals via node-pty, streams i/o bidirectionallyworker
pipeline runnerworker sub-component — monitors pty output, detects stage completion, injects next skill commandworker
notifiersends notifications via the source platform (lark cards, slack blocks) and in-app notificationsserver
developer cli (ov)command-line tool — task creation, pty attach, project/machine queries, notificationsdeveloper machine

monorepo packages#

packagedescription
packages/protocolshared types, enums, websocket frames, constants
apps/servernestjs backend — auth, dispatcher, scheduler, websocket gateway
apps/webreact frontend — dashboard, task management, live terminal
apps/workerworker process — agent execution, pty management, git operations
apps/clioperations cli (overlord install/start/stop/doctor/upgrade)
apps/developer-clideveloper cli (ov setup/task/attach/status/upgrade)
apps/e2eend-to-end integration tests

data flow#

  1. task creation — developer creates a task via web, cli, or bot
  2. dispatching — dispatcher selects the best available worker based on capacity, capabilities, and load
  3. workspace setup — worker creates an isolated git worktree for the task
  4. execution — pipeline runner drives the ai agent through configured stages
  5. monitoring — pty output streams in real time to web dashboard and cli
  6. completion — agent commits code, pushes branch, creates mr/pr
  7. notification — notifier informs the developer through their original channel

task state machine#

QUEUED → ASSIGNED → RUNNING ──→ COMPLETED
                       │  ↘
                       │   CANCELLED
                       ↓       ↑
                   SUSPENDED ──┤
                       │    ──→ RUNNING (reconnect)
                       │    ──→ COMPLETED
                       │    ──→ FAILED (timeout)
                       │
              FAILED ←─┘
                ↓
            QUEUED (retry)
statusdescription
QUEUEDtask created, waiting for available worker
ASSIGNEDworker selected, preparing workspace
RUNNINGpipeline executing (current stage tracked)
SUSPENDEDpipeline awaiting human confirmation for a stage gate, or worker disconnected — awaiting reconnection
COMPLETEDall stages finished, code committed
FAILEDexecution error — can be retried
CANCELLEDmanually cancelled by user

machine selection#

the dispatcher selects target machines using these criteria (in priority order):

  1. user-specified machine via --on parameter
  2. exclude offline and draining machines
  3. filter by required capabilities (e.g., claude, cursor)
  4. exclude machines above load threshold (default 85% cpu/memory)
  5. exclude machines with all slots full
  6. prefer machines that already have the project's base repository
  7. select by lowest composite load score
  8. tie-break by raw hardware capacity

security model#

  • authentication: jwt tokens (access + refresh) with totp 2fa
  • authorization: role-based access control (developer, lead, admin)
  • api tokens: scoped personal access tokens for cli and api usage
  • worker auth: one-time enrollment tokens + jwt for ongoing communication
  • audit trail: all administrative actions logged to audit_logs table