The AI agent that respects your resources.
Single binary. Minimal hardware. Maximum context efficiency.
Zeph is a Rust AI agent built around one principle: every token in the context window must earn its place. Skills are retrieved semantically, tool output is filtered before injection, and the context compacts automatically under pressure — keeping costs low and responses fast on hardware you already own.
curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh
zeph init # interactive setup wizard
zeph migrate-config --config config.toml --diff # check for new options after upgrade
zeph # start the agentTip
cargo install zeph also works. Pre-built binaries and Docker images are on the releases page.
| Feature | Description |
|---|---|
| Hybrid inference | Ollama, Claude, OpenAI, any OpenAI-compatible API, or fully local via Candle (GGUF). Multi-model orchestrator with fallback chains, EMA latency routing, and adaptive Thompson Sampling for exploration/exploitation-balanced model selection. → Providers |
| Skills-first architecture | YAML+Markdown skill files with BM25+cosine hybrid retrieval. Bayesian re-ranking, 4-tier trust model, and self-learning evolution — skills improve from real usage. Agent-as-a-Judge feedback detection with adaptive regex/LLM hybrid analysis. The load_skill tool lets the LLM fetch the full body of any skill outside the active TOP-N set on demand. → Skills · → Self-learning |
| Context engineering | Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. --debug-dump [PATH] writes every LLM request, response, and raw tool output to numbered files for context debugging. → Context · → Debug Dump |
| Semantic memory | SQLite + Qdrant with MMR re-ranking, temporal decay, query-aware memory routing (keyword/semantic/hybrid), cross-session recall, implicit correction detection, and credential scrubbing. Optional graph memory adds entity-relationship tracking with FTS5-accelerated entity search, BFS traversal for multi-hop reasoning, temporal fact tracking, and embedding-based entity resolution for semantic deduplication. Background LLM extraction runs fire-and-forget on each turn; graph facts are injected into the context window alongside semantic recall. → Memory · → Graph Memory |
| IDE integration (ACP) | Stdio, HTTP+SSE, or WebSocket transport. Multi-session isolation with per-session conversation history and SQLite persistence. Session modes, live tool streaming, LSP diagnostics injection, file following, usage reporting. Works in Zed, Helix, VS Code. → ACP |
| Multi-channel I/O | CLI, Telegram, TUI dashboard — all with streaming. Voice and vision input supported. → Channels |
| MCP & A2A | MCP client with full tool exposure to the model. Configure mcpls as an MCP server for compiler-level code intelligence: hover, definition, references, diagnostics, call hierarchy, and safe rename via rust-analyzer, pyright, gopls, and 30+ other LSP servers. A2A agent-to-agent protocol for multi-agent orchestration. → MCP · → LSP · → A2A |
| LSP context injection | Automatically injects LSP-derived context into the agent after tool calls — no explicit tool invocation needed. Three hooks: diagnostics after write_file (compiler errors surfaced as the next turn's context), hover info after read_file (pre-fetched for key symbols via tree-sitter multi-language pre-filter: Rust, Python, JS, TS, Go), and reference listing before rename_symbol (shows all call sites). Operates through the existing mcpls MCP server with graceful degradation when mcpls is unavailable. Token budget enforced per turn. Enabled by the lsp-context feature flag (included in full). → LSP Context Injection |
| Sub-agents | Spawn isolated agents with scoped tools, skills, and zero-trust secret delegation — defined as Markdown files. 4-level resolution priority (CLI > project > user > config), permission_mode (default/accept_edits/dont_ask/bypass_permissions/plan), fine-grained tools.except denylists, background fire-and-forget execution, max_turns limits, persistent memory scopes (user/project/local) with MEMORY.md injection, persistent JSONL transcript storage with /agent resume for continuing completed sessions, and lifecycle hooks (SubagentStart/SubagentStop at config level, PreToolUse/PostToolUse per agent with pipe-separated matchers). Manage definitions with `zeph agents list |
| Instruction files | Drop zeph.md (or CLAUDE.md / AGENTS.md) in your project root. Zeph auto-detects and injects them into every system prompt — project rules, conventions, and domain knowledge applied automatically. Changes are picked up live via filesystem watching (500 ms debounce) — no restart required. → Instruction Files |
| Defense-in-depth | Shell sandbox, SSRF protection, skill trust quarantine, secret zeroization, audit logging, unsafe_code = "deny" workspace-wide. Untrusted content isolation: all tool results, web scrape output, MCP responses, A2A messages, and memory retrieval pass through a ContentSanitizer pipeline that truncates, strips control characters, detects 17 injection patterns, and wraps content in spotlighting XML delimiters before it enters the LLM context. Optional quarantined summarizer (Dual LLM pattern) routes high-risk sources through an isolated, tool-less LLM extraction call for defense-in-depth against indirect prompt injection. Exfiltration guards block markdown image pixel-tracking, validate tool call URLs against flagged untrusted sources, and suppress memory writes for injection-flagged content. TUI security panel with real-time event feed, SEC status bar indicator, and security:events command palette entry. → Security · → Untrusted Content Isolation |
| Document RAG | zeph ingest <path> indexes .txt, .md, .pdf into Qdrant. Relevant chunks surface automatically on each turn. → Document loaders |
| Code indexing & repo map | Always-on AST-based code indexing via tree-sitter ts-query. Extracts structured symbols (name, kind, visibility, line) from Rust, Python, JavaScript, TypeScript, and Go. Generates concise repo maps injected into every LLM request regardless of provider or Qdrant availability. Semantic code search over indexed symbols and chunks. Incremental re-indexing via filesystem watcher. |
| Task orchestration | DAG-based task graphs with dependency tracking, parallel execution, configurable failure strategies (abort/retry/skip/ask), timeout enforcement, and SQLite persistence. LLM-powered goal decomposition via Planner trait with structured output. Tick-based DagScheduler execution engine with command pattern, AgentRouter trait for task-to-agent routing, cross-task context injection with ContentSanitizer integration, and stale event guards. LLM-backed result aggregation (Aggregator trait, LlmAggregator) synthesizes completed task outputs into a single coherent response with per-task token budgeting and raw-concatenation fallback. /plan CLI commands: /plan <goal> (decompose + confirm + execute), /plan status, /plan list, /plan cancel, /plan confirm, /plan resume [id] (resume a paused Ask-strategy graph), /plan retry [id] (re-run failed tasks). TUI Plan View side panel (press p) shows live task status with per-row spinners and status colors; five plan:* command palette entries available. OrchestrationMetrics tracked in MetricsSnapshot. → Orchestration |
| Benchmark & evaluation | TOML benchmark datasets with LLM-as-Judge scoring. Run any subject model against a BenchmarkSet, score responses in parallel via a separate judge model with structured JSON output (JudgeOutput schema), and collect aggregate EvalReport metrics (mean score, p50/p95 latency, per-case token tracking). Budget enforcement via atomic token counter, semaphore-based concurrency, and XML boundary tags for subject response isolation. Enabled by the experiments feature flag (included in full). |
| Daemon & scheduler | HTTP webhook gateway with bearer auth. Cron-based periodic tasks and one-shot deferred tasks with SQLite persistence — add, update, or cancel tasks at runtime via natural language using the built-in scheduler skill. Experiment sessions can run on a cron schedule via TaskKind::Experiment, combining scheduler and experiments feature flags. Background mode. → Daemon |
| Self-experimentation | Autonomous LLM config experimentation engine (inspired by autoresearch). Parameter variation engine with pluggable strategies (grid sweep, random sampling, neighborhood search) explores temperature, top-p, top-k, frequency/presence penalty one parameter at a time, evaluates each variant via LLM-as-judge scoring, and keeps improvements that pass a configurable threshold. SearchSpace defines per-parameter ranges and step sizes; ConfigSnapshot captures the full LLM config for reproducible rollback. Scheduled runs via cron with ExperimentSchedule config. CLI flags: --experiment-run (run a single experiment session and exit), --experiment-report (print results summary and exit). TUI /experiment commands: start, stop, status, report, best. Interactive setup via zeph init wizard. Enabled by the experiments feature flag (opt-in, included in full). |
| Config migration | zeph migrate-config [--config PATH] [--in-place] [--diff] — upgrades an existing config file after a version bump. Missing sections are appended as commented-out blocks with documentation; existing values are never touched. Output is idempotent and can be previewed with --diff before applying. → Migrate Config |
| Single binary | ~15 MB, no runtime dependencies, ~50 ms startup, ~20 MB idle memory. |
┌─ Skills (3/12) ────────────────────┐┌─ MCP Tools ─────────────────────────┐
│ web-search [████████░░] 82% (117)││ - filesystem/read_file │
│ git-commit [███████░░░] 73% (42)││ - filesystem/write_file │
│ code-review [████░░░░░░] 41% (8)││ - github/create_pr │
└────────────────────────────────────┘└─────────────────────────────────────┘
Full documentation — installation, configuration, guides, and architecture reference — at bug-ops.github.io/zeph.
See CONTRIBUTING.md. Found a vulnerability? Use GitHub Security Advisories.
