Offline-first RAG system. Your documents, your models, your machine.
LocalRAG ingests your local documents, stores embeddings in a local ChromaDB database, and answers questions using Ollama models. No cloud services required.
- Installing Ollama — install the Ollama app/CLI first (official ollama.com/download); required for local (non-Docker) use.
-
Install Ollama on your system. See docs/ollama.md (official guide: ollama.com, downloads: ollama.com/download).
-
Install dependencies:
uv sync- Start Ollama:
ollama serve- Pull models (or let
localrag setupdo it):
ollama pull nomic-embed-text
ollama pull llama3.2- Ingest docs and ask a question:
uv run localrag ingest ./docs
uv run localrag query "What are the key topics in these documents?"Run the API:
uv run uvicorn localrag.api.main:app --reloadThen open:
http://127.0.0.1:8000/docsfor interactive docsGET /healthPOST /ingestPOST /ingest/directoryPOST /query(SSE streaming)
Copy .env.example to .env and tweak values:
cp .env.example .envMain keys:
OLLAMA_BASE_URLOLLAMA_EMBED_MODELOLLAMA_LLM_MODELCHROMA_PERSIST_PATHCHROMA_COLLECTION_NAMECHUNK_CHARSCHUNK_OVERLAP_CHARSINGEST_RECURSIVERAG_TOP_K
With Compose, Ollama runs in a container—you can skip a host Ollama install for that workflow. For background on Ollama itself, still see docs/ollama.md.
docker compose up --buildAfter startup, pull models in the Ollama container:
docker exec -it <ollama_container_name> ollama pull nomic-embed-text
docker exec -it <ollama_container_name> ollama pull llama3.2