A production-ready Retrieval-Augmented Generation (RAG) system built using Java 17, Spring Boot, PostgreSQL (pgvector), and Spring AI.
This project demonstrates how to build an end-to-end AI-powered semantic search and question-answering system using modern enterprise Java technologies.
Built with Spring AI, OpenAI embeddings, and pgvector for scalable semantic retrieval.
- Ingests large documents
- Splits them into semantic chunks
- Generates vector embeddings
- Stores them in PostgreSQL using pgvector
- Performs similarity search
- Uses an LLM to generate contextual answers
- Returns structured responses with source attribution
┌────────────────────┐
│ Client / UI │
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ Query Controller │
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ Query Service │
└─────────┬──────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ Embedding Service │ │ Search Service │
│ (OpenAI / SpringAI)│ │ (pgvector SQL) │
└─────────┬──────────┘ └─────────┬──────────┘
│ │
▼ ▼
┌───────────────┐ ┌──────────────────┐
│ Vector Query │ │ Knowledge Chunks │
│ (Question) │ │ (PostgreSQL) │
└───────────────┘ └──────────────────┘
│
▼
┌────────────────────┐
│ ChatClient │
│ (Spring AI) │
└─────────┬──────────┘
▼
┌────────────────────┐
│ Final Response │
│ Answer + Sources │
└────────────────────┘
Document → Chunking → Embeddings → PostgreSQL (pgvector)
-
Accept raw text/document
-
Split into chunks (with overlap)
-
Generate embeddings
-
Store:
documentIdchunkIndexcontentembedding
User Question → Embedding → Vector Search → Top K Chunks
- Convert question → embedding
- Perform similarity search using pgvector
- Retrieve top relevant chunks
Chunks → Context → LLM → Answer
- Combine retrieved chunks
- Send structured prompt to LLM
- Generate final answer
{
"answer": "...",
"sources": [
{
"docId": "...",
"index": 0
}
]
}- LLM generates only the answer
- Sources come from retrieval layer (no hallucination)
-
Splits large documents into chunks
-
Maintains:
documentIdchunkIndex
-
Supports:
- Manual OpenAI REST
- Spring AI Embeddings
-
Easily swappable
- Uses JdbcTemplate
- Executes pgvector similarity queries
- Returns ranked chunks
- Core RAG orchestration
- Builds prompt
- Calls LLM via
ChatClient - Assembles response
- Converts
float[]→ pgvector format - Stateless and thread-safe
- Java 17
- Spring Boot 3.x
- PostgreSQL + pgvector
- Spring AI (LLM integration)
- OpenAI Embeddings (
text-embedding-3-small) - JdbcTemplate (for vector queries)
- Lombok
git clone <your-repo-url>
cd semantic-knowledge-engine
flyway migration scripts for this setup are included and run on startup.
CREATE EXTENSION vector;
CREATE TABLE knowledge_chunks (
id UUID PRIMARY KEY,
document_id TEXT,
chunk_index INT,
content TEXT,
embedding VECTOR(1536)
);spring:
datasource:
url: jdbc:postgresql://localhost:5432/yourdb
username: postgres
password: password
ai:
openai:
api-key: YOUR_API_KEY
chat:
options:
model: gpt-4o-mini
temperature: 0.2POST /rag/ingest
Body:
Raw text document
POST /rag/search
Body:
"What is Spring Boot?"
POST /rag/query
Body:
"What is Spring Boot?"
Use sample:
- Architecture docs
- Kafka / Microservices notes
- Any long technical content
What is Spring Boot?
How does Kafka scale?
What is event-driven architecture?
Check:
✅ Answer is coherent ✅ Sources are returned ✅ Chunks are relevant
- ✅ End-to-end RAG pipeline
- ✅ Vector search using pgvector
- ✅ Source attribution (no hallucinated metadata)
- ✅ Pluggable embedding strategy
- ✅ JDBC-based high-performance retrieval
- ✅ Clean separation of concerns
- Neighbor chunk stitching (context continuity)
- Hybrid search (keyword + vector)
- Streaming LLM responses
- UI with source highlighting
- Multi-document filtering
- Re-ranking models
This project demonstrates:
- Real-world AI system design
- Integration of LLMs with enterprise Java
- Understanding of vector databases
- Production-grade RAG architecture
Built as a hands-on exploration of modern AI systems using Java and Spring ecosystem to bridge traditional backend engineering with AI-driven applications.
Star ⭐ the repo and share!
