MedStudy AI

A RAG-powered study assistant that answers questions exclusively from your uploaded materials. No hallucinations, no internet knowledge - just your textbooks, notes, and documents.

Screenshots

Features

RAG-Based Study Assistance

Document-grounded answers - Responses come only from your uploaded materials
Source citations - Every answer includes [Source: filename] references
Context filtering - Select/deselect sources to focus your search
Semantic search - Uses BGE embeddings for intelligent retrieval

File Upload & Processing

Supported formats: PDF, DOCX, TXT, Markdown, CSV
URL scraping - Ingest content from any webpage (uses trafilatura + BeautifulSoup)
Large file support - Up to 100MB per file
Intelligent chunking - 1000-char chunks with 200-char overlap

Anki Flashcard Generation

AI-generated flashcards from your study materials
Cloze deletion support - {{c1::term}} format
Source tagging - Auto-tagged with source filename
Anki-ready export - Tab-separated format for direct import

BYOAPI (Bring Your Own API)

Use your own API keys for different LLM providers:

Provider	Models
Groq (default)	Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B
OpenAI	GPT-4o, GPT-4o Mini, GPT-4 Turbo
Anthropic	Claude 3.5 Sonnet, Claude 3 Haiku

API keys are stored locally in your browser and never logged server-side.

User Authentication

Local accounts - Username/email/password registration
OAuth - Google and Apple Sign-In
Email verification - Optional verification flow
JWT tokens - 7-day expiry, secure sessions
Guest mode - Try without signing up

AI Modes

Study AI (RAG) - Answers strictly from uploaded documents
General AI - Unrestricted mode using full LLM knowledge
History-only mode - Continue conversations after documents are removed

Additional Features

LaTeX math rendering - Full KaTeX support for equations
Streaming responses - Real-time token streaming via SSE
Session management - Multiple study sessions per user
Source management - Add/remove individual sources
Chat export - Copy entire conversation to clipboard
Memory indicator - Shows context window usage

Tech Stack

Backend

FastAPI - Async Python web framework
fastembed - BAAI/bge-small-en-v1.5 embeddings (130MB)
Groq/OpenAI/Anthropic - Multi-provider LLM support
SQLite - User accounts and session metadata
PyPDF2 / python-docx - Document parsing
trafilatura / BeautifulSoup - Web scraping
slowapi - Rate limiting
bcrypt / PyJWT - Authentication

Frontend

React 18 - UI framework
Vite - Build tooling
TailwindCSS - Styling
KaTeX - Math rendering via rehype-katex
react-markdown - Markdown rendering
lucide-react - Icons

Quick Start

Docker (Recommended)

# Clone the repository
git clone https://github.com/dreamlessx/Generic_RAG.git
cd Generic_RAG

# Create environment file
cat > backend/.env << EOF
GROQ_API_KEY=your_groq_api_key
JWT_SECRET=$(openssl rand -hex 32)
EOF

# Start services
docker compose up --build

Access the app at http://localhost:3000

Single-Container Deployment

For production deployments (Railway, Oracle Cloud, etc.):

# Build unified image
docker build -t medstudy-ai .

# Run
docker run -p 8000:8000 \
  -e GROQ_API_KEY=your_key \
  -e JWT_SECRET=$(openssl rand -hex 32) \
  -v medstudy_data:/app/data \
  medstudy-ai

Local Development

Backend:

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Create .env file
echo "GROQ_API_KEY=your_key" > .env
echo "JWT_SECRET=dev-secret" >> .env

# Run
uvicorn app.main:app --reload --port 8000

Frontend:

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173, proxied to backend at :8000

Environment Variables

Variable	Required	Default	Description
`GROQ_API_KEY`	Yes	-	Default LLM provider API key
`JWT_SECRET`	Yes*	`dev-secret...`	JWT signing secret (set in production!)
`PORT`	No	`8000`	Server port
`GOOGLE_CLIENT_ID`	No	-	For Google OAuth
`APPLE_CLIENT_ID`	No	`com.medstudy.app`	For Apple OAuth

API Documentation

Health Check

GET /api/

Returns {"status": "ok", "service": "MedStudy AI API"}

Authentication

POST /api/auth/register
Content-Type: application/json
{
  "username": "student",
  "email": "student@example.com",
  "password": "SecurePass123"
}

POST /api/auth/login
Content-Type: application/json
{
  "username": "student",
  "password": "SecurePass123"
}

POST /api/auth/oauth
Content-Type: application/json
{
  "provider": "google",
  "id_token": "..."
}

Sessions

POST /api/sessions
Authorization: Bearer <token>

Creates a new study session.

GET /api/sessions
Authorization: Bearer <token>

Lists user's sessions.

Document Upload

POST /api/upload
Content-Type: multipart/form-data

file: <binary>
session_id: <string>

POST /api/ingest-url
Content-Type: application/json
{
  "session_id": "abc123",
  "url": "https://example.com/article"
}

Chat

POST /api/chat
Content-Type: application/json
{
  "session_id": "abc123",
  "question": "What is the mechanism of metformin?",
  "mode": "rag",
  "history": [...],
  "source_filter": ["lecture1.pdf"],
  "model": "openai-gpt4o",
  "api_key": "sk-...",
  "provider": "openai"
}

Returns SSE stream:

data: {"token": "Metformin"}
data: {"token": " works by..."}
data: {"done": true, "sources": ["lecture1.pdf"]}

Anki Generation

POST /api/sessions/{session_id}/anki
Content-Type: application/json
{
  "num_cards": 20
}

Returns tab-separated CSV file for Anki import.

Source Management

DELETE /api/sessions/{session_id}/sources/{source_name}

Security Features

Rate Limiting

Endpoint	Limit
`/api/auth/register`	5/minute
`/api/auth/login`	10/minute
`/api/upload`	20/minute
`/api/chat`	30/minute

Login Protection

5 failed attempts triggers 5-minute lockout
Per-username tracking

Input Validation

Username: 3-30 chars, alphanumeric + underscore
Password: 8+ chars, uppercase, lowercase, number required
Question: Max 10,000 chars, HTML-escaped
URL: Must start with http:// or https://, max 2048 chars
File size: Max 100MB

Password Requirements

Minimum 8 characters
At least one uppercase letter
At least one lowercase letter
At least one number

Deployment

Railway

Connect your GitHub repository
Set environment variables:
- GROQ_API_KEY
- JWT_SECRET
Deploy using the root Dockerfile
Add persistent volume mounted at /app/data

Oracle Cloud (Free Tier)

# On your Oracle VM
docker run -d --restart=always \
  -p 80:8000 \
  -e GROQ_API_KEY=your_key \
  -e JWT_SECRET=your_secret \
  -v /home/ubuntu/medstudy:/app/data \
  medstudy-ai

Docker Compose (Development)

services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    env_file:
      - ./backend/.env
    volumes:
      - uploads:/app/uploads
      - vectorstores:/app/vectorstores

  frontend:
    build: ./frontend
    ports:
      - "3000:80"
    depends_on:
      - backend

Architecture

RAG_med/
├── backend/
│   └── app/
│       ├── main.py           # FastAPI routes
│       ├── llm.py            # Multi-provider LLM streaming
│       ├── document_processor.py  # Text extraction, chunking, embeddings
│       ├── anki.py           # Flashcard generation
│       ├── auth.py           # JWT authentication
│       ├── database.py       # SQLite user/session storage
│       └── config.py         # Environment configuration
├── frontend/
│   └── src/
│       ├── App.jsx           # Main React app
│       └── api.js            # Backend API client
├── Dockerfile                # Unified production image
└── docker-compose.yml        # Development setup

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests (if applicable)
Submit a pull request

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.oracle.yml		docker-compose.oracle.yml
docker-compose.yml		docker-compose.yml
mcp_server.py		mcp_server.py
railway.toml		railway.toml
run.sh		run.sh
test_chat_reliability.py		test_chat_reliability.py

Folders and files

Latest commit

History

Repository files navigation

MedStudy AI

Screenshots

Features

RAG-Based Study Assistance

File Upload & Processing

Anki Flashcard Generation

BYOAPI (Bring Your Own API)

User Authentication

AI Modes

Additional Features

Tech Stack

Backend

Frontend

Quick Start

Docker (Recommended)

Single-Container Deployment

Local Development

Environment Variables

API Documentation

Health Check

Authentication

Sessions

Document Upload

Chat

Anki Generation

Source Management

Security Features

Rate Limiting

Login Protection

Input Validation

Password Requirements

Deployment

Railway

Oracle Cloud (Free Tier)

Docker Compose (Development)

Architecture

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages