Skip to content

dreamlessx/medstudy_website

Repository files navigation

MedStudy AI

FastAPI React Python Docker License

A RAG-powered study assistant that answers questions exclusively from your uploaded materials. No hallucinations, no internet knowledge - just your textbooks, notes, and documents.

Screenshots

Features

RAG-Based Study Assistance

  • Document-grounded answers - Responses come only from your uploaded materials
  • Source citations - Every answer includes [Source: filename] references
  • Context filtering - Select/deselect sources to focus your search
  • Semantic search - Uses BGE embeddings for intelligent retrieval

File Upload & Processing

  • Supported formats: PDF, DOCX, TXT, Markdown, CSV
  • URL scraping - Ingest content from any webpage (uses trafilatura + BeautifulSoup)
  • Large file support - Up to 100MB per file
  • Intelligent chunking - 1000-char chunks with 200-char overlap

Anki Flashcard Generation

  • AI-generated flashcards from your study materials
  • Cloze deletion support - {{c1::term}} format
  • Source tagging - Auto-tagged with source filename
  • Anki-ready export - Tab-separated format for direct import

BYOAPI (Bring Your Own API)

Use your own API keys for different LLM providers:

Provider Models
Groq (default) Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B
OpenAI GPT-4o, GPT-4o Mini, GPT-4 Turbo
Anthropic Claude 3.5 Sonnet, Claude 3 Haiku

API keys are stored locally in your browser and never logged server-side.

User Authentication

  • Local accounts - Username/email/password registration
  • OAuth - Google and Apple Sign-In
  • Email verification - Optional verification flow
  • JWT tokens - 7-day expiry, secure sessions
  • Guest mode - Try without signing up

AI Modes

  • Study AI (RAG) - Answers strictly from uploaded documents
  • General AI - Unrestricted mode using full LLM knowledge
  • History-only mode - Continue conversations after documents are removed

Additional Features

  • LaTeX math rendering - Full KaTeX support for equations
  • Streaming responses - Real-time token streaming via SSE
  • Session management - Multiple study sessions per user
  • Source management - Add/remove individual sources
  • Chat export - Copy entire conversation to clipboard
  • Memory indicator - Shows context window usage

Tech Stack

Backend

  • FastAPI - Async Python web framework
  • fastembed - BAAI/bge-small-en-v1.5 embeddings (130MB)
  • Groq/OpenAI/Anthropic - Multi-provider LLM support
  • SQLite - User accounts and session metadata
  • PyPDF2 / python-docx - Document parsing
  • trafilatura / BeautifulSoup - Web scraping
  • slowapi - Rate limiting
  • bcrypt / PyJWT - Authentication

Frontend

  • React 18 - UI framework
  • Vite - Build tooling
  • TailwindCSS - Styling
  • KaTeX - Math rendering via rehype-katex
  • react-markdown - Markdown rendering
  • lucide-react - Icons

Quick Start

Docker (Recommended)

# Clone the repository
git clone https://github.com/dreamlessx/Generic_RAG.git
cd Generic_RAG

# Create environment file
cat > backend/.env << EOF
GROQ_API_KEY=your_groq_api_key
JWT_SECRET=$(openssl rand -hex 32)
EOF

# Start services
docker compose up --build

Access the app at http://localhost:3000

Single-Container Deployment

For production deployments (Railway, Oracle Cloud, etc.):

# Build unified image
docker build -t medstudy-ai .

# Run
docker run -p 8000:8000 \
  -e GROQ_API_KEY=your_key \
  -e JWT_SECRET=$(openssl rand -hex 32) \
  -v medstudy_data:/app/data \
  medstudy-ai

Local Development

Backend:

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Create .env file
echo "GROQ_API_KEY=your_key" > .env
echo "JWT_SECRET=dev-secret" >> .env

# Run
uvicorn app.main:app --reload --port 8000

Frontend:

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173, proxied to backend at :8000

Environment Variables

Variable Required Default Description
GROQ_API_KEY Yes - Default LLM provider API key
JWT_SECRET Yes* dev-secret... JWT signing secret (set in production!)
PORT No 8000 Server port
GOOGLE_CLIENT_ID No - For Google OAuth
APPLE_CLIENT_ID No com.medstudy.app For Apple OAuth

API Documentation

Health Check

GET /api/

Returns {"status": "ok", "service": "MedStudy AI API"}

Authentication

POST /api/auth/register
Content-Type: application/json
{
  "username": "student",
  "email": "student@example.com",
  "password": "SecurePass123"
}
POST /api/auth/login
Content-Type: application/json
{
  "username": "student",
  "password": "SecurePass123"
}
POST /api/auth/oauth
Content-Type: application/json
{
  "provider": "google",
  "id_token": "..."
}

Sessions

POST /api/sessions
Authorization: Bearer <token>

Creates a new study session.

GET /api/sessions
Authorization: Bearer <token>

Lists user's sessions.

Document Upload

POST /api/upload
Content-Type: multipart/form-data

file: <binary>
session_id: <string>
POST /api/ingest-url
Content-Type: application/json
{
  "session_id": "abc123",
  "url": "https://example.com/article"
}

Chat

POST /api/chat
Content-Type: application/json
{
  "session_id": "abc123",
  "question": "What is the mechanism of metformin?",
  "mode": "rag",
  "history": [...],
  "source_filter": ["lecture1.pdf"],
  "model": "openai-gpt4o",
  "api_key": "sk-...",
  "provider": "openai"
}

Returns SSE stream:

data: {"token": "Metformin"}
data: {"token": " works by..."}
data: {"done": true, "sources": ["lecture1.pdf"]}

Anki Generation

POST /api/sessions/{session_id}/anki
Content-Type: application/json
{
  "num_cards": 20
}

Returns tab-separated CSV file for Anki import.

Source Management

DELETE /api/sessions/{session_id}/sources/{source_name}

Security Features

Rate Limiting

Endpoint Limit
/api/auth/register 5/minute
/api/auth/login 10/minute
/api/upload 20/minute
/api/chat 30/minute

Login Protection

  • 5 failed attempts triggers 5-minute lockout
  • Per-username tracking

Input Validation

  • Username: 3-30 chars, alphanumeric + underscore
  • Password: 8+ chars, uppercase, lowercase, number required
  • Question: Max 10,000 chars, HTML-escaped
  • URL: Must start with http:// or https://, max 2048 chars
  • File size: Max 100MB

Password Requirements

  • Minimum 8 characters
  • At least one uppercase letter
  • At least one lowercase letter
  • At least one number

Deployment

Railway

  1. Connect your GitHub repository
  2. Set environment variables:
    • GROQ_API_KEY
    • JWT_SECRET
  3. Deploy using the root Dockerfile
  4. Add persistent volume mounted at /app/data

Oracle Cloud (Free Tier)

# On your Oracle VM
docker run -d --restart=always \
  -p 80:8000 \
  -e GROQ_API_KEY=your_key \
  -e JWT_SECRET=your_secret \
  -v /home/ubuntu/medstudy:/app/data \
  medstudy-ai

Docker Compose (Development)

services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    env_file:
      - ./backend/.env
    volumes:
      - uploads:/app/uploads
      - vectorstores:/app/vectorstores

  frontend:
    build: ./frontend
    ports:
      - "3000:80"
    depends_on:
      - backend

Architecture

RAG_med/
├── backend/
│   └── app/
│       ├── main.py           # FastAPI routes
│       ├── llm.py            # Multi-provider LLM streaming
│       ├── document_processor.py  # Text extraction, chunking, embeddings
│       ├── anki.py           # Flashcard generation
│       ├── auth.py           # JWT authentication
│       ├── database.py       # SQLite user/session storage
│       └── config.py         # Environment configuration
├── frontend/
│   └── src/
│       ├── App.jsx           # Main React app
│       └── api.js            # Backend API client
├── Dockerfile                # Unified production image
└── docker-compose.yml        # Development setup

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests (if applicable)
  5. Submit a pull request

License

MIT License - see LICENSE for details.

About

AI-powered medical study assistant. Upload documents, get RAG-grounded answers, generate Anki cards.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors