Brain rot, but educational.
The scroll that actually teaches you something.
|
Overview |
Voice Agent |
Live Demo |
Raw demo videos are also available in the
demo/directory.
Say a topic β or type one. An ElevenLabs voice agent picks it up, triggers Firecrawl to research the web in real-time, generates citation-backed scripts, narrates them across 31 languages, and renders vertical short-form videos into a scrollable feed. One voice command in, scrollable feed out. Runs locally with Docker.
Voice Command or Text Input
β
βββ Voice Agent ElevenLabs Conversational AI receives the topic
β
βββ Research Firecrawl searches, scrapes, and extracts from the web
β
βββ Script GPT-4o synthesizes sources into segmented scripts with citations
β
βββ Voice & Audio ElevenLabs narrates in any language, AI music mixed in
β
βββ Render AI backgrounds, captions, transitions β composed into vertical video
β
βΌ
Scrollable Feed
Voice β Research β Generate β Scroll.
The entry point. Click the mic on the dashboard and speak a topic. The ElevenLabs voice agent listens, confirms your intent, then calls Firecrawl Search as a real-time tool to research the web. Once research completes, it automatically generates scripts, narrates them, and kicks off video rendering β all from a single voice command. The agent speaks back to you throughout, confirming what it found and what it's building. No forms, no clicks, no configuration needed.
Four research strategies:
| Mode | Process |
|---|---|
| Simple | Search 5 results, extract markdown, synthesize into script |
| Deep | Search β GPT-4o ranks URLs β Scrape β Structured extraction β Synthesize |
| Agent | Firecrawl's AI agent researches the topic autonomously |
| Manual | Write or paste your own script |
Sources are stored with URL, title, and content. Citations link back to segments.
50+ voices across 31 languages. Five presets:
| Preset | Stability | Speed | Style |
|---|---|---|---|
| Natural | 0.5 | 1.0x | Neutral |
| Dramatic | 0.3 | 0.9x | High |
| Energetic | 0.4 | 1.15x | Medium |
| Calm | 0.8 | 0.9x | Subtle |
| Storyteller | 0.6 | 0.95x | Warm |
Each parameter is adjustable per segment. Non-English scripts are translated before narration. Background music is AI-generated or selected from a built-in library, mixed at 15% volume with fade-in/out.
Three visual modes:
- Full β 4 AI-generated backgrounds (DALL-E 3, 1024Γ1536) with Ken Burns zoom and crossfade transitions
- Video β uploaded video background with text overlay
- Split β video top half, AI background bottom half
Two caption styles:
- Default β text overlay with fade-in
- Karaoke β word-by-word highlighting synced via Whisper word-level timestamps
Videos appear in a vertical scroll feed with auto-play, progress tracking, and auto-advance. Infinite scroll loads more as you go.
Upload MP4, MOV, AVI, or WebM as backgrounds. Audio stripped on upload. Available across all projects.
- Docker and Docker Compose
- All three API keys are required:
| Key | Purpose |
|---|---|
| OpenAI | Script generation, image backgrounds, transcription |
| ElevenLabs | Voice agent, narration, multilingual TTS, music generation |
| Firecrawl | Web research and source extraction |
git clone https://github.com/arun477/firescroll.git
cd firescroll
docker compose up --buildOpen localhost:3500 and add your API keys in Settings first.
The voice agent uses ElevenLabs Conversational AI. To connect it:
- Create an agent on the ElevenLabs dashboard or via API
- Add a server tool (webhook) pointing to your backend's
/api/elevenlabs/create-topicendpoint β this lets the agent create topics and trigger Firecrawl research by voice - Optionally add a second tool pointing to
/api/elevenlabs/check-statusso the agent can report live research progress back to the user - Set the agent ID in
frontend/src/components/VoiceAgent.jsx
The backend must be reachable from ElevenLabs servers for the webhook to work. In production, expose the backend behind a reverse proxy with HTTPS. The agent ID is safe to include in frontend code β it's a public identifier, not a secret.
- Speak or type a topic β click the mic button on the dashboard and say it, or create one manually.
- Research runs automatically β the voice agent triggers Firecrawl to scrape the web. Or pick a research mode manually (simple, deep, agent).
- Configure β voice, visual mode, captions, music β all per segment. Or let defaults handle it.
- Generate β batch generation with progress tracking across 7 phases. Voice-triggered topics auto-generate.
- Scroll β swipe through your feed.
English, Spanish, French, German, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Korean, Chinese, Hindi, Arabic, Turkish, Swedish, Danish, Finnish, Indonesian, Thai, Vietnamese, Ukrainian, Czech, Romanian, Hungarian, Greek, Hebrew, Bengali, Tamil, Filipino.
Narration and script translation handled by ElevenLabs.
- PostgreSQL β replace SQLite for concurrent writes and connection pooling. SQLite works for single-user but breaks under multi-tenant load.
- Pre-built Docker images β publish to Docker Hub. Skip the cold build, run with
docker compose pull && docker compose up. - GPU-accelerated rendering β Remotion is CPU-bound. GPU support would cut render times significantly.
- Job queue visibility β real-time dashboard for Celery workers with retry controls.
- Social export β publish directly to YouTube Shorts, TikTok, and Reels with platform-specific formatting and scheduling.
- Shareable feeds β public URLs so viewers can scroll without running the app. Embeddable player for external sites.
- Scheduled generation β set a topic and cadence. FireScroll researches and generates new content on autopilot.
- Voice cloning β consistent narrator from a 30-second sample via ElevenLabs across an entire series.
- Interactive transcripts β click any word to jump to that frame. Full-text search across all content.
- Multi-tenant auth β user accounts, API key isolation, team workspaces.
- PWA β installable app with offline feed caching and push notifications for completed generations.
- Analytics β per-video watch time, completion rate, drop-off points, and engagement heatmaps.
- Content graph β link topics into learning paths. Auto-suggested via embeddings across generated content.
Built for the Firecrawl x ElevenLabs Hackathon.
- Firecrawl β Web research and data extraction. Search API provides real-time web knowledge in a single call. Six modes from search to structured extraction.
- ElevenLabs β Conversational AI voice agent as the primary input interface. Also powers all narration β 50+ voices across 31 languages with tunable presets, plus AI music generation.
- Remotion β Programmatic video rendering in React.
- OpenAI β Script generation, image backgrounds, transcription.
Contributions welcome. Open an issue to discuss changes first.
MIT β see LICENSE.
