FireScroll

Brain rot, but educational.
The scroll that actually teaches you something.

Demo

Overview

Voice Agent

Live Demo

Raw demo videos are also available in the demo/ directory.

What It Does

Say a topic — or type one. An ElevenLabs voice agent picks it up, triggers Firecrawl to research the web in real-time, generates citation-backed scripts, narrates them across 31 languages, and renders vertical short-form videos into a scrollable feed. One voice command in, scrollable feed out. Runs locally with Docker.

How It Works

Voice Command or Text Input
  │
  ├──  Voice Agent       ElevenLabs Conversational AI receives the topic
  │
  ├──  Research          Firecrawl searches, scrapes, and extracts from the web
  │
  ├──  Script            GPT-4o synthesizes sources into segmented scripts with citations
  │
  ├──  Voice & Audio     ElevenLabs narrates in any language, AI music mixed in
  │
  ├──  Render            AI backgrounds, captions, transitions — composed into vertical video
  │
  ▼
Scrollable Feed

Features

Voice → Research → Generate → Scroll.

Voice Agent (ElevenLabs Conversational AI)

The entry point. Click the mic on the dashboard and speak a topic. The ElevenLabs voice agent listens, confirms your intent, then calls Firecrawl Search as a real-time tool to research the web. Once research completes, it automatically generates scripts, narrates them, and kicks off video rendering — all from a single voice command. The agent speaks back to you throughout, confirming what it found and what it's building. No forms, no clicks, no configuration needed.

Research

Four research strategies:

Mode	Process
Simple	Search 5 results, extract markdown, synthesize into script
Deep	Search → GPT-4o ranks URLs → Scrape → Structured extraction → Synthesize
Agent	Firecrawl's AI agent researches the topic autonomously
Manual	Write or paste your own script

Sources are stored with URL, title, and content. Citations link back to segments.

Voice & Audio

50+ voices across 31 languages. Five presets:

Preset	Stability	Speed	Style
Natural	0.5	1.0x	Neutral
Dramatic	0.3	0.9x	High
Energetic	0.4	1.15x	Medium
Calm	0.8	0.9x	Subtle
Storyteller	0.6	0.95x	Warm

Each parameter is adjustable per segment. Non-English scripts are translated before narration. Background music is AI-generated or selected from a built-in library, mixed at 15% volume with fade-in/out.

Video Rendering

Three visual modes:

Full — 4 AI-generated backgrounds (DALL-E 3, 1024×1536) with Ken Burns zoom and crossfade transitions
Video — uploaded video background with text overlay
Split — video top half, AI background bottom half

Two caption styles:

Default — text overlay with fade-in
Karaoke — word-by-word highlighting synced via Whisper word-level timestamps

Scrollable Feed

Videos appear in a vertical scroll feed with auto-play, progress tracking, and auto-advance. Infinite scroll loads more as you go.

Media Library

Upload MP4, MOV, AVI, or WebM as backgrounds. Audio stripped on upload. Available across all projects.

Getting Started

Prerequisites

Docker and Docker Compose
All three API keys are required:

Key	Purpose
OpenAI	Script generation, image backgrounds, transcription
ElevenLabs	Voice agent, narration, multilingual TTS, music generation
Firecrawl	Web research and source extraction

Run

git clone https://github.com/arun477/firescroll.git
cd firescroll
docker compose up --build

Open localhost:3500 and add your API keys in Settings first.

Voice Agent Setup

The voice agent uses ElevenLabs Conversational AI. To connect it:

Create an agent on the ElevenLabs dashboard or via API
Add a server tool (webhook) pointing to your backend's /api/elevenlabs/create-topic endpoint — this lets the agent create topics and trigger Firecrawl research by voice
Optionally add a second tool pointing to /api/elevenlabs/check-status so the agent can report live research progress back to the user
Set the agent ID in frontend/src/components/VoiceAgent.jsx

The backend must be reachable from ElevenLabs servers for the webhook to work. In production, expose the backend behind a reverse proxy with HTTPS. The agent ID is safe to include in frontend code — it's a public identifier, not a secret.

Usage

Speak or type a topic — click the mic button on the dashboard and say it, or create one manually.
Research runs automatically — the voice agent triggers Firecrawl to scrape the web. Or pick a research mode manually (simple, deep, agent).
Configure — voice, visual mode, captions, music — all per segment. Or let defaults handle it.
Generate — batch generation with progress tracking across 7 phases. Voice-triggered topics auto-generate.
Scroll — swipe through your feed.

Supported Languages

English, Spanish, French, German, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Korean, Chinese, Hindi, Arabic, Turkish, Swedish, Danish, Finnish, Indonesian, Thai, Vietnamese, Ukrainian, Czech, Romanian, Hungarian, Greek, Hebrew, Bengali, Tamil, Filipino.

Narration and script translation handled by ElevenLabs.

Roadmap

Infrastructure

PostgreSQL — replace SQLite for concurrent writes and connection pooling. SQLite works for single-user but breaks under multi-tenant load.
Pre-built Docker images — publish to Docker Hub. Skip the cold build, run with docker compose pull && docker compose up.
GPU-accelerated rendering — Remotion is CPU-bound. GPU support would cut render times significantly.
Job queue visibility — real-time dashboard for Celery workers with retry controls.

Product

Social export — publish directly to YouTube Shorts, TikTok, and Reels with platform-specific formatting and scheduling.
Shareable feeds — public URLs so viewers can scroll without running the app. Embeddable player for external sites.
Scheduled generation — set a topic and cadence. FireScroll researches and generates new content on autopilot.
Voice cloning — consistent narrator from a 30-second sample via ElevenLabs across an entire series.
Interactive transcripts — click any word to jump to that frame. Full-text search across all content.

Scale

Multi-tenant auth — user accounts, API key isolation, team workspaces.
PWA — installable app with offline feed caching and push notifications for completed generations.
Analytics — per-video watch time, completion rate, drop-off points, and engagement heatmaps.
Content graph — link topics into learning paths. Auto-suggested via embeddings across generated content.

Built With

Built for the Firecrawl x ElevenLabs Hackathon.

Firecrawl — Web research and data extraction. Search API provides real-time web knowledge in a single call. Six modes from search to structured extraction.
ElevenLabs — Conversational AI voice agent as the primary input interface. Also powers all narration — 50+ voices across 31 languages with tunable presets, plus AI music generation.
Remotion — Programmatic video rendering in React.
OpenAI — Script generation, image backgrounds, transcription.

Contributing

Contributions welcome. Open an issue to discuss changes first.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
agent		agent
assets		assets
backend		backend
demo		demo
demosite		demosite
frontend		frontend
remotion-studio		remotion-studio
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
docker-compose.yml		docker-compose.yml
gitpush.sh		gitpush.sh
tunnel.sh		tunnel.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FireScroll

Demo

What It Does

How It Works

Features

Voice Agent (ElevenLabs Conversational AI)

Research

Voice & Audio

Video Rendering

Scrollable Feed

Media Library

Getting Started

Prerequisites

Run

Voice Agent Setup

Usage

Supported Languages

Roadmap

Infrastructure

Product

Scale

Built With

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FireScroll

Demo

What It Does

How It Works

Features

Voice Agent (ElevenLabs Conversational AI)

Research

Voice & Audio

Video Rendering

Scrollable Feed

Media Library

Getting Started

Prerequisites

Run

Voice Agent Setup

Usage

Supported Languages

Roadmap

Infrastructure

Product

Scale

Built With

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages