Skip to content

arun477/firescroll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

261 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FireScroll

Brain rot, but educational.
The scroll that actually teaches you something.

YouTube Playlist MIT License


FireScroll

Demo

FireScroll Overview
Overview
Voice Agent Demo
Voice Agent
FireScroll Live Demo
Live Demo

Raw demo videos are also available in the demo/ directory.

What It Does

Say a topic β€” or type one. An ElevenLabs voice agent picks it up, triggers Firecrawl to research the web in real-time, generates citation-backed scripts, narrates them across 31 languages, and renders vertical short-form videos into a scrollable feed. One voice command in, scrollable feed out. Runs locally with Docker.

How It Works

Voice Command or Text Input
  β”‚
  β”œβ”€β”€  Voice Agent       ElevenLabs Conversational AI receives the topic
  β”‚
  β”œβ”€β”€  Research          Firecrawl searches, scrapes, and extracts from the web
  β”‚
  β”œβ”€β”€  Script            GPT-4o synthesizes sources into segmented scripts with citations
  β”‚
  β”œβ”€β”€  Voice & Audio     ElevenLabs narrates in any language, AI music mixed in
  β”‚
  β”œβ”€β”€  Render            AI backgrounds, captions, transitions β€” composed into vertical video
  β”‚
  β–Ό
Scrollable Feed

Features

Voice β†’ Research β†’ Generate β†’ Scroll.

Voice Agent (ElevenLabs Conversational AI)

The entry point. Click the mic on the dashboard and speak a topic. The ElevenLabs voice agent listens, confirms your intent, then calls Firecrawl Search as a real-time tool to research the web. Once research completes, it automatically generates scripts, narrates them, and kicks off video rendering β€” all from a single voice command. The agent speaks back to you throughout, confirming what it found and what it's building. No forms, no clicks, no configuration needed.

Research

Four research strategies:

Mode Process
Simple Search 5 results, extract markdown, synthesize into script
Deep Search β†’ GPT-4o ranks URLs β†’ Scrape β†’ Structured extraction β†’ Synthesize
Agent Firecrawl's AI agent researches the topic autonomously
Manual Write or paste your own script

Sources are stored with URL, title, and content. Citations link back to segments.

Voice & Audio

50+ voices across 31 languages. Five presets:

Preset Stability Speed Style
Natural 0.5 1.0x Neutral
Dramatic 0.3 0.9x High
Energetic 0.4 1.15x Medium
Calm 0.8 0.9x Subtle
Storyteller 0.6 0.95x Warm

Each parameter is adjustable per segment. Non-English scripts are translated before narration. Background music is AI-generated or selected from a built-in library, mixed at 15% volume with fade-in/out.

Video Rendering

Three visual modes:

  • Full β€” 4 AI-generated backgrounds (DALL-E 3, 1024Γ—1536) with Ken Burns zoom and crossfade transitions
  • Video β€” uploaded video background with text overlay
  • Split β€” video top half, AI background bottom half

Two caption styles:

  • Default β€” text overlay with fade-in
  • Karaoke β€” word-by-word highlighting synced via Whisper word-level timestamps

Scrollable Feed

Videos appear in a vertical scroll feed with auto-play, progress tracking, and auto-advance. Infinite scroll loads more as you go.

Media Library

Upload MP4, MOV, AVI, or WebM as backgrounds. Audio stripped on upload. Available across all projects.

Getting Started

Prerequisites

  • Docker and Docker Compose
  • All three API keys are required:
Key Purpose
OpenAI Script generation, image backgrounds, transcription
ElevenLabs Voice agent, narration, multilingual TTS, music generation
Firecrawl Web research and source extraction

Run

git clone https://github.com/arun477/firescroll.git
cd firescroll
docker compose up --build

Open localhost:3500 and add your API keys in Settings first.

Voice Agent Setup

The voice agent uses ElevenLabs Conversational AI. To connect it:

  1. Create an agent on the ElevenLabs dashboard or via API
  2. Add a server tool (webhook) pointing to your backend's /api/elevenlabs/create-topic endpoint β€” this lets the agent create topics and trigger Firecrawl research by voice
  3. Optionally add a second tool pointing to /api/elevenlabs/check-status so the agent can report live research progress back to the user
  4. Set the agent ID in frontend/src/components/VoiceAgent.jsx

The backend must be reachable from ElevenLabs servers for the webhook to work. In production, expose the backend behind a reverse proxy with HTTPS. The agent ID is safe to include in frontend code β€” it's a public identifier, not a secret.

Usage

  1. Speak or type a topic β€” click the mic button on the dashboard and say it, or create one manually.
  2. Research runs automatically β€” the voice agent triggers Firecrawl to scrape the web. Or pick a research mode manually (simple, deep, agent).
  3. Configure β€” voice, visual mode, captions, music β€” all per segment. Or let defaults handle it.
  4. Generate β€” batch generation with progress tracking across 7 phases. Voice-triggered topics auto-generate.
  5. Scroll β€” swipe through your feed.

Supported Languages

English, Spanish, French, German, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Korean, Chinese, Hindi, Arabic, Turkish, Swedish, Danish, Finnish, Indonesian, Thai, Vietnamese, Ukrainian, Czech, Romanian, Hungarian, Greek, Hebrew, Bengali, Tamil, Filipino.

Narration and script translation handled by ElevenLabs.

Roadmap

Infrastructure

  • PostgreSQL β€” replace SQLite for concurrent writes and connection pooling. SQLite works for single-user but breaks under multi-tenant load.
  • Pre-built Docker images β€” publish to Docker Hub. Skip the cold build, run with docker compose pull && docker compose up.
  • GPU-accelerated rendering β€” Remotion is CPU-bound. GPU support would cut render times significantly.
  • Job queue visibility β€” real-time dashboard for Celery workers with retry controls.

Product

  • Social export β€” publish directly to YouTube Shorts, TikTok, and Reels with platform-specific formatting and scheduling.
  • Shareable feeds β€” public URLs so viewers can scroll without running the app. Embeddable player for external sites.
  • Scheduled generation β€” set a topic and cadence. FireScroll researches and generates new content on autopilot.
  • Voice cloning β€” consistent narrator from a 30-second sample via ElevenLabs across an entire series.
  • Interactive transcripts β€” click any word to jump to that frame. Full-text search across all content.

Scale

  • Multi-tenant auth β€” user accounts, API key isolation, team workspaces.
  • PWA β€” installable app with offline feed caching and push notifications for completed generations.
  • Analytics β€” per-video watch time, completion rate, drop-off points, and engagement heatmaps.
  • Content graph β€” link topics into learning paths. Auto-suggested via embeddings across generated content.

Built With

Built for the Firecrawl x ElevenLabs Hackathon.

  • Firecrawl β€” Web research and data extraction. Search API provides real-time web knowledge in a single call. Six modes from search to structured extraction.
  • ElevenLabs β€” Conversational AI voice agent as the primary input interface. Also powers all narration β€” 50+ voices across 31 languages with tunable presets, plus AI music generation.
  • Remotion β€” Programmatic video rendering in React.
  • OpenAI β€” Script generation, image backgrounds, transcription.

Contributing

Contributions welcome. Open an issue to discuss changes first.

License

MIT β€” see LICENSE.

About

πŸ”₯ Type any topic. AI researches, writes, and renders addictive short-form videos with ultra-realistic voices and custom music. The scroll that actually teaches you something.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors