html2rss is a Ruby gem that generates RSS 2.0 feeds from websites by scraping HTML or JSON content with CSS selectors or auto-detection.
This gem is the core of the html2rss-web application.
Detailed usage guides, reference docs, and the feed directory live on the project website:
- Ruby gem documentation
- Web application
- Feed directory
- Contributing guide
- GitHub Discussions
- Sponsor on GitHub
You can develop html2rss directly in your browser using GitHub Codespaces:
The Codespace comes pre-configured with Ruby 3.4 (compatible with Ruby 4.0), all dependencies, and VS Code extensions ready to go!
Please see the contributing guide for details on how to contribute.
- Config - Loads and validates configuration (YAML/hash)
- RequestService - Fetches pages using Faraday or Browserless
- Selectors - Extracts content via CSS selectors with extractors/post-processors
- AutoSource - Auto-detects content using Schema.org, JSON state blobs, semantic HTML, and structural patterns
- RssBuilder - Assembles Article objects and renders RSS 2.0
Config -> Request -> Extraction -> Processing -> Building -> Output
The config schema is generated from the runtime dry-validation contracts and exported for client-side tooling.
- Ruby API:
Html2rss::Config.json_schema - CLI:
html2rss schema - CLI options:
html2rss schema --write tmp/html2rss-config.schema.jsonhtml2rss schema --no-pretty
- Runtime validation API:
Html2rss::Config.validate(config_hash) - Runtime validation CLI:
html2rss validate config.yml - Packaged JSON file:
schema/html2rss-config.schema.json
If you are an editor integration, automation script, or AI tool, prefer these stable discovery points:
- call
html2rss schemato read the current exported schema - read
schema/html2rss-config.schema.jsonwhen working from the repository or installed gem - use
Html2rss::Config.schema_pathif you already have Ruby loaded - use
Html2rss::Config.validateorhtml2rss validate config.ymlwhen you need authoritative runtime validation of selector references
Run bundle exec rake config:schema before committing to regenerate schema/html2rss-config.schema.json and keep the checked-in JSON Schema in sync with the validators. The exported schema covers client-side validation, while runtime validation remains authoritative for dynamic cross-field checks such as selector-key references.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find html2rss useful, please consider sponsoring the project.

