Skip to content

html2rss/html2rss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

410 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

html2rss logo

Gem Version Yard Docs Retro Badge: valid RSS CI

html2rss is a Ruby gem that generates RSS 2.0 feeds from websites by scraping HTML or JSON content with CSS selectors or auto-detection.

This gem is the core of the html2rss-web application.

Documentation

Detailed usage guides, reference docs, and the feed directory live on the project website:

πŸ’» Try in Browser

You can develop html2rss directly in your browser using GitHub Codespaces:

Open in GitHub Codespaces

The Codespace comes pre-configured with Ruby 3.4 (compatible with Ruby 4.0), all dependencies, and VS Code extensions ready to go!

🀝 Contributing

Please see the contributing guide for details on how to contribute.

πŸ—οΈ Architecture

Core Components

  1. Config - Loads and validates configuration (YAML/hash)
  2. RequestService - Fetches pages using Faraday or Browserless
  3. Selectors - Extracts content via CSS selectors with extractors/post-processors
  4. AutoSource - Auto-detects content using Schema.org, JSON state blobs, semantic HTML, and structural patterns
  5. RssBuilder - Assembles Article objects and renders RSS 2.0

Data Flow

Config -> Request -> Extraction -> Processing -> Building -> Output

Config schema workflow

The config schema is generated from the runtime dry-validation contracts and exported for client-side tooling.

  • Ruby API: Html2rss::Config.json_schema
  • CLI: html2rss schema
  • CLI options:
    • html2rss schema --write tmp/html2rss-config.schema.json
    • html2rss schema --no-pretty
  • Runtime validation API: Html2rss::Config.validate(config_hash)
  • Runtime validation CLI: html2rss validate config.yml
  • Packaged JSON file: schema/html2rss-config.schema.json

If you are an editor integration, automation script, or AI tool, prefer these stable discovery points:

  • call html2rss schema to read the current exported schema
  • read schema/html2rss-config.schema.json when working from the repository or installed gem
  • use Html2rss::Config.schema_path if you already have Ruby loaded
  • use Html2rss::Config.validate or html2rss validate config.yml when you need authoritative runtime validation of selector references

Run bundle exec rake config:schema before committing to regenerate schema/html2rss-config.schema.json and keep the checked-in JSON Schema in sync with the validators. The exported schema covers client-side validation, while runtime validation remains authoritative for dynamic cross-field checks such as selector-key references.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ’– Sponsoring

If you find html2rss useful, please consider sponsoring the project.