Skip to content

Data-driven Clusters v4.1 page (11 clusters from Google Sheet)#720

Merged
richarddushime merged 13 commits intomasterfrom
data-driven-clusters-v4
Mar 25, 2026
Merged

Data-driven Clusters v4.1 page (11 clusters from Google Sheet)#720
richarddushime merged 13 commits intomasterfrom
data-driven-clusters-v4

Conversation

@LukasWallrich
Copy link
Contributor

@LukasWallrich LukasWallrich commented Mar 22, 2026

Summary

Replaces the 7 hardcoded cluster pages (v3) with a fully data-driven approach powered by the FORRT Clusters v4.1 Google Doc and a structured Google Sheet.

What changed

  • 11 clusters (was 7), 93 sub-clusters, ~1300 publications with DOI-resolved APA references
  • New parsing script (scripts/parse_clusters_to_sheet.py) that:
    • Fetches the Google Doc as plain text and parses the hierarchical structure
    • Resolves ~1050 DOIs via doi.org content negotiation for clean APA references + BibTeX
    • Writes structured data to a Google Sheet (3 tabs: Clusters, Sub-Clusters, Publications with data validation)
    • Exports data/clusters_v4.json for Hugo to consume at build time
  • New Hugo shortcode (layouts/shortcodes/clusters_display.html) that renders all clusters from the JSON data with:
    • Sidebar navigation with collapsible cluster tree and colored arrows
    • Tabbed sub-clusters (matching the previous UI pattern) with wrapping support
    • Sub-cluster headings, italic descriptions, and bulleted reference lists
    • Full-text search across clusters, sub-clusters, and all references (with match highlighting and click-to-scroll)
    • DOI links rendered as clickable URLs; HTML formatting (e.g. <i> for italics) preserved from doi.org
    • Responsive layout (sidebar collapses on mobile with toggle button)
  • Updated intro text to reflect 11 clusters (was 9)
  • Deactivated old cluster1.mdcluster7.md (set active = false)

Data pipeline

Google Doc (v4.1)
    ↓  parse_clusters_to_sheet.py
Google Sheet (3 tabs with data validation)
    ↓  --export-json flag
data/clusters_v4.json (committed to repo)
    ↓  Hugo build
clusters_display.html shortcode renders the page

The script supports --dry-run, --skip-doi, --json-only, and --export-json flags. DOI lookups are cached in scripts/doi_cache.json (gitignored) for fast reruns.

Screenshots

The page preserves the established tab-based UI for sub-clusters while adding sidebar navigation and full-text search. Each cluster section has an alternating pastel background color.

Test plan

  • Run python3 scripts/parse_clusters_to_sheet.py --dry-run to verify parsing (expect 11 clusters, ~93 sub-clusters, ~1297 publications)
  • Run hugo server and verify /clusters/ renders correctly
  • Test tab switching within clusters
  • Test sidebar navigation (expand clusters, click sub-clusters)
  • Test full-text search (e.g. search for an author name, click result to scroll)
  • Test on mobile viewport (sidebar toggle, content layout)
  • Verify print view shows all tab content

🤖 Generated with Claude Code

Replace the 7 hardcoded cluster markdown files with a data-driven approach
that reads from a generated JSON file (clusters_v4.json). The data originates
from the FORRT Clusters v4.1 Google Doc and is parsed into a Google Sheet,
then exported as JSON for Hugo to consume at build time.

Key changes:
- New script (parse_clusters_to_sheet.py) that parses the GDoc, resolves
  DOIs via doi.org for clean APA references + BibTeX, writes to Google Sheet,
  and exports JSON for Hugo
- New Hugo shortcode (clusters_display.html) renders all clusters with
  sidebar navigation, tabbed sub-clusters, and full-text search
- Updated intro text to reflect 11 clusters (was 9)
- Deactivated old cluster1-7.md files (replaced by data-driven rendering)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LukasWallrich LukasWallrich requested a review from a team as a code owner March 22, 2026 22:24
@github-actions
Copy link
Contributor

👍 All image files/references (if any) are in webp format, in line with our policy.

@LukasWallrich
Copy link
Contributor Author

LukasWallrich commented Mar 22, 2026

Staging Deployment Status

This PR has been successfully deployed to staging as part of an aggregated deployment.

Deployed at: 2026-03-25 22:33:11 UTC
Staging URL: https://staging.forrt.org

The staging site shows the combined state of all compatible open PRs.

@forrtproject forrtproject deleted a comment from github-actions bot Mar 22, 2026
The clusters page now has its own full-text search that covers
clusters, sub-clusters, and all references. The site-wide Academic
search is redundant and has been disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 22, 2026

📝 Spell Check Results

Found 6 potential spelling issue(s) when checking 30 changed file(s):

📄 static/js/clusters-page.js

Line Issue
80 tabEl ==> table
81 tabEl ==> table
83 tabEl ==> table
85 tabEl ==> table
436 tabEl ==> table
437 tabEl ==> table

ℹ️ How to address these issues:

  1. Fix the typo: If it's a genuine typo, please correct it.
  2. Add to whitelist: If it's a valid word (e.g., a name, technical term), add it to .codespell-ignore.txt
  3. False positive: If this is a false positive, please report it in the PR comments.

🤖 This check was performed by codespell

@richarddushime
Copy link
Contributor

we now have 2 searches box funcs
I m proposing that we remove the custom search on the left and leave the search on top of clusters

meanwhile i will continue enhancing it , would be good if you can check it asap
@LukasWallrich @flavioazevedo

@LukasWallrich
Copy link
Contributor Author

LukasWallrich commented Mar 23, 2026

Thanks @richarddushime! I agree that we need to get rid of one of the searches.

There is also now too much going on in this area - too many boxes. Maybe the syllabus does not need to be in a box?
image

Can we also remove the outdated figure and really condense the text? I think the following is all we need above the clusters - unless @flavioazevedo disagrees (but Richard, please make the change so that he can look at a complete new draft)

Teaching Open and Reproducible Science shouldn't require educators to spend months sifting through a decade of literature. FORRT simplifies this process by providing a curated, expert-backed framework. Developed by over 50 scholars, our taxonomy organizes open scholarship into 11 distinct clusters, offering a clear pathway for integrating these tenets into your teaching and mentoring, regardless of your field or level of expertise.

@richarddushime
Copy link
Contributor

I am from making other adjustements
removed the left search and enhanced the functionality of the search (I limited the search not to go through references because it was getting a lot of results from references and making a user loose necessary text of the clusters)

I would like also clarification about the below

Teaching Open and Reproducible Science shouldn't require educators to spend months sifting through a decade of literature. FORRT simplifies this process by providing a curated, expert-backed framework. Developed by over 50 scholars, our taxonomy organizes open scholarship into 11 distinct clusters, offering a clear pathway for integrating these tenets into your teaching and mentoring, regardless of your field or level of expertise.

Do you mean all the contents before the forrt syllabus and the figure all removed and replaced by this paragraph ?

About the figure i think its good to keep having it as we wait for the updated one (may be flavio can push for its design quickly ?)

@richarddushime
Copy link
Contributor

Additionally here is something i am proposing

in the latest commit I Introduces dedicated, indexable URLs for each FORRT cluster (/clusters/cluster-N/) alongside the existing taxonomy hub (/clusters/), so each cluster is a first-class page for search and sharing.

The reason i Added this is that Clusters in sitemap are only covered by 1 url (the main cluster page) or we can have each cluster indexable

by :
Canonical URLs per topic — One clear URL per cluster (and its sub-clusters in-page), instead of relying on a single long hub page or hash-only navigation for discovery.
Unique metadata per URL — Each cluster page can carry its own <title>, meta description, and Open Graph / Twitter fields from front matter, improving relevance for queries and snippet quality.
Structured data — Per-page JSON-LD (cluster_seo_jsonld) ties each URL to explicit taxonomy/entity signals for that cluster.
Topic-cluster information architecture — The hub remains the overview and entry point; cluster pages act as satellites with internal links between hub and subpages, supporting crawl paths and topical grouping.
Stable deep links — Shareable URLs (including hash targets for sub-clusters where used) support accurate social previews, backlinks, and citations to the right slice of the taxonomy.

you can check the preview by https://staging.forrt.org/clusters/cluster- [cluster-number-eg:2 or 2] eg: https://staging.forrt.org/clusters/cluster-2/

@LukasWallrich
Copy link
Contributor Author

Thanks Richard! The individual pages are great! Yes, please remove all text and the figure. Let's focus on having an accurate website. I don't see why we need this rather complex figure if we have the same information right below (in the sidebar) as readers generally want to get to the point ... so I would personally always hide it behind a details tag, or an about page if we want to talk more about the process - but that can be discussed once we have an updated figure. Showing inconsistent data is unnecesssary, unprofessional and confusing.

@LukasWallrich
Copy link
Contributor Author

And one issue with the separate pages: the search no longer works across pages. I am ok with that if we rename it to "search this cluster" - but ideally I think I'd prefer to have a search of all clusters. What do you think?

@richarddushime
Copy link
Contributor

And one issue with the separate pages: the search no longer works across pages. I am ok with that if we rename it to "search this cluster" - but ideally I think I'd prefer to have a search of all clusters. What do you think?

I saw that issue but i left it pending because i was waiting for your validation first of the new Design,
The search can work the same way as the current hub search (search within all clusters)

@LukasWallrich
Copy link
Contributor Author

And one issue with the separate pages: the search no longer works across pages. I am ok with that if we rename it to "search this cluster" - but ideally I think I'd prefer to have a search of all clusters. What do you think?

I saw that issue but i left it pending because i was waiting for your validation first of the new Design, The search can work the same way as the current hub search (search within all clusters)

That sounds a bit difficult to me - doesn't that then require anchors on each paragraph that you can link to from another page? But great if you can implement it!

LukasWallrich and others added 4 commits March 25, 2026 16:34
Replace verbose multi-paragraph intro with a compact two-column layout
(text left, clickable thumbnail right) and remove syllabus section.
Adds lightbox overlay with magnifying glass hint for discoverability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sub-cluster tabs in fixed 3-column grid with fixed height and centered text
- Remove reference counts from tabs
- Full-width layout for clusters display section
- Darker background on inactive tabs
- "Update pending" badge on clusters diagram

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…av behavior

- Remove ~145 lines of dead .clusters-cluster-subpage CSS and ~65 lines
  of dead JS (element never exists in DOM)
- Remove unused list.html template and .clusters-title-toolbar CSS
- Extract duplicated color arrays to shared partial (colors.html)
- Remove duplicate intro text from _index.md (was inconsistent with intro.html)
- Fix sub-cluster nav on individual pages to use in-page anchors for the
  active cluster, aligning scroll behavior with the hub page
- Fix redundant .toLowerCase() in search
- Add min-height on .cluster-tab-content to keep footer below fold
  when viewing short sub-clusters

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LukasWallrich
Copy link
Contributor Author

@richarddushime thanks for the work on this today (and before). I did the final cleanups ... should be good to go.

@richarddushime
Copy link
Contributor

i think what s remaining is documenting the process of how its generated up to rendering pages in the website
and also updating data processing with a manual workflow dispatch incase the sheets have been updated ?? but for now this can be shipped 🥇

Copy link
Contributor

@richarddushime richarddushime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥇

@richarddushime
Copy link
Contributor

i think what s remaining is documenting the process of how its generated up to rendering pages in the website and also updating data processing with a manual workflow dispatch incase the sheets have been updated ?? but for now this can be shipped 🥇

TBF as Enhancement

@richarddushime richarddushime merged commit 05001bb into master Mar 25, 2026
5 checks passed
@richarddushime richarddushime deleted the data-driven-clusters-v4 branch March 25, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants