Skip to content

Implement time-based cache invalidation for dataset description cache #1714

@emphor11

Description

@emphor11

### Problem
The function _get_dataset_description caches the dataset description (description.xml) to disk, but the cache is never invalidated.

Dataset descriptions contain metadata such as dataset status (active, deactivated, etc.), which may change over time on the OpenML server. Because the cached description file is reused indefinitely, users may receive stale metadata if the local cache is old.

There is also a TODO comment in the code suggesting that this cache should invalidate itself after some time.

Expected Behavior

Cached dataset descriptions should be refreshed periodically to ensure that metadata reflects the current state on the server.

Current Behavior

Once description.xml is downloaded and cached, it is reused indefinitely unless the cache directory is manually deleted or force_refresh_cache=True is used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions