Skip to content

feat: Sage API parity — frames, converters, config#2

Merged
timosachsenberg merged 34 commits intomasterfrom
wrap-more
Mar 12, 2026
Merged

feat: Sage API parity — frames, converters, config#2
timosachsenberg merged 34 commits intomasterfrom
wrap-more

Conversation

@timosachsenberg
Copy link
Contributor

@timosachsenberg timosachsenberg commented Mar 11, 2026

This pull request introduces several major enhancements and improvements to the C FFI bridge for timsrust, focusing on expanded C API coverage, improved test infrastructure, and developer usability. The most significant changes are the addition of frame-level access APIs, spectrum structure extensions for Sage compatibility, a new opaque configuration system for custom dataset opening, and improved CI and documentation for testing.

Key changes:

C API Extensions

  • Added frame-level access APIs, including the tims_frame struct and functions for retrieving individual frames or all frames by MS level, as well as proper memory management for returned frame arrays. [1] [2]
  • Extended the tims_spectrum struct with additional fields (index, isolation window, charge, precursor intensity, frame index) to match Sage's requirements and support richer downstream analysis.

Rust FFI and Internal Structs

  • Implemented a new TimsFfiConfig struct and related FFI functions to allow users to customize spectrum reading parameters (e.g., smoothing, centroiding, calibration) when opening datasets via tims_open_with_config. [1] [2]
  • Refactored the internal TimsDataset struct to add frame-level buffers, frame reader, and cached index converters, supporting the new C API features and improving memory management. [1] [2]

Testing and CI Improvements

  • Added a comprehensive GitHub Actions workflow (.github/workflows/test.yml) to run both stub (no dataset) and integration (real data) tests, including caching and automatic dataset download for CI efficiency.
  • Updated the README.md with clear instructions for running stub and integration tests, including environment variable setup and dataset download, as well as an explanation of the CI strategy.
  • Declared libc as a dev-dependency to support test infrastructure.

Usability and Developer Experience

  • Added new demos to examples/cpp_client.cpp showcasing frame-level access, index conversion, and extended spectrum fields for easier onboarding and validation.
  • Minor: Updated crate type in Cargo.toml to include "lib" for broader linking support.
  • Fixed header guard and file naming in include/timsrust_cpp_bridge.h.

These changes collectively make the FFI bridge more powerful, easier to test, and better documented for both end-users and developers.

timosachsenberg and others added 14 commits March 11, 2026 21:48
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- precursor_intensity: float -> double (preserve f64 precision)
- frame_index sentinel: 0 -> UINT32_MAX (0 is valid index)
- Add isolation_mz field (used by Sage for DIA window matching)
- Add calibration_tolerance and calibrate config setters
- Clarify buffer invalidation independence (frame vs spectrum)
- Note FrameWindowSplitting converter chicken-and-egg constraint
- Specify behavior for invalid ms_level (empty array, Ok status)
- Note tims_get_spectra_by_rt must populate new fields
- Config uses Box allocation (consistent with dataset handle)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Clarify index is Spectrum.index (not Precursor.index)
- frame_index sentinel only for MS1 (no precursor), not missing data
- MSLevel::Unknown maps to 0
- num_scans derived from scan_offsets.len() - 1
- scan_offsets copied from Vec<usize> to Vec<u64>; 32-bit unsupported
- Use get_all_ms1/ms2 instead of parallel_filter
- tims_free_frame_array frees inner arrays then outer array
- Converter single-value functions return NaN on NULL handle
- calibrate setter: 0=disabled, non-zero=enabled
- Builder handles FrameWindowSplitting converter resolution internally

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 tasks across 4 chunks:
- Chunk 1: Extend TimsFfiSpectrum with 6 new fields
- Chunk 2: Frame-level access (TimsFfiFrame, FrameReader, batch API)
- Chunk 3: Converters (TOF->m/z, scan->IM) and config builder
- Chunk 4: C header and example updates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… errors

- Distribute C header updates into chunks 1-3 (was all in chunk 4)
  so Rust FFI and C header stay in sync at every commit
- Fix get_frame: use ptr::null() for empty buffers instead of
  dangling Vec::as_ptr(), matching existing spectrum pattern
- Fix tims_get_frames_by_level: propagate frame read errors as
  TimsFfiStatus::Internal instead of silently dropping them
- Add cargo check --features with_timsrust verification steps
- Reduce to 9 tasks (header tasks merged into existing chunks)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_intensity, frame_index

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rray FFI exports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change tims_get_frames_by_level ms_level parameter from c_uint/unsigned int
  to u8/uint8_t to match plan specification
- Add doc comments to all TimsDataset struct fields
- Reformat get_frame stub branch to one statement per line

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set last_error with descriptive messages on IndexOutOfBounds and
  frame read failures for better C/C++ consumer debugging
- Break long expressions (num_scans, ms_level match) across multiple lines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add single-value and batch converter functions for TOF->m/z and
scan->ion mobility conversions. These expose the MetadataReader
converters cached on TimsDataset at open time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TimsFfiConfig wrapper around SpectrumReaderConfig with setters for
smoothing_window, centroiding_window, calibration_tolerance, and
calibrate. Expose opaque tims_config type with create/free/setter FFI
functions and tims_open_with_config that spawns a panic-safe thread.

Refactor TimsDataset::open() to extract shared post-reader setup into
finish_open() helper, eliminating code duplication between open() and
the new open_with_config(). Update C header with converter and config
declarations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… demos

Add demonstration sections for:
- Frame-level access (single frame + batch MS1 fetch with timing)
- TOF->m/z and scan->IM converter usage
- Extended spectrum fields (charge, isolation, frame_index, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Correct stale filename comment (timsffi.h -> timsrust_cpp_bridge.h)
- Clarify tims_get_frame buffer validity: only invalidated by the next
  tims_get_frame call, not by spectrum operations (buffers are independent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a design and implementation plan plus code and tests to expose frame-level access, converters (TOF→m/z, scan→IM), an opaque SpectrumReader config and config-based open, extended spectrum/frame FFI types, C ABI bindings, Rust wiring, C++ examples, and CI/test harness.

Changes

Cohort / File(s) Summary
Design & Plans
docs/superpowers/plans/2026-03-11-sage-api-parity.md, docs/superpowers/specs/2026-03-11-sage-api-parity-design.md, docs/superpowers/plans/2026-03-12-testing-strategy.md, docs/superpowers/specs/2026-03-12-testing-strategy-design.md
New planning and design documents detailing Sage API parity, FFI contracts, feature chunks (extended spectrum, frames, converters, config), and an extensive testing/CI strategy.
C Header / ABI
include/timsrust_cpp_bridge.h
Adds tims_frame and extends tims_spectrum; declares frame APIs (tims_get_frame, tims_get_frames_by_level, tims_free_frame_array), conversion helpers (scalar/array), opaque tims_config and setters, and tims_open_with_config.
Rust FFI Entrypoints & Lib Surface
src/lib.rs
Exports config module and new C-ABI functions: config lifecycle/setters, tims_open_with_config, frame retrieval/batch/free, conversion helpers (scalar & array), extended spectrum/frame handling, and memory-management helpers. Feature-gated behavior preserved for no-timsrust builds.
Types
src/types.rs
Adds TimsFfiFrame and augments TimsFfiSpectrum with index, isolation_width, isolation_mz, charge, precursor_intensity, and frame_index.
Config Module
src/config.rs
Introduces TimsFfiConfig wrapper with constructor and setters (set_smoothing_window, set_centroiding_window, set_calibration_tolerance, set_calibrate); setters no-op when with_timsrust is disabled.
Dataset & Frame Logic
src/dataset.rs
Extends TimsDataset with per-frame buffers, frame_reader and converters, adds finish_open flow, open_with_config, get_frame, and populates extended spectrum fields; provides stub fallbacks for non-feature builds.
Examples
examples/cpp_client.cpp
Adds demo blocks for frame-level access (single and batch by MS level), converter usage, and printing extended spectrum fields.
Tests — Rust & C++
tests/*, tests/common/mod.rs, tests_cpp/*, .github/workflows/test.yml
Adds extensive Rust FFI tests (lifecycle, error handling, spectrum, frame, query, converters, real-data), C++ Catch2 ABI & smoke tests, test helpers, Makefile, Catch2 fetch script, and CI workflows for stub and integration runs.
Build / Manifest & Ignore
Cargo.toml, .gitignore
Adds lib crate-type for rlib artifact, adds a dev-dependency for tests, and .gitignore entries for build/test artifacts.

Sequence Diagram(s)

sequenceDiagram
  participant Cpp as C++ Client
  participant FFI as C ABI (timsffi)
  participant Rust as timsrust Library
  participant Reader as SpectrumReader/FrameReader

  Cpp->>FFI: tims_config_create()
  FFI->>Rust: allocate TimsFfiConfig
  Cpp->>FFI: tims_config_set_*()
  FFI->>Rust: update config fields
  Cpp->>FFI: tims_open_with_config(path, cfg)
  FFI->>Rust: open dataset with config
  Rust->>Reader: construct / finish_open (build converters, frame reader)
  Reader-->>Rust: ready
  FFI-->>Cpp: return dataset handle

  Cpp->>FFI: tims_get_frame(handle, index, out_frame)
  FFI->>Rust: read frame, fill buffers & TimsFfiFrame
  Rust->>Reader: fetch scans / TOF / intensities
  Reader-->>Rust: buffers
  Rust-->>FFI: success + frame pointers
  FFI-->>Cpp: out_frame populated

  Cpp->>FFI: tims_convert_tof_to_mz_array(handle, tof_indices, n, out_mz)
  FFI->>Rust: invoke converter
  Rust-->>FFI: out_mz filled

  Cpp->>FFI: tims_free_frame_array(handle, frames, count)
  FFI->>Rust: free per-frame allocations
Loading

Poem

🐇 I hop through frames and whisper m/z,
I braid configs and readers with finesse,
I nibble indexes, guard each pointer tight,
I test and hum until the bridge is right,
Hooray — small paws, big API delight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: Sage API parity — frames, converters, config' directly summarizes the main changes: exposing frame-level access, converters, and configuration APIs to achieve Sage API parity. It is clear, specific, and reflects the primary purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch wrap-more

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/superpowers/plans/2026-03-11-sage-api-parity.md`:
- Around line 543-548: The planned prototype for tims_get_frames_by_level is
using the wrong parameter order; update the declaration to match the actual
exposed signature as (handle, ms_level, out_count, out_frames) instead of
(handle, ms_level, out_frames, out_count), and update any sample call sites and
documentation that construct calls to tims_get_frames_by_level to pass out_count
before out_frames so the FFI call and headers align with the implementation.

In `@docs/superpowers/specs/2026-03-11-sage-api-parity-design.md`:
- Around line 73-89: The spec's C signatures must match the shipped header:
change return type tims_status to timsffi_status for tims_get_frame and
tims_get_frames_by_level, and reorder tims_get_frames_by_level parameters so
uint32_t *out_count comes before tims_frame **out_frames; also ensure
tims_free_frame_array uses the same parameter order and types as the header
(timsffi_status where applicable and tims_dataset*, tims_frame*, uint32_t count)
so generated bindings/sample code match the real API.

In `@src/config.rs`:
- Around line 30-60: The setter set_calibration_tolerance currently accepts any
f64; update it to validate the input at the FFI boundary by checking
tolerance.is_finite() and tolerance >= 0.0 before mutating
self.inner.spectrum_processing_params.calibration_tolerance; if the check fails,
do not assign (or return/propagate an error if you prefer an explicit rejection
policy) so NaN/±Inf/negative values cannot be written into
SpectrumProcessingParams.calibration_tolerance.

In `@src/dataset.rs`:
- Around line 340-344: The map_err on self.frame_reader.get(...) currently sets
last_error and returns TimsFfiStatus::IndexOutOfBounds even though indices were
already bounds-checked earlier; change the error mapping to return
TimsFfiStatus::Internal to reflect a read/decode failure (keep setting
self.last_error with the formatted error message), i.e. update the closure in
the call to self.frame_reader.get(...) so it maps any returned Err(e) to set
self.last_error = Some(format!(...)) and then return TimsFfiStatus::Internal
instead of IndexOutOfBounds.
- Around line 329-382: The get_frame method leaves self.last_error set after a
prior failure; clear it on every successful call by setting self.last_error =
None before returning Ok(()). Update both branches in get_frame (the
#[cfg(feature = "with_timsrust")] success path before its Ok(()) and the
#[cfg(not(feature = "with_timsrust"))] path before its Ok(())) so successful
reads reset last_error.

In `@src/lib.rs`:
- Around line 461-576: The code sets ds.last_error on read failure but never
clears it on successful/empty returns; update the success paths to reset
ds.last_error = None: clear it in the invalid-ms_level branch before returning
TimsFfiStatus::Ok, clear it when n == 0 before returning Ok, and clear it on the
final successful path just before writing *out_count/*out_frames and returning
TimsFfiStatus::Ok (i.e., touch ds.last_error in the function that contains the
frames handling logic and the symbols arr_ptr, n, frames, and
out_count/out_frames).
- Around line 486-552: The malloc size computations can overflow (n, num_peaks,
scan_len); update the allocations (the initial arr_ptr allocation and the
per-frame allocations that produce tof_ptr, int_ptr, and scan_ptr) to use
checked_mul when computing count * mem::size_of::<...>() and early-return
TimsFfiStatus::Internal on None; if checked_mul yields Some(len) call
malloc(len) as before, otherwise skip allocation and perform the same
cleanup/return logic used for NULL mallocs to avoid under-allocation and
out-of-bounds writes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e8d95e5-0042-406e-a01d-2ef524d142b1

📥 Commits

Reviewing files that changed from the base of the PR and between 1d139f4 and ef2dcb2.

📒 Files selected for processing (8)
  • docs/superpowers/plans/2026-03-11-sage-api-parity.md
  • docs/superpowers/specs/2026-03-11-sage-api-parity-design.md
  • examples/cpp_client.cpp
  • include/timsrust_cpp_bridge.h
  • src/config.rs
  • src/dataset.rs
  • src/lib.rs
  • src/types.rs

- get_frame: return Internal (not IndexOutOfBounds) on read/decode failure
  after bounds check passes
- get_frame: clear last_error on successful returns
- tims_get_frames_by_level: clear ds.last_error on all success paths
- tims_get_frames_by_level: use checked_mul for all malloc size calculations
  to prevent integer overflow on pathological inputs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/lib.rs (1)

500-605: Extract the repeated partial-free path.

The three allocation branches duplicate the same “free previously written frames + free current temporaries” logic. A small helper for that cleanup would make this unsafe path much easier to audit and less error-prone on future changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` around lines 500 - 605, The allocation branches for tof_ptr,
int_ptr, and scan_ptr duplicate the same cleanup logic; extract that into a
single helper (e.g., free_frame_allocs or cleanup_on_alloc_failure) which
accepts arr_ptr, idx, and any current temporaries (tof_ptr, int_ptr, scan_ptr)
and performs the per-frame frees and frees arr_ptr before returning
TimsFfiStatus::Internal; replace the duplicated blocks in the checked_mul None
and p.is_null() paths inside the tof_ptr, int_ptr, and scan_ptr branches to call
this helper instead, preserving the same null checks and free(...) calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dataset.rs`:
- Around line 255-280: The get_spectrum path currently returns Ok(()) without
clearing any previously recorded error, so tims_get_last_error(handle, ...) will
still return stale messages; update the successful-return branches in the
get_spectrum implementation (the block that sets out.num_peaks, out.mz, etc.) to
clear the stored error (e.g., set self.last_error = None or clear the last_error
buffer) before returning Ok(()), and apply the same change to the analogous
successful branch around the code referenced at lines 293-299 (the other
get_frame / tims_get_frames_by_level success path) so successful calls reset
last_error.
- Around line 127-130: finish_open() currently extracts mz_converter and
im_converter from a plain MetadataReader, causing potential divergence from the
config-aware SpectrumReader created in open_with_config(); update the code so
the converters come from the same configured reader: either pass the same config
into MetadataReader creation or, better, obtain mz_converter and im_converter
directly from the configured SpectrumReader instance (the one constructed in
open_with_config()) so that the converters exposed via the FFI functions
tims_convert_tof_to_mz and tims_convert_scan_to_im match the actual spectrum
processing (symbols: open_with_config, finish_open, SpectrumReader,
MetadataReader, mz_converter, im_converter, tims_convert_tof_to_mz,
tims_convert_scan_to_im).

In `@src/lib.rs`:
- Around line 472-478: The match arm that currently discards the frame-batch
read error should capture it (match Err(e)) and set ds.last_error to include
e.to_string() (e.g., format!("failed to read one or more frames: {}", e)) before
zeroing out out_count and nulling out_frames and returning
TimsFfiStatus::Internal; update the match in the code that builds frames from
frames_result (the variable frames_result => frames) to preserve and log the
underlying error so tims_get_last_error() can return the actual FrameReader
failure.

---

Nitpick comments:
In `@src/lib.rs`:
- Around line 500-605: The allocation branches for tof_ptr, int_ptr, and
scan_ptr duplicate the same cleanup logic; extract that into a single helper
(e.g., free_frame_allocs or cleanup_on_alloc_failure) which accepts arr_ptr,
idx, and any current temporaries (tof_ptr, int_ptr, scan_ptr) and performs the
per-frame frees and frees arr_ptr before returning TimsFfiStatus::Internal;
replace the duplicated blocks in the checked_mul None and p.is_null() paths
inside the tof_ptr, int_ptr, and scan_ptr branches to call this helper instead,
preserving the same null checks and free(...) calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab681b8b-d86a-451a-ab42-0e701606f99f

📥 Commits

Reviewing files that changed from the base of the PR and between ef2dcb2 and a4a0af4.

📒 Files selected for processing (2)
  • src/dataset.rs
  • src/lib.rs

timosachsenberg and others added 11 commits March 12, 2026 08:05
Contract-centric approach: Rust integration tests for FFI safety
and memory ownership (~80%), C++ Catch2 suite for ABI layout and
config effect validation (~20%). Two-tier CI with stub and real-data jobs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move handle null-check in tims_get_spectra_by_rt before the feature
  gate so it applies in both stub and real builds
- Move finalize() calls in tims_file_info outside the feature gate so
  stub mode returns zeros instead of infinity sentinels
- Update spec with additional test cases from review feedback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 tasks across 5 chunks: test infrastructure, lifecycle, error handling,
spectrum/frame/query/converter stub tests, real-data integration tests,
C++ Catch2 ABI and smoke tests, and CI workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pers)

- Add "lib" crate-type so integration tests can link against the crate
- Add libc to dev-dependencies for FFI test helpers
- Make config, dataset, and types modules public for test access
- Create tests/common/mod.rs with FFI declarations, struct mirrors,
  status constants, and helper functions (open_stub, open_real, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n_with_config

16 tests covering:
- tims_open / tims_close null-safety and error paths
- tims_open_with_config null-safety and error paths
- tims_open_with_config happy path with temp directory
- Config builder create/free/setters null-safety and full lifecycle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover tims_get_last_error (global vs per-handle, null/zero buffer,
truncation, content validation), tims_get_spectrum / tims_get_spectra_by_rt
(null guards, index-OOB on stub, batch edge cases, free safety), and
tims_get_frame / tims_get_frames_by_level (same patterns for frame API).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ffi_query.rs (10 tests) covering tims_num_spectra, tims_num_frames,
tims_get_swath_windows, tims_free_swath_windows, and tims_file_info with
null-handle guards and stub return-value assertions.

Add ffi_converters.rs (14 tests) covering scalar and array index
converters (tof_to_mz, scan_to_im) with null-handle NaN checks,
null-input/output guards, zero-count edge cases, stub identity gates,
and array-vs-scalar consistency checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 21 integration tests gated behind TIMSRUST_TEST_DATA_DDA and
TIMSRUST_TEST_DATA_DIA environment variables. Tests cover spectrum
retrieval, frame access, scan offset monotonicity, RT-based queries
with IM filtering, index converters (scalar and array), file info,
swath windows, and cross-cutting scenarios (reopen, simultaneous
handles).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests_cpp/ directory with:
- test_abi.cpp: enum value checks, struct sizeof/offset ABI verification
- test_smoke.cpp: error path, config builder, DDA/DIA smoke tests, and
  config effect tests (centroiding, smoothing, calibration)
- Makefile: build and run targets linking against libtimsrust_cpp_bridge
- fetch_catch2.sh: downloads Catch2 v3.5.2 amalgamated files
- .gitignore: excludes Catch2 vendored files and build artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a test.yml workflow that runs on push to master, PRs, and manual
dispatch. The stub-tests job builds without timsrust and runs both Rust
and C++ ABI tests. The integration-tests job (master/dispatch only)
builds with timsrust and runs against real datasets (download TBD).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add #[cfg(not(feature = "with_timsrust"))] to all tests that call
  open_stub(), preventing panics when run with --features with_timsrust
  (the real reader cannot open a fake temp directory)
- Add static Mutex in ffi_error_handling.rs to serialize tests that
  read/write the global LAST_ERROR state, preventing flaky parallel races
- Document i32 vs TimsFfiStatus choice in common/mod.rs (C ABI uses int)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/lib.rs (2)

292-292: ⚠️ Potential issue | 🟡 Minor

Missing overflow check on spectra array allocation.

The allocation at line 292 uses direct multiplication take_n * mem::size_of::<TimsFfiSpectrum>() without overflow protection, similar to the swath windows issue.

🛡️ Add overflow check
-        let arr_ptr = (unsafe { malloc(take_n * mem::size_of::<TimsFfiSpectrum>()) }) as *mut TimsFfiSpectrum;
+        let arr_size = match take_n.checked_mul(mem::size_of::<TimsFfiSpectrum>()) {
+            Some(s) => s,
+            None => return TimsFfiStatus::Internal,
+        };
+        let arr_ptr = (unsafe { malloc(arr_size) }) as *mut TimsFfiSpectrum;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` at line 292, The allocation for arr_ptr uses unchecked
multiplication of take_n * mem::size_of::<TimsFfiSpectrum>() which can overflow;
update the allocation to first compute the byte size using checked_mul (e.g.,
take_n.checked_mul(mem::size_of::<TimsFfiSpectrum>())) and handle the None case
(return an error/null or abort gracefully) before calling malloc, and then call
malloc with the verified size; ensure references to take_n,
mem::size_of::<TimsFfiSpectrum>(), and the arr_ptr/malloc allocation site are
updated accordingly.

151-151: ⚠️ Potential issue | 🟡 Minor

Missing overflow check on swath windows allocation.

Unlike the frame allocation code which uses checked_mul, the swath windows allocation at line 151 uses direct multiplication n * mem::size_of::<TimsFfiSwathWindow>() which could overflow for pathological n values.

🛡️ Add overflow check for consistency
-            let size = n * mem::size_of::<crate::types::TimsFfiSwathWindow>();
+            let size = match n.checked_mul(mem::size_of::<crate::types::TimsFfiSwathWindow>()) {
+                Some(s) => s,
+                None => return TimsFfiStatus::Internal,
+            };
             let ptr = unsafe { malloc(size) } as *mut crate::types::TimsFfiSwathWindow;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` at line 151, Replace the direct multiplication for swath window
allocation with a checked multiplication using
n.checked_mul(mem::size_of::<crate::types::TimsFfiSwathWindow>()) and handle the
None case the same way the frame allocation does (e.g., return an error or abort
allocation) so overflow cannot occur; update the code that assigns to size and
the subsequent allocation path to use the checked result (referencing the
variables size and n and the type
mem::size_of::<crate::types::TimsFfiSwathWindow>() to locate the change).
🧹 Nitpick comments (4)
tests/ffi_lifecycle.rs (1)

112-126: Consider cleanup of test temp directories.

The test creates timsrust_ffi_test_config_happy in the system temp directory but doesn't clean it up afterward. While not critical, accumulated test artifacts can clutter the filesystem over many runs.

🧹 Optional cleanup approach
 #[cfg(not(feature = "with_timsrust"))]
 #[test]
 fn open_with_config_happy_path() {
     let dir = std::env::temp_dir().join("timsrust_ffi_test_config_happy");
     std::fs::create_dir_all(&dir).expect("create temp dir");
     let c_path = CString::new(dir.to_str().unwrap()).unwrap();

     let cfg = unsafe { tims_config_create() };
     assert!(!cfg.is_null());

     let mut handle: *mut libc::c_void = ptr::null_mut();
     let status = unsafe { tims_open_with_config(c_path.as_ptr(), cfg, &mut handle) };
     assert_status(status, TIMSFFI_OK);
     assert!(!handle.is_null());

     unsafe { tims_close(handle) };
     unsafe { tims_config_free(cfg) };
+    let _ = std::fs::remove_dir_all(&dir);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ffi_lifecycle.rs` around lines 112 - 126, The test creates a temporary
directory `dir` for "timsrust_ffi_test_config_happy" but never removes it; after
the existing teardown calls (unsafe { tims_close(handle) } and unsafe {
tims_config_free(cfg) }) add cleanup to remove the temp directory (e.g., call
std::fs::remove_dir_all(&dir) or switch to a scoped TempDir) so the filesystem
artifact is deleted; locate this in tests/ffi_lifecycle.rs around the block that
uses tims_config_create, tims_open_with_config, tims_close, and
tims_config_free.
tests/ffi_real_data.rs (1)

68-71: Assertion may be overly strict for edge-case datasets.

The assertion num_frames <= num_spectra assumes every frame contributes at least one spectrum. While this is typical for DDA datasets, edge cases (empty frames, filtered data) could violate this invariant. Consider either relaxing the assertion or documenting that this test requires a well-formed DDA dataset.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ffi_real_data.rs` around lines 68 - 71, The assertion that num_frames
<= num_spectra is too strict for edge-case datasets; update the test in
tests/ffi_real_data.rs to either relax or guard the check: replace the hard
assert on num_frames and num_spectra with a conditional check that only asserts
this invariant for well-formed DDA inputs (e.g., check a dataset-type flag or
skip when frames may be empty), or change it to a less strict validation
(log/warn or use assert!(num_frames <= num_spectra || num_frames == 0) /
similar) so tests don’t fail for filtered/empty-frame datasets; target the
assertion using the num_frames and num_spectra symbols and add a short comment
explaining the dataset assumption if you keep the check.
tests_cpp/test_smoke.cpp (1)

125-150: Test name suggests verification that doesn't occur.

The test "Calibration changes m/z values" only verifies that both calibrate=off and calibrate=on produce valid spectra with num_peaks > 0, but doesn't actually assert that any m/z values differ. If calibration should always produce different values, add an assertion similar to the smoothing test. If calibration may not change values (e.g., already calibrated data), consider renaming the test to clarify its intent.

💡 Suggested assertion to verify calibration effect
     tims_config_free(cfg_on);
     tims_close(h2);
     // At minimum both should return valid data
     REQUIRE(s1.num_peaks > 0);
     REQUIRE(s2.num_peaks > 0);
+    // Optionally verify calibration has some effect (if expected)
+    // bool any_mz_differ = false;
+    // auto n = std::min(mz1.size(), mz2.size());
+    // for (size_t i = 0; i < n; ++i) {
+    //     if (std::abs(mz1[i] - mz2[i]) > 1e-6f) { any_mz_differ = true; break; }
+    // }
+    // REQUIRE(any_mz_differ);  // Uncomment if calibration should always change values
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests_cpp/test_smoke.cpp` around lines 125 - 150, The test "Calibration
changes m/z values" (TEST_CASE) currently only checks s1.num_peaks and
s2.num_peaks but doesn't compare the mz vectors; update the test to actually
verify calibration effect by comparing the m/z arrays produced with calibrate
off vs on (mz1 and mz2 from tims_get_spectrum) — for example, assert that at
least one corresponding m/z value differs (or use a tolerance and check any
fabs(mz1[i] - mz2[i]) > eps) for matching indices and that num_peaks match
before comparison; alternatively, if calibration may legitimately produce
identical m/z, rename the TEST_CASE to reflect it only verifies spectra are
non-empty and do not assert a change. Ensure you reference the existing symbols:
TEST_CASE name, tims_config_set_calibrate, tims_get_spectrum, s1/s2, and mz1/mz2
when making the change.
Cargo.toml (1)

29-30: Remove the duplicate libc entry from [dev-dependencies].

libc is already declared in [dependencies] (line 16), so repeating it under [dev-dependencies] (line 30) with the same version creates maintenance burden. Integration tests and examples can use crates from [dependencies] directly without duplication. Keep a single declaration unless dev code requires a different version or feature set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` around lines 29 - 30, Remove the duplicate libc declaration from
the [dev-dependencies] section in Cargo.toml: keep the existing libc = "0.2"
entry under [dependencies] and delete the libc = "0.2" line from
[dev-dependencies] so the crate is declared only once (unless you intentionally
need a different version/features for dev code).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test.yml:
- Around line 43-69: The workflow currently seeds test steps with
TIMSRUST_TEST_DATA_DDA/TIMSRUST_TEST_DATA_DIA pointing at testdata/dda.d and
testdata/dia.d but the "Download test datasets" step only creates the directory
and prints a warning; replace the TODO by either (A) downloading and extracting
the real test artifacts (e.g. use gh release download or curl to pull the
correct release asset(s) and tar -xzf into testdata) so the files testdata/dda.d
and testdata/dia.d actually exist, or (B) make the downstream steps conditional:
check for the existence of testdata/dda.d and testdata/dia.d and skip or mark
the job as neutral/fail-fast with a clear message if they are missing; update
the step named "Download test datasets" and ensure the Rust step that reads
TIMSRUST_TEST_DATA_DDA/TIMSRUST_TEST_DATA_DIA and the C++ invocation passing
DDA/DIA reference those files only after the download/existence check.

In `@docs/superpowers/specs/2026-03-12-testing-strategy-design.md`:
- Line 21: The fenced code block that contains the file-tree is missing a
language hint and is being flagged by markdownlint; update the opening fence
(```) for the file-tree block to include a language such as text or plaintext
(change ``` to ```text or ```plaintext) so the linter accepts it—locate the
file-tree fenced block by its triple-backtick markers and modify the opener
accordingly.

In `@src/lib.rs`:
- Line 19: Remove the unused import TIMSFFI_MAX_ERROR_LEN from lib.rs: locate
the use statement "use crate::types::TIMSFFI_MAX_ERROR_LEN;" and delete it (or
remove TIMSFFI_MAX_ERROR_LEN from the use list if that line imports multiple
items) so the compiler warning about an unused import is eliminated.

In `@tests_cpp/test_abi.cpp`:
- Around line 16-42: The tests currently use weak checks (sizeof(...) > 0 and
partial offsetof comparisons) that won't catch layout drift; update the ABI
tests for tims_spectrum, tims_frame, tims_swath_window (and other exposed
structs like tims_level_stats, tims_file_info_t) to assert exact sizes and exact
offsets for every exposed field using Rust-generated C constants (e.g.
TIMS_SPECTRUM_SIZE and per-field offset constants such as
TIMS_SPECTRUM_OFFSET_RT_SECONDS) instead of relational checks, and replace the
partial offsetof assertions with REQUIRE(offsetof(...) == <Rust constant>) for
each field so the test fails on any reorder/padding regression. Ensure the new
constants are referenced in the test and used for all fields in tims_spectrum,
tims_frame, tims_swath_window (and add similar full-field checks for
tims_level_stats and tims_file_info_t).

---

Outside diff comments:
In `@src/lib.rs`:
- Line 292: The allocation for arr_ptr uses unchecked multiplication of take_n *
mem::size_of::<TimsFfiSpectrum>() which can overflow; update the allocation to
first compute the byte size using checked_mul (e.g.,
take_n.checked_mul(mem::size_of::<TimsFfiSpectrum>())) and handle the None case
(return an error/null or abort gracefully) before calling malloc, and then call
malloc with the verified size; ensure references to take_n,
mem::size_of::<TimsFfiSpectrum>(), and the arr_ptr/malloc allocation site are
updated accordingly.
- Line 151: Replace the direct multiplication for swath window allocation with a
checked multiplication using
n.checked_mul(mem::size_of::<crate::types::TimsFfiSwathWindow>()) and handle the
None case the same way the frame allocation does (e.g., return an error or abort
allocation) so overflow cannot occur; update the code that assigns to size and
the subsequent allocation path to use the checked result (referencing the
variables size and n and the type
mem::size_of::<crate::types::TimsFfiSwathWindow>() to locate the change).

---

Nitpick comments:
In `@Cargo.toml`:
- Around line 29-30: Remove the duplicate libc declaration from the
[dev-dependencies] section in Cargo.toml: keep the existing libc = "0.2" entry
under [dependencies] and delete the libc = "0.2" line from [dev-dependencies] so
the crate is declared only once (unless you intentionally need a different
version/features for dev code).

In `@tests_cpp/test_smoke.cpp`:
- Around line 125-150: The test "Calibration changes m/z values" (TEST_CASE)
currently only checks s1.num_peaks and s2.num_peaks but doesn't compare the mz
vectors; update the test to actually verify calibration effect by comparing the
m/z arrays produced with calibrate off vs on (mz1 and mz2 from
tims_get_spectrum) — for example, assert that at least one corresponding m/z
value differs (or use a tolerance and check any fabs(mz1[i] - mz2[i]) > eps) for
matching indices and that num_peaks match before comparison; alternatively, if
calibration may legitimately produce identical m/z, rename the TEST_CASE to
reflect it only verifies spectra are non-empty and do not assert a change.
Ensure you reference the existing symbols: TEST_CASE name,
tims_config_set_calibrate, tims_get_spectrum, s1/s2, and mz1/mz2 when making the
change.

In `@tests/ffi_lifecycle.rs`:
- Around line 112-126: The test creates a temporary directory `dir` for
"timsrust_ffi_test_config_happy" but never removes it; after the existing
teardown calls (unsafe { tims_close(handle) } and unsafe { tims_config_free(cfg)
}) add cleanup to remove the temp directory (e.g., call
std::fs::remove_dir_all(&dir) or switch to a scoped TempDir) so the filesystem
artifact is deleted; locate this in tests/ffi_lifecycle.rs around the block that
uses tims_config_create, tims_open_with_config, tims_close, and
tims_config_free.

In `@tests/ffi_real_data.rs`:
- Around line 68-71: The assertion that num_frames <= num_spectra is too strict
for edge-case datasets; update the test in tests/ffi_real_data.rs to either
relax or guard the check: replace the hard assert on num_frames and num_spectra
with a conditional check that only asserts this invariant for well-formed DDA
inputs (e.g., check a dataset-type flag or skip when frames may be empty), or
change it to a less strict validation (log/warn or use assert!(num_frames <=
num_spectra || num_frames == 0) / similar) so tests don’t fail for
filtered/empty-frame datasets; target the assertion using the num_frames and
num_spectra symbols and add a short comment explaining the dataset assumption if
you keep the check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5fd22b2c-48f4-48b5-910c-0d3eb983fa2e

📥 Commits

Reviewing files that changed from the base of the PR and between a4a0af4 and 36928c2.

📒 Files selected for processing (18)
  • .github/workflows/test.yml
  • .gitignore
  • Cargo.toml
  • docs/superpowers/plans/2026-03-12-testing-strategy.md
  • docs/superpowers/specs/2026-03-12-testing-strategy-design.md
  • src/lib.rs
  • tests/common/mod.rs
  • tests/ffi_converters.rs
  • tests/ffi_error_handling.rs
  • tests/ffi_frame.rs
  • tests/ffi_lifecycle.rs
  • tests/ffi_query.rs
  • tests/ffi_real_data.rs
  • tests/ffi_spectrum.rs
  • tests_cpp/Makefile
  • tests_cpp/fetch_catch2.sh
  • tests_cpp/test_abi.cpp
  • tests_cpp/test_smoke.cpp
✅ Files skipped from review due to trivial changes (1)
  • .gitignore

timosachsenberg and others added 3 commits March 12, 2026 10:56
Download DDA/DIA HeLa datasets (PXD027359) from the test-data-v1
GitHub release and set TIMSRUST_TEST_DATA_DDA/DIA env vars.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove the master/workflow_dispatch gate so integration tests run on
  every push and PR alongside stub tests
- Cache the extracted test datasets using actions/cache keyed on the
  release tag — the 674 MB download only happens once
- Delete zip files after extraction to keep the cache lean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document stub tests, integration tests with real data, dataset download
instructions, env vars, and CI caching strategy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/test.yml (1)

42-58: Consider adding a verification step after cache restore.

When the cache is restored (cache hit), the workflow proceeds directly to build without verifying the expected directories exist. If the cache is corrupted or incomplete, tests will fail with unclear errors.

💡 Optional: Add a post-cache verification step
       - name: Cache test datasets
         id: cache-testdata
         uses: actions/cache@v4
         with:
           path: testdata
           key: test-data-v1

+      - name: Verify test datasets
+        run: |
+          test -d testdata/20210510_TIMS03_EVO03_PaSk_MA_HeLa_50ng_5_6min_DDA_S1-B1_1_25185.d || \
+            { echo "DDA dataset missing"; exit 1; }
+          test -d testdata/20210510_TIMS03_EVO03_PaSk_SA_HeLa_50ng_5_6min_DIA_high_speed_S1-B2_1_25186.d || \
+            { echo "DIA dataset missing"; exit 1; }
+
       - name: Download test datasets
         if: steps.cache-testdata.outputs.cache-hit != 'true'

Alternatively, move the conditional check into a combined verify-or-download step:

      - name: Ensure test datasets
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          DDA="testdata/20210510_TIMS03_EVO03_PaSk_MA_HeLa_50ng_5_6min_DDA_S1-B1_1_25185.d"
          DIA="testdata/20210510_TIMS03_EVO03_PaSk_SA_HeLa_50ng_5_6min_DIA_high_speed_S1-B2_1_25186.d"
          if [[ -d "$DDA" && -d "$DIA" ]]; then
            echo "Test datasets found in cache"
            exit 0
          fi
          mkdir -p testdata
          gh release download test-data-v1 -D testdata
          unzip testdata/DDA_HeLa_50ng_5_6min.d.zip -d testdata
          unzip testdata/DIA_HeLa_50ng_5_6min.d.zip -d testdata
          rm testdata/*.d.zip
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test.yml around lines 42 - 58, The cache step ("Cache test
datasets" with id cache-testdata) should be followed by a verification that
required dataset directories exist; modify or replace the "Download test
datasets" step so it always runs a short verify-or-download script (step name
e.g. "Ensure test datasets") that checks for the two expected directories (the
DDA and DIA folder paths) and only downloads/unzips the release (using GH_TOKEN
and gh release download) when those directories are missing or incomplete;
ensure the step exits success if verification passes so downstream jobs use the
restored cache safely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/test.yml:
- Around line 42-58: The cache step ("Cache test datasets" with id
cache-testdata) should be followed by a verification that required dataset
directories exist; modify or replace the "Download test datasets" step so it
always runs a short verify-or-download script (step name e.g. "Ensure test
datasets") that checks for the two expected directories (the DDA and DIA folder
paths) and only downloads/unzips the release (using GH_TOKEN and gh release
download) when those directories are missing or incomplete; ensure the step
exits success if verification passes so downstream jobs use the restored cache
safely.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4541dc30-66d9-46a3-91d1-522a062fb74c

📥 Commits

Reviewing files that changed from the base of the PR and between e37726c and 5cfb010.

📒 Files selected for processing (1)
  • .github/workflows/test.yml

timosachsenberg and others added 4 commits March 12, 2026 11:13
- Make Tof2MzConverter::convert and Scan2ImConverter::convert pub in
  stub mode — they're called from lib.rs (different module)
- Add extern crate timsrust_cpp_bridge to tests/common/mod.rs so the
  linker includes the rlib's #[no_mangle] extern "C" symbols, fixing
  undefined symbol errors in integration test binaries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…C++ tests

- Make stub SpectrumReader, FrameReader, Tof2MzConverter, Scan2ImConverter
  pub(crate) so they're visible through pub(crate) fields on TimsDataset
- Handle tims_open_with_config failures gracefully in calibration and
  combined config C++ tests — timsrust 0.4.2 panics on calibrate=on
  with certain datasets (caught by our panic safety wrapper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On ARM Linux c_char is u8 (unsigned), not i8. Using 0i8 literals and
passing to CStr::from_ptr caused type mismatches on aarch64.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@timosachsenberg timosachsenberg requested a review from singjc March 12, 2026 10:46
Comprehensive guide for integrating timsrust_cpp_bridge into OpenMS,
covering CMake setup, FileTypes registration, BrukerTdfFile handler,
DIA-PASEF metadata, config support, and converter utilities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@timosachsenberg timosachsenberg merged commit 4f31019 into master Mar 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant