Refactor artifacts detection + add sturation detection by samuelgarcia · Pull Request #4297 · SpikeInterface/spikeinterface

samuelgarcia · 2026-01-06T16:00:18Z

@JoeZiminski @oliche

This refactor the artifact detection and blanking

with a main multi method function : detect_artifact_periods()
return periods with the new base_period_dtype instead list of list
SilencedPeriodsRecording also use this new dtype
DetectArtifactAndSilentPeriodsRecording (replace SilencedArtifactsRecording) do detect and silence at the same
time to be used in preprocess dict (@chrishalcrow this is for you)
refactor the method of Pierre based on envleope
incorporate the method of @oliche and @JoeZiminski to detect saturation

Could also be done in this PR:

add a margin_ms option in SilentPeriodsRecording to ewxtend the zeroing with a mragin : do we need this ?

To be done in 2 futures PR:

handle margin and taper on border of SilentPeriod (more tricky that it appear when period is at border...)
rewrite RemoveArtifactsRecording based on SilencedPeriodsRecording and the new dtype

yger · 2026-01-12T14:38:15Z

@samuelgarcia do not forget, in this PR, to change DetectThresholdCrossing(recording, ...) to DetectThresholdCrossing(envelope, ...) in the detec_period_artefacts_by_envelope

samuelgarcia · 2026-01-21T10:03:45Z

Hi @oliche @JoeZiminski
I copy paste and clean a bit your PR #4301 into this one.
Have a look and tell if this is OK.

I think we do not need to scaled traces at each chunk we could only could the unscaled threhold in the init and then we could avoid the costly convertion to float32 or float64.

Also I put the unit in uV to be more SI friendly.

… compatibility

samuelgarcia · 2026-01-21T13:22:02Z

@alejoe91 @chrishalcrow ready to review on my side

src/spikeinterface/preprocessing/detect_artifacts.py

samuelgarcia · 2026-01-22T10:49:51Z

src/spikeinterface/preprocessing/silence_periods.py

+            self, recording, periods=artifacts, mode=mode, noise_levels=noise_levels, seed=seed, **noise_levels_kwargs
+        )
+        # note self._kwargs["periods"] is done by SilencedPeriodsRecording and so the computaion is done once
+


add detect_artifact_method detect_artifact_kwargs

src/spikeinterface/preprocessing/detect_artifacts.py

chrishalcrow · 2026-01-22T16:44:12Z

Hello, trying this out on real data with saturation. I get this output

artifact_periods=array([(0, 637721, 638769), (0, 638955, 639206), (0, 923219, 923602),
       (0, 923605, 923606), (0, 923607, 923835)],

Are these periods inclusive? And if they are, should the last pair be merged?

src/spikeinterface/preprocessing/detect_artifacts.py

JoeZiminski · 2026-01-22T22:51:27Z

Hey @samuelgarcia thanks for this! will check this out tomorrow. I accidentally did not push two tidy-up commits to #4301, do you mind if I push to your branch to add some changes? I think it is mostly superficial stuff like docstrings etc.

samuelgarcia · 2026-01-23T16:59:41Z

Hello, trying this out on real data with saturation. I get this output
artifact_periods=array([(0, 637721, 638769), (0, 638955, 639206), (0, 923219, 923602),
       (0, 923605, 923606), (0, 923607, 923835)],
Are these periods inclusive? And if they are, should the last pair be merged?

Yes, the last pair should be merged.
Let us check. the _collapse_events() is done for this purpose.
@JoeZiminski can you look at this ?

samuelgarcia · 2026-01-23T17:00:10Z

Hey @samuelgarcia thanks for this! will check this out tomorrow. I accidentally did not push two tidy-up commits to #4301, do you mind if I push to your branch to add some changes? I think it is mostly superficial stuff like docstrings etc.

yes of course. push anything you want

for more information, see https://pre-commit.ci

JoeZiminski · 2026-01-30T13:47:39Z

Hey @samuelgarcia thanks for this, I added the content of those commits I missed. I ran into some issues with the (now renamed) uV_per_second_threshold . This is a threshold for the first derivative in units uV/s so it can be a very small number (e.g. IBL default in volts, converted to uV is 1e-2; for example, with the test data the threshold is around 0.0004 which goes negative after the round trip to int and back. Please correct me if wrong, the dynamic range of NP probes pm 5mV at int16 the bit resolution is 0.153 μV. I think this might be why @chrishalcrow is seeing those missed samples in his data.

If possible I'd prefer to cast and scale the data to avoid such issues, then we always know we are solid whatever the user inputs. I get 0.047298s to cast, offset and scale 10 million samples so I don't think this will be a bottleneck. WDYT?

samuelgarcia · 2026-02-02T12:36:31Z

src/spikeinterface/preprocessing/tests/test_detect_artifacts.py

    max_ = np.max(np.r_[data_seg_1.flatten(), data_seg_2.flatten()])
    gain =  max_ / 2**15
-    offset = 0
+    offset = 50


@JoeZiminski : why did you put 50 as offset here ?

Apologies forgot to say, just to ensure it is tested (if it was zero and was accidentally not applied in the function, the test would not flag it)

(that was just a quick test, see update below)

samuelgarcia · 2026-02-02T12:39:05Z

Please correct me if wrong, the dynamic range of NP probes pm 5mV at int16 the bit resolution is 0.153 μV. I think this might be why @chrishalcrow is seeing those missed samples in his data.

NP1 is 12bits, so I think that the lsb is 5000 / 2**11 = 2.4uV
Am I wrong ?

JoeZiminski · 2026-02-03T00:15:33Z

Hey @samuelgarcia I didn't realize the bit depth of the ADC is not 16-bit! I see here ADC resolution is given as 10 bits. In any case, the resolution from the gains/offsets used to scale the data won't be able to represent the values used for the uV_per_second_threshold?

oliche · 2026-02-03T15:42:25Z

@JoeZiminski the sample to volts conversion is a probe dependent variable. To get a feel of the complexity of the matter, here are the Neuropixel possible configs for all probes released to date: https://docs.google.com/spreadsheets/d/1UZycmxwwaJGCJ4AknyXrHkDXWOUO24TtAnBbe1mQ9u4/edit?gid=0#gid=0

As you can see there are potentially multiple conversion factors depending on the probe vintage.

It seems the SpikeInterface canon is to leave this up to the user. Our design in https://github.com/int-brain-lab/ibl-neuropixel has been to read it from the metadata, making sure we are compatible with the oldest probe versions we observed at IBL.

JoeZiminski · 2026-02-03T15:53:23Z

Thanks @oliche, in this case @samuelgarcia I think we should definitely scale the data rather than the parameters, as we cannot be sure the gains / offsets used to scale the data will not destroy information when converting float-to-int e.g. uV_per_second_threshold. We could play around with applying the division by fs at another point but it seems a little brittle and not worth the very minor speed-up.

src/spikeinterface/preprocessing/detect_artifacts.py

JoeZiminski · 2026-02-06T15:58:48Z

Hey @samuelgarcia @alejoe91 thanks for the feedback, I'm running into the below issue calculating the derivative on int16 data. I don't think this is an artefact of the test data but would be good to check, @oliche too.

See below for the test data in float and int. When taking the derivative, information is lost due to overflow.

The above is the data converted to int, each timeseries is the data from one channel. In many instances, the data after the saturation is negative in int space, resulting in overflow

e.g. by definition the saturation point is at the max of the int16 range, 32767. Often, when the saturation goes back to baseline, the datapoint is negative in int space e.g. -10000. When you take the diff e.g. -10000 - 32767 you get overflow and the information is lost. It seems like this would happen in real data as well?

for more information, see https://pre-commit.ci

oliche · 2026-02-07T13:05:04Z

@JoeZiminski I think the test data is not realistic. Usually the maximum int recorded on neuropixel is 512 on NP1 and 8192 on NP2 so you wouldn't have the issues above as the max diff would be 8192 * 2 (2 ** 14).

However in general I think it is a poor practice to compute in int16 for this kind of issues and patchy support of the numpy suite. Users may unwittingly create weird situations, exactly like this one: #4175.

Anyways for this specific saturation detection problem, you just have to fix your test data by staying in the maxint ranges above.

JoeZiminski · 2026-02-07T13:39:47Z

Cheers @oliche! That's good to know for NP probes. For other providers like Nexus, run their ADC at 16 bit-depth (e.g. here), in this case it might be a problem? That being said I don't know how often these probes saturate and whether people are likely to run this function for non-NP1 probes. But yes I think the easiest solution to avoid these worries is to perform computation in float.

oliche · 2026-02-08T09:03:12Z

If you think you may exceed half the the range of int16 (ie higher than 2 ** 14 or lower than -2 ** 14), then you can check for the max(abs()) values. If they are above or below, you may throw a warning and only perform the absolute saturation detection and skip the one based on the derivative ?

That would prevent too many false positives.

JoeZiminski · 2026-02-09T13:27:25Z

Thanks @oliche, yes in this case I would definitely suggest @samuelgarcia we perform operations in float here to avoid having to worry about the unpredictable bit-depth of the data.

alejoe91 · 2026-03-16T14:20:01Z

Hi guys, I've debugged this a bit and I think it's much simpler to specify the diff threshold in diff_threshold_uV. This would be the threshold in the uV per sample, for example, if diff_threshold_uV=200 it means that if the next sample has a diff larger than 200uV, the current sample is flagged as an artifact. I think this makes it much easier to grasp, with respect to using uv_per_seconds ot uv_per_ms.

Thoughts?

JoeZiminski · 2026-03-17T15:22:55Z

Hey @alejoe91 thanks I completely agree that is much more intuitive and easy to use. Thanks for finalising this! My only comments are:

The traces should be cast to float before the derivative threshold is applied (to avoid overflow issues as above). As far as I can tell the traces are currently passed as int16 by default to compute
I am not 100% convinced that the saturation detection and envelope detection should be merged into a single function. What is the use case of the envelope detection / when should it be applied? Is it similar to the saturation detection in that it should be computed before any preprocessing has been performed and applied after? If the use case are exactly the same and should be applied in exactly the same way, and these are just different methods to achieve the same thing, then I can see the benefit.

But otherwise, I cannot see the benefit, and it makes things confusing as a user. I am a 'user' interested in saturation detection and now I have spent 20 minutes researching envelope detection to see if I should be using that instead (which I have just done, and I'm still not sure). Passing a dictionary of kwargs that changes based on the value of another argument is also quite confusing, and you loose IDE autocomplete and it makes documentation difficult. I have not seen this pattern before in common frameworks, are there any other examples of it? If all three functions are made publically available and documented its not so bad, but in that case I cannot see the marginal benefit of the detect_artifact_periods function.

alejoe91 · 2026-03-17T15:39:38Z

Hey @alejoe91 thanks I completely agree that is much more intuitive and easy to use. Thanks for finalising this! My only comments are:

The traces should be cast to float before the derivative threshold is applied (to avoid overflow issues as above). As far as I can tell the traces are currently passed as int16 by default to compute

I am not 100% convinced that the saturation detection and envelope detection should be merged into a single function. What is the use case of the envelope detection / when should it be applied? Is it similar to the saturation detection in that it should be computed before any preprocessing has been performed and applied after? If the use case are exactly the same and should be applied in exactly the same way, and these are just different methods to achieve the same thing, then I can see the benefit.
But otherwise, I cannot see the benefit, and it makes things confusing as a user. I am a 'user' interested in saturation detection and now I have spent 20 minutes researching envelope detection to see if I should be using that instead (which I have just done, and I'm still not sure). Passing a dictionary of kwargs that changes based on the value of another argument is also quite confusing, and you loose IDE autocomplete and it makes documentation difficult. I have not seen this pattern before in common frameworks, are there any other examples of it? If all three functions are made publically available and documented its not so bad, but in that case I cannot see the marginal benefit of the detect_artifact_periods function.

Hey @alejoe91 thanks I completely agree that is much more intuitive and easy to use. Thanks for finalising this! My only comments are:

The traces should be cast to float before the derivative threshold is applied (to avoid overflow issues as above). As far as I can tell the traces are currently passed as int16 by default to compute

Done! Just casting to float at the top of the compute!

I am not 100% convinced that the saturation detection and envelope detection should be merged into a single function. What is the use case of the envelope detection / when should it be applied? Is it similar to the saturation detection in that it should be computed before any preprocessing has been performed and applied after? If the use case are exactly the same and should be applied in exactly the same way, and these are just different methods to achieve the same thing, then I can see the benefit.
But otherwise, I cannot see the benefit, and it makes things confusing as a user. I am a 'user' interested in saturation detection and now I have spent 20 minutes researching envelope detection to see if I should be using that instead (which I have just done, and I'm still not sure). Passing a dictionary of kwargs that changes based on the value of another argument is also quite confusing, and you loose IDE autocomplete and it makes documentation difficult. I have not seen this pattern before in common frameworks, are there any other examples of it? If all three functions are made publically available and documented its not so bad, but in that case I cannot see the marginal benefit of the detect_artifact_periods function.

I think I agree with this. We can keep it simple, have the 2 functions and users can directly choose what they want to do. @samuelgarcia ok to remove detect_artifact_periods?

alejoe91 · 2026-03-17T16:42:35Z

Another note: the last commits also add the option to annotate a recording with saturation_threshold_uV and use that automatically when set. In a follow up, we cna port the logic from ibl-neuropixel to set this automatically based on metadata. We'll have to see how to do it for Open Ephys as well

src/spikeinterface/preprocessing/detect_artifacts.py

alejoe91 · 2026-03-19T15:35:11Z

src/spikeinterface/core/node_pipeline.py

+    # Do not remove this, this is to remenber that in previous version the first node needed to be
+    # a detectot but not anymore
+    # node0 = nodes[0]
+    # if not isinstance(node0, PeakSource):
+    #     raise ValueError(
+    #         "Peak pipeline graph must have as first element a PeakSource (PeakDetector or PeakRetriever or SpikeRetriever"
+    #     )


I suggest making this optional and have a check_for_peak_source flag

alejoe91 · 2026-03-19T16:31:10Z

src/spikeinterface/preprocessing/detect_artifacts.py

+        sample-to-sample voltage change exceeds this value in the required
+        fraction of channels are flagged as saturation.  Pass ``None`` to
+        disable derivative-based detection and rely solely on
+        ``saturation_threshold_uV``.  IBL use **300 μV/sample** for NP1 probes.


@JoeZiminski can you check this 300uV/sample value?

Refactor aftifacts detection.

5610518

alejoe91 added the preprocessing Related to preprocessing module label Jan 6, 2026

alejoe91 changed the title ~~Refactor aftifacts detection.~~ Refactor artifacts detection Jan 7, 2026

JoeZiminski mentioned this pull request Jan 15, 2026

Compute saturation intervals ala IBL #4301

Draft

Integrate the saturation detector

ef3e820

samuelgarcia added 2 commits January 21, 2026 14:11

Progagate new periods dtype to SilencedPeriodsRecording with backward…

a71dade

… compatibility

oups

0d709e6

samuelgarcia changed the title ~~Refactor artifacts detection~~ Refactor artifacts detection + add sturation detection Jan 21, 2026

samuelgarcia marked this pull request as ready for review January 21, 2026 13:21

yger reviewed Jan 21, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Show resolved Hide resolved

oups

7ab75c0

samuelgarcia commented Jan 22, 2026

View reviewed changes

chrishalcrow reviewed Jan 22, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

chrishalcrow reviewed Jan 22, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

chrishalcrow reviewed Jan 22, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

chrishalcrow reviewed Jan 22, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

JoeZiminski and others added 3 commits January 29, 2026 15:40

Applying extras from other PR, adding voltage_per_sec_threshold.

7c7446e

Rename uV_per_sec_threshold.

86e6924

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b17890

for more information, see https://pre-commit.ci

samuelgarcia commented Feb 2, 2026

View reviewed changes

samuelgarcia added this to the 0.104.0 milestone Feb 3, 2026

alejoe91 reviewed Feb 5, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

alejoe91 reviewed Feb 5, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

JoeZiminski mentioned this pull request Feb 6, 2026

Suggestion to always cast the data to float and scale on load #4371

Open

JoeZiminski and others added 2 commits February 6, 2026 16:01

Updates testing converting to int.

4d507ad

[pre-commit.ci] auto fixes from pre-commit.com hooks

6546166

for more information, see https://pre-commit.ci

Merge branch 'main' into artifact_detector_refactor

5fd5ddd

alejoe91 mentioned this pull request Mar 9, 2026

blank_staturation with a window around saturation signal #1541

Closed

solve conflicts

dc5a030

Use diff_threshold_uV, clean up, extend tests

4df3ff0

alejoe91 added 3 commits March 17, 2026 16:35

fix: get rid of forkserver mp_context

fb9867b

cast traces to float32 to aboid overflow issues

a8c3ef5

Merge branch 'main' into artifact_detector_refactor

3f0b371

alejoe91 reviewed Mar 18, 2026

View reviewed changes

src/spikeinterface/preprocessing/detect_artifacts.py Outdated Show resolved Hide resolved

Fix envelope and remove mute_window_ms

b6544db

alejoe91 reviewed Mar 19, 2026

View reviewed changes

Add check_for_peak_source in run_node_pipeline and update api.rst

6032924

alejoe91 reviewed Mar 19, 2026

View reviewed changes

alejoe91 approved these changes Mar 19, 2026

View reviewed changes

Conversation

samuelgarcia commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yger commented Jan 12, 2026

Uh oh!

samuelgarcia commented Jan 21, 2026

Uh oh!

samuelgarcia commented Jan 21, 2026

Uh oh!

Uh oh!

samuelgarcia Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrishalcrow commented Jan 22, 2026

Uh oh!

Uh oh!

JoeZiminski commented Jan 22, 2026

Uh oh!

samuelgarcia commented Jan 23, 2026

Uh oh!

samuelgarcia commented Jan 23, 2026

Uh oh!

JoeZiminski commented Jan 30, 2026

Uh oh!

samuelgarcia Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

JoeZiminski Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

JoeZiminski Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

samuelgarcia commented Feb 2, 2026

Uh oh!

JoeZiminski commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliche commented Feb 3, 2026

Uh oh!

JoeZiminski commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

JoeZiminski commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliche commented Feb 7, 2026

Uh oh!

JoeZiminski commented Feb 7, 2026

Uh oh!

oliche commented Feb 8, 2026

Uh oh!

JoeZiminski commented Feb 9, 2026

Uh oh!

alejoe91 commented Mar 16, 2026

Uh oh!

JoeZiminski commented Mar 17, 2026

Uh oh!

alejoe91 commented Mar 17, 2026

Uh oh!

alejoe91 commented Mar 17, 2026

Uh oh!

Uh oh!

alejoe91 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

alejoe91 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

samuelgarcia commented Jan 6, 2026 •

edited

Loading

JoeZiminski commented Feb 3, 2026 •

edited

Loading

JoeZiminski commented Feb 6, 2026 •

edited

Loading