fix: add speaker embedding matching to offline sync (issue #5907) by sungdark · Pull Request #5946 · BasedHardware/omi

sungdark · 2026-03-23T12:56:46Z

Fix: Offline sync no speaker diarization (issue #5907)

Problem

Offline recording sync (sync_local_files / process_segment) was skipping the speaker identification pipeline, causing all transcribed segments to show generic 'SPEAKER_00', 'SPEAKER_01' labels instead of being matched against stored person embeddings. Live recording worked correctly because it runs speaker_identification_task which calls get_speech_profile_matching_predictions to identify speakers from their voice embeddings.

Solution

Added the same speaker embedding matching call to process_segment after postprocess_words returns. The get_speech_profile_matching_predictions function extracts speaker embeddings from the audio and matches them against stored person embeddings, setting is_user and person_id on each segment.

Changes

backend/routers/sync.py:
- Added import for get_speech_profile_matching_predictions
- Added speaker matching call in process_segment (after getting transcript segments, before storing them)

Testing

The fix follows the same pattern used in postprocess_conversation.py's _handle_segment_embedding_matching function and the speaker_identification_task in transcribe.py.

Closes #5907

Offline sync (sync_local_files / process_segment) was skipping the speaker identification pipeline, causing all transcribed segments to show generic 'SPEAKER_00', 'SPEAKER_01' labels instead of being matched against stored person embeddings. Live recording runs speaker_identification_task which calls get_speech_profile_matching_predictions to identify speakers from their voice embeddings. This fix adds the same call to process_segment after postprocess_words returns. Fixes BasedHardware#5907

greptile-apps · 2026-03-23T12:59:36Z

Greptile Summary

This PR fixes issue #5907 by adding speaker embedding matching to the offline sync path (process_segment in backend/routers/sync.py), bringing it in line with the live-recording pipeline that already calls get_speech_profile_matching_predictions. The change is small and well-scoped: it mirrors the pattern established in _handle_segment_embedding_matching (postprocess_conversation.py) and wraps the new call in a try/except so failures degrade gracefully.

Key changes:

Imports get_speech_profile_matching_predictions from utils.stt.speech_profile
Calls the speaker-matching API after postprocess_words returns, before segments are stored or merged with an existing conversation
On failure the exception is caught and logged, segments retain their default SPEAKER_* labels rather than crashing the sync

Minor issues found:

path.replace('.bin', '.wav') is a no-op — paths passed to process_segment are always .wav files from segmented_paths; the variable should simply be wav_path = path
No bounds check before matches[i]: if the remote API returns fewer items than transcript_segments, the loop raises an IndexError that is swallowed by except Exception, silently skipping all speaker attribution

Confidence Score: 4/5

Safe to merge — the fix is wrapped in a try/except and only adds new behaviour to a previously-broken code path; any failure leaves offline sync no worse than before.
The logic is correct and follows the established pattern. The two issues flagged are both style/defensive-coding concerns (P2), not runtime blockers — failures are caught and logged. The IndexError risk is real but only manifests in an edge case where the speech-profile API returns a malformed response, and even then the silent fallback is acceptable rather than data-corrupting.
No files require special attention; backend/routers/sync.py is the only changed file and the concerns are minor.

Important Files Changed

Filename	Overview
backend/routers/sync.py	Adds speaker embedding matching to the offline sync path by calling `get_speech_profile_matching_predictions` after transcription. Functionally mirrors the live-recording pipeline. Two style-level concerns: (1) `path.replace('.bin', '.wav')` is a no-op since segmented paths are already `.wav`, and (2) no bounds check before indexing into `matches`, which would silently skip all speaker data if the API returns a shorter list.

Sequence Diagram

sequenceDiagram
    participant Client
    participant sync_local_files
    participant process_segment
    participant deepgram_prerecorded
    participant get_speech_profile_matching_predictions
    participant SpeechProfileAPI
    participant DB

    Client->>sync_local_files: POST /v1/sync-local-files (audio .bin files)
    sync_local_files->>sync_local_files: decode_files_to_wav (.bin → .wav)
    sync_local_files->>sync_local_files: retrieve_vad_segments (split into speech segments)
    sync_local_files->>process_segment: process each segmented .wav (thread)

    process_segment->>deepgram_prerecorded: transcribe via signed URL
    deepgram_prerecorded-->>process_segment: transcript_segments (SPEAKER_00, SPEAKER_01…)

    Note over process_segment: NEW: speaker embedding matching
    process_segment->>get_speech_profile_matching_predictions: uid + wav_path + segments
    get_speech_profile_matching_predictions->>SpeechProfileAPI: POST audio + segments
    SpeechProfileAPI-->>get_speech_profile_matching_predictions: [{is_user, person_id}, …]
    get_speech_profile_matching_predictions-->>process_segment: matches list
    process_segment->>process_segment: set seg.is_user / seg.person_id

    process_segment->>DB: store/merge conversation with identified speakers

_{Reviews (1): Last reviewed commit: "fix: add speaker embedding matching to o..." | Re-trigger Greptile}

greptile-apps · 2026-03-23T12:59:40Z

backend/routers/sync.py

+    # Speaker identification: match segments against stored person embeddings
+    # This uses the same pipeline as live recording (speaker_identification_task)
+    try:
+        wav_path = path.replace('.bin', '.wav')


Unnecessary .bin → .wav path substitution

path passed to process_segment is always a .wav file — it comes from segmented_paths, which are populated inside retrieve_vad_segments with paths like f'{path_dir}/{segment_timestamp}.wav'. The .replace('.bin', '.wav') call has no effect here (there is no .bin in the path), so wav_path is always equal to path.

The misleading substitution is a silent no-op today, but it implies that a .bin path might arrive here. If the call-site ever changes, the speaker matching step would silently try to open a file whose name was never transformed, causing the except block to swallow the error with no speaker data written.

Suggested change

wav_path = path.replace('.bin', '.wav')

wav_path = path # path is already a .wav segment from retrieve_vad_segments

greptile-apps · 2026-03-23T12:59:41Z

backend/routers/sync.py

+        for i, seg in enumerate(transcript_segments):
+            seg.is_user = matches[i]['is_user']
+            seg.person_id = matches[i].get('person_id')


No bounds-check before indexing matches

get_speech_profile_matching_predictions returns [{'is_user': False, 'person_id': None}] * len(segments) on the error paths, but on a successful API response it simply returns whatever the remote service returned — there is no guarantee the length matches transcript_segments. If the response contains fewer items, matches[i] raises an IndexError; if it contains more, extra matches are silently ignored.

The current except Exception wrapper will catch the IndexError and log it, so this is not a crash, but it means speaker identification is completely skipped when the API returns even one fewer result than expected.

Consider guarding the loop or falling back to a safe default when lengths differ:

for i, seg in enumerate(transcript_segments): if i < len(matches): seg.is_user = matches[i]['is_user'] seg.person_id = matches[i].get('person_id')

greptile-apps bot reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add speaker embedding matching to offline sync (issue #5907)#5946

fix: add speaker embedding matching to offline sync (issue #5907)#5946
sungdark wants to merge 1 commit intoBasedHardware:mainfrom
sungdark:fix/offline-sync-speaker-diarization

sungdark commented Mar 23, 2026

Uh oh!

greptile-apps bot commented Mar 23, 2026

Uh oh!

greptile-apps bot Mar 23, 2026

Uh oh!

greptile-apps bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	wav_path = path.replace('.bin', '.wav')
	wav_path = path # path is already a .wav segment from retrieve_vad_segments

Conversation

sungdark commented Mar 23, 2026

Fix: Offline sync no speaker diarization (issue #5907)

Problem

Solution

Changes

Testing

Uh oh!

greptile-apps bot commented Mar 23, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant