Skip to content

fix: read quality ASCII has @#1

Open
nttg8100 wants to merge 3 commits intoALSER-Lab:mainfrom
nttg8100:fix-read-quality-conversion
Open

fix: read quality ASCII has @#1
nttg8100 wants to merge 3 commits intoALSER-Lab:mainfrom
nttg8100:fix-read-quality-conversion

Conversation

@nttg8100
Copy link

@nttg8100 nttg8100 commented Mar 17, 2026

The Bug

Original Issue: FASTR silently dropped 7-8 reads during conversion due to incorrect chunk boundary detection.

Root Cause: toFASTR_chunk_processor.py:135 used buffer.rfind(b"\n@") to find FASTQ record boundaries. This fails because quality scores can contain "@" characters, causing the code to split chunks in the middle of records.

Impact:

  • Input: 272,479 reads
  • After buggy conversion: 272,472 reads (7 lost!)
  • Paired files desynchronized
  • BWA-MEM2 produced 0 alignments

The Fix

Implemented proper FASTQ record boundary detection in find_last_fastq_record_boundary(), check detail in file change

App

The key idea is that using fastr as the plug-and-play tool for one of the most common aligner bwa-mem2. Here, bwa-mem2 will read the read that is streaming by the fastr to fastq command line. This idea can be used for similar tool that can read from stdin.

Performance

I want to test whether the plug-n-play takes more resource to compute. For small dataset, it is fine to use fastr with slight change on running time

CI/CD

The issue is that the fastr lacks of the testing for its functions, I added the small tests for at least verifying the input and output of reads.

…Hub Actions

- Move performance test scripts from separate location to app/bwa-mem2/
- Update GitHub Actions workflow to reference app/bwa-mem2/test_data
- Simplify documentation to reflect new directory structure
- Verify workflow syntax is valid and ready for CI/CD
@nttg8100 nttg8100 changed the title fix: read qualith ASCII has @ fix: read quality ASCII has @ Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant