Open
Conversation
…Hub Actions - Move performance test scripts from separate location to app/bwa-mem2/ - Update GitHub Actions workflow to reference app/bwa-mem2/test_data - Simplify documentation to reflect new directory structure - Verify workflow syntax is valid and ready for CI/CD
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Bug
Original Issue: FASTR silently dropped 7-8 reads during conversion due to incorrect chunk boundary detection.
Root Cause:
toFASTR_chunk_processor.py:135usedbuffer.rfind(b"\n@")to find FASTQ record boundaries. This fails because quality scores can contain "@" characters, causing the code to split chunks in the middle of records.Impact:
The Fix
Implemented proper FASTQ record boundary detection in
find_last_fastq_record_boundary(), check detail in file changeApp
The key idea is that using fastr as the plug-and-play tool for one of the most common aligner bwa-mem2. Here, bwa-mem2 will read the read that is streaming by the fastr to fastq command line. This idea can be used for similar tool that can read from stdin.
Performance
I want to test whether the plug-n-play takes more resource to compute. For small dataset, it is fine to use fastr with slight change on running time
CI/CD
The issue is that the fastr lacks of the testing for its functions, I added the small tests for at least verifying the input and output of reads.