BUGGED NOT FIT FOR USE

FastFilter2

FastFilter2 is a high-performance, production-ready Python tool for filtering paired-end FASTQ files. Designed for bioinformatics pipelines, it provides flexible, reliable, and fast filtering of sequencing data with built-in support for multi-threading, compression, and detailed summaries.

This tool is ideal for pre-processing RNA-seq, DNA-seq, or other high-throughput sequencing datasets prior to alignment, assembly, or downstream analysis.

Key Features

Biological quality filters:
- Minimum read length
- Maximum allowed ambiguous bases (Ns)
- Homopolymer detection
- Minimum average Phred quality score
Paired-end safe: Ensures that reads are filtered in pairs, maintaining synchronization between R1 and R2.
High-performance I/O: Writes uncompressed FASTQ first for speed, then compresses output automatically with pigz using multiple threads.
Batch processing: Efficient batch writing to reduce I/O overhead.
Progress tracking: Real-time progress bars via tqdm for monitoring large datasets.
Summary output: Generates CSV reports with total reads, passing reads, and pass rates.

Installation

Clone the repository and install dependencies:

git clone https://github.com/GamaPintoLab/fastfilter2.git
cd fastfilter2
pip install -r requirements.txt

Dependencies:

Python 3.9 or higher
Biopython
tqdm
pigz (for multi-threaded compression)

Usage

Run the tool from the command line:

fastfilter2 -i /path/to/input_fastq_dir -o /path/to/output_dir -j 4

Command-line Options

-i, --seq-dir : Input directory containing paired-end FASTQ files (required)
-o, --output-dir : Directory to write filtered outputs (defaults to <input_dir>/fastfilter)
-j, --cpus : Number of threads for parallel processing and compression (default: 1)
-l, --minlen : Minimum sequence length (default: 25)
-p, --homopolymerlen : Maximum allowed homopolymer length (default: 25)
-s, --min-score : Minimum average Phred quality score (default: 30)
--dryrun : Run without writing outputs (for testing)

Example

fastfilter2 -i samples/fastq -o results/filtered -j 8 -l 50 -s 20 --dryrun

This example processes paired-end FASTQ files in samples/fastq using 8 CPU threads, filters reads shorter than 50 bases or with average quality below 20, and performs a dry run without writing files.

How It Works

Input parsing: Reads paired-end FASTQ files and validates file pairs.
Filtering: Applies multiple biological filters to each read:
- Removes reads with ambiguous bases (N or .)
- Filters out reads with homopolymers above a given threshold
- Filters based on minimum length and average Phred score
Batch writing: Writes passing reads in batches to reduce I/O overhead.
Compression: Automatically compresses output FASTQ files with pigz for speed and storage efficiency.
Reporting: Produces a summary CSV with per-file statistics including total reads, passing reads, and pass rates.

Output

Filtered paired-end files are named:

<sample_name>_R1_FILTERED.fastq.gz
<sample_name>_R2_FILTERED.fastq.gz

Summary CSV:

fastfilter_summary.csv containing:

file: sample name
total_reads: number of reads in input
good_reads: reads passing filters
pass_rate_pct: percentage of reads passing filters

Performance

Multi-threaded filtering and compression using multiprocessing and pigz.
Efficient memory usage via batch processing of reads.
Suitable for very large FASTQ datasets (tens to hundreds of millions of reads).

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Built on top of the original FastFilter concept
Biopython community for sequence handling tools
tqdm for progress visualization
pigz for high-speed parallel compression

Author: Lucas Monteiro
PI: Margarida Gama-Carvalho
Lab: RNA Systems Biology Lab, BioISI, University of Lisbon

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitignore		.gitignore
BUGS.md		BUGS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
fastfilter2.py		fastfilter2.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BUGGED NOT FIT FOR USE

FastFilter2

Key Features

Installation

Usage

Command-line Options

Example

How It Works

Output

Performance

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BUGGED NOT FIT FOR USE

FastFilter2

Key Features

Installation

Usage

Command-line Options

Example

How It Works

Output

Performance

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages