📊

RNASEQC: RNAseq Metrics for Quality Control and Process Optimization

Retrieve raw sequencing data from storage

Decompress the raw data if compressed

Validate the integrity of raw data files

Convert raw data into FASTQ format

Perform quality control on FASTQ files

Trim low-quality bases and remove adapter sequences

Align the reads to the reference genome

Sort aligned reads by coordinate

Mark duplicates in the aligned reads

Index the sorted BAM files

Read alignment quality check

Recalibrate base quality scores

Perform variant calling

Annotate identified variants

Perform quality control on the final results

Approval: Lab Manager

Generate reports for RNASEQC metrics

Store the RNASEQC metrics report for future referencing

Clean up temporary files created during the process

Retrieve raw sequencing data from storage

This task involves retrieving the raw sequencing data files from the designated storage location. The raw data contains the information obtained from the sequencing process and serves as the starting point for the analysis. Make sure to locate the correct directory or folder and retrieve the files without modifying or moving them. Please confirm the retrieval of the raw data files by selecting the appropriate option below.

Confirmation of Raw Data Retrieval

1

Yes, data retrieved and intact
2

No, data not found
3

Data retrieval in progress
4

Data retrieval delayed

Decompress the raw data if compressed

In case the raw sequencing data is compressed, it needs to be decompressed before further analysis can be performed. Compression is commonly done to save storage space and facilitate data transfer. Use the appropriate decompression tool based on the file format to extract the raw data. If the data is already in its uncompressed form, skip this task. Please confirm the decompression process by selecting the appropriate option below.

Confirmation of Data Decompression

1

Decompressed successfully
2

Data already in uncompressed form
3

Decompression in progress
4

Decompression failed

Validate the integrity of raw data files

Ensuring the integrity of raw data files is crucial for reliable analysis results. Use a file integrity checking tool to verify the integrity of the raw sequencing data. Checksums or hash values are commonly used to compare the data file against a known value. If the integrity check fails, it indicates data corruption or transmission errors. Please confirm the validation of the raw data file integrity by selecting the appropriate option below.

Confirmation of Data Integrity Validation

1

Data integrity validated
2

Data integrity check failed
3

Validation in progress
4

Validation delayed

Convert raw data into FASTQ format

The raw data obtained from the sequencing process may be in a different format, such as BAM or SAM, which is not suitable for downstream analysis. This task involves converting the raw data into the FASTQ format, which is commonly used for RNA sequencing data. Use a suitable conversion tool or script to perform the conversion. Please confirm the successful conversion of the raw data files to FASTQ format by selecting the appropriate option below.

Confirmation of Data Conversion to FASTQ

1

Data converted to FASTQ successfully
2

Data already in FASTQ format
3

Conversion in progress
4

Conversion failed

Perform quality control on FASTQ files

Performing quality control (QC) on the FASTQ files is essential to assess the overall quality of the sequencing data before further analysis. Use a QC tool or software to generate quality metrics, such as read length distribution, base composition, and per-base quality scores. Review the QC metrics to identify potential issues and confirm if the data meets the quality standards. Please provide your QC report and observations in the field below.

Quality Control Report and Observations

Trim low-quality bases and remove adapter sequences

To improve the quality of the sequencing data, low-quality bases and adapter sequences need to be trimmed. Low-quality bases can introduce noise into the analysis, while adapter sequences are remnants of the sequencing library preparation process. Use a trimming tool or software to remove these sequences while ensuring the remaining reads are of high quality. Please confirm the successful trimming of low-quality bases and removal of adapter sequences by selecting the appropriate option below.

Confirmation of Trimming and Adapter Removal

1

Trimming and adapter removal successful
2

No trimming or removal needed
3

Trimming in progress
4

Trimming failed

Align the reads to the reference genome

Aligning the reads to a reference genome is a fundamental step in the RNA sequencing analysis pipeline. Use a suitable alignment tool or software to align the reads to the reference genome assembly. The aligned reads provide information on their position in the genome and enable downstream analysis, such as variant calling. Please confirm the successful alignment of the reads to the reference genome by selecting the appropriate option below.

Confirmation of Read Alignment

1

Reads aligned successfully
2

No read alignment needed
3

Alignment in progress
4

Alignment failed

Sort aligned reads by coordinate

Sorting the aligned reads by coordinate is important to ensure efficient downstream analysis. Coordinate sorting arranges the aligned reads based on their genomic positions, facilitating variant calling and other analyses that rely on positional information. Use a suitable sorting tool or software to sort the aligned reads by coordinate. Please confirm the successful sorting of the aligned reads by coordinate by selecting the appropriate option below.

Confirmation of Read Sorting

1

Reads sorted by coordinate successfully
2

No read sorting needed
3

Sorting in progress
4

Sorting failed

Mark duplicates in the aligned reads

Duplication of reads can occur during the sequencing process or as a result of library preparation. Marking duplicates is essential to avoid overrepresentation of certain reads in downstream analysis, such as variant calling. Use a duplicates marking tool or software to identify and mark the duplicate reads. Please confirm the successful marking of duplicates in the aligned reads by selecting the appropriate option below.

Confirmation of Duplicate Marking

1

Duplicates marked successfully
2

No duplicate marking needed
3

Marking in progress
4

Marking failed

Index the sorted BAM files

Indexing the sorted BAM files allows for rapid retrieval of specific genomic regions. The index file contains positional information that enables efficient random access to the aligned reads. Use a suitable indexing tool or software to create the index file for the sorted BAM files. Please confirm the successful indexing of the sorted BAM files by selecting the appropriate option below.

Confirmation of BAM File Indexing

1

BAM files indexed successfully
2

No indexing needed
3

Indexing in progress
4

Indexing failed

Read alignment quality check

Performing a quality check on the read alignment is necessary to ensure the alignment results are accurate and reliable. Use a suitable read alignment quality check tool or software to assess the alignment quality based on various metrics, such as mapping quality scores and alignment statistics. Review the quality check results and confirm if the read alignment meets the desired quality standards. Please provide your quality check report and observations in the field below.

Read Alignment Quality Check Report and Observations

Recalibrate base quality scores

Recalibrating the base quality scores is important to improve the accuracy of variant calling and downstream analyses that rely on base quality information. The base quality scores represent the confidence level of each base call and can be affected by various factors during the sequencing process. Use a suitable base quality score recalibration tool or software to recalibrate the scores based on known variants and quality reference data. Please confirm the successful recalibration of the base quality scores by selecting the appropriate option below.

Confirmation of Base Quality Score Recalibration

1

Base quality scores recalibrated successfully
2

No recalibration needed
3

Recalibration in progress
4

Recalibration failed

Perform variant calling

Variant calling is the process of identifying genetic variants, such as single nucleotide polymorphisms (SNPs) and insertions/deletions (indels), compared to the reference genome. Use a suitable variant calling tool or software to perform the variant calling analysis based on the aligned reads. The identified variants provide valuable information for further analysis and interpretation. Please confirm the successful identification of variants by selecting the appropriate option below.

Confirmation of Variant Calling

1

Variants called successfully
2

No variant calling needed
3

Variant calling in progress
4

Variant calling failed

Annotate identified variants

Annotating the identified variants with relevant information, such as gene names, functional impact, and population frequency, enhances the interpretation of the results. Use a suitable variant annotation tool or software to annotate the identified variants based on available reference databases or resources. The annotated variants facilitate the identification of biologically relevant variants. Please confirm the successful annotation of the identified variants by selecting the appropriate option below.

Confirmation of Variant Annotation

1

Variants annotated successfully
2

No variant annotation needed
3

Annotation in progress
4

Annotation failed

Perform quality control on the final results

Performing quality control (QC) on the final results ensures the reliability and accuracy of the analysis. Use suitable QC tools or software to assess various metrics, such as variant call rate, genotype concordance, and allele frequency distribution. Review the QC metrics to identify any potential issues and confirm if the final results meet the quality standards. Please provide your QC report and observations in the field below.

Final Results Quality Control Report and Observations

Approval: Lab Manager

Perform quality control on the final results
Will be submitted

Generate reports for RNASEQC metrics

Generating reports for RNASEQC metrics provides a comprehensive summary of the analysis results and facilitates interpretation and sharing with stakeholders. Use a suitable reporting tool or software to generate the RNASEQC metrics report based on the analysis outputs and QC metrics. The report should include relevant information, such as sequencing statistics, alignment metrics, variant calling results, and quality control assessments. Please upload the generated RNASEQC metrics report in the field below.

Upload RNASEQC Metrics Report

Store the RNASEQC metrics report for future referencing

Storing the RNASEQC metrics report is important for future referencing and reproducibility of the analysis. Save the generated RNASEQC metrics report in a designated storage location or database. Make sure the report is easily retrievable and properly documented for future use. Please confirm the storage of the RNASEQC metrics report by selecting the appropriate option below.

Confirmation of Report Storage

1

Report stored successfully
2

No report storage needed
3

Storage in progress
4

Storage failed

Clean up temporary files created during the process

Cleaning up temporary files created during the analysis process helps free up storage space and maintain a clean and organized workflow. Identify and remove any unnecessary or temporary files generated during the analysis, such as intermediate alignment files or temporary index files. Please confirm the successful cleanup of temporary files by selecting the appropriate option below.

Confirmation of Temporary File Cleanup