P03: Structural variant calling with linked read sequencing data
Open postions: 1 PhD student
Principal investigator: Dr. Birte Kehr
Genomic structural variants (SVs) can cause a multitude of human phenotypes, including genetic diseases. SVs are, however, notoriously difficult to call from short-read sequencing data due to their length and their frequent location within repetitive regions of the genome. These properties create ambiguities in short-read alignments. It is clear that we are in desperate need of additional long-range information if we are to obtain a more complete picture of structural variation. Linked reads are a new and cost-effective type of sequencing that provides long-range sequence information through barcodes which label short reads originating from the same long (~50,000 bp) DNA molecule. Though a very limited number of promising linked read data analysis tools are available, these tools rely on a given read alignment and do not take advantage of reads with ambiguous or without alignment. We hypothesize that utilizing all the linked read data will result in an SV call set that is more comprehensive than those generated with current short read or linked read analysis tools. Given our experience in assembling non-reference sequence, we will use the long-range information from linked reads to develop and implement a genome-wide local assembly approach for identifying SVs in linked read data. This allows the preparation of a comprehensive SV call set for the patient-derived linked read data generated by our RU. Our linked read analysis tool will provide additional information about SVs and long-range haplotypes in whole genome sequencing datasets from patients defying molecular diagnosis by whole exome sequencing, thereby improving the molecular diagnostic rate for patients in the RU’s rare disease cohorts. This work is an integral part of the RU’s overarching goal to significantly reduce the length of the diagnostic odyssey for patients with rare diseases.