Coronavirus disease 2019 (COVID-19) has been rapidly spreading across the world since its first identification. This unprecedented pandemic has led to the tragic loss of human life and has presented a colossal challenge to public health research. The disease is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), whose genome is of great importance to the elucidation of viral dynamics and the search for a proper treatment.
Assembling short reads from sequencing techniques into the correct order is a fundamental task of bioinformatics research. A number of methods have been developed over the past few decades, among which Euler path based techniques have proved efficient and reliable. In this project, we construct De Bruijn graphs from SAS-Cov-2 whole genome sequencing reads and implement an algorithm to establish an Euler path in each graph to explore the factors, such as the length of K-mers, that might affect genome assembly. In addition, we explore methods to visualize the graph and validate our results with a reference genome. Furthermore, we report the results of our tentative approaches to comparing the impact of different lengths of K-mers and the results of the De Bruijn graph construction and visualization.
If you would like to read more about our results and findings, please visit our GitHub Repository.