Automated calculation of genomewide CADD scores as accumulants of functional annotations
Samenvatting project
Genomic prediction (GP) has revolutionized the world of animal breeding, but the lack of functional genomic information currently hampers further development. Although successful, GP uses the genome as a black box, by working with a set of genomic markers distributed across the genome to predict the performance of an animal. However, recent developments in animal genomics have produced frameworks to score genomic variants on their likely functionality, which will facilitate the rapid discovery of novel functional variants and improved genomic prediction. The Combined Annotation Dependent Depletion (CADD) framework is a tool that can predict the impact of mutation via integration of multiple annotations into one metric. Accurate impact prediction of mutations is extremely valuable to understand the genotype-phenotype link, one of the major research topics in the life sciences. These CADD scores are built on important layers of annotations that include sequence context, conservation scores, gene expression data, non-synonymous mutation scores, and epigenomic data. CADD was originally developed for human and in 2020 we have produced and published CADD for two livestock species, pig, and chicken. However, as increasingly more functional data is being generated, regular updates are needed to integrate this new information into the CADD scores. We recently, generated a highly improved reference genome and generated functional genome information (RNAseq data) for many tissues and developmental stages for turkey. Furthermore, an additional wealth of sequence, genotypic and phenotypic information for several elite turkey populations are available at Hendrix Genetics. This new information now enables developing and deploying a CADD approach in turkey as well. With the concurrent development of a versatile bioinformatics pipeline to calculate and compare CADD scores, we plan to generate a resource that will drive the future development of functional genomics resources in turkey and other livestock species.
Doel van het project
The overall goal of the project is to utilize functional genomic information to assist in the identification of causal variants for health and robustness traits in turkey and to improve genomic prediction with specific emphasis on health and robustness traits in turkey.
Motivatie
The proposed activities fit within KIA MMIP S2 Biotechnology and Breeding. Within our proposal we will develop a bioinformatics tool (based on our established knowledgebase, genomics data and AI approaches) for precision breeding in poultry with specific emphasis on turkey.
Geplande resultaten
1.	A database with CADD C-scores for every possible single nucleotide variant in the turkey genome.
2.	A standardized bioinformatics pipeline for regular updates of CADD C-scores in livestock with turkey and chicken as use cases.
3.	Implementation of poultry breeding strategies that include the use of CADD scores for identification of novel causal genetic markers.
4.	Scientific peer reviewed publication describing the development and use of turkey CADD scores for the identification of causal variants.
Output
Demonstrator-2
Publicatie(s) peer reviewed – verwacht-2
Impact
Impact –
The most important impact is the availability of CADD scores for every possible mutation in the genomes of chicken and turkey to the breeding companies Hendrix Genetics (layer chicken and turkey) and Cobb-Vantress (broiler chicken). CADD scores in chicken and turkey can be used as a tool to prioritize variants identified in a GWAS analyses in order to identify likely causative variants.
Another important impact of the project is the identification of several structural variants in turkey that might affect relevant phenotypes in some of these lines. We found evidence for several deletions and two examples of an insertion and an inversion likely affecting a specific trait in turkey lines of Hendrix Genetics.
The project also has provided the data (haplotype genome assemblies) to generate a pan-genome reference based on the 6 elite turkey breeding lines of Hendrix Genetics. This pan-genome reference will aid future genotyping of SVs and identification of causative variants in these breeding lines of Hendrix Genetics.