Mutation Load in Black Grouse
1 Introduction
This webpage/document contains a summary of the workflow used in the manuscript titled “Genetic architecture of male reproductive success in a lekking bird: insights from predicted deleterious mutations”, Chen et al. 2024 (in review). Please see the github repository for all scripts and more detailed descriptions of the data and analyses.
2 Main goals
In the current study, we used a long-term dataset to (i) quantify the fitness effects of total, homozygous and heterozygous individual genomic mutation loads; (ii) evaluate the relative contributions of deleterious mutations across different genomic regions and biological processes to fitness variation (iii) unravel the behavioural and / or ornamental pathways through which deleterious mutations impact lifetime reproductive success. We used whole genome resequencing, phenotypic and fitness data of 190 male black grouse sampled annually across five study sites in Central
Mutation load can be defined as a statistic that summarizes the selection and dominance coefficients of deleterious mutations as a function of their frequencies in a population (Bertorelle et al. 2022). As we do not have selection and dominance coefficients of mutations in wild populations, we use a proxy for mutation load calculated as the number of deleterious mutations for a given individual.
There are different types of load, e.g. the realized load (expressed load) which reduces fitness in the current generation, and the potential/masked load (inbreeding load) which quantifies the potential fitness loss due to (partially) recessive deleterious mutations that may become expressed in future generations depending on the population’s demography. The genetic load is made up of realized plus masked load.
3 Calculating genetic load
There are generally two most commonly used computational approaches to identify putative deleterious variants from whole genome re-sequencing data. In general, these tools attempt to predict the effect of a mutation on the function or evolutionary fitness of a protein. The two are distinct but can be related; for instance, a loss of function mutation will be strongly selected against if the gene is essential but will tend to be less evolutionary deleterious if the gene is non-essential or if the variant only slightly alters protein function. We used two common approaches:
Genomic Evolutionary Rate Profiling (GERP): This approach uses multi-species genome alignments to identify genomic sites that are strongly conserved over millions of years of evolution, as non-synonymous mutations at these sites have a high likelihood of being deleterious. (Davydov et al. 2010)
SNP effect (SnpEff): This approach predicts the consequences of genomic variants on protein sequences and identifies loss of function and missense variants. (Cingolani et al. 2012).
4 This webpage / document
This webpage can also be found in PDF format on github. Note that in the PDF format, code is not folded which will end up in a lengthy document, and the html looks aesthetically better ;). In both documents, you will find some of the scripts for the analysis performed in this study. Note that not all bioinformatic steps are put on here (only from inferring mutations onwards). You can find the complete set of analyses with their explanations in the github repo.