Aurélien Cottin defended his thesis on 19 June 2020. Carried out in the framework of the GenomeHarvest flagship project and funded by Agropolis Fondation, his subject was "Characterization of mosaic genomes of crop plants: evaluation of methodologies and application to banana".
Aurélien is under the supervision of Nabila Yahiaoui at Montpellier, SupAgro, in the framework of Biodiversity, Agriculture, Food, Environment, Land, Water, in partnership with AGAP - Amélioration Génétique et Adaptation des Plantes.
Abstract
Many cultivated plants went through intersubspecific hybridization events that are associated to their domestication and diversification processes. This is for example the case of cultivated bananas that are diploid or triploid hybrids, multiplied through vegetative propagation and deriving from different hybridization events between subspecies and species of the Musa genus, that are spread through different regions and islands in South-East Asia. The genomes resulting from these hybridizations have a mosaic structure of sequences from different origins. This mosaic can be characterized by local ancestry inference (LAI) methods. These methods have been, for the most of them, developed in the framework of human genetic studies, for situations with implicit assumptions that may not always fit plant models. The objective of this thesis was to evaluate and apply LAI methods to elucidate crop plant mosaic genome structures, with a specific focus on banana. A program allowing simulation of genotyping data and comparison of local ancestry inference results was set up to evaluate the impact of different characteristics that can be found in non model crop plant datasets, on LAI method performances. Three published LAI methods were compared through simulations. The results have shown that elevated differentiation levels between ancestral populations and small numbers of generations after hybridization events allow a more accurate inference. Moreover, inference accuracy was moderately affected by a relatively small number of representatives of ancestral populations, by the variation of the number of ancestral populations, by selfing in one ancestral population or vegetative propagation for admixed individuals. When one ancestral population was not represented in the dataset, the genome regions contributed by this missing population were here variably assigned by the methods to one or the other represented ancestral population. These methods may thus be used for local ancestry inference in cultivated plants but only if all ancestral populations are sampled. In a second part, SNP data obtained from resequencing of 115 diploid banana accessions were analyzed. These accessions included wild individuals from diverse Musa species and subspecies and diploid cultivars. An approach based on the determination of ratios of allele sequence coverage was used to select representatives of banana known genetic groups with no or low levels of introgression. This approach allowed the visualization of local ancestry from ancestral groups represented in the dataset and also allowed the detection of banana genome regions that could not be assigned to a known origin and that may derive from unknown ancestors. This supports recently published work on the existence of one or two ancestral groups contributing to banana cultivars and for which no wild representatives are yet identified. A dataset from 14 cultivated banana accessions without unknown ancestry was analyzed by the three evaluated LAI methods. Two of these methods have shown highly correlated inference results and mosaic profiles very similar to those obtained with the allelic ratios approach for the 14 accessions. This tends to show that among the LAI methods tested, HMM-based methods can be used on non-model plant datasets as long as ancestral groups are characterized, even with few available representatives. The 14 accessions studied mainly originate from New Guinea, the native area of M. a. ssp. banksii and M. schizocarpa. The inferred mosaics illustrate a more widespread contribution than previously shown of M. schizocarpa to cultivated banana genomes.