Africa has the world’s greatest genetic diversity, yet it’s missing from research: we’re filling the gap

By Mary Alexander on 8 June 2026

Africa has the richest diversity and deepest roots in the human genetic tree, yet data from its 2,000 ethnolinguistic groups remains largely uncollected. The AGenDA project is starting to fill that gap.

An artist’s impression of DNA. Lack of genomic data from Africa has profoundly shaped modern medicine, from disease prediction to ancestry testing. (Image: Wikimedia Commons)

Michele Ramsay and Ananyo Choudhury • 12 May 2026

Since work in the field began, most of the world’s genomic research has relied on DNA data from people of European ancestry.

A genome is the full DNA code of about three billion (a thousand million) bases, including all the chromosomes. Each person has two genomes: one from their mother and the other from their father.

Well-resourced research able to generate hundreds of thousands of whole human genomes and associated health data tends to favour European genetics. Yet modern humans, our species, evolved on the African continent. African populations therefore contain the deepest branches of human genetic history and the greatest genetic diversity on the planet. But the continent remains strikingly underrepresented in global genomic databases.

Africa has more than 2,000 ethnolinguistic groups, yet genetic data has been gathered from less than a hundred. This is like a GPS map of a city with only 5% of the streets marked.

This bias has profoundly shaped modern medicine, from disease prediction to ancestry testing. And it’s why researchers increasingly recognise that studying African genomes has the potential to reveal insights and health-related biological pathways never observed before.

Our multidisciplinary team of researchers were involved in identifying under-represented groups for whole-genome sequencing in nine African countries. As part of the Assessing Genetic Diversity in Africa project (AGenDA), we have worked out ethical ways to obtain, record and share genetic material and add it to global databases.

The AGenDA dataset alone is expected to uncover millions of previously unknown genetic variants, with analyses underway. These discoveries will inform research into diseases that affect populations in African and worldwide. They include diabetes, heart disease, cancer and neurological or mental health conditions.

This is only a first step. Capturing the full scope of African genomic diversity will require hundreds of thousands of genomes. The project aims to bridge some of the most obvious gaps rather than fully map the continent’s diversity.

Expanding African genomic data is not only important for Africa. It will strengthen global biomedical science.

Pedestrians cross the road in front of Government House in Port Louis, Mauritius. The AGenDA project sequenced more than 1,000 whole genomes from previously neglected communities, including those from Africa’s Indian Ocean islands. (Image: Jean-Yan Norbert/UNDP, CC BY-NC 2.0)

Missing populations

Modern genomic science relies on large databases of DNA sequences to understand disease risk, ancestry and human evolution. These databases underpin a wide range of scientific and medical tools. They are used in medical research, disease prediction, drug development, ancestry testing and increasingly in artificial intelligence models that analyse health data.

When a population is absent from a reference database – a library of whole genome sequences – science simply cannot detect it. Genetic algorithms work by comparing individuals to reference populations. When a specific reference population is missing, the algorithms will assign the closest available match.

This problem becomes acute in ancestry testing, a form of genetic testing used to learn more about biological heritage. Because African reference data remains incomplete, people with African ancestry may get vague or misleading results about their origins.

Without more African genomic data the assignment of specific ancestry may be incorrect. In addition, disease risk predictions would be misleading. For example it has been shown that standard doses for medications like warfarin (a blood thinner) or efavirenz (an HIV medication) could be ineffective or toxic for people with specific variants more common in African populations.

Prior knowledge of the distribution of such variants in a population could be key to deciding the suitability of a drug for patients from that population.

Filling some of the gaps

The AGenDA project was designed to begin addressing some of the gaps in genome data and African representation. This project involved large multi-country scientific collaborations across the continent. It also required coordinating research across multiple ethics committees, regulatory frameworks and institutions. Scientists collaborated with research partners in Angola, the Democratic Republic of Congo, Kenya, Libya, Mauritius, Rwanda, Tunisia and Zimbabwe.

The aim was not simply to increase the number of African genomes in global databases. Instead, the team carefully selected populations to address major geographic and ethnolinguistic gaps in genomic data.

But generating large genomic databases requires careful community engagement and consent from participants to share their data. Biological samples for DNA extraction must be collected and the sequencing performed one base at a time.

We therefore built community engagement and culturally appropriate consent processes into the project from the beginning.

More than 1,000 whole genomes were sequenced from communities rarely examined in previous genetic studies. These included:

hunter-gatherer populations
Nilo-Saharan-speaking communities
Afro-Asiatic speakers
understudied Bantu-speaking populations
communities from north Africa and the Indian Ocean islands

Selecting samples required careful consideration of what African diversity actually represents.

Genetic diversity does not map neatly onto modern national borders. Instead, researchers considered a range of additional factors. These included:

poorly represented geographic regions in genomic databases
major ancestral population histories
languages spoken and self-identified ethnic groups
recent patterns of migration

In some cases, neighbouring communities may appear close due to geographic proximity but have distinct genetic histories that reflect population separations thousands of years ago.

Studying African genomes benefits science everywhere

African genomes contain more genetic variation than populations on any other continent. This diversity provides a powerful resource for scientific discovery. When researchers study more diverse populations they are better able to achieve a number of things.

First, they can identify new genetic variants.

Second, they can investigate evolutionary forces, like natural selection, that have shaped the genomes of people in different parts of the world.

And third, they can pinpoint variants that influence health and disease.

More inclusive genomic datasets are also essential as genomics becomes integrated with artificial intelligence systems that analyse medical data and predict health outcomes. Future medical technologies could be biased to work best for whoever is represented in the data.

Ultimately, expanding African genomic representation will help ensure that the benefits of genomic medicine are shared more equitably. At the same time, it will improve the accuracy and depth of understanding in global genetic science.

Michele Ramsay is the director of the Sydney Brenner Institute for Molecular Bioscience and professor in the Division of Human Genetics at the University of the Witwatersrand.

Ananyo Choudhury is a reader at the Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand.

This article was originally published by The Conversation on 12 May 2026 under a Creative Commons licence.

Categories: Africa