Did you know that it’s now possible to sequence all of your DNA for about the cost of a smartphone? This will reveal your unique genetic makeup, and can be used to work out the similarities and differences between yourself and other people around the world at a genetic level.
But how can you make sense of this information, and what does your genetic variation tell you?
In our research group at Oxford University’s Big Data Institute, we think the key to understanding this is held in our ancestry, and in particular in the genetic genealogy that relates us all. This describes how you and everyone else have inherited different parts of your genome from different ancestors. If we could learn this genealogy and decipher where and when they lived, we could uncover all of the history written in our genes – how our ancestors moved around the world and the evolutionary processes that created us all.
This sounds like a Herculean task. Without the genomes of everyone who ever lived, what could we possibly know about people who lived thousands or hundreds of thousands of years ago?
We’ve approached this task by devising a series of elegant computer algorithms which take genetic similarities and differences in a dataset of many individuals, and accurately reconstructs relationships among them.
Unifying modern and ancient genomes
Building on this approach, in our new research we describe the story of recent evolution among 215 diverse human populations from varying times and geographic locations.
The genealogy – lines of descent from our common ancestors – includes the genomes of 3,601 people from three separate datasets, as well as eight high quality ancient genomes. These came from three Neanderthals (an extinct human subspecies who lived in Eurasia until around 40,000 years ago), a Denisovan (another human subspecies more recently discovered from a shard of bone found in a Siberian cave), and a family of four humans from the Afanasievo culture who lived 4,500 years ago in south Siberia.
The unified genealogy, or “family tree”, explains the genetic relationships of these thousands of genomes to one another.
Strictly, though, it is not a single tree, but a series of linked trees along the genome. We call this a “tree sequence”, and the tree sequence we created contains a lot of trees – 13 million in fact. There are also 27 million common ancestors, and for each of these we estimated times and geographical locations.
We intend this to provide a basic framework for understanding how we are all related to one another.
For example, we have created an interactive plot showing the estimated ages of the common ancestors of different populations. It shows links between African populations and non-African populations to see the effect of the so-called “Out of Africa” event, when a set of humans migrated from Africa to Eurasia. Comparisons involving non-African populations show that they have many common ancestors originating around 3,000 generations ago.
Likewise, comparing Denisovans with various populations shows they interbred with the ancestors of Papuans and Aboriginal Australians, as other studies have also found. There is a tremendous amount of information in this resource and it contains many more patterns ripe for future investigation.
The video below shows the estimated location of common ancestors, moving backwards in time. It reveals likely movements of people around the world, tracing back to humanity’s African origin hundreds of thousands of years ago.
Features such as the population of the Americas, although only roughly geographically accurate, are also immediately obvious. These results hint at an earlier arrival of humans in the Americas and Oceania than the current archaeological evidence suggests. The genealogy is an ideal framework for future work to investigate these sorts of signals.
One of the many benefits of our approach is that it makes very few assumptions. We don’t assume one, or only a few, migrations out of Africa for example. And we don’t require it to have happened in a certain way at a certain time. Through the genealogy we aim to let the data speak for itself.
The ancestry of everyone
So what does all of this genetic variation tell us? The similarities and differences between your genome and that of everyone else who ever lived is essentially human history written in our genes. The genetic genealogy is a way to read that history and understand where we came from. It is also the context for any analysis of human genomes, such as tracing the origin and spread of disease-causing genetic mutations.
In the near future, given your genetic information, you should be able to find out in a matter of minutes where you fit into our “unified genealogy”. But you wouldn’t fit in just one place. Different parts of your DNA will have come from very different ancestors, who would have lived in very different locations around the globe. And as we incorporate more genomes into our genealogy we can make it even more comprehensive – we have the tools to create genealogies of not thousands but millions of genomes.
It’s not just useful for humans. Lots of biological research requires knowing how populations of individuals change through time and space. That can be done by creating these genetic genealogies, whether it be for domesticated animals, endangered species or vectors of human disease such as mosquitoes. Genealogies underlie the genetics of every species of life. We now have the tools to pierce the veil and glimpse the secrets within.