Genetic research has exploded in the past two decades — since the Human Genome Project, the first draft of an entire human genome — was completed.
We now can predict an embryo’s genetic disease risk, and we’ve sequenced ancient human DNA found in cave dirt. And while the Human Genome Project had cost three billion dollars, companies today claim to be able to sequence an entire genome for just one hundred bucks.
“This genealogy allows us to see how every person’s genetic sequence relates to every other.”Yan Wong
But if you thought Bacon’s Law was cool (where any actor can be linked to Kevin Bacon through only six degrees of separation), researchers have taken a significant step to map the genetic ties between every person who ever lived. They’ve created the “first draft” of a family tree that tracks our shared history.
Building a family tree: Over 120 genetic biobanks have sprung up across the world during the past thirty years. These banks contain genetic samples people donated for research — from small university biobanks to large government-supported repositories. We’ve sequenced hundreds of thousands of individuals, including many long-deceased people.
With the recently increased ease in DNA sequencing comes a vast potential to create a detailed picture of how human populations evolved. The research possibilities are huge.
But there is a major challenge: combining the data into a single, usable format is difficult. Each dataset differs in how it was collected, the location, and how the information was processed and analyzed.
A team from the University of Oxford’s Big Data Institute wanted to overcome that hurdle. They designed an algorithm to handle the currently available massive datasets. The robust datasets include millions of genome sequences, but the algorithm can scale up as we amass more.
“We have created the largest human family tree ever, which describes the origin and spread of human genetic variation. While the tree is comprehensive, it cannot be truly ‘complete’ unless we had the genome of everyone alive today and all of their ancestors, as well as knowledge of where and when they lived. We thus think of what we have created as a ‘first draft’ of the family tree of all of humanity,” Anthony Wilder Wohns, who was at the Big Data Institute and is now at the Broad Institute of MIT and Harvard, told Technology Networks.
The team built an algorithm based on the idea that all humans who ever lived can be described by a single genealogy — an uber complex tree sequence that links us all.
Then they mapped the complete genetic relationships of all humans, calling it the “first draft” of humanity’s family tree. They say it is only the first draft because it is limited by the mere hundreds of thousands of genomes currently available. Still, the single tree can trace the ancestry of everyone.
“We have basically built a huge family tree, a genealogy for all of humanity that models as exactly as we can the history that generated all the genetic variation we find in humans today,” Yan Wong, an evolutionary geneticist at the Big Data Institute, said in a statement.
“This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome.”
What they did: DNA is made up of alleles — regions in the genome that are inherited from one parent and code for a specific trait. Each of these alleles is a reference point for moments of genetic divergence throughout history.
The team mapped these out in a “tree sequence” or “ancestral recombination path.” For simplicity’s sake, we can think of it as a family tree of genetic diversity — a representation of how our genetic diversity came to be.
The team used eight databases that contained modern and ancient genomes. They used 3,609 individual genomes from 215 populations across the world. The samples included genomes from over 100,000 years ago to the last millennium.
The algorithm they built, which they described in the journal Science, can predict where a common ancestor might exist in our family tree to explain a specific genetic variation. It can also provide location data to estimate where ancestors lived. Their first draft contains about 27 million “ancestors,” or points of genetic divergence — with room to grow.
To verify their work, the team created a computer simulation that linked these tree sequences back through time to where certain genetic variations first appeared. Their computer simulation recreated human evolution’s key historical events, like the migration out of Africa, which is already known to have occurred from other data.
What this means: In the post-genomic era, technological breakthroughs have made DNA sequencing faster and cheaper. This new family tree of genetic diversity could advance medical research, including identifying genetic diseases and their risk factors.
“As the quality of genome sequences from modern and ancient DNA samples improves, the trees will become even more accurate and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today,” Anthony Wilder Wohns, the lead author on the study, said.
But amassing all this genetic data and potentially finding the genetic links between every person that ever existed does raise concerns regarding genetic privacy.
Wohns told Technology Networks that they only used publicly available genomes, therefore, there weren’t any privacy issues. However, even direct-to-consumer genetic tests have revealed information that has negatively impacted individuals — from finding out they had a long-lost relative to learning of a genetic disease risk. Sometimes, the people affected most weren’t even those who elected to take the genetic test in the first place. If scientists truly flesh out a family tree of all humanity, what else could they find out, and who will have access to an individual’s private information?
We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].