Page 13«..10..12131415..2030..»

Category Archives: Human Genetics

Analysis of Multi-Ancestry Cohort Uncovers Dozens of Genes Linked to Blood Lipid Levels – GenomeWeb

Posted: December 24, 2021 at 1:46 am

NEW YORK Researchers have identified in a multi-ancestry cohort almost three dozen genes associated with blood lipid levels that are risk factors for atherosclerotic cardiovascular diseases.

While previous genome-wide association studies have linked more than 400 genetic loci to blood lipid levels, these loci explain between 9 percent and 12 percent of the phenotypic variance found among lipid traits.

In a new study, an international team of researchers has conducted gene-based association testing of blood lipid levels with rare and likely damaging gene variants using a dataset of more than 170,000 individuals of multiple ancestries. As they reported in the American Journal of Human Genetics on Monday, the researchers identified 35 genes linked to circulating lipid levels, including genes not previously associated with lipid levels, including ones found among individuals of differing ancestries.

"I would expect that genes that are associated across multiple ancestries to be more robust findings compared to ones we only see in one ancestry," senior author Gina Peloso from the Boston University School of Public Health said in an email. "We might not see the same variants in a gene associated in multiple ancestries, but finding genetic variants associated in different ancestries helps us cross validate the associations."

These genes were further enriched for the targets of cholesterol-lowering drugs and indicated that, contrary to other studies, the gene located closest to the GWAS index SNP may often be the functional gene.

For their analysis, the researchers combined data from four sources that amassed either exome or genome sequencing data alongside blood lipid level information and, in all, their dataset included more than 170,000 individuals including 97,493 Europeans, 30,025 South Asians, 16,507 Africans, 16,440 Hispanic individuals or Latinos, 10,420 East Asians, and 1,182 Samoans.

At the same time, the researchers focused on six lipid phenotypes for their analysis, including total cholesterol, LDL-Cl, HDL-C, non-HDL-C, triglycerides, and TG:HDL.

In a single-variant association analysis, the researchers uncovered hundreds of rare coding variants associated with those different lipid traits. But by then conducting a gene-based analysis of transcript-altering variants, they homed in on 35 genes that reached exome-wide significance. Most of these genes, the researchers noted, were associated with more than one lipid trait. Ten of them had not previously been associated with blood lipid phenotypes.

Most, 27, of these genes are located within 200 kilobases of GWAS-indexed SNPs for blood lipid traits, the researchers found. They further investigated whether these genes were linked to the corresponding lipid measurement, finding that they were, suggesting that the closest gene to a noncoding GWAS signal is most likely the causal one and should be prioritized for follow-up. They noted, though, that some previous studies have instead found the closest genes to a GWAS signal do not show an association with the phenotype under study.

"This could be due to the type of variation we tested rare protein-altering variation compared to looking at variation that might influence gene regulatory mechanisms," Peloso noted.

The genes the researchers identified through their gene-based analysis were broadly consistent across ancestry groups. For instance, three of the 17 genes associated with HDL-C showed that association in a least two ancestry groups at exome-wide significance, while five of the 14 genes linked to total cholesterol did, and four of the 10 genes linked to non-HDL-C did.

They further reported that these genes were enriched for LDL-C drug targets. "While the genes that we identified might represent drug targets, further work will be necessary to determine whether those genes are druggable and influence clinical events," she added.

See the original post here:
Analysis of Multi-Ancestry Cohort Uncovers Dozens of Genes Linked to Blood Lipid Levels - GenomeWeb

Posted in Human Genetics | Comments Off on Analysis of Multi-Ancestry Cohort Uncovers Dozens of Genes Linked to Blood Lipid Levels – GenomeWeb

Honing in on Shared Network of Cancer Genes – URMC

Posted: December 24, 2021 at 1:46 am

Wilmot Cancer Institute researchers are a step closer to understanding the complex gene interactions that cause a cell to become malignant. In a new Cell Reports study published today, the group used network modeling to hone in on a set of such interactions that are critical to malignancy, and likely to be fertile ground for broad cancer therapies.

Discrete genetic mutations that can be targeted by drugs have only been identified for a small fraction of cancer types. But those mutations rely on a downstream network of non-mutated genes in order to cause cancer. Those downstream genes and their intricate interactions may be common across many cancers and could offer a giant leap forward in cancer therapy.

One of the lead authors of the study, Hartmut Hucky Land, Ph.D., thedeputy director of the Wilmot Cancer Institute and the Robert and Dorothy Markin Professor of Biomedical Genetics at the University of Rochester Medical Center,has worked to identify common core features of cancers for over 10 years. His goal is to find cancers shared vulnerabilities and exploit them.

Targeting non-mutated proteins that are essential to making cells cancerous is a broader approach that could be used in multiple cancers, said Land, but its hard to find these non-mutated, essential genes.

That is why Land turned to Matthew McCall, Ph.D., MHS, a Wilmot Cancer Institute investigator who is an associate professor of Biostatistics and Computational Biology at URMC, for collaboration. McCall, who is the other lead author of the study, developed a new network modeling method, called TopNet, that the group paired with genetic experiments in cells and mice to pinpoint functionally relevant gene networks.

Lands group previously identified a very diverse set of non-mutated genes that are crucial to cancer. In this study, the group wanted to see how those genes interact starting with a subset of 20 genes. Increasing or decreasing the expression of one gene in cultured cells would have numerous effects on the expression levels of the other genes in the set.

There were so many interactions, you could waste a lot of time, energy and money testing interactions that might not be useful, McCall said. To hone in on the interactions that are more likely to be useful, we used network modeling, and compared our model networks back to the lab findings, McCall said.For context, the number of possible gene network models considered by TopNet was many times greater than the estimated number of atoms in the universe. After weeding out models that didnt closely fit the observed data and further focusing in on gene interactions that appeared in at least 80 percent of the models, the team was left with a manageable set of 24 high-confidence gene interactions. Subsequent experiments demonstrated that these interactions often play an important role in malignancy.

Dr. McCalls elegant and mind-boggling methodology is essentially helping us disentangle a hairball of genetic networks, said Land. These networks are usually very messy and its nearly impossible to extract useful information from them. But Dr. McCall has found a way to cut through this Gordian knot.

The group has already tested a sampling of the genetic interactions revealed by TopNet, and confirmed via experiments in cells and mice that the interactions are functionally linked. Next, the group intends to test the limits of TopNet, with the intent to use this method to find potential cancer therapies that are broadly effective.

This work was completed as part of a $6.3M National Cancer Institute Outstanding Investigator Award granted to Land in 2015 and a K99/R00 grant from the National Human Genome Research Institute to McCall. Helene McMurray, Ph.D., assistant professor of Biomedical Genetics and Pathology and Laboratory Medicine at URMC was the first author of the study.

See the original post here:
Honing in on Shared Network of Cancer Genes - URMC

Posted in Human Genetics | Comments Off on Honing in on Shared Network of Cancer Genes – URMC

O redwood tree, o redwood tree, can tree genetics save thee? – Bulletin of the Atomic Scientists

Posted: December 24, 2021 at 1:46 am

The devastating wildfires that ripped through California this year and last consumed nearly a fifth of the worlds giant sequoias, the largest trees on Earth by volume. According to official estimates, between 13 and 19 percent of the 75,000 sequoias over 4 feet in diameter were lost in just two years. While sequoias evolved with wildfire and need it to open their seed cones and to clear the forest floor so the seeds can germinate, the fires over the last two yearsexacerbated by climate change-driven droughtwere simply too hot.

Joanna Nelson, the director of science and conservation planning for the organization Save the Redwoods League, says this tree loss year after year is not sustainable for these ancient trees, which can live to be 3,000 years and older.

While they havent been hit as hard by high-intensity wildfiresyetmore than half of coast redwood forests are experiencing drought conditions that the US Drought Monitor labels extreme or exceptional. These trees rely on fog for up to 40 percent of their water each year, but summertime fog hours have declined by a third over a century.

The situation is even more dire for giant sequoias; more than 93 percent of giant sequoia in the Sierra Nevada mountain range are in exceptional drought conditions.

Giant sequoia and coast redwood treeswhich together share the designation of Californias state treehave endured for millennia, and conservationists like Nelson hope they will stick around for millennia to come, even with man-made global warming and climate change.

To that end, the Save the Redwoods League funded a multiyear project to sequence the genomes of both the giant sequoia (Sequoiadendron giganteum) and the coast redwood (Sequoia sempervirens). Researchers at the University of California, Davis, Johns Hopkins University, the University of Connecticut, and Northern Arizona University published the coast redwood genome this month, after completing and publishing the giant sequoia genome last year.

The first step in understanding how everything works, whether its your refrigerator, or your car, or a genome, is having a parts list, is it not? said David Neale, plant sciences professor emeritus at UC Davis and lead author on the new coast redwood genome research. There was no parts list for these trees. So thats what weve accomplished, in the very earliest phases of this research, is to sequence the genomes and get a list of all the parts. Now the work begins, the good stuff begins, is learning how those parts work together to make a redwood tree a redwood tree, and a giant sequoia tree a giant sequoia tree, and why there are differences among individuals within species.

By comparing the coast redwood genome to other species of conifers, the researchers found a number of stress response genes, which could contribute to the trees longevity, including those involved in fungal disease resistance, detoxification, and physical injury/structural remodeling.

Conservationists can use this information in several ways. Inbreeding among trees is not unlike inbreeding within human populations, and it can have serious consequences, but if conservationists dont know what trees are present in a grove, they cant prune or plant with diversity in mind. This work will help give them the tools to do that.

It will also help to identify the genetic traits that might do best in warmer, dryer climates, because thats the direction California is headed.

What are the genes or combination of genes that can help trees respond to drought, respond to pests and pathogens, respond to rising temperatures? said Nelson, listing out some of the questions she hopes this research can answer. Theres both a focus on genetic diversity in retaining the widest suite of possible responses, and identifying what is it that helps trees respond to climate threats.

Neale said part of his research on redwoods has involved collecting 100 samples of individual trees of each species and growing clones of those trees from cuttings in a common garden to study how different trees grow under the same conditions. Researchers can then compare the growing conditions in the greenhouses to the average temperatures and rainfall where the original trees were located and see how a tree found in a generally cool, wet spot grows in comparison to a tree from a higher, drier location. Once the environment has been controlled, any differences in plant growth and expression can be attributed to genetic differences.

We are looking at whats the most locally adapted, and then what would be the most locally adapted, said Nelson.

Nelson said that as they are selecting seeds to raise in a nursery and plant in the wild later, they are looking for species that are both adapted to the local environment now, and those that are likely to be adapted for the environment 1000 years in the future, 2000 years in the future. That is, after all, how long these trees could live. For example, plants in the northern hemisphere are generally migrating north and up in elevation in response to climate change. So researchers are looking to plant trees that would previously have previously been found 500 feet lower in elevation, where temperatures would have been warmer.

What researchers learn about the coast redwood and giant sequoia may also inform other genetic research. Coast redwood [and giant sequoia are] not a classic model like Arabidopsis for genetic research, however, discoveries made in redwood might well inform the basic genetic system that underlies all plant adaptation to the environment, said Neale.

While it might seem like this work is all about charismatic megaflora, saving redwood trees has significant implications for climate change mitigation. In 2016, researchers found that coast redwood forests sequestered more than twice the amount of carbon as other forests. And both species of trees play important roles in their respective ecosystems, particularly coast redwoods, which help prevent erosion and protect the land-ocean boundary, which has trickle down benefits for salmon and other species.

A lot of the newspaper articles right now about giant sequoia are about existential threat, said Nelson. Were talking about extinction threat; thats what were looking at and need to respond to now.

Original post:
O redwood tree, o redwood tree, can tree genetics save thee? - Bulletin of the Atomic Scientists

Posted in Human Genetics | Comments Off on O redwood tree, o redwood tree, can tree genetics save thee? – Bulletin of the Atomic Scientists

Aging in Mice Linked to Misexpression of Class of Genes – The Scientist

Posted: December 24, 2021 at 1:46 am

Aging is inevitable, and goes along with many changes in cells, tissues, and organsincluding DNA damage, mitochondrial dysfunction, and telomere loss. But why we age in the first place and what drives these changes is still unknown. A study published December 15 in Science Advances suggests a possible answer, linking the increased activity of genes lacking long stretches of C and G bases with degeneration and aging.

As cells age, the architecture of chromatin, which packages DNA, unravels. Samuel Beck, a computational biologist at MDI Biological Laboratory, says he and his colleagues set out to explore whether these structural changes contribute to the degenerative changes also associated with aging. Specifically, the researchers focused on stretches of C and G bases called CpG islands (CGI). CGI are present in the promoters of around 60 percent of mammalian genes, termed CGI+genes, but absent in the remaining 40 percent, called CGI- genes.

These CGI- genes typically lie silent in a densely-packed form of chromatin known as heterochromatin. Heterochromatin attaches to the nuclear lamina, which lines the inner nuclear membrane. As cells age, the nuclear lamina weakens and frees the heterochromatin, which loosens, allowing previously silenced genes to be expressed.

Previously, Beck and his colleagues showed that heterochromatin formation only regulates the expression of CGI- genes, while CGI+ genes are silenced through another mechanism called repressive Polycomb bodies. In the new work, our hypothesis was that aging, and its associated chromatin architecture disorganization, results in dysregulation of genes lacking CpG islands, says Beck. Looking at gene expression in the kidneys and hearts of a mouse population generated by breeding eight inbred strains with one another, which mimics the complexity of genetics in the human population, the team found that CGI- genes tend to be upregulated in aged tissues. In some mice, CGI- genes in the kidneys were upregulated, while other mice of the same chronological age didnt show this misexpression. When the researchers took a closer look, they found that mice with upregulated CGI- genes had a higher incidence of renal dysfunction. Misexpression of CGI- genesmeaning that theyre expressed when they shouldnt beis associated with the physiological deterioration of aging, Beck says.

Looking further into the link between chromatin architecture and CGI- gene misexpression, the researchers turned to a receptor called Lamin B that tethers heterochromatin to the nuclear envelope. In mice with a nonfunctional Lamin B receptor, they observed looser heterochromatin and CGI- gene misexpressionin other words, nuclear architecture disruption and heterochromatin decondensation lead to CGI- upregulation, says Beck. The team is investigating whether it also works the other way around, with upregulation of CGI- genes causing or facilitating chromatin decondensation. If so, CGI- genes could be targeted in an attempt to reverse aging, says Beck, who, in further work, has a patent pending for an inhibitor of CGI- gene misexpression.

This is a strong descriptive study showing an association between heterochromatin disruption and the activation of genes devoid in CGI promoters (CGI-), writes University of Edinburgh geneticist Tamir Chandra, who was not involved in the study, in an email to The Scientist.

Our hypothesis was that aging, and its associated chromatin architecture disorganization, results in dysregulation of genes lacking CpG islands.

Samuel Beck, MDI Biological Laboratory

In further experiments, Beck and his team analyzed why some CGI- genes are misexpressed during aging while others are not. They homed in on the genomic landscape, which, within a cell, can be subdivided into euchromatic and heterochromatic domains. Euchromatic domains tend to harbor more CGI+ genes and heterochromatic domains more CGI- genes. When inactive CGI- genes are within broad heterochromatic domains, they are densely and broadly condensed. When inactive CGI- genes are within broad euchromatic domains, they are somewhat less densely and locally condensed, writes Beck in an email to The Scientist.

Unexpectedly, in mouse cells, CGI- genes located within heterochromatic domains were rarely misexpressed during aging, while CGI- genes forming local heterochromatin within largely euchromatic domains were overexpressed, Beck adds. We initially thought CGI- genes within both domains would be activated, however, it was not the case. CGI- genes within euchromatic domains, which are generally inactivated by local (and weak) heterochromatin formation, are frequently activated upon heterochromatin decondensation during aging. However, CGI- genes within heterochromatic domains that are densely condensed are rarely activated.

In their previous study, the authors found that CGI- genes are directly regulated by local binding of transcription factors, while CGI+ genes are not. Accordingly, when they looked for sites where transcription factors might bind in euchromatic and heterochromatic domains, they found that CGI- genes within euchromatic domains have more transcription factor binding sites compared to CGI- genes within heterochromatic domains. Additionally, CGI- genes within euchromatic domains that are upregulated during aging contain more binding sites than genes that are not upregulated. Beck interprets these observations as showing that heterochromatin decondensation during aging allows easy access of transcription factors to DNA. So CGI- genes that are more susceptible for transcription factor binding (i.e., with many motifs) are more frequently activated when heterochromatin disappears. However, this is still speculation, as the authors didnt test this explanation further. Further investigation as to why it is not CGI- genes located in [heterochromatin] that are affected by the disruption would be interesting, writes Chandra.

The researchers also investigated whether CGI- genes are connected to whats known as cellular identity. Cells making up the heart, muscles, kidneys, or other organs usually express different genes to carry out their functions. As cells age, they also lose this cellular identity, and the researchers wondered if misexpression of CGI-genes could help explain why. Analyzing aged mouse kidneys, Beck and his team saw that genes typically expressed in the spleen, intestine, eye, and liver start to be expressed in aged kidneysand the majority of these genes were CGI- genes. That is one way how aged cells lose their identity, suggests Beck.

Aged cells also secrete signals in an uncontrolled way, but what triggers this secretion is not yet known. Analysis of the products of CGI- genes in mouse kidneys and hearts indicated that many encode secreted proteins, including cytokines, chemokines, growth factors, and proteases. Proinflammatory secretory CGI- genes were misexpressed in cells in which the nuclear and chromatin architecture was disrupted. According to the authors, this indicates that the secretory phenotype of aged cells is linked to disruption of the nuclear architecture and resulting upregulation of CGI- genes.

This study pinpoints and defines the specific set of genes that are aberrantly activated during aging and the consequences, geneticist Weiwei Dang from Baylor College of Medicine, who was not involved in the study, writes in an email to The Scientist.However, he sees several limitations to the study, including that most of the data presented (with some exceptions) are association data between aging and transcription, without further digging into the underlying causes of these changes during aging, and that key regulators that distinguish between CGI+ and CGI- genes remain to be identified or investigated.

Dang also notes a lack of potential aging intervention strategy based on these findings. However, Beck suggests that if overexpression of CGI- genes does turn out to drive chromatin decondensation, then inhibiting CGI- gene expression could become such a strategy.

More here:
Aging in Mice Linked to Misexpression of Class of Genes - The Scientist

Posted in Human Genetics | Comments Off on Aging in Mice Linked to Misexpression of Class of Genes – The Scientist

European wine grapes have their genetic roots in western Asia – New Scientist

Posted: December 24, 2021 at 1:46 am

We used to think that European wine grapes were cultivated locally, independently of grape domestication in western Asia, but grape genetics suggests otherwise

By Carissa Wong

Red grapes ready to be harvested in a vineyard

alika/Shutterstock

Grapes used to make common European wines may have originated from grapevines that were first domesticated in the South Caucasus region of western Asia. As these domesticated grapes dispersed westwards during the Greek and Roman times, they interbred with local European wild populations, which helped the wine grapes adapt to different European climates.

The origins of grapes (Vitis vinifera) that are used in Europe and elsewhere to produce wines such as Merlot, Chardonnay and Pinot Noir have long been debated.

It has been proposed that European wine grapes arose from the cultivation of wild European populations (V. viniferasubspecies sylvestris), independently of the original domestication of grapes in western Asia around 7000 years ago.

But a genetic analysis carried out by Gabriele Di Gaspero at the Institute of Applied Genomics in Udine, Italy, and his colleagues suggests that European wine grapes actually originated from domesticated grapes (V. vinifera subspecies sativa) that were initially grown for consumption as fresh fruit in western Asia.

The team sequenced the genomes of 204 wild and cultivated grape varieties to cover the range of genetic diversity in cultivated grapes and compared how similar their genetic sequences were to one another.

This revealed that as western Asian table grapes spread westwards across the Mediterranean and further inland into Europe, they interbred with wild European grape populations that grew nearby.

The wild plants grew close to vineyards and interbred this was unintentional. But the results of the breeding created adaptive traits that were likely selected by humans intentionally, says Di Gaspero. By bringing together this genetic evidence and existing historical evidence, the introductions in southern Europe and inland likely occurred in Greek and Roman times, although we dont know more specific dates.

By modelling how the ancestry of the grapes in different regions of Europe related to aspects of the local climate such as temperature and precipitation, the team discovered that European wild grapes probably contributed traits that enabled the ancestral grape vines to adapt to different regions as they moved westwards from Asia.

The team also found evidence of the effect that domestication had on grape genetics.

In wild grape varieties, a larger seed makes a larger berry because grape seeds produce a growth hormone called ethylene. But for human consumption, a larger berry-to-seed ratio is desirable. The team found that an enzyme not found in the berries of wild varieties was present in the berries of domesticated varieties. In other plants, the enzyme is known to help berries grow in response to ethylene, which suggests it does the same in grapes.

Understanding which genes encode favourable traits in grapes can allow us grow better grape crops, says Di Gaspero.

Journal reference: Nature Communications, DOI: 10.1038/s41467-021-27487-y

More on these topics:

Original post:
European wine grapes have their genetic roots in western Asia - New Scientist

Posted in Human Genetics | Comments Off on European wine grapes have their genetic roots in western Asia – New Scientist

The future of omicron variant: Scientists predict whats next – Deseret News

Posted: December 24, 2021 at 1:46 am

Multiple scientists and experts are weighing on what Americans should expect from the omicron variant of the coronavirus over the next few weeks.

Dr. Stephen Goldstein, professor at the Eccles Institute of Human Genetics at the University of Utah, told Salon that cases will rise in the next few weeks to peak levels.

Dr. Monica Gandhi, infectious disease doctor and professor of medicine at the University of California-San Francisco, told Salon that omicron is more transmissible and will cause a wave of new infections.

Its clear from these comments that the omicron variant is spreading and will continue to do so as we move through winter. Its unclear if the strain is less virulent meaning it causes less severe symptoms on its own or if people are more immune to the coronavirus by now, creating less severe symptoms.

Either way, the Centers for Disease Control and Prevention recently predicted a new surge of omicron cases will impact the U.S. by January 2022, according to The Washington Post.

See the article here:
The future of omicron variant: Scientists predict whats next - Deseret News

Posted in Human Genetics | Comments Off on The future of omicron variant: Scientists predict whats next – Deseret News

Out of Africa: The Path of Homo sapiens By Which Routes Did Modern Man Arrive in Europe? – SciTechDaily

Posted: December 24, 2021 at 1:46 am

By which routes did modern man arrive in Europe? A book reports on the latest findings.

What routes did Homo sapiens take on his way from Africa to Europe and Asia in the previous millennia? The climatic conditions changed, and with them the living conditions. The advance was hampered in some places by deserts, in others by dense forests. Over the past twelve years, a team of researchers within the framework of the Collaborative Research Center 806 Our Way to Europe unraveled the complex interplay of cultural innovations and environment that shaped migrations. After completion of the interdisciplinary joint project, the researchers now present a book with the most important findings under the leadership of the Universities of Bonn and Cologne.

The cradle of man is in Africa this has been known for half a century. A decade ago, scholarly discussion was still dominated by the idea that a small group of Homo sapiens migrated from Africa to Europe about 70,000 years ago. Through anatomical and intellectual superiority, this group is said to have displaced archaic local populations as it advanced, leaving Homo sapiens as the only genetic branch of humanity to survive.

Varves in a drill core from Lake Van, Turkey.These are lighter and darker layers in lake sediments that are deposited over the course of a year. Credit: Thomas Litt/University of Bonn

This notion has changed fundamentally since it became clear that Neanderthals contributed at least a small part to the genome of Homo sapiens, says paleobotanist Prof. Dr. Thomas Litt of the University of Bonn, principal editor of the book and deputy spokesman for the Collaborative Research Center. Genetics doesnt quite tell the same story or a different part of the story as paleontology and archeology. The team therefore endeavored to better understand this controversial picture by analyzing information on the nature and environment, as well as the role of culture, of this prehistoric population dynamic. The researchers focused on different time periods: from the emergence of modern humans, their dispersal, the repopulation of Ice Age Europe, Neolithic settlement, and the migration of settled societies.

The new findings show that not only a migration wave, but several African Homo sapiens populations followed a journey of up to 5,000 kilometers to Europe and Asia. Improved radiometric dating of Homo sapiens fossils further suggests that the area of origin of modern humans includes not only East Africa, but also South and Northwest Africa. The time scale of Homo sapiens now extends back to 300,000 years. Prof. Litts team investigated when and where migration corridors or barriers existed from a paleoecological and paleoclimatological perspective.

Until now, science assumed that there were two possible main routes modern man could have taken to Europe: The western via the Strait of Gibraltar and the eastern via the Levant. Despite the short distance across the Strait of Gibraltar, in the past twelve years researchers were unable to find any evidence of direct cultural contact between Morocco and the Iberian Peninsula or evidence of crossing the strait during the Paleolithic. This is one of the big question marks in the history of human settlement in the western Mediterranean, Litt says of this surprising finding. Evidently, the Strait of Gibraltar had been more of a barrier at the time due to strong ocean currents.

This leaves the Levant, the only permanent land bridge between Africa and Eurasia, as the key region as a migration route for modern humans, says Litt. His research group conducted intensive research on drill cores, for example from the Dead Sea or the Sea of Galilee, in which plant pollen is preserved. This allows changes in vegetation cover to be identified and environmental and climatic conditions to be reconstructed. Litt: These data illustrate that the Levant could only have served as a corridor when, under more favorable conditions, for example, neither deserts nor dense forests impeded the advance.

For a total of twelve years, the interdisciplinary research team from archeology, geosciences, soil science, ethnology and geography in the Collaborative Research Center 806 Our Way to Europe deciphered the migrations of Homo sapiens. Around one hundred researchers were involved and many hundreds of scientific papers were published. In addition to the Universities of Cologne and Bonn, RWTH Aachen University and numerous cooperation partners from the USA, Africa, the Middle East, and Europe were also involved. The main results are now summarized in the 372-page book jointly edited by the paleobotanist Prof. Dr. Thomas Litt (Bonn), the prehistorian Prof. Dr. Jrgen Richter and the geography didactician Prof. Dr. Frank Schbitz (both University of Cologne). The book should be attractive and relevant to all readers interested in understanding the prehistory of our own species, its migratory routes, and motivations for migration triggered by complex interactions of its culture and environment, says Litt.

Publication: Thomas Litt, Jrgen Richter, Frank Schbitz (eds.): The Journey of Modern Humans from Africa to Europe Culture-Environmental Interaction and Mobility, Schweizerbart Science Publishers, 372p., EUR 39.90.

Read the original here:
Out of Africa: The Path of Homo sapiens By Which Routes Did Modern Man Arrive in Europe? - SciTechDaily

Posted in Human Genetics | Comments Off on Out of Africa: The Path of Homo sapiens By Which Routes Did Modern Man Arrive in Europe? – SciTechDaily

SomaLogics SomaScan Assay used in largest proteomic study to date, bridging the gap between genomics and disease – Yahoo Finance

Posted: December 10, 2021 at 2:32 am

BOULDER, Colo., Dec. 09, 2021 (GLOBE NEWSWIRE) -- In a new study published in Nature Genetics, scientists at deCODE genetics, a subsidiary of Amgen, used SomaLogics (NASDAQ: SLGC) SomaScan Assay to measure blood proteins in 35,559 Icelanders and mapped them to 27 million genetic sequence variants. Using this vast amount of proteomic data, these researchers hope to demonstrate that combining protein measurements at population scale with genetic data on disease will dramatically impact understanding of human diseases and potential drug targets. This new study was the largest proteomic study published to date with 170 million protein measurements.

Less than 10% of human disease is driven by genetics. Plasma proteomics, the study of blood proteins, can help bridge the gap between genomics and disease discovery. This paper found that linking genes to proteins, and then to diseases can show patterns between the factors that cause a disease and the factors that are a consequence of a disease. This process may give a roadmap of how diseases develop and offer potential drug targets.

In this study, the plasma levels of 4,719 blood proteins were tested for genetic associations with 373 diseases and traits, producing 257,490 of these associations. SomaLogics SomaScan Assay was used to find genetic variant-protein target associations, called protein quantitative trail loci or pQTLs. In the study, 94% of the proteins measured using the SomaScan Assay showed an associated pQTL, resulting in more than 18,000 pQTLs. Ninety-three percent of these pQTLs are considered novel. The study also identified 938 genes encoding as potential protein drug targets for various diseases.

Our SomaScan Assay offers the ability to measure and identify the largest percentage of the human proteome at commercial scale on the market today and it proved to be exquisitely specific in this study, said SomaLogic Chief Executive Officer Roy Smythe, M.D. We hope that this study, and more like it, will help to provide the vital information that can be added to genetic data to create a more comprehensive understanding of human biology, and increasingly power more effective treatments for human disease.

Story continues

About SomaLogicSomaLogic (Nasdaq: SLGC) seeks to deliver precise, meaningful, and actionable health-management information that empowers individuals worldwide to continuously optimize their personal health and wellness throughout their lives. This essential information, to be provided through a global network of partners and users, is derived from SomaLogics personalized measurement of important changes in an individuals proteins over time. For more information, visit http://www.somalogic.com and follow @somalogic on Twitter.

Forward Looking Statements Disclaimer This press release contains certain forward-looking statements within the meaning of the federal securities laws with respect to the proposed business combination between SomaLogic and CM Life Sciences II and otherwise, including statements regarding the anticipated benefits of the business combination, the anticipated timing of the business combination, expansion plans, projected future results and market opportunities of SomaLogic. These forward-looking statements generally are identified by the words believe, project, expect, anticipate, estimate, intend, strategy, future, opportunity, plan, may, should, will, would, will be, will continue, will likely result, and similar expressions. Forward-looking statements are predictions, projections and other statements about future events that are based on current expectations and assumptions and, as a result, are subject to risks and uncertainties. Forward looking statements do not guarantee future performance and involve known and unknown risks, uncertainties and other factors. Many factors could cause actual future events to differ materially from the forward-looking statements in this press release, including factors which are beyond SomaLogics or CM Life Sciences IIs control. You should carefully consider the risks and uncertainties described in the Risk Factors section of the CM Life Sciences IIs registration statement on Form S-4 (File No. 333-256127) (the Registration Statement) and the definitive proxy statement/prospectus included therein. These filings identify and address important risks and uncertainties that could cause actual events and results to differ materially from those contained in the forward-looking statements. Forward-looking statements speak only as of the date they are made. Readers are cautioned not to put undue reliance on forward-looking statements, and SomaLogic and CM Life Sciences II assume no obligation and do not intend to update or revise these forward-looking statements, whether as a result of new information, future events, or otherwise. Neither SomaLogic nor CM Life Sciences II gives any assurance that either SomaLogic or CM Life Sciences II or the combined company will achieve its expectations.

SomaLogic Contact Emilia Costales 720-798-5054ecostales@somalogic.com

Investor ContactLynn Lewis or Marissa BychGilmartin Group LLCinvestors@somalogic.com

Read the original:
SomaLogics SomaScan Assay used in largest proteomic study to date, bridging the gap between genomics and disease - Yahoo Finance

Posted in Human Genetics | Comments Off on SomaLogics SomaScan Assay used in largest proteomic study to date, bridging the gap between genomics and disease – Yahoo Finance

Geneticists Have Reduced Use of the Term ‘Race’ in Papers – Medscape

Posted: December 10, 2021 at 2:32 am

A decline in the use of the word "race" in papers on human genetics reflects a growing understanding of race as a social construct. But other trends may point to ongoing uncertainty about how to discuss different populations.

What to know:

Human geneticists have moved away from using the word "race" to describe populations, a study recently published in The American Journal of Human Genetics (AJHG) shows.

Researchers examined the text of all 11,635 articles published between 1949 and 2018 by the AJHG. While the word "race" appeared in 22% of papers in the first 10 years of the paper's publication, it was used in just 5% of papers in the last 10 years.

This decline points to the current understanding in science of race as a social construct and a desire to move away from past research that erroneously conflated genetics with racial categories, according to lead author Vence Bonham, JD, the acting deputy director of the National Human Genome Research Institute.

The study also found that the alternative and sometimes more ambiguous terms "ethnicity" and "ancestry" have increased over time, which may suggest that geneticists are still struggling to find terms to accurately describe populations.

A The National Academies of Science, Engineering, and Medicine has recently formed a committee to produce a consensus report on the use of the word "race" and other terms descriptive of populations in health disparities research.

This is a summary of the article "Human geneticists curb use of the term 'race' in their papers" published by Science on December 2. The full article can be found on science.org.

For more news, follow Medscape on Facebook, Twitter, Instagram, YouTube, andLinkedIn

Excerpt from:
Geneticists Have Reduced Use of the Term 'Race' in Papers - Medscape

Posted in Human Genetics | Comments Off on Geneticists Have Reduced Use of the Term ‘Race’ in Papers – Medscape

Toward a genome sequence for every animal: Where are we now? – pnas.org

Posted: December 10, 2021 at 2:32 am

Abstract

In less than 25 y, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earths eukaryotic diversity [H. A. Lewin etal., Proc. Natl. Acad. Sci. U.S.A. 115, 43254333 (2018)]. As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the disciplines future. In this Perspective, we provide a contemporary, quantitative overview of animal genome sequencing. We identified the best available genome assemblies in GenBank, the worlds most extensive genetic database, for 3,278 unique animal species across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity, whereas gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for improving genomic resource availability and research value while also broadening global representation.

The first animal genome sequence was published 23 y ago (1). The 97 millionbasepair (bp) (Mb) Caenorhabditis elegans genome assembly ushered in a new era of animal genome biology where genetic patterns and processes could be investigated at genome scales. As genome assemblies have accumulated for an increasingly diverse set of species, so too has our knowledge of how genomes vary and shape Earths biodiversity (e.g., refs. 2 and 3). Major shifts in genome availability and quality have been driven by two key events. First, the invention of high-throughput, short-read sequencing provided an economical means to generate millions of reads for any species from which sufficient DNA could be obtained. These 100-bp short reads could be assembled into useful, albeit fragmented, genome assemblies. Later, the rise of long-read sequencing allowed for similarly economical generation of reads that are commonly orders of magnitude longer than short reads, resulting in vastly more contiguous genome assemblies (4).

We have now entered an era of genomic natural history. Building on 250 y of natural history efforts to describe and classify the morphological diversity of life on Earth, we are gaining a complementary genomic perspective of Earths biodiversity. However, a baseline accounting of our progress toward a complete perspective of Earths genomic natural historywhere every species has a corresponding, reference-quality genome assembly availablehas not been presented. This knowledge gap is particularly important given the momentum toward sequencing all animal genomes, which is being driven by a host of sequencing consortia. For instance, the Vertebrate Genomes Project seeks to generate high-quality assemblies for all vertebrates (5), the Bird10K project seeks to generate assemblies for all extant birds (6), the i5K project plans to produce 5,000 arthropod genome assemblies (7), the Earth BioGenome Project aims to sequence all eukaryote genomes (8), and the Darwin Tree of Life project plans to sequence genomes for all eukaryotes in Britain and Ireland (https://www.darwintreeoflife.org/).

In this Perspective, we curated, quantified, and summarized genomic progress for a major component of Earths biodiversity: kingdom Animalia (Metazoa) and its roughly 1.66 million described species (9). We show that as of June 2021, 3,278 unique animals have had their nuclear genome sequenced and the assembly made publicly available in the National Center for Biotechnology Information (NCBI) GenBank database (10). This translates to 0.2% of all animal species. When viewed through the lens of major clades, massive disparities exist. For instance, 32 times more assemblies are available for chordates than arthropods (Fig. 1).

Variation in taxonomic richness and genome availability, quality, and assembly size across kingdom Animalia in GenBank (as of 28 June 2021). Taxonomic groups are clustered by phylogeny following ref. 11. Only groups with 30 or more available assemblies as of January 2021 are shown with the exception of Hominidae (n = 5 assemblies). In the tree, bold group names represent phyla and naming conventions follow those of the NCBI database. Of 34 recognized animal phyla, 10 do not have a representative genome sequence. (A) The total number of described species for each group following Zhang (9) and the references therein. (B) Genomic representation among animal groups for 3,278 species with available genome assemblies. Bars represent the magnitude of the observed minus the expected number of genomes given the proportion that each group comprises of described animal diversity. Significance was assessed with Fishers exact tests and significantly under- or overrepresented groups (P < 0.05) are denoted with asterisks. Gray numbers indicate the total number of species with available genome assemblies for each group. The number of available assemblies is not mutually exclusive with taxonomy; that is, a carnivore genome assembly would be counted in three categories (order Carnivora, class Mammalia, phylum Chordata). (C) The percentage of described species within a group with an available genome sequence (bars) and the percentage of those assemblies that have corresponding annotations (red circles). For many groups (e.g., arthropods), only a fraction of a percent of all species have an available genome assembly, making their percentage appear near zero. (D) Assembly size for all animal genome assemblies, grouped by taxonomy. (E) Contig N50 by taxonomic group. The sequencing technology used for each assembly is denoted by circle fill color: short-read (blue), long-read (yellow), or not provided (gray). In D and E, each circle represents one genome assembly and a few notable or outlier taxa are indicated with gray text.

To construct a database of the best available genome assembly for all animals, we downloaded metadata from GenBank for all kingdom Animalia taxa using the summary genome function in v.10.9.0 of the NCBI Datasets command-line tool on 4 February 2021. Next, we used the TaxonKit (12) lineage function to retrieve taxonomic information for each taxid included in the genome metadata. To gather additional data for each assembly (e.g., sequencing technology), we used a custom web scraper script. Both this web scraper script and the scripts used to download and organize the metadata are available in this studys GitHub repository (https://github.com/pbfrandsen/metazoa_assemblies). We later supplemented this initial dataset with a second round of metadata acquisition on 28 June 2021. For the full dataset, we hand-refined the NCBI taxonomy classifications to subdivide our dataset into three categories: species, subspecies, or hybrids (Dataset S1). If replicate assemblies for a taxon were present, we defined the best available assembly as the one with the highest contig N50 (the midpoint of the contig distribution where 50% of the genome is assembled into contigs of a given length or longer).

We filtered our data in several ways: We removed subspecies (unless they were the only representative for a species), hybrids, and assemblies that were shorter than 15.3 Mb [the smallest confirmed assembly size for a metazoan to date (13)] or had a contig N50 less than 1 kilobase (Kb). We also culled assemblies that were unusually short (i.e., 1 to 2.5 Mb) with information in their descriptions that indicated they were not true nuclear genome assemblies (e.g., exon capture). In total, we culled 407 assemblies based on the above criteria. The remaining assemblies were classified as short-read, long-read, or not provided if only short reads (e.g., Illumina) were used, any long-read sequences (e.g., PacBio) were used, or no information was available. We defined a species as having gene annotations available if any assembly for that taxon also had annotations in GenBank. When the best available assembly did not have annotations included or when multiple assemblies had annotations, we retained the annotations for the assembly with the highest contig N50. Finally, we used the submitting institution for each assembly as a surrogate for the institution that led the genome assembly effort. Using these data, we classified assemblies to a country, region (Africa, Asia, Europe, Middle East, North America, Oceania, South America, Southeast Asia), and the Global North (e.g., Australia, Canada, Europe, United States) or Global South (e.g., Africa, Asia including China, Mexico, Middle East, South America).

To test if clades were under- or overrepresented in terms of genome availability relative to their species richness, we compared the observed number of species with assemblies with the expected total for the group. We obtained totals for the number of described species overall and for each group from previous studies, primarily from Zhang (9) and the references therein. We assessed significance between observed and expected representation with Fishers exact tests (alpha = 0.05). We tested for differences in distributions of contig N50 or assembly size between short- and long-read genomes with Welchs t tests. For both display (i.e., Fig. 1) and analysis, we subdivided the dataset into the lowest taxonomic level that still contained 30 or more assemblies as of January 2021 (with the exception of hominids, which were given their own category due to their exceptionally high genomic resource quality).

Genome assemblies were available for 3,278 species representing 24 phyla, 64 classes, and 258 orders (Fig. 2A and Dataset S1). The dataset was exceptionally enriched for the phylum Chordata (which includes all vertebrates) with 1,770 assemblies for the group (54% of all assemblies) despite chordates comprising just 3.9% of animal species (P, Fishers < 1e-5; Fig. 1). Conversely, arthropods were underrepresented with 1,115 assemblies (34% of the dataset) for a group that comprises 78.5% of animal species (P, Fishers < 1e-5; Fig. 1). However, not all arthropods were underrepresented; five insect clades were overrepresented (Apidae [bees], Culicidae [mosquitoes], Drosophila [fruit flies], Formicidae [ants], and Lepidoptera [butterflies and moths]; all P, Fishers < 1e-3; Fig. 1). Collectively, of the 59 animal taxonomic groups included in our dataset, 14 groups were underrepresented, 17 were represented as expected, and 28 were overrepresented (primarily chordates; Fig. 1). Ten phyla had no publicly available genome sequence (Fig. 1). Over the 17-y GenBank genome assembly record, animal assemblies have been deposited at a rate of 0.52 species assemblies per day. Over the most recent year, however, this rate increased eightfold to 4.07 assemblies per day. If the most recent rate were maintained, all currently described animals would have a genome assembly available by 3136. To achieve this goal by 2031 instead, an average of 165,614 novel animal genomes would need to be sequenced and assembled each year (112 times faster than the rate for the most recent year).

Genome availability for kingdom Animalia versus taxonomic descriptions and over time. (A) The proportion of described taxonomic groups versus the number with sequenced genome assemblies from phyla to species. The gray plot (Right) is a zoomed-in perspective of the higher taxonomy-level categories in the full plot (Left). For genus through phylum, the number of described categories is based on the NCBI taxonomy. For species, the total number described is from Zhang (9). (B) The timeline of genome contiguity versus availability for animals according to the GenBank publication date (x axis; C). A rise in assembly contiguity has been precipitated by long-read sequencing. Particularly contiguous assemblies for a given time period are labeled. (C) The number of animal genome assemblies deposited in GenBank each month since February 2004. Several notable events are labeled. When specific dates are indicated, those (and the assemblies referred to) are included within that months total. For B and C, it is important to note that when a genome assembly is updated to a newer version, its associated date is also updated. Thus, the date associated with many early animal assemblies [e.g., C. elegans (1)] has shifted to be more recent with updates.

The average animal genome assembly was 1.02 gigabases (Gb) in length (SD 1.21 Gb) with a contig N50 of 2.26 Mb (SD 25.16 Mb; Fig. 1 D and E). Two animal genome assemblies were 25 Gb longer than all other assembliesthe axolotl [32.4 Gb (14)] and Australian lungfish [34.6 Gb (15)] (Fig. 1D). The smallest genome assembly in the dataset, the mite Aculops lycopersici, was over 1,000 times smaller, spanning just 32.5 Mb (16). Still smaller is the 15.3 Mb assembly of the marine parasite Intoshia variabili, which has the smallest animal genome currently known (13). But, since the I. variabili assembly was not available in GenBank as of June 2021, it was not included in our dataset.

Contiguity varied dramatically across groups. For instance, hominid assemblies (family Hominidae, n = 5) were the most contiguous with an average contig N50 of 24.2 Mb. Bird assemblies (class Aves, n = 515) were also highly contiguous (mean contig N50 = 1.4 Mb) despite being so numerous (and accumulating over a long period of time). On the other end of the spectrum, jellyfish and related species (phylum Cnidaria) exhibited some of the least contiguous genome assemblies with a mean contig N50 of 0.18 Mb (n = 65; Fig. 1E). Roughly 34% of animals with genome assemblies had corresponding annotations in GenBank but annotation rates differed substantially among groups (Fig. 1C). For example, the rate of arthropod annotations (22.3%) lags behind that for chordates (41.3%); however, much of this disparity appeared to be driven by the low and high annotation rates of butterflies and moths (order Lepidoptera) and birds (class Aves), respectively. Of 445 assemblies, just 6.5% of lepidopteran assemblies in GenBank have corresponding annotations versus 72.8% of birds (n = 519 assemblies; Fig. 1C). Notably, since most gene models are based on sequence similarity to known functional genes and not functional data, the true rate of annotation is likely even lower than reported here.

Animal genome assemblies have been contributed by researchers at institutions on every continent with permanent inhabitants, including 52 countries. From a regional perspective, institutions in North America (n = 1,331), Europe (n = 972), and Asia (n = 828) collectively accounted for 95.5% of all assemblies (Fig. 3A). And, nearly 70% of all animal genome assemblies have been submitted by researchers in just three countries: United States (n = 1,275), China (n = 676), and Switzerland (n = 317) (Fig. 3A). When countries were grouped by their inclusion in the Global North or South, similarly stark patterns emerged. Researchers affiliated with institutions in the Global North contributed roughly 75% of animal genome assemblies (Fig. 3B). From a taxonomic perspective, researchers at North American institutions have contributed the most insect and mammal assemblies, European researchers have contributed the most fish assemblies, and Asian researchers have contributed the most bird assemblies (Fig. 3A). The first assembly in GenBank from the Global North was deposited in 2004 and the first assembly from the Global South was deposited in 2011 (Fig. 3C). Since then, the number of assemblies deposited each year has steadily risen, with the proportions from the Global North and South staying relatively constant (Fig. 3C).

Where animal genome assemblies have been produced around the world according to the submitting institutions in GenBank. (A) For each geographic region, total numbers of genome assemblies are shown by dark circles with white lettering. This total is further broken down by country and taxon. For regions where more than four countries have contributed assemblies (e.g., Europe), an Other category represents all other countries. The same applies to all assemblies that are not insects, birds, fish, or mammals in the taxon plots. Countries are color-coded by assignment to the Global North or South. (B) The total number of genome assemblies contributed by countries in the Global North (e.g., United States, Europe, Australia) versus the Global South (e.g., Africa, South America, China, Mexico, Middle East). (C) The rate of genome assembly deposition by major sources in the Global North (Europe, United States) and Global South (China, Southeast [SE] Asia) as well as all other countries collectively in each (Other).

Use of long reads in genome assemblies and availability of key metadata also differ with geography. For assemblies deposited since 2018, researchers from the Global South have used long reads slightly more frequently than those from the Global North (25.7% versus 20.2%; Fig. 4A). However, researchers from the Global North were far less likely to report the types of sequence data used (19.9% of assemblies for the Global North versus 1.4% of assemblies for the Global South; Fig. 4A). Much of this difference appears to be driven by genome assemblies deposited by researchers at European institutions (Fig. 4B). This gap in metadata may reflect an issue with data mirroring between the European Nucleotide Archive (ENA) and GenBank. For instance, many new genome assemblies being generated by the United Kingdom, for example, are part of the Wellcome Sanger Institutes Darwin Tree of Life project, which is generating exceptionally high quality assemblies using long-read sequencing and depositing them into the ENA (Fig. 5). One region (Oceania) and three countries (Australia, Finland, India) reported long reads being used in more than 50% of deposited assemblies (Fig. 4 B and C).

Sequencing technologies used around the world (A) between the Global North versus Global South, (B) among regions, and (C) among countries. To limit bias due to the limited availability of long-read sequencing technologies before 2017 (Fig. 2B), only assemblies deposited on or after 1 January 2018 were included in the analysis and in C only countries that deposited five or more assemblies during the focal period (January 2018 to June 2021) are shown.

Examples of major contributors of genome assemblies for (A) butterflies (order Lepidoptera), (B) birds (class Aves), and (C) fish (primarily class Actinopterygii). Major contributors were defined as any consortium, organization, or project that has deposited more than 5% of all assemblies for butterflies and birds or 2.5% of all assemblies for fish.

Animal genome sequencing has dramatically progressed in the last 25 y. In that span, the field has moved from sequencing the first nuclear genome for any animal (1)a landmark achievementto targeting the generation of genome assemblies for all of Earths eukaryotic biodiversity (8). Here, we provided a contemporary perspective on progress toward this goal for the 1.6 million species in the animal kingdom (9). We showed that while tremendous progress has been made, major gaps and biases remain both in terms of taxonomic and geographic representation, at least within the most commonly used database of genomic resources, GenBank. For instance, a major bias exists in favor of vertebrates which are vastly overrepresented relative to their total species diversity (Fig. 1 AC). From the perspectives of biomedicine and human evolution, this bias is reasonable since humans are vertebrates. However, from a basic research perspective, particularly as it relates to genomic natural history and an overarching goal to sequence all animal genomes, there is a need to taxonomically diversify sequencing efforts.

At the highest taxonomic levels, 10 animal phyla still have no genomic representation. To illustrate the scale of this disparity versus other groups and the unique biology that is being overlooked, genome assemblies are available for 685 ray-finned fishes (class Actinopterygii) but none exists for phylum Nematomorpha, an 2,000-species clade of parasitic worms whose presence can dramatically alter energy budgets of entire stream ecosystems (17). Another phylum without genomic representationLoriciferawas first described in 1983 (18). This group of small, sediment-dwelling animals includes the only examples of multicellular species that spend their entire life cycles under permanently anoxic conditions (19). Loriciferans accomplish this feat by foregoing the energy-producing mitochondria found in virtually all animals in favor of hydrogenosome-like organelles akin to those found in prokaryotes inhabiting anaerobic habitats (19). Clearly, there is much to discover in terms of genomic diversity and functional biology in clades yet to be sampled.

A few select countriesprimarily the United States, several European nations, and Chinahave led the sequencing of the vast majority of animal genome assemblies (Fig. 3A). Aside from China, all of these countries are within the Global North. This pattern of geographic bias raises two potential issues for representation in animal genome science. First, the researcher population of animal genome sequencing likely does not reflect the global population. Second, sampling biases may exist toward the regions where most of the genome sequencing is occurring. Some of this bias is intentional and reflects funding goals for a given region. For instance, the Darwin Tree of Life project seeks to sequence the genomes of all 70,000 eukaryotic species living in Britain and Ireland. Still, however, similar to how sampling biases can yield skewed understanding of the natural world in other disciplines (e.g., ref. 20), so too could bias toward specific ecoregions, habitats, or other classifications skew genomic insight.

Inherently linked with questions of representation in animal genome science is the specter of parachute science (or helicopter research)the practice where international scientists, typically from wealthy nations, conduct studies in other countries that are often poorer without meaningful communication nor collaborations with local people (21). Parachute science has a long history in ecological research, and signatures of these practices have been observed for genome sciences. For instance, Marks etal. (22) found that the majority of plant genome assemblies for species that are native to South America and Africa were sequenced off-continent by researchers at European, North American, or Asian institutions. Given the sheer number of animal genome assemblies that have been submitted by a small number of countries and institutions, a similar pattern likely exists for animal genomes. However, to properly assess this issue, parsing authorship to quantify collaboration, at a minimum, would need to occur and this approach would still overlook key aspects of representation that need to be considered (e.g., if a researcher from the Global South is working at an institution in the Global North).

For the purpose of biological discovery, not all genome assemblies are created equal. As long-read sequencing technologies have matured, so too has the quality of assemblies being generated (4). In the last year alone, the largest ever animal genome assembly was deposited [Australian lungfish (15)] as well as the most complete human genome to date, a telomere-to-telomere assembly (23). Still, many species in GenBank only have low-quality assemblies available (i.e., contig N50 < 100 Kb with no corresponding gene annotations; Fig. 1). Since fragmentation and/or poor or missing gene annotations reduce the research value of an assembly, genome quality is important, particularly when the end goal is resource development for a broader community. As of April 2021, the Earth BioGenome Project sought assembly quality of 6.C.Q40 (https://www.earthbiogenome.org/assembly-standards) for reference genomes, where 6 refers to a 1e-6 contig N50 (i.e., 1 Mb). In our dataset, 568 assemblies (17.3%) reach this contiguity standard. And that number drops to 271 assemblies (8.3%) when contig N50 1 Mb and deposited gene annotations are both required. For reference, the C above refers to chromosomal scale scaffolding and Q40 to a less than 1/10,000 error rate. Neither of these metrics were assessed in this study.

Independent research laboratories, institutions, and consortia have contributed genome assemblies on both ends of the quality spectrum (Fig. 5). For example, among butterflies (order Lepidoptera), a bimodal quality distribution is being primarily driven by contributions made in 2021 by two submitting institutions, the Florida Museum of Natural History (e.g., ref. 24) and the Wellcome Sanger Institute (Fig. 5A). When viewing genome assembly contributions holistically across the animal Tree of Life, it is clear that two consortiathe Vertebrate Genomes Project (5) and the Darwin Tree of Life, part of the Wellcome Sanger Institutewarrant specific recognition for contributing exceptional genomic resources relative to closely related species (Fig. 5).

While animal genome science has dramatically matured in recent years, the field still rests on the cusp of massive change. Thousands of genome assemblies are now available for a wide range of taxa, a resource that can empower unprecedented scales of genomic comparison. Simultaneously, multiple consortia are building momentum toward their goals and generating some of the highest-quality genome assemblies ever produced. The field is also diversifying, with researchers around the world, particularly from the Global South, leading a rising number of efforts. These ongoing advances will yield higher-quality, more globally representative genome data for animals. As we collectively build toward this new genomic future, we offer recommendations to improve assembly quality and accessibility while also continuing to increase representation within the discipline.

The quality of a genome assembly is likely the most important factor dictating its long-term value. Genome assembly quality, however, is difficult to define. Here, we propose a holistic view on genome assembly quality that generally echoes the guidelines proposed by the Earth BioGenome Project and other consortia. Briefly, assemblies should reach minimum levels of contiguity (e.g., contig N50 > 1 Mb) and accuracy in order to be considered a reference that will likely not need to be updated for most applications. At a minimum, assemblies should also include high-quality gene annotations that perhaps take advantage of standardized pipelines [e.g., NCBI Eukaryotic Genome Annotation Pipeline (25)] to maximize compatibility across taxa. We recommend the field further improve the quality of genome assembly resources in two ways. First, refining and expanding the coordinated deposition of genome assemblies will improve the usability of the resources and reproducibility of analyses. It will also reduce duplications of effortthat is, when a group sequences a genome that has already been producedan issue that is likely to become increasingly common.

To refine and expand coordinated resource deposition, we recommend the continued use of GenBank (10) or one of the other archives that are members of the International Nucleotide Sequence Database Collaborationthe ENA and DNA Database of Japanas the central repositories for genome assemblies and their metadata given their tripartite data-sharing agreement. Next, we call on genetic archive administrators, consortia, and independent researchers to collectively improve the metadata submitted with each assembly and the mirroring of data across repositories. Too many assemblies lack basic information about the sequence data and methods used (e.g., Fig. 4) and, with the difficulty of linking assemblies to published studies (if available), it can be challenging or impossible to find this information. Further, an expansion of the metadata associated with each assemblyideally to make more of the categories required and expand demographic datawould make efforts to quantify geographic representation, for instance, far more straightforward. Alternatively, the metadata associated with genome assembly accessions could be integrated with existing efforts like the Genomic Observatories Metadatabase [GeOMe (26)]. Furthermore, a set of minimum quality characteristics for a genome assembly may need to be defined. A number of exceptionally low quality genome assemblies (e.g., with contig N50 values shorter than 1 Kb) that often cover only a small fraction of the expected total genome sequence length for a given group are present in GenBank. The presence of these assemblies raises the question: Where is the inflection point between resource quality and value to other researchers versus diluting the resources of a shared repository?

For our second recommendation, we amplify and expand the message of Buckner etal. (27) and Thompson etal. (28): Genome science needs specimen vouchers. Vouchers serve as a key physical link between taxonomy and molecular insight. Rarely, however, are vouchers referenced in publications of genome assemblies; only 11% of vertebrate assemblies included such a reference as of January 2020 (27). While vouchers represent a physical reference for assessing taxonomic classification or morphological variation, a properly stored voucher could also provide a long-term source of material for future resource improvement. If a physical specimen cannot be deposited, photographs and/or genomic DNA should be deposited in its place (e.g., ref. 29). Tied to the metadata discussion above, additional fields should be added to GenBank genome assembly accessions to directly link the assembly to a specimen, photo, or genomic DNA that has been deposited elsewhere.

Though geographic representation in animal genome science has improved in recent years, the discipline appears far from properly reflecting the global researcher pool. This issue is almost certainly multifaceted, likely stemming from a lack of infrastructure (e.g., fewer high-throughput sequencing platforms in developing countries), fewer resources for expensive molecular research, and a corresponding lack of training in genome data analysis. To bridge this gap and to empower a more diverse discipline, the nations and institutions that are devoting large amounts of resources to animal genome sequencing (e.g., China, United Kingdom, United States), and the researchers within those countries, should continue to develop meaningful collaborations with researchers within countries where their focal species reside (30). These meaningful collaborationswhere all parties are valued for their expertise and involved in decision makingimprove the science through transfer of local knowledge, provide a means for local researchers to expand their skillset and network while raising their scholarly profile, and, most importantly, can effectively end the practice of parachute science (30). Within-continent (or -country) initiatives also have transformative potential for people and genome research. For instance, the African-led effort to sequence 3 million African genomes over the next 10 y (the 3MAG project) will yield massive investment in African genomics, an incredible resource for understanding the full scope of human genetic diversity, and a new generation of African genome scientists (31). While focused on human genetics, the infrastructure and expertise that arise from the 3MAG project will no doubt translate to other taxa in the coming years.

A practical justification also exists for increasing representation in genome science, particularly as we seek to generate genome assemblies for every animal on Earth. The Global South is home to the bulk of the worlds biodiversity (32) and, as such, researchers in these regions have greater access to key habitats and specimens. Thus, it behooves everyone, including researchers in the Global North, to deepen collaborations with peers in the Global South while also helping to build indigenous capacity for collection, storage, and sequencing of new specimens.

Animal genome science continues to grow and expand at an exceptional rate. The coming years will surely see thousands, and perhaps tens of thousands, of new genome assemblies from across the Tree of Life, technological and analytical improvements, and some of the largest-scale and most in-depth studies of animal genome biology conducted to date. However, if we are to realize the ambitious goals of efforts like the Earth BioGenome Projecta self-described biological moonshotthe rate and mean quality of animal genome assembly production will have to increase by roughly two orders of magnitude. Regardless of rates and timelines, however, perhaps the most important goal for the future of animal genome science is that we empower a more diverse, representative researcher community in parallel with the generation of new resources.

All study data are included in the article and/or supporting information.

S.H. and J.L.K. were supported by NSF Award OPP-1906015. We thank Guangfeng Song, Eric Cox, and Anne Ketter from the Datasets development team at the NCBI for their responsiveness and receptiveness to improving this valuable tool for data science.

Author contributions: S.H., J.L.K., and P.B.F. designed research; S.H. and P.B.F. performed research; S.H. and P.B.F. analyzed data; and S.H., J.L.K., and P.B.F. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2109019118/-/DCSupplemental.

See the article here:
Toward a genome sequence for every animal: Where are we now? - pnas.org

Posted in Human Genetics | Comments Off on Toward a genome sequence for every animal: Where are we now? – pnas.org

Page 13«..10..12131415..2030..»