Page 11234..1020..»

Category Archives: Human Genetics

The dubious consent question at the heart of the Human Genome Project : Short Wave – NPR

Posted: July 21, 2024 at 2:35 am

Launched in 1990, a major goal of the Human Genome Project was to sequence the human genome as fully as possible. In 2003, project scientists unveiled a genome sequence that accounted for over 90% of the human genome as complete as possible for the technology of the time. Darryl Leja, NHGRI/Flickr hide caption

Launched in 1990, a major goal of the Human Genome Project was to sequence the human genome as fully as possible. In 2003, project scientists unveiled a genome sequence that accounted for over 90% of the human genome as complete as possible for the technology of the time.

The Human Genome Project was a massive undertaking that took more than a decade and billions of dollars to complete. For it, scientists collected DNA samples from anonymous volunteers who were told the final project would be a mosaic of DNA. Instead, over two-thirds of the DNA comes from one person: RP11. No one ever told him. Science journalist Ashley Smart talks to host Emily Kwong about his recent investigation into the decision to make RP11 the major donor and why unearthing this history matters to genetics today.

Read Ashley's full article in Undark Magazine here.

Curious about other biology stories? Email us at shortwave@npr.org.

Listen to Short Wave on Spotify and Apple Podcasts.

Listen to every episode of Short Wave sponsor-free and support our work at NPR by signing up for Short Wave+ at plus.npr.org/shortwave.

Today's episode was produced by Berly McCoy and edited by Rebecca Ramirez. They both checked the facts. Kwesi Lee was the audio engineer.

Originally posted here:
The dubious consent question at the heart of the Human Genome Project : Short Wave - NPR

Posted in Human Genetics | Comments Off on The dubious consent question at the heart of the Human Genome Project : Short Wave – NPR

Genomic variants associated with age at diagnosis of childhood-onset type 1 diabetes – Nature.com

Posted: July 11, 2024 at 2:43 am

Patterson CC, Karuranga S, Salpea P, Saeedi P, Dahlquist G, Soltesz G et al. Worldwide estimates of incidence, prevalence and mortality of type 1 diabetes in children and adolescents: Results from the International Diabetes Federation Diabetes Atlas. Diabetes Res Clin Pract. 2019;157:107842.

Article PubMed Google Scholar

Green A, Hede SM, Patterson CC, Wild SH, Imperatore G, Roglic G, et al. Type 1 diabetes in 2017: global estimates of incident and prevalent cases in children and adults. Diabetologia. 2021;64:274150.

Article PubMed PubMed Central Google Scholar

Leete P, Mallone R, Richardson SJ, Sosenko JM, Redondo MJ, Evans-Molina C. The effect of age on the progression and severity of type 1 diabetes: potential effects on disease mechanisms. Curr diabetes Rep. 2018;18:18.

Article CAS Google Scholar

Rakyan VK, Beyan H, Down TA, Hawa MI, Maslau S, Aden D, et al. Identification of Type 1 Diabetes Associated DNA Methylation Variable Positions That Precede Disease Diagnosis. PLOS Genet. 2011;7:e1002300.

Article CAS PubMed PubMed Central Google Scholar

Kumar D, Gemayel NS, Deapen D, Kapadia D, Yamashita PH, Lee M, et al. North- American twins with IDDM. Genetic, etiological, and clinical significance of disease concordance according to age, zygosity, and the interval after diagnosis in first twin. Diabetes. 1993;42:135163.

Article CAS PubMed Google Scholar

Olmos P, AHern R, Heaton DA, Millward BA, Risley D, Pyke DA et al. The significance of the concordance rate for type 1 diabetes in identical twins. Diabetologia. 1988;31:74750.

Kyvik KO, Green A, Beck-Nielsen H. Concordance rates of insulin dependent diabetes mellitus: a population based study of young Danish twins. BMJ. 1995;311:9137.

Article CAS PubMed PubMed Central Google Scholar

Hyttinen V, Kaprio J, Kinnunen L, Koskenvuo M, Tuomilehto J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes. 2003;52:10525.

Article CAS PubMed Google Scholar

Nerup J, Platz P, Andersen OO, Christy M, Lyngsoe J, Poulsen JE, et al. HL-A antigens and diabetes mellitus. Lancet Lond Engl. 1974;2:8646.

Article CAS Google Scholar

Bell GI, Horita S, Karam JH. A polymorphic locus near the human insulin gene is associated with insulin- dependent diabetes mellitus. Diabetes. 1984;33:17683.

Article CAS PubMed Google Scholar

Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, et al. Genome-wide association study and meta- analysis find that over 40 loci affect risk of type 1 diabetes. Nate Genet. 2009;41:7037.

Article CAS Google Scholar

Bradfield JP, Qu HQ, Wang K, Zhang H, Sleiman PM, Kim CE, et al. A genome-wide meta-analysis of six type 1 diabetes cohorts identifies multiple associated loci. PLoS Genet. 2011;7:e1002293.

Article CAS PubMed PubMed Central Google Scholar

Onengut-Gumuscu S, Chen WM, Burren O, Cooper NJ, Quinlan AR, Mychaleckyj JC, et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat Genet. 2015;47:3816.

Article CAS PubMed PubMed Central Google Scholar

Virgin HW, Todd JA. Metagenomics and personalized medicine. Cell. 2011;147:4456.

Article CAS PubMed PubMed Central Google Scholar

Inshaw JRJ, Cutler AJ, Burren OS, Stefana MI, Todd JA. Approaches and advances in the genetic causes of autoimmune disease and their implications. Nat Immunol. 2018;19:67484.

Article CAS PubMed Google Scholar

Caillat-Zucman S, Garchon HJ, Timsit J, Assan R, Boitard C, Djilali-Saiah I et al. Age- dependent HLA genetic heterogeneity of type 1 insulin-dependent diabetes mellitus. J Clin Invest. 1992;90:224250.

Article CAS PubMed PubMed Central Google Scholar

Howson JMM, Cooper JD, Smyth DJ, Walker NM, Stevens H, She JX, et al. Evidence of Gene-Gene Interaction and Age-at-Diagnosis Effects in Type 1 Diabetes. Diabetes. 2012;61:30127.

Article CAS PubMed PubMed Central Google Scholar

Inshaw JRJ, Cutler AJ, Crouch DJM, Wicker LS, Todd JA. Genetic Variants Predisposing Most Strongly to Type 1 Diabetes Diagnosed Under Age 7 Years Lie Near Candidate Genes That Function in the Immune System and in Pancreatic -Cells. Diabetes Care. 2020;43:16977.

Article CAS PubMed Google Scholar

Syreeni A, Sandholm N, Sidore C, Cucca F, Haukka J, Harjutsalo V, et al. Genome-wide search for genes affecting the age at diagnosis of type 1 diabetes. J Intern Med. 2021;289:66274.

Article CAS PubMed Google Scholar

Moffatt MF, Gut IG, Demenais F, Strachan DP, Bouzigon E, Heath S, et al. A large-scale, consortium-based genomewide association study of asthma. N Engl J Med. 2010;363:121121.

Article CAS PubMed PubMed Central Google Scholar

Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10:1523.

Article PubMed PubMed Central Google Scholar

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;11:2498504.

Article Google Scholar

Piero J, Bravo , Queralt-Rosinach N, Gutirrez-Sacristn A, Deu-Pons J, Centeno E, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids Res. 2017;45:D833D839.

Article PubMed Google Scholar

Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57:289300.

Article Google Scholar

Kielevinen V, Turtinen M, Luopajrvi K, Hrknen T, Ilonen J, Knip M, et al. Increased HLA class II risk is associated with a more aggressive presentation of clinical type 1 diabetes. Acta Paediatr. 2023;112:5228.

Article PubMed Google Scholar

Pllnen PM, Lempainen J, Laine AP, Toppari J, Veijola R, Vhsalo P, et al. Characterisation of rapid progressors to type 1 diabetes among children with HLA-conferred disease susceptibility. Diabetologia. 2017;60:128493.

Article PubMed Google Scholar

Bougnres P, Valleron AJ. Causes of early-onset type 1 diabetes: toward data-driven environmental approaches. J Exp Med. 2008;205:29537.

Article PubMed PubMed Central Google Scholar

Bougnres P, LeFur S, Valleron AJ. Early varicella infection is associated with a delayed onset of childhood type 1 diabetes. Diabetes Metabol. 2022;48:101394.

Verlaan DJ, Berlivet S, Hunninghake GM, Madore AM, Larivire M, Moussette S, et al. Allele-Specific Chromatin Remodeling in the ZPBP2/GSDMB/ORMDL3 Locus Associated with the Risk of Asthma and Autoimmune Disease. Am J Hum Genet. 2009;85:37793.

Article CAS PubMed PubMed Central Google Scholar

Kochi Y, Yamada R, Suzuki A, Harley JB, Shirasawa S, Sawada T, et al. A functional variant in FCRL3, encoding Fc receptor-like 3, is associated with rheumatoid arthritis and several autoimmunities. Nat Genet. 2005;37:47885.

Article CAS PubMed PubMed Central Google Scholar

Eskdale J, McNicholl J, Wordsworth P, Jonas B, Huizinga T, Field M, et al. Interleukin-10 microsatellite polymorphisms and IL-10 locus alleles in rheumatoid arthritis susceptibility. Lancet. 1998;352:12823.

Article CAS PubMed Google Scholar

Lin YJ, Wan L, Sheu JJC, Huang CM, Lin CW, Lan YC, et al. G/T polymorphism in the interleukin-2 exon 1 region among Han Chinese systemic lupus erythematosus patients in Taiwan. Clin Immunol. 2008;129:369.

Article CAS PubMed Google Scholar

Sun SC. Deubiquitylation and regulation of the immune response. Nat Rev Immunol. 2008;8:50111.

Article CAS PubMed PubMed Central Google Scholar

Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:11924.

Article CAS PubMed PubMed Central Google Scholar

Pothlichet J, Quintana-Murci L. The genetics of innate immunity sensors and human disease. Int Rev Immunol. 2013;32:157208.

Article CAS PubMed Google Scholar

Kahlmann D, Davalos-Misslitz ACM, Ohl L, Stanke F, Witte T, Frster R. Genetic variants of chemokine receptor CCR7 in patients with systemic lupus erythematosus, Sjogrens syndrome and systemic sclerosis. BMC Genet. 2007;8:33.

Article PubMed PubMed Central Google Scholar

Sigurdsson S, Nordmark G, Gring HHH, Lindroos K, Wiman AC, Sturfelt G, et al. Polymorphisms in the Tyrosine Kinase 2 and Interferon Regulatory Factor 5 Genes Are Associated with Systemic Lupus Erythematosus. Am J Hum Genet. 2005;76:52837.

Article CAS PubMed PubMed Central Google Scholar

Read the rest here:
Genomic variants associated with age at diagnosis of childhood-onset type 1 diabetes - Nature.com

Posted in Human Genetics | Comments Off on Genomic variants associated with age at diagnosis of childhood-onset type 1 diabetes – Nature.com

The untold story of the Human Genome Project: How one mans DNA became a pillar of genetics – STAT

Posted: July 11, 2024 at 2:43 am

STAT is co-publishing this investigation by Undark.

They numbered 20 in all 10 men and 10 women who came to a sprawling medical campus in downtown Buffalo, N.Y., to volunteer for what a news report had billed as the worlds biggest science project.

It was the spring of 1997, and the Human Genome Project, an ambitious attempt to read and map a human genetic code in its entirety, was building momentum. The projects scientists had refined techniques to read out the chemical sequences the series of As, Cs, Ts, and Gs that encode the building blocks of life. Now, the researchers just needed suitable human DNA to work with. More exactly, they needed DNA from ordinary people willing to have their genetic information published for the world to see. The volunteers who showed up at Buffalos Roswell Park Cancer Institute had come to answer the call.

To take part in the study was to assume risks that were hard to calculate or predict. If the volunteers were publicly outed, project scientists told them, they might be contacted by the media or by critics of genetic research of whom there were many. If the published sequences revealed a worrisome genetic condition that could be tied back to the volunteers, they might face discrimination from potential employers or insurers. And it was impossible to know how future scientists might use or abuse genetic information. No ones genome had ever been sequenced before.

But the volunteers were also informed that measures had been put in place to protect them: They would remain anonymous, and to minimize the chances that any one of them could be identified based on their unique genetic sequence, the published genome would be a patchwork, derived not from one person but stitched together from the DNA of a large number of volunteers. If we use the blood you donate to prepare DNA samples, the consent form read, we expect that no more than 10% of the eventual DNA sequence will have been obtained from your DNA.

Soon, however, those assurances began to wither. When a much-celebrated working draft of the human genome was published in 2001, the vast majority of it nearly 75 percent came from just one Roswell Park volunteer, an anonymous male donor known as RP11.

To this day, the story of how and why RP11 came to be the centerpiece of one of biologys crowning achievements has largely escaped public scrutiny. Even the scientists who helped orchestrate it disagree about the particulars.

To piece the story together, Undark reviewed more than 100 emails, letters, and other digital documents housed within the History of Genomics Archive at the National Human Genome Research Institute. The documents, provided to Undark through an institutional research collaboration agreement, reveal that the projects sourcing of human genetic material was more ethically fraught than official publications portrayed it to be, and included DNA harvested from a cadaver, and from one of the projects own scientists. The records, along with interviews with many of the projects central figures and with experts in law and bioethics, paint a picture in which high-ranking project officials constrained by their own experimental protocols and accelerated timelines veered from their guiding principles and pushed the boundaries of informed consent.

We were panicking, recalled Aristides Patrinos, who led the Department of Energys efforts in the Human Genome Project and, along with National Human Genome Research Institute director Francis Collins, helped steer the project to completion. So a lot of these issues were not front and center. Thats no excuse, but it was a reason. We were under a lot of pressure to make sure we finished by the time we finished.

The revelations potentially cast a stain on a project that had been extolled for its high ethical standards. Its a big deal when researchers act deceptively, which is to say they do things that they said they werent going to do, or dont do things that they said they were,said Paul Appelbaum, a Columbia University professor who specializes in legal and ethical issues in medicine, psychiatry, and genetics.It has the potential to negatively impact the research enterprise in general, and the benefits that can potentially come from it.

To the extent that an injustice was done, it has propagated far and wide. The genetic sequence that emerged from the Human Genome Projectcontinues to serve as a cornerstone resource of modern biology as a so-called reference genome, used ubiquitously by clinicians and researchers to identify genetic variants, sequence new genomes, and aid tests that determine patients genetic risks. Although the reference genome has undergone several refinements and incorporated new genetic material over the years, RP11 remains at the center of it all, with his DNA still constituting more than 70 percent of the most recent versions.

RP11 is likely unaware that his DNA played, and continues to play, such a pivotal role in the march of genetic science. Project leaders, hamstrung, they say, by a decades-old ethics panel decision, have never attempted to inform him.

Well, I think at this point, it probably would be a good idea to come out in the open and tell everybody what happened, said Patrinos. And give as many specifics as possible.

The Human Genome Project is often compared to the achievement of putting humans on the moon. Launched in 1990 by the Department of Energy and the National Institutes of Health, the project took 13 years and, at the time, around $3 billion to complete. By 2000, scientists had sequenced around 85 percent of the genome, and the milestone was marked with a White House ceremony. President Bill Clinton described it as more than just an epic-making triumph of science and reason. U.K. Prime Minister Tony Blair, who joined by satellite, called it the kind of breakthrough that takes humankind across a frontier and into a new era.

But in 1996, the project was at a crossroads. Francis Collins, then the director of NIHs National Center for Human Genome Research later renamed the National Human Genome Research Institute, or NHGRI was leading the international consortium of laboratories tasked with completing the sequence. Still in his mid-40s, the physicians star was rising. He had succeeded Nobel laureate James Watson years earlier as the centers director, and Barack Obama would later appoint him to the helm of NIH, the worlds largest public funder of biomedical and behavioral research. People who worked with him described him as a brilliant mind and a great communicator a passionate leader with legendary powers of persuasion.

Collins needed all of those qualities to manage the first sequencing of a human genome. It was a staggeringly complex operation. First, the entirety of a persons DNA a molecular sequence of more than 3 billion pairs of nucleotide bases, typically represented as As, Cs, Ts, and Gs had to be broken into fragments roughly 100,000 to 200,000 base pairs long. The fragments were then isolated and cloned, typically by specially preparing each one and inserting it into a bacterium, which copied the fragment as it reproduced. In this way, the teams scientists could make a physical copy of a persons full, albeit fragmented, genome known as a clone library.

Identical clone libraries could then be shipped to different laboratories around the world, allowing many research groups to read the fragments, and piece the sequences back together, in parallel. In a way, it was like distributing sets of the same, extraordinarily difficult jigsaw puzzle to a lineup of the worlds best puzzle solvers: They could work on different sections of the puzzle simultaneously and, if need be, check each others work.

By 1996, clone libraries were already being distributed to a variety of labs. But that spring, project members learned that several of the libraries had been constructed without any informed consent process and with no oversight from institutional review boards, or IRBs bodies that, according to federal policy, should have ethical purview over research with human subjects. Rumors swirled that some of the DNA had come from scientists involved with the project, a scenario that project members speculated could raise ethical questions about consent and invite charges of elitism. Internal project correspondence and tissue bank donation records reviewed by Undark suggest that another DNA source was the cadaver of a 19-year-old who had died by suicide; the family had donated the body to science but had not specifically consented to its use in the Human Genome Project.

It bothered Collins that at least one donors identity was known to project scientists, and that the donor was aware his DNA was being used to create a library. It sounds as if the donor knows who he is, he wrote in an email that March, after being briefed on a clone library that had been constructed at the California Institute of Technology. Thats not the way it should have been done.

In the wake of the revelation, Collins and Patrinos consulted an array of advisers and came up with a new plan, outlined in a joint guidance. They would find new donors and make new clone libraries, under new protocols. Unlike the old libraries, the new ones would be obtained through a double-blind procedure: Scientists involved with the project would not know the identities of the donors, and donors wouldnt know for certain whether their DNA was being used in the project. According to internal correspondence and interviews, project leadership was concerned not only about the genetic privacy of the donors, but also about the possibility that a donor might trumpet their role to the media and create a spectacle.

It seemed like it would create a major distraction from what we wanted to generate, recalled Robert Waterston, who headed one of the five centers that did the majority of the sequencing for the project.

We wanted the human genome, he added meaning a reference that everyone could relate to. Its not Joe Blows genome. Its your genome. Its my genome. Its representative of everybodys genome.

To further protect the two-way confidentiality, the completed representation of the human genome would be a mosaic, assembled from the DNA of not one but multiple donors. The thinking, among the projects inner circle, was that a mosaic would not only complicate attempts to identify donors based on the genetic sequence but also reduce the incentive for wanting to know the donors identities to begin with. If a donors identity did come to light, limiting their contributions might minimize their exposure to potential harms and deter them from attempting to claim property or ownership rights over the published sequence.

In a June 1996 email that appears to have been written by Melvin Simon, who led a cloning operation at Caltech, the scientist told Human Genome Project leadership, including Patrinos, that, as he understood it, no matter what waiver a volunteer is willing to sign, he or she would not lose ownership or property rights. Thus only by a true patchwork or anonymizing approach can it be made extremely difficult to claim such rights, the email read. (Simon confirmed the sentiment behind the email in an interview with Undark.)

Simons Caltech team and a laboratory at the Roswell Park Cancer Institute were each commissioned to create new clone libraries under the new protocols. Soon, however, the plans for a mosaic genome would veer off course, and the Human Genome Project would find itself in a consent conundrum with one person, RP11, caught in the middle.

Pieter de Jong, who led the cloning project at the Roswell Park Cancer Institute, had been behind some of the problematic libraries that had sparked Collins consternation in the spring of 1996. But he had a long history with the project, and he was a foremost expert at DNA cloning. So when the Human Genome Project enacted its new plan, they commissioned him to build at least five new libraries, de Jong recalled to Undark.

This time, de Jong used a lottery-like process to select donors. On March 23, 1997, he ran an advertisement in the Buffalo News seeking 20 volunteers. The edition also featured a front-page story about the project, which de Jong says he helped arrange. In the weeks that followed, the volunteers each came in, met with a genetic counselor, signed a consent form, and donated a few tablespoons of blood. The genetic counselor labeled each blood sample with a number, but created no records linking the samples to their donors.

The 20 samples were then transferred to de Jong, who chose two at random one male and one female to use for clone libraries. The only personal information the facility retained were the names and signatures on the consent forms, which were sealed in envelopes and stored in a locked file cabinet. As a result, it would be virtually impossible for anyone at Roswell Park to determine who the two donors were.

A postdoctoral researcher, Kazutoyo Osoegawa, did most of the work building the first library. Osoegawa was skillful, de Jong recalled, with a knack for coaxing large fragments of DNA from a sample for cloning: The larger the fragments, the more easily scientists could map them for sequencing, and the fewer fragments overall they would have to sequence to finish the job.

By August of 1997, de Jong, Osoegawa and their colleagues had begun distributing the first of the new Roswell Park clone libraries, RP11, and it was a good one with enough fragments for scientists to be fairly certain that they spanned essentially the entire genome, with few missing gaps. A second library was in the works, with more to follow. But, before those libraries could materialize, the Human Genome Projects plans took a turn.

On the evening of Sept. 20, 1998, Francis Collins emailed NHGRI brass, including Jane Peterson, a program director involved with the sequencing effort, and Mark Guyer, the institutes assistant director for scientific coordination, about an unhappy circumstance. I have been feeling uneasy about the RPC11 library ever since Jane uncovered the language that Pieter de Jong used for the consent form, he wrote. (The RP11 library was often referred to as RPC11 or RPCI-11 in correspondence.)

The specific language that unsettled Collins was the passage conveying that no more than 10 percent of the genetic sequence was expected to come from their DNA. And it was resurfacing at an inopportune moment.

The Human Genome Project was in the midst of what Maynard Olson, who led one of the projects sequencing labs, described in an email that September as a de facto drift away from the concept of a genome sequence that is a mosaic of contributions from many individuals. When de Jong crafted the consent language, he was under the impression that 10 new clone libraries would be built and integrated into the completed genome. But now project leaders were lurching toward a strategy that would draw most of the final sequence between 60 and 90 percent from a single clone library. And RP11 was their library of choice.

In his email to his NHGRI colleagues, Collins wrote that the document of general principles he and Patrinos had shared suggested an intent to include several donors but wasnt specific about it, nor does it put a ceiling on the amount of sequence that could come from a single person.

The 10 percent language in the consent form worried him, however. Attempting to reconsent RP11 under new terms would be complicated: RP11 could have been any of the 10 male donors, and all the researchers had to go on were the names on the consent forms. The only way he could think to do it, he wrote, would require asking every volunteer if they objected to the raising of the 10 percent restriction and then holding our breath that none of them do.

Technically, the word expect didnt forbid using RP11 for more than 10 percent of the sequence, Collins wrote, but how far can we push this?

The next month, Collins joined a conference call with de Jong, Roswell Park IRB chair Harold Douglass, and other Roswell Park and NHGRI staff. According to handwritten notes, Collins told them that limiting use of the clone library to 10 percent would devastate the momentum of the project and that there were concerns about recontacting all 10 male donors. The notes indicate that Douglass mentioned the IRB would ask about the benefit of fast-tracking the project, and Collins said there was a medical reason: to find as many genes ASAP to understand disease. (Speaking to Undark, Collins confirmed his participation in the call. He said the notes, taken by a different participant, used phrasing he wouldnt have used, but seemed correct.)

Days later, the Roswell Park IRB met and according to a written summary that was shared with Guyer voted unanimously against any attempts to try to find and reconsent the ten donors. Among the IRBs stated justifications were that the expectation expressed to the donors was not a guarantee, and that attempting to reconsent the 10 male volunteers would be difficult and could jeopardize RP11s anonymity. To delay the project by not expanding the use of RP11s library, the panel added, would itself be unethical, given the number of people who stood to derive health benefits from the timely completion of the human genome. (Douglass declined to comment for this story.)

Recently, Collins spoke to Undark about RP11 and the Human Genome Projects donor sourcing strategies. He was joined by Eric Green, who was also involved with the project and currently leads the National Human Genome Research Institute.

According to Collins and Green, project leaders did initially aim to construct 10 new clone libraries for use in the completed genome. But they soon realized it would be inefficient and chaotic to work with 10 libraries at once. There would be lots of complexities that would come out by having too much blending going on, Green said.

Collins explained that structural differences between individual genomes such as large-scale insertions or deletions of genes can make it difficult to stitch together an accurate sequence from two different human sources. If you go from one person to 10, he said, and then you try to fit the whole thing together, its going to be potentially much more error-prone.

It was primarily those technical challenges, Collins and Green said recently, that prompted the decision to derive most of the genome from a single donor. And RP11 with its well-sized fragments and comprehensive coverage of the genome stood out from the other libraries as the ideal one to work with, they said. Also, Green added, RP11 at the time was further along than any of the other new libraries in the process of being characterized and prepared for sequencing.

But Collins and Greens recollections diverge in key ways from those of other scientists involved in the Human Genome Project. Robert Waterston, for instance, who was among the small circle of researchers who guided project strategy, recalls that the complexities of blending clone libraries were only a minor consideration. Yes, structural differences in DNA could complicate the task of meshing one persons genetic sequence with anothers, he said, but only in certain regions of the genome, such as those marked by repeat sequences that differ in number and complexity from one person to the next.

The bigger factor, said Waterston, was time. And the Human Genome Project was pressed for time, he said, thanks to a man named J. Craig Venter.

In May 1998, the scientist Venter whose nonprofit Institute for Genomic Research had done pilot work for the Human Genome Project launched a venture built to rival the publicly funded initiative. That June, Venter and his colleagues pledged in a Science article that they would sequence a human genome by 2001 years ahead of the Human Genome Projects 2005 target deadline and at a fraction of the cost. The enterprise, known as Celera Genomics Group, set up shop in Rockville, Maryland, just miles from NHGRIs Bethesda headquarters.

Correspondence from that time suggests the news lit a fire under the Human Genome Project. Obviously there would be significant political advantages to getting something out a year earlier than Venter is proposing, provided we can defend its utility, wrote Phil Green, an investigator at the University of Washingtons sequencing center, in an email that was shared with Collins shortly after word of Venters plans began to spread.

Project members worried about the implications of a commercial enterprise owning, and possibly monetizing, the first human genome. For some of them, competition itself and the specter of a stinging defeat seemed to be motivation enough. In an email that September, NHGRIs Peterson described Eric Lander who led the Whitehead/MIT Center for Genome Research, one of the five large centers that sequenced the majority of the genome as having called her in a very depressed mood. Lander believed Venter would have a draft of the human genome done before next summer and will take continual pot shots at us, Peterson wrote. (Lee McGuire, chief communication officer at the Broad Institute, where Eric Lander is a member and founding director, told Undark that Lander was unavailable to be interviewed for this story.)

In a move that was widely reported in the media as being prompted by the Celera announcement, Collins announced that September that the Human Genome Project would aim to finish its genome two years earlier than planned, by 2003, and release a working draft by 2001.

We came into this crush with Celera, and everything just had to get done as quickly as possible, recalled Waterston. The complement of libraries theyd envisioned wasnt ready yet, and it wouldve taken time to make and distribute them, he said. They had to work with what they had, and what they had was RP11.

There just wasnt an alternative, Waterston recalled. We didnt have a second library to go to.

Marco Marra and John McPherson who along with Waterston did much of the preliminary characterization of clone libraries at Washington University similarly remember that it was the dearth of available libraries, more than the challenge of blending them together, that led the project to focus on a single donor.

That aligns with de Jongs recollection. RP11 was a good library, he told Undark, but so were subsequent libraries he built. The problem was that there was no time to wait. (De Jong shared records with Undark indicating that his lab had not yet completed the second of its planned new libraries by September 1998, when the issues around RP11s consent language arose; it is unclear whether the Caltech laboratory had completed and distributed the first of its planned new libraries to sequencing centers by that time, but Waterston recalls they hadnt.)

Although de Jong said he was not heavily involved in discussions of sequencing strategy, he thinks it began to dawn on the scientists how much additional work, and money, would be required to prepare and sequence 10 libraries, rather than one or two. They couldnt potentially keep up the same speed as Venter with his commercial effort if they would have stayed with the original plan, said de Jong. So I think it was mostly because they didnt want to lose the race.

Other members of the Human Genome Project who spoke with Undark expressed similar sentiments, including one of its highest-ranking figures. We got pretty panicky that we were going to lose this, Patrinos said of the competition with Celera. So at that time, we had to follow paths that would get us to the conclusion as fast as possible.

Asked if he felt Celera contributed to a sense of urgency at that time, Collins told Undark he didnt recall that being a factor that the rush, instead, was to get the job done to provide benefits for understanding health and disease. In a follow-up call, Collins clarified: I think Celeras intentions to produce a for-profit human genome sequence was an issue that everybody was fully aware of, so that was in the air, if you will. But he said it was not the driving factor at all in the decision to move as quickly as possible to obtain a complete public sequence.

In any case, on Oct. 27, 1998 five months after Venter launched his rival to the Human Genome Project, a month and a half after the project gave itself a new, ambitious deadline, weeks after Collins concerned email about RP11s consent language, and days after Collins conference call with the chair of the Roswell Park IRB the ethics panel gave Collins and his team carte blanche to dramatically expand the use of RP11s DNA, without telling any of the Roswell Park donors about the change.

That same month Simon and collaborator Hiroaki Shizuya having finished their first Caltech library under the new donor protection protocols told the DOEs Marvin Frazier that although the group had genetic material in hand to begin a second library, they had been informed that there was no longer a great deal of interest in new libraries, and they were instead moving on to new research pursuits.

Archival correspondence suggests the turn of events didnt sit well with all of the lead scientists involved in the project. I was deeply distressed to have the director of a major genome center already start building the case that the informed-consent form for DNA used to build RPC-11 did not really mean what it said, wrote Olson in a November 1998 email to Collins and his University of Washington colleague Phil Green. The ethical, legal, and social issues related to the library sourcing will not go away, he predicted.

Speaking to Undark, Olson said he does not recall which consent language, or which director, he was referring to in his email. But he remembers there being tension between the ethicists and technical experts involved with the project. Some of the ethicists resented the idea that technical considerations should factor into discussions, he said, and a lot of the more technically well-informed participants in the project just actually werent terribly interested in the ethics issues.

Undark invited several biomedical ethicists and legal experts to review the Roswell Park consent form and the IRBs ruling on RP11. Their responses called into question many of the justifications the ethics panel gave for its decision.

The big deal is that the 10% is not just a minor aspect of the consent form, wrote Hank Greely, a Stanford University Professor who works on ethical, legal, and social issues in the biosciences, in an email to Undark. Rather, he noted, it is a substantial part of the argument about confidentiality. Greely said that he didnt find any of the panels justifications convincing. He doesnt think the IRB acted nefariously, but he said that he would not have so hastily dismissed the possibility of attempting to reconsent the volunteers, and that doing so wouldnt necessarily have heightened the risks to the donor. Weve got these 10 names. Lets see if theyre in the phone book, he said, later adding, lets see how locatable they are.

Jonathan Moreno, a professor of medical ethics and health policy at the University of Pennsylvania who declined the offer to review documents but was briefed by Undark on the IRB decision, agreed that the volunteers should have been reconsented.

Appelbaum, the Columbia University legal and ethics specialist, was one of several experts who took issue with the panels interpretation of the 10 percent expectation. I think a reasonable person would take away from that that the intent of the research team was to use no more than 10 percent of his or her genome in the project, he said. And so playing with words in that way, I think, is really not appropriate in this context.

Appelbaum also thought it was odd for Collins, representing a sponsoring agency, to meet directly with an IRB chair on an ethical issue related to work the agency was sponsoring. There is a risk, he said, of exerting undue influence on the oversight process. Bruce Gordon, the assistant vice chancellor for regulatory affairs at the University of Nebraska Medical Center, told Undark that, generally speaking, the best practice would be that funders shouldnt be interacting with the IRB under any circumstance, though he described it as an unspoken rule, and not a strict standard.

Collins said he agreed the conference call was an unusual step, but that the significance of the situation justified it. I counted on the IRB to do what they always do, he said, which is to step back and take up a purely objective view of an ethical question and render their best opinion. I do not believe I put pressure on them at all.

Although ethicists and legal experts who spoke to Undark raised questions about the rationale of the IRBs ruling, many said it was unlikely that RP11 had suffered concrete harms as a result a point also expressed by Collins and other key figures from the Human Genome Project. Protections enacted in the U.S. since the completion of the Human Genome Project make it illegal for employers or health insurers to discriminate based on a persons genetic information. And experts say that without a matching DNA sample, it remains difficult to identify a person based solely on a genetic sequence. With a matching sample, however, it would be straightforward to identify the donor, whether their contribution was 70% or 7%.

I think its fair to say RP11 was probably misled about what was going to happen, said R. Alta Charo, a professor emerita of law and bioethics at the University of Wisconsin Madison. (Like Moreno, Charo declined the offer to review documents, but was briefed by Undark on the IRB decision.) The real question, however, said Charo, is whether the decision made him more identifiable, whether it exposed him to more risk. I dont know how to answer that question.

Appelbaum said it may be truethat RP11s risks werent substantially heightened by the decision to expand the use of his genetic sequence. But it seems to me that thats different from saying that the action wasnt consequential, he said, in the sense that it can be highly consequential, I think, for the research enterprise in this country to make promises to people in signed consent forms, and then violate those promises.

Appelbaum described the episode as illustrative of a long history of deceptions that have contributed to a lack of trust in the research enterprise, especially in minoritized communities. One of the big issues in human subjects research, which has assumed even greater salience in genomic research, has been the issue of trust, he said. If I agree to be in your project, are you leveling with me about whats going to happen to me? And if I agree to donate blood, or some other tissue sample, are you telling me the truth about how its going to be used?

The June 2000 White House ceremony that marked the Human Genome Projects sequencing milestone was a joint ceremony: At the presidential lectern that day, President Clinton was flanked on one side by Francis Collins and on the other by Craig Venter, whose Celera team was also nearing the finish line.

The following winter, the two teams each published landmark genome papers, with the Human Genome Projects report on its draft genome sequence officially appearing in the Feb. 15 issue of the prestigious journal Nature, and Celeras sequencing results appearing in the rival journal Science one day later.

Celera reported that its genome had been assembled from five unnamed donors, one of whom the majority donor Venter later revealed was himself.

Meanwhile, the Human Genome Project was circumspect about the donors behind its published sequence. A table in the Nature paper listed eight clone libraries that were described as having contributed the bulk of the sequence. Among them was RP11, which the table noted accounted for just over 74 percent of the draft genome. The other seven each contributed between 1.6 and 4.3 percent of the total. Additional libraries, neither named nor tallied in the paper, collectively accounted for the remaining 8.4 percent of the sequence.

The paper described the libraries as originating from anonymous DNA donors, according to a lottery-like process like the one used at Roswell Park. What was left unsaid but what consent documents, internal memos, and other records reviewed by Undark reveal is that six of the eight named libraries were the same ones that had raised ethics concerns early in the project: the library sourced from the 19-year-old cadaver; the libraries suspected to have been built with the DNA of project scientists; the libraries whose donors were known to project researchers. Collins and Patrinos had agreed in 1996 to let scientists use those libraries, provided the donors were properly consented, protocols were cleared by IRBs, and the libraries contributed minimally to the final sequence. (Caltechs Simon told Undark that it was a lab technicians husband and not a postdoc, as had been rumored who produced the sperm from which one of his early libraries was built.)

Also left unsaid was that four of the eight libraries had all been derived from the same donor.

Collins and NHGRI director Green could not confirm to Undark how many, if any, of the libraries outside of the top eight had been approved by IRBs. Collins also said he did not know if the family of the 19-year-old tissue donor had been reconsented in accordance with the 1996 guidelines.

Asked if he feels the project should have been more forthright in the 2001 paper about the sourcing of DNA donors, Collins said its always good in hindsight to be transparent and forthright in every way. To be honest though, I dont think in my view, that this was such a major substantial issue that it would have required a deep debate about exactly how to put that forward. He added, I dont believe that individuals were significantly put at risk by the way in which this was laid out. And I hope that doesnt get lost.

To Appelbaum, however, the idea that the Human Genome Projects landmark paper may have misrepresented donor procedures is gravely concerning the kind of transgression that can erode public trust in science more broadly. Perhaps an argument could be made to defend the projects DNA sourcing, Appelbaum said, but Im not sure theres any argument on the other side about covering up what you did when you publish your results. I think youve got to be open about that.

If you made certain decisions along the way, he said, you describe the decisions you made and the justification for them.

The culmination of the Human Genome Project was, in a way, the beginning of a long scientific afterlife for RP11s genetic sequence. A 2010 study, published in the journal Science, analyzed the reference genome and concluded that RP11 was of mixed African and European genetic ancestry, and likely identified as Black or African American.

Perhaps most consequential, however, is that the sequence that emerged from the human genome project has evolved into a foundational resource of modern genetics. It has been revised and improved through the years, each new edition, or reference assembly, augmented with new annotations and fixes.

Deanna Church, who led an international collaboration that managed the reference assemblies in the years following the Human Genome Projects completion, likens them to maps that give scientists a shared coordinate system for describing, comparing, and understanding genetic sequences. Researchers use them to interpret and identify fragments of DNA; clinicians and genetic testing companies use them as benchmarks to determine which genetic variants a person carries. The reference assembly that emerged from the Human Genome Project has become the foundation for all genomic data and databases, wrote the authors of a 2019 opinion piece in the journal Genome Biology.

And to this day, the most widely used reference assemblies continue to derive more than 70 percent of their sequence from a person who did not clearly consent to that level of use.

In recent years, Church and other experts have argued that it is time for a new reference model: The assemblies from the Human Genome Project do not adequately reflect the breadth of human genetic variation, they say. And although those reference assemblies are of exceptional quality by genome standards, a newer sequence, sourced from new DNA and known as the telomere-to-telomere assembly, is both more accurate and more comprehensive.

But a reference assemblys usefulness stems in large part from the information, annotations, and standards that are built on top of it, and it will take time for scientists to duplicate that infrastructure for a new reference genome.

Leslie Biesecker, chief of the Center for Precision Health Research at the NHGRI, estimates it will be three to five years before the community transitions to a new reference. There are so many pieces of machinery that need to be moved forward at the same time in order for that whole system to work.

Stanfords Greely, a lawyer by training, said its conceivable that were RP11 to learn of the outsized role his DNA played in genetic science, he might seek financial compensation. Without wanting to get into the merits of the claims, it could play out kind of the way the Henrietta Lacks story has, said Greely, referring to a Black woman who died of cervical cancer in 1951, and whose cells were harvested for science without her consent. (Lacks family members were recently awarded an undisclosed settlement from Thermo Fisher Scientific, over allegations the company unjustly profited from her cells.) If I were NIH, I would worry hey, if this guy knows, he might sue us or make trouble for us, Greely said.

Documents suggest the architects of the Human Genome Project worried about just such a scenario: a clause in the original consent form used at Roswell Park asserted that, by signing, a donor waived their rights to claim any part of conceivable profits resulting from research performed on the blood and products derived from the blood you donated. But emails sent to NHGRI leadership in July 1997 indicate that when Department of Health and Human Services officials learned of the clause, they argued it ran afoul of a federal regulation that bars consent language that could be construed as a waiver of legal rights. Although RP11 had likely already signed the original version, the waiver was removed from the consent form by that August.

These days, the trim beard Pieter de Jong wore during the days of the Human Genome Project has turned to gray. He now lives near Seattle, where he still runs a small clone library supply operation. This year, to free up space, he finally destroyed three of the five clone libraries he built for the Human Genome Project two of which he says the project never used, and a third that was incorporated into the reference sequence only in the genomes later revisions.

De Jong no longer knows the whereabouts of the 20 consent forms that were collected from the Roswell Park volunteers the only known records that identify the participants by name. Although study protocols stipulated that Roswell Park staff would maintain a chain of custody for the forms, Annie Deck-Miller, director of public relations at the center, now known as the Roswell Park Comprehensive Cancer Center, told Undark in an email that the facility no longer possesses any forms related to de Jongs study. In a subsequent emailed statement, representatives of Roswell Park indicated that documents related to the Human Genome Project were stored onsite for a number of years, as required by federal regulations. They declined to comment further, however, citing a lack of capacity to engage in a review of decisions purported to have taken place in a confidential meeting conducted 26 years ago. Collins and Green say they have never attempted to notify Roswell Park donors about the change to the sequencing plan, and that the IRB decision does not permit them to.

There is, however, one Human Genome Project donor whose whereabouts de Jong knows precisely: the person behind the four clone libraries that accounted for more than 9 percent of the draft sequence.

De Jong recalls that he and a visiting collaborator created those libraries in the summer of 1993. They did it quickly he was in a hurry to apply for grants and get something going and he said there were few ethical guardrails to guide them. De Jong felt it would be inappropriate to solicit DNA from one of his lab workers, so my collaborator my visitor and me, we exchanged, we both tossed up and we gave blood samples for the project.

One of those samples yielded clone libraries that helped spark the 1996 panic over donors: libraries whose origins project leaders worried might leak to the press, de Jong said, but that nonetheless found their way into the worlds first human genome sequence.

It ended up being me, de Jong said, matter-of-factly. The reference genome is maybe 80 percent or 75 percent RP11, and maybe 10 percent me.

If you or someone you know donated to or otherwise participated in the Human Genome Project and you would like to share your story, Undark and STAT would like to hear from you. Contact us at[emailprotected].

Undark is a nonprofit, editorially independent digital magazine exploring the intersection of science and society.

See original here:
The untold story of the Human Genome Project: How one mans DNA became a pillar of genetics - STAT

Posted in Human Genetics | Comments Off on The untold story of the Human Genome Project: How one mans DNA became a pillar of genetics – STAT

Metabolic gene function discovery platform GeneMAP identifies SLC25A48 as necessary for mitochondrial choline import – Nature.com

Posted: July 11, 2024 at 2:43 am

Prosser, G. A., Larrouy-Maumus, G. & de Carvalho, L. P. S. Metabolomic strategies for the identification of new enzyme functions and metabolic pathways. EMBO Rep. 15, 657669 (2014).

Article PubMed PubMed Central Google Scholar

Pizzagalli, M. D., Bensimon, A. & Superti-Furga, G. A guide to plasma membrane solute carrier proteins. FEBS J. 288, 27842835 (2021).

Article PubMed Google Scholar

Wiedmer, T., Ingles-Prieto, A., Goldmann, U., Steppan, C. M. & Superti-Furga, G. Accelerating SLC transporter research: streamlining knowledge and validated tools. Clin. Pharmacol. Ther. 112, 439442 (2022).

Article PubMed PubMed Central Google Scholar

Csar-Razquin, A. et al. A call for systematic research on solute carriers. Cell 162, 478487 (2015).

Article PubMed Google Scholar

Shi, X. et al. Combinatorial GxGxE CRISPR screen identifies SLC25A39 in mitochondrial glutathione transport linking iron homeostasis to OXPHOS. Nat. Commun. 13, 2483 (2022).

Article CAS PubMed PubMed Central Google Scholar

Kenny, T. C. et al. Integrative genetic analysis identifies FLVCR1 as a plasma-membrane choline transporter in mammals. Cell Metab. 35, 10571071.e12 (2023).

Article CAS PubMed PubMed Central Google Scholar

Wang, Y. et al. SLC25A39 is necessary for mitochondrial glutathione import in mammalian cells. Nature 599, 136140 (2021).

Article CAS PubMed PubMed Central Google Scholar

Unlu, G. et al. Metabolic-scale gene activation screens identify SLCO2B1 as a heme transporter that enhances cellular iron availability. Mol. Cell 82, 28322843.e7 (2022).

Article CAS PubMed PubMed Central Google Scholar

Dvorak, V. et al. An overview of cell-based assay platforms for the solute carrier family of transporters. Front. Pharmacol. 12, 722889 (2021).

Article CAS PubMed PubMed Central Google Scholar

Barroso, I. & McCarthy, M. I. The genetic basis of metabolic disease. Cell 177, 146161 (2019).

Article PubMed PubMed Central Google Scholar

Rios, S. et al. Plasma metabolite profiles associated with the World Cancer Research Fund/American Institute for Cancer Research lifestyle score and future risk of cardiovascular disease and type 2 diabetes. Cardiovasc. Diabetol. 22, 252 (2023).

Article CAS PubMed PubMed Central Google Scholar

Wang, F. et al. Plasma metabolomic profiles associated with mortality and longevity in a prospective analysis of 13,512 individuals. Nat. Commun. 14, 5744 (2023).

Article CAS PubMed PubMed Central Google Scholar

Schlosser, P. et al. Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine. Nat. Genet. 55, 9951008 (2023).

Article CAS PubMed PubMed Central Google Scholar

Yin, X. et al. Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci. Nat. Commun. 13, 1644 (2022).

Article CAS PubMed PubMed Central Google Scholar

Chen, Y. et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat. Genet. 55, 4453 (2023).

Article PubMed PubMed Central Google Scholar

Surendran, P. et al. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat. Med. 28, 23212332 (2022).

Article CAS PubMed PubMed Central Google Scholar

Yin, X. et al. Integrating transcriptomics, metabolomics, and GWAS helps reveal molecular mechanisms for metabolite levels and disease risk. Am. J. Hum. Genet. 109, 17271741 (2022).

Lotta, L. A. et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nat. Genet. 53, 5464 (2021).

Article CAS PubMed PubMed Central Google Scholar

Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 10911098 (2015).

Article PubMed PubMed Central Google Scholar

Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245252 (2016).

Article CAS PubMed PubMed Central Google Scholar

Porcu, E. et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat. Commun. 10, 3300 (2019).

Article PubMed PubMed Central Google Scholar

Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).

Article PubMed PubMed Central Google Scholar

Zhou, D. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat. Genet. 52, 12391246 (2020).

Article CAS PubMed PubMed Central Google Scholar

Mogil, L. S. et al. Genetic architecture of gene expression traits across diverse populations. PLoS Genet. 14, e1007586 (2018).

Article PubMed PubMed Central Google Scholar

Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 94409445 (2003).

Article PubMed PubMed Central Google Scholar

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583589 (2021).

Article CAS PubMed PubMed Central Google Scholar

Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

Article CAS PubMed Google Scholar

Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-Spliceimproving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021).

Article CAS PubMed PubMed Central Google Scholar

Jeong, H., Tombor, B., Albert, R., Oltval, Z. N. & Barabsl, A. L. The large-scale organization of metabolic networks. Nature 407, 651654 (2000).

Article CAS PubMed Google Scholar

Stacey, D. et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 47, e3 (2019).

Article CAS PubMed Google Scholar

Li, F., Chen, Y., Anton, M. & Nielsen, J. GotEnzymes: an extensive database of enzyme parameter predictions. Nucleic Acids Res. 51, D583D586 (2023).

Article CAS PubMed Google Scholar

Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 13181330 (2020).

Article CAS Google Scholar

Zeisel, S. H. & Da Costa, K. A. Choline: an essential nutrient for public health. Nutr. Rev. 67, 615623 (2009).

Article PubMed Google Scholar

Combs, G. F. Jr. & McClung, J. P. (eds) in The Vitamins 523589 (Academic Press, 2022).

Kennedy, E. P. & Weiss, S. B. The function of cytidine coenzymes in the biosynthesis of phospholipides. J. Biol. Chem. 222, 193214 (1956).

Article CAS PubMed Google Scholar

Ducker, G. S. & Rabinowitz, J. D. One-carbon metabolism in health and disease. Cell Metab. 25, 2742 (2017).

Article PubMed Google Scholar

Dragolovich, J. Dealing with salt stress in animal cells: the role and regulation of glycine betaine concentrations. J. Exp. Zool. 268, 139144 (1994).

Article CAS Google Scholar

Ueland, P. M. Choline and betaine in health and disease. J. Inherit. Metab. Dis. 34, 315 (2011).

Article CAS PubMed Google Scholar

Chen, W. W., Freinkman, E., Wang, T., Birsoy, K. & Sabatini, D. M. Absolute quantification of matrix metabolites reveals the dynamics of mitochondrial metabolism. Cell 166, 13241337.e11 (2016).

Article CAS PubMed PubMed Central Google Scholar

Palmieri, F. The mitochondrial transporter family SLC25: identification, properties and physiopathology. Mol. Aspects Med. 34, 465484 (2013).

Article CAS PubMed Google Scholar

Chen, S. et al. A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. Preprint at bioRxiv https://doi.org/10.1101/2022.03.20.485034 (2022).

Son, Y., Kenny, T. C., Khan, A., Birsoy, K. & Hite, R. K. Structural basis of lipid head group entry to the Kennedy pathway by FLVCR1. Nature 629, 710716 (2024).

Article CAS PubMed Google Scholar

Ri, K. et al. Molecular mechanism of choline and ethanolamine transport in humans. Nature 630, 501508 (2024).

Verkerke, A. R. P., Shi, X., Abe, I., Gerszten, R. E. & Kajimura, S. Mitochondrial choline import regulates purine nucleotide pools via SLC25A48. Preprint at bioRxiv https://doi.org/10.1101/2023.12.31.573776 (2024).

Patil, S. et al. SLC25A48 is a human mitochondrial choline transporter. Preprint at medRxiv https://doi.org/10.1101/2023.12.04.23299390 (2023).

Ardlie, K. G. et al. The GenotypeTissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648660 (2015).

Article Google Scholar

Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

Article PubMed PubMed Central Google Scholar

Lonsdale, J. et al. The GenotypeTissue Expression (GTEx) project. Nat. Genet. 45, 580585 (2013).

Article CAS Google Scholar

Yavorska, O. O. & Burgess, S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int. J. Epidemiol. 46, 17341739 (2017).

Article PubMed PubMed Central Google Scholar

Burgess, S., Thompson, S. G. & CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int. J. Epidemiol. 40, 755764 (2011).

Foley, C. N. et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat. Commun. 12, 764 (2021).

Article CAS PubMed PubMed Central Google Scholar

Wu, P. et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inform. 7, e14325 (2019).

Article PubMed PubMed Central Google Scholar

Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 8293 (2011).

Article PubMed PubMed Central Google Scholar

Choi, S. W. & OReilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. Gigascience 8, giz082 (2019).

Read more from the original source:
Metabolic gene function discovery platform GeneMAP identifies SLC25A48 as necessary for mitochondrial choline import - Nature.com

Posted in Human Genetics | Comments Off on Metabolic gene function discovery platform GeneMAP identifies SLC25A48 as necessary for mitochondrial choline import – Nature.com

Limitations in next-generation sequencing-based genotyping of breast cancer polygenic risk score loci | European … – Nature.com

Posted: June 24, 2024 at 2:41 am

Missing loci & convertibility to hg38

For four BC PRS loci, no variants were listed at the specified genomic position in gnomAD v2.1.1, namely rs572022984, rs113778879, rs73754909, and rs79461387. gnomAD v3.1.2 also reported no variants for three of these four loci for corresponding loci in hg38 as defined by dbSNP [23] (Supplementary Table2). Locus rs572022984 was listed but with an overall allele count of zero in NFE samples (Table2).

For two loci, conversion to hg38 resulted in a change in alleles, namely for rs143384623 (hg19: 1-145604302-C-CT; hg38: 1-145830798-C-CA) and rs550057 (hg19: 9-136146597-C-T; hg38: 9-133271182-T-C). For rs143384623, the change of the alternative allele from CT to CA did not result in a noticeable shift in AFs observed in gnomAD NFE samples (5142/13304 (0.39) in v2.1.1 versus 24316/64610 (0.38) in v3.1.2, two-sided Fishers exact test p=0.14). For rs550057, the observed AFs appeared exactly opposite, i.e., 3786/14828 (0.26) for allele T in gnomAD v2.1.1 and 49878/67552 (0.74) for allele C in gnomAD v3.1.2. Therefore, 149878/67552 was assumed as the gnomAD v3.1.2 effect AF at this bi-allelic site.

For 39 of the 320 PRS loci listed with AF>0 in gnomAD v3.1.2, at least one observation of technical artifacts was reported: 38 loci were flagged as being located in low-complexity regions, 3 as being localized at a low-quality site, and 1 failed the allele-specific VQSR filter (Supplementary Table2).

Due to the absolute difference threshold 0.016 (Supplementary Fig.1), 24 loci were determined as showing deviating AFs compared to CanRisk (Fig.1, Table2). Absolute differences ranged from 0.03 to 0.71, and for 21 out of these 24 loci (87.5%), technical artifacts were reported in gnomAD v3.1.2.

Extremely deviating AFs with an absolute difference>0.016 are indicated by red markers.

All 49 PRS loci for which a noticeably deviating AF was observed in at least one of the data sets provided by the five participating GC-HBOC centers are listed in Table3.

For the IMGAG DRAGEN data, 0.052 was calculated as threshold to determine noticeably deviating AFs (Supplementary Fig.2), resulting in 18 loci affected (Table3, Fig.2). Of these, 16 were previously also identified as missing or showing noticeably deviating AFs in gnomAD v3.1.2. The exceptions were rs62485509 and rs9931038. For IMGAG freebayes data, 0.036 was calculated as threshold (Supplementary Fig.2), resulting in 16 loci from the BCAC 313 BC PRS determined as showing a noticeably deviating AF. Of these, 11 loci were also identified as showing deviating AF in IMGAG DRAGEN data, and all but rs12406858 and rs11268668 were previously identified as missing or showing deviating AFs in gnomAD v3.1.2.

Data were provided by the Institute of Medical Genetics and Applied Genomics (IMGAG) at University Hospital Tbingen, Institute for Clinical Genetics (ICG) at University Hospital Carl Gustav Carus Dresden, by the Department of Medical Genetics (DMG) at University Hospital Mnster, by the Center for Familial Breast and Ovarian Cancer (CFBOC) at University Hospital Cologne, and by the Institute of Human Genetics (IHG) at the University of Regensburg.

Considering genotyping data provided by the ICG based on 585 samples, 23 of the overall 324 PRS loci did not meet the minimum quality criteria (read depth20) in more than 25% of samples and were discarded (Supplementary Table3). Additionally, GATK reported read depth <20 for >25% of samples for rs56097627 and rs143384623. For 260 of the remaining 299 PRS loci (86.96%), forced genotyping with GATK and freebayes resulted in the observation of identical AFs. For both ICG GATK and freebayes data, 0.063 was calculated as threshold to determine noticeably deviating AFs (Supplementary Fig.3). Using this threshold, 11 loci showed noticeably deviating AFs in the GATK data set (including two loci exclusive for BCAC 313 BC PRS) and 14 loci in the freebayes data set (including three loci exclusive for BCAC 313 BC PRS), respectively, with an overlap of 7 (Table3, Fig.2).

The DMG provided GATK- and DRAGEN-based BRIDGES 306 BC PRS genotyping data of 545 samples. Locus rs138179519 did not meet the quality criteria, and additionally rs774021038 using DRAGEN. Of the remaining 304 loci, 252 (82.89%) showed identical AFs (Supplementary Table3). Using a threshold of 0.052 (Supplementary Fig.4), resulted in 20 loci showing deviating AFs in GATK data and14 loci in DRAGEN data, respectively,with an overlap of 9 loci.

For the CFBOC data based on 412 samples, a threshold of 0.047 was calculated (Supplementary Fig.5). The loci of the BRIDGES 306 BC PRS were considered, 243 (79.41%) of which showed identical AFs for both callers applied (Supplementary Table3). Overall 25 loci (all of which are included also in the BCAC 313 BC PRS) showed deviating AFs: 16 loci in GATK and 19 loci in freebayes data, with an overlap of 10 loci.

The IHG provided GATK- and CLC-based BRIDGES 306 BC PRS genotyping data of 251 samples (Supplementary Methods). Four loci did not meet the quality criteria in both settings, and additional four in the CLC setting. Of the remaining 298 loci, 228 (76.51%) showed identical AFs (Supplementary Table3). Using a threshold of 0.063 (Supplementary Fig.6), resulted in 23 loci showing noticeably deviating AFs in GATK data, respectively 19 loci in CLC data, with an overlap of 10 loci.

In summary, for four loci, deviating AFs were reported in all GC-HBOC real-world settings examined, namely for rs56097627, rs113778879, rs57589542, and rs3988353. Further four loci, namely rs574103382, rs73754909, rs3057314, and rs57920543, were reported with deviating AFs in all settings except for one (Table3).

However, there were also 16 loci that were conspicuous in a single setting exclusively, namely five in IHG GATK data (rs1511243, rs4880038, rs1027113, rs12709163, rs1111207), three each in ICG freebayes data (rs34207738, rs147399132, rs199504893) and in IHG CLC data (rs10975870, rs11049431, rs144767203), two in DMG GATK data (rs10644978, rs66987842), and one each in IMGAG DRAGEN (rs9931038), IMGAG freebayes data (rs12406858), and CFBOC freebayes data (rs140702307). Another three loci (rs10074269, rs55941023, rs35054928) showed AF deviations in only one center, but these were concordant.

Considering the loci non-existent in gnomAD v3.1.2, rs113778879 was not observed with expected AF in any GC-HBOC center, and rs73754909 only with forced DRAGEN calling in DMG data. For rs79461387, expected AFs were reported consistently when using freebayes, but not by unforced DRAGEN calling and in two settings using forced GATK. Of note, rs572022984 with zero allele count in gnomAD v3.1.2 NFEs and an expected AF of 0.0364 in CanRisk, was consistently not observed at all or with a maximum AF of 0.0037 (Supplementary Table3).

Five loci showing aberrant AFs in gnomAD v3.1.2 NFEs (Table2) were not reported with deviating AF by any of the participating GC-HBOC centers, namely rs78425380, rs62331150, rs60954078, rs10862899, and rs112855987.

Without further information and assuming a standardized PRS at the 50th percentile, the estimated 10-year risks of developing primary BC of cancer-unaffected women of 20, 40, and 60 years of age were 0.1%, 1.5%, and 3.4% according to CanRisk (Supplementary Table4). Percentiles of PRSs from artificial VCF files with aberrant dosages (see Materials and Methods) ranged from 47.5% (IHG CLC, BRIDGES 306) up to 55.7% (ICG freebayes, BCAC 313). The risk of 0.1% for a 20-year-old woman was concordantly unchanged in all scenarios including artificial PRSs. For a 40-year-old woman, estimated 10-year risks were increased by 0.1% in seven scenarios, and for a 60-year-old woman by up to 0.2% in eight scenarios.

Estimated remaining lifetime risks of developing primary BC assuming an average PRS (50th percentile) of cancer-unaffected women aged 20, 40, and 60 years are 11.3%, 10.9%, and 7.1% according to CanRisk (Supplementary Table4). When using PRSs from artificial VCF files with aberrant dosages, estimated lifetime risks ranged from 11.1% up to 11.9% for a 20-year-old woman, from 10.6% up to 11.4% for a 40-year-old woman, and from 7.0% up to 7.4% for a 60-year-old woman. The lowest estimates were obtained with the BRIDGES 306 BC PRS based on IHG CLC data with 19 artificial dosages imputed, and the highest with the BCAC 313 BC PRS based on ICG freebayes data with 14 artificial dosages imputed.

For 20 PRS loci showing noticeably deviating AFs in at least one real-world NGS data set, alternative alleles or overlapping variants with minimum AF 0.01 in NFEs were reported in gnomAD v3.1.2 (Supplementary Table5). For rs73754909 and rs79461387, both SNVs and non-existent in gnomAD v3.1.2, deletions were reported with comparable AFs to the ones expected by CanRisk. For both deletions, the adjacent downstream nucleotide of the reference sequence was identical to the substituted nucleotide of the expected effect allele (Fig.3). For rs113778879, which is also an SNV not contained in gnomAD v3.1.2, a similar observation could be made (Supplementary Fig.7), but the reported AF exceeds the expected one by more than 0.1 (0.5762 versus 0.6818).

Both alternative alleles are deletions with the adjacent downstream nucleotide identical to the expected substituted one.

For 28 out of the 49 loci showing noticeable deviating AFs in at least one real-world data set, proxies in 1000G GRCh37 microarray data, 1000G GRCh38 High Coverage WGS data, or TOPMED European data could be identified (Supplementary Table6). For rs113778879, rs73754909, and rs79461387, LDpair based on GRCh38 reported the same alternative alleles as gnomAD v3.1.2 (Supplementary Table5), where the original PRS loci are non-existent.

Proxies and alternative alleles showing AFs in gnomAD v3.1.2 comparable to expected CanRisk AFs, i.e., an absolute deviation <0.016, were considered as possible workarounds for improved PRS genotyping, and further evaluated with respect to observed AFs in IMGAG freebayes data (Table4). For 19 of these 21 PRS loci, absolute differences between expected and observed AFs in IMGAG freebayes data remained below the previously defined IMGAG freebayes-specific threshold of 0.036. The exceptions were the substitutions of rs12406858 and rs79461387. The latter is noteworthy because the original PRS locus, which is an SNV, was correctly called by freebayes in forced and unforced mode (Table3), whereas GATK HaplotypeCaller seemed to call an overlapping deletion of sequence GAG in DMG and CFBOC data. Also noteworthy are the potential replacements of rs73754909 and rs111833376, as both variants were called with noticeably deviating AFs in most real-world data sets.

Here is the original post:
Limitations in next-generation sequencing-based genotyping of breast cancer polygenic risk score loci | European ... - Nature.com

Posted in Human Genetics | Comments Off on Limitations in next-generation sequencing-based genotyping of breast cancer polygenic risk score loci | European … – Nature.com

Y chromosome is evolving faster than the X, primate study reveals – Livescience.com

Posted: June 24, 2024 at 2:41 am

The Y chromosome in primates including humans is evolving much more rapidly than the X chromosome, new research on six primate species suggests.

For instance, humans and chimpanzees share upwards of 98% of their DNA across the whole of the genome, but just 14% to 27% of the DNA sequences on the human Y chromosome are shared with our closest living relatives.

The finding surprised scientists, given that humans and chimpanzees diverged just 7 million years ago a blip in evolutionary terms.

"I expect my genome to be very different to that of bacteria or insects because a lot of time has elapsed, evolutionarily speaking," study co-author Brandon Pickett, a postdoctoral fellow at the National Human Genome Research Institute (NHGRI) at the National Institutes of Health, told Live Science. "But from other primates, I expect it to be pretty similar."

Related: Genomes of 51 animal species mapped in record time, creating 'evolutionary time machine'

It's not clear exactly why the Y chromosome is evolving so rapidly. For starters there is only a single copy of the Y chromosome per cell in primates, females carry two copies of the X chromosome, while males carry an X and a Y chromosome the Y chromosome plays a critical role in sperm production and fertility. Having only a single copy of the Y chromosome presents a vulnerability if changes happen to occur, there is no second chromosome to act as a backup.

And changes are likely to occur due to something called mutation bias. The Y chromosome may be so prone to change because it generates many sperm. This requires lots of DNA replication. And every time DNA is copied, there's a chance for mistakes to creep in.

Get the worlds most fascinating discoveries delivered straight to your inbox.

Scientists have previously sequenced the primate genome for all 16 representative families.

In the new study, published May 29 in the journal Nature, scientists compared the sex chromosomes of five great ape species chimpanzees (Pan troglodytes), bonobos (Pan paniscus), western lowland gorillas (Gorilla gorilla gorilla) and Bornean and Sumatran orangutans (Pongo pygmaeus and Pongo abelii) and one more distantly related to humans, siamang gibbons (Symphalangus syndactylus).

The team studied the chromosomes using telomere-to-telomere (T2T) sequencing. T2T can accurately sequence repetitive elements, including the protective telomere "caps" of chromosomes that have proven difficult to read in the past, Pickett said. The researchers used computing software to make comparisons between the sequencing results, by creating alignments to reveal which parts of the chromosome had changed and which parts had stayed the same.

The chromosomal X and Y sequences of each of the six species were also compared to the human X and Y chromosome, already sequenced in an earlier studywith the T2T method.

The findings revealed that across all the studied species, the Y chromosome evolved rapidly. Even species in the same genus have very different Y chromosomes to one another. For instance, chimpanzees and bonobos diverged just 1 million to 2 million years ago, yet there is a dramatic difference in their Y chromosome lengths, said Christian Roos, a senior scientist at the Primate Genetics Laboratory, German Primate Center, who was not involved in the study.

In some cases the difference in length caused by chromosome losses or duplications that occur when DNA is copied amounted to up to about half of the observed differences. For example, the Y chromosome from the Sumatran orangutan is twice as long as the gibbon's Y chromosome.

In contrast, the study found that the X chromosome was highly conserved across the primate species, as might be expected for a structure with a critical role in reproduction.

One reason the Y seems to have thrived despite such a high rate of mutation is that across all the studied species, it contains stretches of highly repetitive genetic material, such as palindromic repeats, where the sequence reads the same forward and backward. Nestled within these stretches of repeating DNA are genes. So the repeated DNA may safeguard important genes from replication mistakes and thereby preserve essential biological material, the researchers wrote in their paper.

The study did have limitations though; it looked at only a single representative for each primate species, and it couldn't say how much the Y chromosome would vary within animals of the same species, Pickett said.

Original post:
Y chromosome is evolving faster than the X, primate study reveals - Livescience.com

Posted in Human Genetics | Comments Off on Y chromosome is evolving faster than the X, primate study reveals – Livescience.com

Genetic association mapping leveraging Gaussian processes | Journal of Human Genetics – Nature.com

Posted: June 4, 2024 at 2:49 am

Gaussian Process (GP)

Gaussian Process (GP) is a type of stochastic processes, whose application in the machine learning field enables us to infer a nonlinear function f(x) over a continuous domain x (e.g., time and space). Precisely, f(x) is a draw from a GP, if {f(x1), , f(xN)} follows a N-dimensional multivariate normal distribution for the N input data points ({{{x}_{i}}}_{i = 1}^{N}). Let us denote (X={({x}_{1},ldots ,{x}_{N})}^{top }) and (f={(f({x}_{1}),ldots ,f({x}_{N}))}^{top }), a GP is formally written as

$$f sim {{{{{{{mathcal{N}}}}}}}}(m(X),k(X,X)),$$

where m() denotes the mean function and k(,) denotes the kernel function [11]. The simplest kernel function would be the linear kernel, such that k(X, X)=2XX, while the automatic relevance determination squared exponential (ARD-SE) kernel is defined as

$$k({x}_{j},{x}_{k})={sigma }^{2}exp left[-{sum }_{q=1}^{Q}frac{{({x}_{jq}-{x}_{kq})}^{2}}{2{rho }_{q}}right]$$

for the (j, k) element of k(X, X), where ({x}_{j},{x}_{k}in {{mathbb{R}}}^{Q}) are Q-dimensional input vectors. Here 2 is the kernel variance parameter and (rho ={({rho }_{1},ldots ,{rho }_{Q})}^{top }) is the vector of characteristic length scales, whose inverse determines the relevance of each element of the input vector. Typically, the mean function is defined as m(X)=0.

Because the GP yielding f(x) has various useful properties inherited from the normal distribution, GP can be used to estimate a nonlinear function f(X) from output data (y={({y}_{1},ldots ,{y}_{N})}^{top }) along continuous factor X. The extended linear model y=f(X)+ is referred to as the GP regression and widely used in the machine learning framework [12]. This model can be used to map dynamic genetic associations for normalized gene expression or other common complex quantitative traits (e.g., human height) along the continuous factor x (e.g., cellular states or donors age). Let us denote the genotype vector (g={({g}_{1},ldots ,{g}_{N})}^{top }) and the kinship matrix R among N individuals, the mapping model, as proposed by us or others [8, 10] can be expressed as follows:

$$y=alpha +beta odot g+gamma +varepsilon ,$$

(1)

where

$$alpha sim {{{{{{{mathcal{N}}}}}}}}(0,K),quad beta sim {{{{{{{mathcal{N}}}}}}}}(0,{delta }_{g}K),quad gamma sim {{{{{{{mathcal{N}}}}}}}}(0,{delta }_{d}Kodot R)$$

are all GPs with similar covariance matrices, where denotes element wise product between two vectors or matrices with the same dimensions, K=k(X, X) denotes the covariance matrix with a kernel function, and denotes the residuals. Intuitively, models the average baseline change of y in relation to x, while represents the dynamic genetic effect along x. The effect size is multiplied by the genotype vector g, indicating that the output yi varies between different genotype groups (gi {0, 1, 2}). In fact, the effect size (xi) is additive to the baseline (xi) at each xi, which is the same as the standard association mapping. Here statistical hypothesis testing is performed under the null hypothesis of g=0, as the strength of genetic association is determined by g.

It is important to note that the model (1) includes a correction term that accounts for the between-donor variation of dynamic changes along x, particularly when multiple data points are measured from the same donor or samples are taken from related donors. This term is essential for statistical calibration of the genetic effect , because other genetic associations scattered over the genome (trans effects) can confound the target genotype effect. Therefore, to adjust for the confounding effect, we need to include the extra GP , which is drawn from a normal distribution with the covariance matrix of K multiplied by the kinship matrix R.

Here, the kinship matrix is estimated by (hat{R}=sumnolimits_{l = 1}^{L}{tilde{g}}_{l}{tilde{g}}_{l}^{top }/L) using genome-wide variants gl(l=1, ,L), where ({tilde{g}}_{l}) is a standardized genotype vector (centered and scalced) based on the allele frequency at genetic variant l, while L denotes the total number of all variants across the genome [6]. The matrix is initially a NN dense matrix, but it can be simplified if donors are (sufficiently) unrelated. Let us introduce a design matrix of donor configuration, (Zin {{mathbb{R}}}^{Ntimes {N}_{d}}), for the Nd donors (i.e., zij=1 if the sample i is taken from the donor j; otherwise zij=0), the kinship matrix can then be approximated as R=ZZ. Thus, can be expressed as a linear combination of Nd independent GPs ({{gamma }_{j} sim {{{{{{{mathcal{N}}}}}}}}(0,{delta }_{d}K);j=1,ldots ,{N}_{d}}), such that (gamma =mathop{sum }nolimits_{j = 1}^{{N}_{d}}{gamma }_{j}odot {z}_{j}), where zj denotes the jth column vector of Z. This approximation is particularly useful for parameter estimation with large Nd (as discussed in section 2.4).

When the sample size N is large, an ordinary GP faces a severe scalability issue due to the dimension of the dense matrix K being NN, resulting in a total computational cost of ({{{{{{{mathcal{O}}}}}}}}({N}^{3})). As a result, the application of GP in the GWAS field is hindered, as the sample sizes often reach a million these days. However, there are several alternatives to approximate the full GP model, including Nystrm approximation (low-rank approximation), Projected Process approximation [13], Sparse Pseudo-inputs GP [14], Fully Independent Training Conditional approximation and Variational Free Energy approximation [15]. In this section, we introduce a sparse GP approximation proposed by [16].

The sparse GP is a scalable model using the technique of inducing points [14]. Since the computational cost of the sparse GP is ({{{{{{{mathcal{O}}}}}}}}(N{M}^{2})) with M inducing points, we can greatly reduce the computational cost, which is essentially linear to N under the assumption of MN. Let us denote M inducing points by (T={({t}_{1},ldots ,{t}_{M})}^{top }) and corresponding GPs by (u={(u({t}_{1}),ldots ,u({t}_{M}))}^{top }), the joint distribution of f and u becomes a multivariate normal distribution. Therefore a lower bound of the conditional distribution p(yu) can be written as

$$log p(y| u) = log int,p(y| f)p(f| u)dfge intleft[log p(y| f)right]p(f| u)df\ = log {{{{{{{mathcal{N}}}}}}}}(y| bar{f},{sigma }^{2}I)-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}}equiv {{{{{{{{mathcal{L}}}}}}}}}_{1},$$

where

$$bar{f}={K}_{NM}{K}_{MM}^{-1}u,quad {tilde{K}}_{NN}={K}_{NN}-{K}_{NM}{K}_{MM}^{-1}{K}_{MN},$$

and

$${K}_{NN}=k(X,X),quad {K}_{NM}=k(X,T),quad {K}_{MM}=k(T,T).$$

Therefore, the marginal distribution of the output y is approximated by

$$p(y) = int,p(y| u)p(u)duge intexp {{{{{{{{{mathcal{L}}}}}}}}}_{1}}p(u)du\ = log {{{{{{{mathcal{N}}}}}}}}(y| 0,V)-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}}equiv exp {{{{{{{{{mathcal{L}}}}}}}}}_{2}},$$

where (V={sigma }^{2}I+{K}_{NM}{K}_{MM}^{-1}{K}_{MN}). The lower bound ({{{{{{{{mathcal{L}}}}}}}}}_{2}) is referred to as the Titsias bound and can be used for parameter estimation as well as statistical hypothesis testing.

Selecting the optimal number of inducing points M and their coordinates is crucial for accurately approximating a GP. Although a larger value of M provides a better approximation of GP, it is not feasible to increase M when N reaches hundreds of thousands in large-scale genetic association studies. Additionally, the accuracy of the GP is influenced by the complexity of nonlinearity of y and the dimension Q of input points x. There are few approaches inferring an optimal value of M from data [17], but the size of the example used in the study is too small (48 genes437 samples) to be applied to real-world data. However, it is worth noting that the optimal coordinate of inducing points with a fixed M can be easily learned from data, as described in the next section.

Genetic association mapping involves performing tens of millions of hypothesis tests. Therefore, it is almost impossible to estimate the parameters of GPs from each pair of trait and variant across the genome, even with use of the sparse approximation mentioned in the last subsection. Furthermore, both the baseline and the correction term share the characteristic length parameter (rho ={({rho }_{1},ldots ,{rho }_{Q})}^{top }) and the inducing points T. This can lead to unstable optimization and prolonged parameter estimation times. To address this issue, we have previously proposed a three-step parameter estimation strategy for performing the statistical hypothesis testing [10]. Especially, optimizing with respect to using a quasi-Newton approach (such as the BFGS method) is sufficient in the first step, because the variance explained by is typically much smaller than that explained by . The three steps are:

y=+ (baseline model: H0) to estimate and T.

y=++ (baseline model: H1) to estimate variance parameters d and 2. Here (hat{rho }) and (hat{T}) estimated in H0 are plugged into H1.

y=+g++ (full model: H2) to test whether g=0. Here ({hat{rho },hat{T},{hat{delta }}_{d},{hat{sigma }}^{2}}) estimated in H0 and H1 are used.

Here the Titsias bounds for these models are given by

$${{{{{{{{mathcal{L}}}}}}}}}_{2}^{h}=left{begin{array}{ll}log {{{{{{{mathcal{N}}}}}}}}(y| 0,V)-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}},hfill &h={H}_{0},\ log {{{{{{{mathcal{N}}}}}}}}(y| 0,{V}_{d})-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{(1+{delta }_{d}){tilde{K}}_{NN}},hfill&h={H}_{1},\ log {{{{{{{mathcal{N}}}}}}}}(y| 0,{V}_{g})-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{(1+{delta }_{d}){tilde{K}}_{NN}+{delta }_{g}G{tilde{K}}_{NN}G},&h={H}_{2},end{array}right.$$

where

$${V}_{d}=V+{delta }_{d}({K}_{NM}{K}_{MM}^{-1}{K}_{MN})odot R,quad {V}_{g}={V}_{d}+{delta }_{g}G{K}_{NM}{K}_{MM}^{-1}{K}_{MN}G,$$

and G=diag(g) denotes the diagonal matrix whose diagonal elements are given by the elements of g. The estimators (hat{rho }) and (hat{T}) are obtained by maximizing ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{0}}) with respect to and T, and ({hat{delta }}_{d}) and ({hat{sigma }}^{2}) are obtained by maximizing ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}) with respect to d and 2 given (hat{rho }) and (hat{T}).

It is worth noting that, when the kinship matrix R can be expressed as R=ZZ with a lower rank matrix (Z=({z}_{1},ldots ,{z}_{{N}_{d}})) with Nd

$${V}_{d}=V+{delta }_{d}left({K}_{NM}{K}_{MM}^{-1}{K}_{MN}right)odot (Z{Z}^{top })={sigma }^{2}I+A{B}^{-1}{A}^{top },$$

where

$$A = , (C,{{{{{{{rm{diag}}}}}}}}({z}_{1})C,ldots ,{{{{{{{rm{diag}}}}}}}}({z}_{D})C),quad \ B = , {{{{{{{rm{diag}}}}}}}}({K}_{MM},{delta }_{d}{K}_{MM},ldots ,{delta }_{d}{K}_{MM}),$$

and (C={K}_{NM}{K}_{MM}^{-1}), and B becomes a M(Nd+1)M(Nd+1) block diagonal matrix. Since the computational complexity of H1 or H2 is ({{{{{{{mathcal{O}}}}}}}}({N}_{d}^{2}{M}^{2}N)), for large Nd such as MNd>N, the total complexity is over ({{{{{{{mathcal{O}}}}}}}}({N}^{3})) and we again face the scalability issue.

However, if the donors in the data are unrelated, we can significantly reduce the memory usage and the computational burden to be ({{{{{{{mathcal{O}}}}}}}}({N}_{d}{M}^{2}N)). This is because the matrix A becomes a sparse matrix, with ({z}_{i}^{top }{z}_{{i}^{{prime} }}=0) for (ine {i}^{{prime} }), resulting in NM(Nd1) elements out of NMNd bing 0. Additionaly, non-zero elements of A are repeated and identical to the elements of C, and the block diagonal element of B is essentially ({K}_{MM}^{-1}).

To perform GWAS with GP, it is crucial to reduce the computational time required to map a genetic association for each variant. The Score statistic to test g=0 can be computed from the first derivative of ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{2}}) with respect to g, and the variance parameters ({{hat{sigma }}^{2},{hat{delta }}_{d}}) of Vd are estimated from ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}) once for every single variant to be tested. Therefore, it is ideal to test tens of millions of variants independently. To use the fact that the first derivative of ({V}_{g}^{-1}) given g=0 depends only on Vd, such that

$${left.frac{partial {V}_{g}^{-1}}{partial {delta }_{g}}rightvert }_{{delta }_{g} = 0}=-{V}_{d}^{-1}G{K}_{NM}{K}_{MM}^{-1}{K}_{MN}G{V}_{d}^{-1},$$

the Score statistic can be explicitly written as

$$S={y}^{top }{hat{V}}_{d}^{-1}G{K}_{NM}{K}_{MM}^{-1}{K}_{MN}G{hat{V}}_{d}^{-1}y,$$

(2)

whose distribution is the generalized 2 distribution, that is, the distribution of the weighted sum of M independent 2 statistics, such as (mathop{sum }nolimits_{m = 1}^{M}{lambda }_{m}{chi }_{m}^{2}) [8, 10]. It is known that the weights m(m=1, , M) are given by the non-negative eigenvalues of

$${K}_{MM}^{-1/2}{K}_{MN}G{hat{V}}_{d}^{-1}G{K}_{NM}{K}_{MM}^{-top /2},$$

where ({K}_{MM}^{-1/2}) can be computed using the Cholesky decomposition of ({K}_{MM}={K}_{MM}^{top /2}{K}_{MM}^{1/2}).

To compute the p-value from S, we can use the Davies exact method, implemented in the CompQuadForm package on R. Note that, if we use a linear kernel, S can be simplified as described [8]. Although the Score based approach is an easy and quick solution for genome-wide mapping, to check the asymptotic behavior and the statistical calibration of the Score statistics, we should use a QQ-plot to verify that the p-values obtained from multiple variants follow a uniform distribution under the null hypothesis.

If the collocalisation analysis [18] or Bayesian hierarchical model [19] is considered as a downstream analysis using the test statistics, a Bayes factor can also be computed using the Titsias bounds, such as

$$log (BF)={{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{2}}-{{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}.$$

Here we would use some empirical values g={0.01, 0.1, 0.5} to average the Bayes factor, instead of integrating out g from ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{2}}) [20].

In a real genetic association mapping, most of genetic associations are indeed static and ubiquitous over the factor x. To capture such a static association, we can come up with the following model

$$y={alpha }_{0}{1}_{N}+alpha +{beta }_{0}g+beta odot g+{gamma }_{0}+gamma +varepsilon ,$$

where 0 denotes the intercept, 1N denotes the N-dimensional vector of all 1s, 0 denotes the effect size of the static genetic association, and ({gamma }_{0} sim {{{{{{{mathcal{N}}}}}}}}(0,{sigma }^{2}{delta }_{d0}R)) denotes the donor variation which confounds 0. For instance, in [8], the static genetic association 0 is modeled as a fixed effect, and the dynamic effect is tested using the Score statistic. On the other hand, in [10], the authors modeled both the static and dynamic associations as a random effect to test via a Bayes factor. In this case, the covariance matrix K can be rewritten as

$${K}^{* }={sigma }^{2}{e}^{-{rho }_{0}}{1}_{N}{1}_{N}^{top }+K$$

to estimate the model parameters in (1), and then the variance g=0 for is tested.

Note here that, the kernel parameter 0 is not necessarily common and shared across , and . Indeed, in [10], the authors estimated ({hat{rho }}_{0}^{alpha }) and ({hat{rho }}_{0}^{gamma }) independently in ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}). To compute the Score statistic, the authors assumed that ({hat{rho }}_{0}^{beta }={hat{rho }}_{0}^{gamma }) for and , because the ratio of the static effect to the dynamic effect can be the same for cis and trans genetic effects.

In longitudinal studies, the factor x is typically observed explicitly (e.g., donors age or physical locations where samples were taken). This makes it straightforward to perform genetic association mapping along x using the Score statistics or Bayes factors, as described above. However, this is not often the case for the molecular studies, and therefore we need to estimate the underlying biological state from the data.

In single-cell biology, typically, the hidden cellular state x is often referred to as pseudotime", and the principal component analysis is normally used to estimate it as part of dimension reduction [21]. Gaussian process latent variable model (GPLVM) is a strong alternative to extract the pseudotime when the molecular phenotype gradually changes along pseudotime x in a nonlinear fashion [22, 23].

We have also proposed a GPLVM that uses the baseline model H0 to estimate the latent variable X from the single-cell RNA-seq data (see Section 3 for more details). Let Y=(y1,,yJ) be the gene expression matrix of J genes, whose column is a vector of gene expression for the gene j, the Titsias lower bound of the GPLVM based on the baseline model H0 can be written as

$$p(Y| X)ge {{{{{{{mathcal{MN}}}}}}}}(Y| 0,Sigma ,I+{K}_{NM}{K}_{MM}^{-1}{K}_{MN})-frac{J}{2}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}}={{{{{{{{mathcal{L}}}}}}}}}_{2}.$$

To obtain the optimal cellular state (hat{X}), this lower bound can be maximized with respect to {, X, T, } [10, 24]. Here (Sigma ={{{{{{{rm{diag}}}}}}}}({sigma }_{j}^{2};j=1,ldots ,J)) denotes the residual variance parameters of J genes, and ({{{{{{{mathcal{MN}}}}}}}}(cdot )) denotes the matrix normal distribution. Due to the uniqueness of the model parameters, the variance parameter in the kernel function is set to be 2=1. In addition, to maintain the uniqueness of the latent variable estimation, a prior probability on X is required. It is quite common to assume independent standard normal distributions for each of the elements of (X sim {{{{{{{mathcal{MN}}}}}}}}(0,I,I)) [24], although there are multiple alternatives to consider depending on the nature of the modeled data [10, 23].

In the parameter estimation, the limited-memory BFGS method can be used to implement GPLVM for large N. In addition, the stochastic variational Bayes approach can be used to fit GPLVM to larger data sets, while reducing the fitting time [25,26,27].

For the non-Gaussian output y, the Titsias bound ({{{{{{{{mathcal{L}}}}}}}}}_{2}) is not analytically available. However, for the Poisson distribution case, a lower bound of the conditional probability p(yu) can be computed as follows:

$${{{{{{{{mathcal{L}}}}}}}}}_{1}=mathop{sum}_{i}left[-log ({y}_{i}!)+{y}_{i}{bar{f}}_{i}-exp left({bar{f}}_{i}+frac{{tilde{k}}_{ii}}{2}right)right],$$

where ({tilde{k}}_{ii}) denotes the ith diagonal element of ({tilde{K}}_{NN}). Let i and wi be the working response and the iterative weight of GLM for the ith sample, such that

$${nu }_{i}={bar{f}}_{i}+({y}_{i}-{w}_{i})/{w}_{i}quad {{{{{{{rm{and}}}}}}}}quad {w}_{i}=exp left({bar{f}}_{i}+frac{{tilde{k}}_{ii}}{2}right)$$

for i=1, , N, the optimal (hat{u}) which maximizes (exp {{{{{{{{{mathcal{L}}}}}}}}}_{1}}p(u)) satisfies

$$left({K}_{MM}^{-1}+{K}_{MM}^{-1}{K}_{MN}W{K}_{NM}{K}_{MM}^{-1}right)u=Wnu ,$$

(3)

where W=diag(wi; i=1, , N), which suggests

$$nu | u sim {{{{{{{mathcal{N}}}}}}}}(bar{f},{W}^{-1})$$

as described in elsewhere [28]. Therefore, we can maximize

$${{{{{{{{mathcal{L}}}}}}}}}_{2}={{{{{{{mathcal{N}}}}}}}}(nu | 0,{W}^{-1}+{K}_{NM}{K}_{MM}^{-1}{K}_{MN})$$

with respect to {2, } where (u=hat{u}) is iteratively updated as in (3). Thus, to obtain the Score statistic for non-Gaussian y, we replace y= and ({hat{V}}_{d}={W}^{-1}+A{B}^{-1}A) in (2).

For a binary output y, it is more complicated than the Poisson case, bacause it is even impossible to analytically compute the ({{{{{{{{mathcal{L}}}}}}}}}_{1}) bound with logit or Probit link function. For logit link function, several useful alternatives to the ({{{{{{{{mathcal{L}}}}}}}}}_{1}) bound have been proposed [29]. For Probit link function [30], proposed an approximation of ({{{{{{{{mathcal{L}}}}}}}}}_{1}) using the Gauss-Hermite quadrature. However, in both cases, the computational cost is much higher than the Poisson case and it is rather impractical to conduct a large genome-wide association mapping at this moment.

See original here:
Genetic association mapping leveraging Gaussian processes | Journal of Human Genetics - Nature.com

Posted in Human Genetics | Comments Off on Genetic association mapping leveraging Gaussian processes | Journal of Human Genetics – Nature.com

‘Fossil viruses’ embedded in the human genome linked to psychiatric disorders – Livescience.com

Posted: June 4, 2024 at 2:49 am

Ancient viral DNA embedded in the human genome may boost people's susceptibility to neuropsychiatric disorders, such as depression, bipolar disorder and schizophrenia.

A study published in May in the journal Nature Communications zoomed in on human endogenous retroviruses (HERVs) snippets of DNA that form approximately 8% of the modern human genome.

Psychiatric disorders tend to run in families, and studies of twins have also hinted that genetics plays a role in whether people develop them. Estimates suggest that schizophrenia and bipolar disorder may have a heritability as high as 80%, meaning most of the variability seen in these disorders comes down to differences in people's genetics.

Specific versions of genes, or gene variants, have been tied to these disorders, but not much is known about the influence of HERVs.

Related: Common cold virus may predate modern humans, ancient DNA hints

"We were fascinated by the concept that [HERVs] existed in the human genome and so much was not known about them," study co-author Timothy Powell, a neuroscientist and molecular geneticist at King's College London, told Live Science.

HERVs are bits of viruses that have been woven into the human genome over evolutionary time, with the oldest examples introduced to our ancestors over 1.2 million years ago. Some HERVs are known to be switched on in cancer cells, and they may contribute to the disease; others are active in healthy tissues or play important roles in early development, so they're not necessarily all bad. Some HERVs are even active in the brain, but it's not yet clear what they're up to.

Get the worlds most fascinating discoveries delivered straight to your inbox.

Previously, scientists have studied the role of HERVs in psychiatric disorders by comparing the genetic material of individuals without such disorders with that of people affected by a given disorder. A drawback of this method, however, is that it doesn't account for the influence of environmental factors or other conditions a person may have. This makes it difficult to say with certainty that a given stretch of DNA, in isolation, is strongly associated with the disorder.

The new study used a different approach to weigh the effects of thousands of HERVs. The researchers accessed genetic data from previous studies that involved tens of thousands of people, as well as from postmortem brain tissue samples collected from nearly 800 patients with and without psychiatric disorders. They then studied which gene variants different individuals carried, noting whether they seemed to affect nearby HERVs.

They found that specific gene variants were associated with a higher risk of three psychiatric disorders schizophrenia, depression and bipolar disorder. These variants also affected whether HERVs in the brain were "switched on" and to what degree.

"This [association] gives us much more certainty that the genetic differences we're seeing between cases and controls are more likely to be a true reflection of the biology of the disorder," Rodrigo Duarte, a research fellow at King's College London, told Live Science.

The team is the first to identify five new HERVs strongly tied to psychiatric disorders. Two were associated with schizophrenia, one was common to schizophrenia and bipolar disorder, and one was specific to major depressive disorder. These five HERVs are distinct from any previously linked with each of the conditions.

"It is a major advancement," said Dr. Avindra Nath, clinical director at the National Institute of Neurological Disorders and Stroke who was not involved in the study. "The way that we've been studying all these other neurological diseases, we need to look at them again using their technique," Nath told Live Science.

The study suggests that these HERVs enhance the chances of developing the disorders, but at this point, not much can be said for how much these genetic snippets boost an individual person's risk. Carrying one of the HERVs doesn't necessarily guarantee a person will be affected by the linked disorder.

Going forward, the group plans to manipulate HERV activity in brain cells in lab dishes to see whether they affect the way the neurons grow and form connections.

"From a genetic standpoint, it's an advancement of the field," Nath said. "But from a pathogenesis standpoint, much remains to be answered" about how the HERVs actually contribute to disease.

Ever wonder why some people build muscle more easily than others or why freckles come out in the sun? Send us your questions about how the human body works to community@livescience.com with the subject line "Health Desk Q," and you may see your question answered on the website!

Here is the original post:
'Fossil viruses' embedded in the human genome linked to psychiatric disorders - Livescience.com

Posted in Human Genetics | Comments Off on ‘Fossil viruses’ embedded in the human genome linked to psychiatric disorders – Livescience.com

Improved functional mapping of complex trait heritability with GSA-MiXeR implicates biologically specific gene sets – Nature.com

Posted: June 4, 2024 at 2:49 am

Sullivan, P. F. & Geschwind, D. H. Defining the genetic, genomic, cellular, and diagnostic architectures of psychiatric disorders. Cell 177, 162183 (2019).

Article CAS PubMed PubMed Central Google Scholar

de Leeuw, C. A., Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353364 (2016).

Article PubMed Google Scholar

Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545 (2005).

Article CAS PubMed PubMed Central Google Scholar

Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 2529 (2000).

Article CAS PubMed PubMed Central Google Scholar

Koopmans, F. et al. SynGO: an evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217234.e4 (2019).

Article CAS PubMed PubMed Central Google Scholar

Hill, W. D. et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol. Psychiatry 24, 169181 (2019).

Article CAS PubMed Google Scholar

Howard, D. M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343352 (2019).

Article CAS PubMed PubMed Central Google Scholar

Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502508 (2022).

Article CAS PubMed PubMed Central Google Scholar

de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

Article PubMed PubMed Central Google Scholar

Simillion, C., Liechti, R., Lischer, H. E. L., Ioannidis, V. & Bruggmann, R. Avoiding the pitfalls of gene set enrichment analysis with SetRank. BMC Bioinform. 18, 151 (2017).

Article Google Scholar

Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 12281235 (2015).

Article CAS PubMed PubMed Central Google Scholar

Goeman, J. J. & Bhlmann, P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980987 (2007).

Article CAS PubMed Google Scholar

Tashman, K. C., Cui, R., OConnor, L. J., Neale, B. M. & Finucane, H. K. Significance testing for small annotations in stratified LD-Score regression. Preprint at medRxiv https://doi.org/10.1101/2021.03.13.21249938 (2021).

Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986992 (2017).

Article CAS PubMed PubMed Central Google Scholar

Zabad, S., Ragsdale, A. P., Sun, R., Li, Y. & Gravel, S. Assumptions about frequency-dependent architectures of complex traits bias measures of functional enrichment. Genet. Epidemiol. 45, 621632 (2021).

Article CAS PubMed Google Scholar

Frei, O. et al. Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation. Nat. Commun. 10, 2417 (2019).

Article PubMed PubMed Central Google Scholar

Holland, D. et al. Beyond SNP heritability: polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model. PLoS Genet. 16, e1008612 (2020).

Article CAS PubMed PubMed Central Google Scholar

Shadrin, A. A. et al. Phenotype-specific differences in polygenicity and effect size distribution across functional annotation categories revealed by AI-MiXeR. Bioinformatics 36, 47494756 (2020).

Article CAS PubMed PubMed Central Google Scholar

Holland, D. et al. The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity. Genetics 217, iyaa046 (2021).

Article PubMed PubMed Central Google Scholar

Kingma, D.P. & Ba, J. L. Adam: a method for stochastic optimization. arXiv (2014).

Chen, J. et al. The trans-ancestral genomic architecture of glycemic traits. Nat. Genet. 53, 840860 (2021).

Article CAS PubMed PubMed Central Google Scholar

Clarke, T. K. et al. Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank (N=112117). Mol. Psychiatry 22, 13761384 (2017).

Article CAS PubMed PubMed Central Google Scholar

de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256261 (2017).

Article PubMed PubMed Central Google Scholar

Evangelou, E. et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 14121425 (2018).

Article CAS PubMed PubMed Central Google Scholar

Hautakangas, H. et al. Genome-wide analysis of 102,084 migraine cases identifies 123 risk loci and subtype-specific risk alleles. Nat. Genet. 54, 152160 (2022).

Article CAS PubMed PubMed Central Google Scholar

Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 15051513 (2018).

Article CAS PubMed PubMed Central Google Scholar

Mishra, A. et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature 611, 115123 (2022).

Article CAS PubMed PubMed Central Google Scholar

Okbay, A. et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 54, 437449 (2022).

Article CAS PubMed PubMed Central Google Scholar

Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912919 (2018).

Article CAS PubMed PubMed Central Google Scholar

Shah, S. et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat. Commun. 11, 163 (2020).

Article CAS PubMed PubMed Central Google Scholar

The, C.-H.G.I. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. 28, 715718 (2020).

Article Google Scholar

Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 13391348 (2019).

Article CAS PubMed Google Scholar

Wightman, D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimers disease. Nat. Genet. 53, 12761282 (2021).

Article CAS PubMed PubMed Central Google Scholar

Wuttke, M. et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat. Genet. 51, 957972 (2019).

Article CAS PubMed PubMed Central Google Scholar

Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~ 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 36413649 (2018).

Article CAS PubMed PubMed Central Google Scholar

Smeland, O. B., Frei, O., Dale, A. M. & Andreassen, O. A. The polygenic architecture of schizophreniarethinking pathogenesis and nosology. Nat. Rev. Neurol. 16, 366379 (2020).

Article PubMed Google Scholar

Nakazawa, K. et al. GABAergic interneuron origin of schizophrenia pathophysiology. Neuropharmacology 62, 15741583 (2012).

Article CAS PubMed Google Scholar

Stedehouder, J. & Kushner, S. A. Myelination of parvalbumin interneurons: a parsimonious locus of pathophysiological convergence in schizophrenia. Mol. Psychiatry 22, 412 (2017).

Article CAS PubMed Google Scholar

Berrandou, T.-E., Balding, D. & Speed, D. LDAK-GBAT: fast and powerful gene-based association testing using summary statistics. Am. J. Hum. Genet. 110, 2329 (2023).

Article CAS PubMed Google Scholar

Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 14211427 (2017).

Article CAS PubMed PubMed Central Google Scholar

Moon, A. L., Haan, N., Wilkinson, L. S., Thomas, K. L. & Hall, J. CACNA1C: association with psychiatric disorders, behavior, and neurogenesis. Schizophr. Bull. 44, 958965 (2018).

Article PubMed PubMed Central Google Scholar

Singh, T. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature 604, 509516 (2022).

Article CAS PubMed PubMed Central Google Scholar

Howes, O. D. & Kapur, S. The dopamine hypothesis of schizophrenia: version IIIthe final common pathway. Schizophr. Bull. 35, 549562 (2009).

Article PubMed PubMed Central Google Scholar

Fusar-Poli, P. & Meyer-Lindenberg, A. Striatal presynaptic dopamine in schizophrenia, part II: meta-analysis of [18F/11C]-DOPA PET studies. Schizophr. Bull. 39, 3342 (2013).

Article PubMed Google Scholar

Huhn, M. et al. Comparative efficacy and tolerability of 32 oral antipsychotics for the acute treatment of adults with multi-episode schizophrenia: a systematic review and network meta-analysis. Lancet 394, 939951 (2019).

Article CAS PubMed PubMed Central Google Scholar

Harrison, P. J. Schizophrenia susceptibility genes and neurodevelopment. Biol. Psychiatry 61, 11191120 (2007).

Article PubMed Google Scholar

Burch, K. S. et al. Partitioning gene-level contributions to complex-trait heritability by allele frequency identifies disease-relevant genes. Am. J. Hum. Genet. 109, 692709 (2022).

Article CAS PubMed PubMed Central Google Scholar

Yao, D. W., OConnor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626633 (2020).

Article CAS PubMed PubMed Central Google Scholar

Siewert-Rocks, K. M., Kim, S. S., Yao, D. W., Shi, H. & Price, A. L. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am. J. Hum. Genet. 109, 393404 (2022).

Article CAS PubMed PubMed Central Google Scholar

Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245252 (2016).

Article CAS PubMed PubMed Central Google Scholar

Zhu, X. & Stephens, M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 9, 4361 (2018).

Article PubMed PubMed Central Google Scholar

Read the original here:
Improved functional mapping of complex trait heritability with GSA-MiXeR implicates biologically specific gene sets - Nature.com

Posted in Human Genetics | Comments Off on Improved functional mapping of complex trait heritability with GSA-MiXeR implicates biologically specific gene sets – Nature.com

1st draft of a human ‘pangenome’ published, adding millions of …

Posted: May 17, 2023 at 12:13 am

Scientists have published the first human "pangenome" a full genetic sequence that incorporates genomes from not just one individual, but 47.

These 47 individuals hail from around the globe and thus vastly increase the diversity of the genomes represented in the sequence, compared to the previous full human genome sequence that scientists use as their reference for study. The first human genome sequence was released with some gaps in 2003 and only made "gapless" in 2022. If that first human genome is a simple linear string of genetic code, the new pangenome is a series of branching paths.

The ultimate goal of the Human Pangenome Reference Consortium, which published the first draft of the pangenome on Wednesday (May 10) in the journal Nature (opens in new tab), is to sequence at least 350 individuals from different populations around the world. Although 99.9% of the genome is the same from person to person, there is a lot of diversity found in that final 0.1%.

"Rather than using a single genome sequence as our coordinate system, we should instead have a representation that is based on the genomes of many different people so we can better capture genetic diversity in humans," Melissa Gymrek (opens in new tab), a genetics researcher at the University of California, San Diego, who was not involved in the project, told Live Science.

Related: More than 150 'made-from-scratch' genes are in the human genome. 2 are totally unique to us.

The first full human genome sequence was completed in 2003 by the Human Genome Project and was based on one person's DNA. Later, bits and pieces from about 20 other individuals were added, but 70% of the sequence scientists use to benchmark genetic variation still comes from a single person.

Geneticists use the reference genome as a guide when sequencing pieces of people's genetic codes, Arya Massarat (opens in new tab), a doctoral student in Gymrek's lab who co-authored an editorial about the new research with her in the journal Nature, told Live Science. They match the newly decoded DNA snippets to the reference to figure out how they fit within the genome as a whole. They also use the reference genome as a standard to pinpoint genetic variations different versions of genes that diverge from the reference that might be linked with health conditions.

But with a single reference mostly from one person, scientists have only a limited window of genetic diversity to study.

The first pangenome draft now doubles the number of large genome variants, known as structural variants, that scientists can detect, bringing them up to 18,000. These are places in the genome where large chunks have been deleted, inserted or rearranged. The new draft also adds 119 million new base pairs, meaning the paired "letters" that make up the DNA sequence, and 1,115 new gene duplication mutations to the previous version of the human genome.

"It really is understanding and cataloging these differences between genomes that allow us to understand how cells operate and their biology and how they function, as well as understanding genetic differences and how they contribute to understanding human disease," study co-author Karen Miga (opens in new tab), a geneticist at the University of California, Santa Cruz, said at a press conference held May 9.

The pangenome could help scientists get a better grasp of complex conditions in which genes play an influential role, such as autism, schizophrenia, immune disorders and coronary heart disease, researchers involved with the study said at the press conference.

For example, the Lipoprotein A gene is known to be one of the biggest risk factors for coronary heart disease in African Americans, but the specific genetic changes involved are complex and poorly understood, study co-author Evan Eichler (opens in new tab), a genomics researcher at the University of Washington in Seattle, told reporters. With the pangenome, researchers can now more thoroughly compare the variation in people with heart disease and without, and this could help clarify individuals' risk of heart disease based on what variants of the gene they carry.

Related: As little as 1.5% of our genome is 'uniquely human'

The current pangenome draft used data from participants in the 1000 Genomes Project, which was the first attempt to sequence genomes from a large number of people from around the world. The included participants had agreed for their genetic sequences to be anonymized and included in publicly available databases.

The new study also used advanced sequencing technology called "long-read sequencing," as opposed to the short-read sequencing that came before. Short-read sequencing is what happens when you send your DNA to a company like 23andMe, Eichler said. Researchers read out small segments of DNA and then stitch them together into a whole. This kind of sequencing can capture a decent amount of genetic variation, but there can be poor overlap between each DNA fragment. Long-read sequencing, on the other hand, captures big segments of DNA all at once.

While it's possible to sequence a genome with short-read sequencing for about $500, long-read sequencing is still expensive, costing about $10,000 a genome, Eichler said. The price is coming down, however, and the pangenome team hopes to sequence their next batches of genomes at half that cost or less.

The researchers are working to recruit new participants to continue to fill in diversity gaps in the pangenome, study co-author Eimear Kenny (opens in new tab), a professor of medicine and genetics at the Institute for Genomic Health at Icahn School of Medicine at Mount Sinai in New York City, told reporters. Because genetic information is sensitive and because different rules govern data-sharing and privacy in different countries, this is delicate work. Issues include privacy, informed consent, and the possibility of discrimination based on genetic information, Kenny said.

Already, researchers are uncovering new genetic processes with the draft pangenome. In two papers published in Nature alongside the work, researchers looked at highly repetitive segments of the genome. These segments have traditionally been difficult to study, biochemist Brian McStay (opens in new tab) of the National University of Ireland Galway, told Live Science, because sequencing them via short-read technology makes it hard to understand how they fit together. The long read technology allows for long chunks of these repetitive sequences to be read at once.

The studies found that in one type of repetitive sequence (opens in new tab), known as segmental duplications, there is a larger than expected amount of variation, potentially a mechanism for the long-term evolution of new functions for genes. In another type of repetitive sequence (opens in new tab) that is responsible for building the cellular machines that create new proteins, though, the genome stays remarkably stable. The pangenome allowed researchers to discover a potential mechanism for how these key segments of DNA stay consistent over time.

"This is just the start," McStay said. "There will be a whole lot of new biology that will come out of this."

More:
1st draft of a human 'pangenome' published, adding millions of ...

Posted in Human Genetics | Comments Off on 1st draft of a human ‘pangenome’ published, adding millions of …

Page 11234..1020..»