Page 7«..6789..2030..»

Category Archives: Genetics

Social Determinants and Genetics Work in Tandem to Drive Disparities in Breast Cancer Care – OncLive

Posted: September 8, 2022 at 2:37 am

Adana A.M. Llanos, PhD, MPH, discusses key research on the social and biological factors that influence disparities in breast cancer, how these factors work in tandem to affect patient outcomes, and how this knowledge can be deployed in the real world.

Although genetic and biologic factors play a key role in outcomes for patients with breast cancer, social and structural determinants also have a place in the equation. Addressing these existing genetic factors and social determinants, thereby improving prevention and detection, can help reduce disparities in metastatic breast cancer, where Black women historically present more frequently at advanced stages and have worse outcomes, according to Adana A.M. Llanos, PhD, MPH.

In a presentation during the 2022 ASCO Annual Meeting, Llanos highlighted how understanding the genetic and etiological aspects of breast cancer can help inform prevention and screening techniques. The next challenge for physicians will be implementing these ideas into community practice to better serve underrepresented patient populations.1

Focusing on all of the different time points throughout [the cancer care] continuum will be a critical aspect of achieving more health equity and more equitable outcomes, Llanos said. This is a major goal throughout my long-term research interests.

In an interview with OncLive, Llanos discussed other key research on the social and biological factors that influence disparities in breast cancer, how these factors work in tandem to affect patient outcomes, and how this knowledge can be deployed in the real world. She is an associate professor of Epidemiology at the Mailman School of Public Health at Columbia University and an adjunct associate professor at Rutgers School of Public Health.

Llanos: My talk was part of a discussion involving 2 selected abstracts. The session included abstracts that focused on some of the racial, ethnic, and regional disparities in metastatic breast cancer.

The lead author for the first abstract is Sachi Singhal, MD, [who explored racial and regional disparities in metastatic breast cancer].2 The second abstract focused on the variation of genetic mutations, specifically pathogenic variants in breast cancer predisposition genes, and how they're related to triple-negative breast cancer [TNBC].3 That abstract was presented by Michael J. Hall, MD, MS.

In my talk, I provided an overview of some of the existing racial and ethnic disparities that we see in breast cancer in the United States, highlighting stage distribution and focusing on the fact that the distribution of tumor stage varies by race and ethnicity. We see high rates [of advanced stages of disease] among Black women. Moreover, I talked about some of the factors related to this disparity in advanced stage diagnosis by race.

I also presented the distribution of tumor subtypes, focusing on TNBC, which tend to be among the most aggressive forms of breast cancer. We see that the incidence of TNBC is substantially higher in Black women. [Our goal is to gain a greater] understanding of how some of the racial, ethnic, and ancestral [factors of breast cancer intersect and interact with] genetics and social determinants of health to contribute to these disparities and how they impact poor outcomes among some patient groups.

This session highlighted some of the facts that clinicians already know. The abstracts [by Drs Singhal and Hall] got at the biology [of breast cancer] and the social/structural determinants that could be related and working together to impact outcomes.

Im an epidemiologist, and Im interested in studying social and biological factors that contribute to disparities. One of the takeaways in my talk was how we can address breast cancer disparities at multiple phases in the cancer control continuum. This includes looking at etiology and biology, which is more focused on genetics and ancestry. These are things that we cannot change but understanding them better will give us a good sense of ways that we can address disparities.

Looking at prevention, we talk a lot about precision medicine and treatment. However, maybe we should be talking more about precision prevention, [which includes] detection and diagnosis. There are disparities [in these spaces], and [it is important to consider] how we can address those disparities.

Guideline-concordant treatment was not a major focus of my talk but understanding the genetics behind some of these disparities will contribute to improving the treatment options and guideline-concordant treatment for patients with breast cancer. Lastly, public health is important. [We need to learn] how to deploy all this knowledge to have a broader impact on patients and communities.

[Considering] the emerging data and studies around the genetics of TNBC, one of the limitations of a lot of the existing and past research is the historical underrepresentation of racial and ethnic minority groups, especially those of African ancestry. As new studies and larger studies are initiated, it is critical to have broad representation of diverse ancestral backgrounds to [allow investigators] to get a sense of some of the genetic and etiological differences, why they exist, and how [these factors] can impact treatment.

As important as biology and genetics are, we need to consider the social determinants and structural determinants across the entire cancer care continuum, not just for breast cancer, but for all cancers.

Read more from the original source:
Social Determinants and Genetics Work in Tandem to Drive Disparities in Breast Cancer Care - OncLive

Posted in Genetics | Comments Off on Social Determinants and Genetics Work in Tandem to Drive Disparities in Breast Cancer Care – OncLive

Ticking away in the back of my mind: what does it mean to know the risk embedded in your DNA? – The Guardian

Posted: September 8, 2022 at 2:37 am

Mortality has always been on Perry Jones mind, much more so than your average 20-something. Shes dealt with a number of challenging health conditions since her teens, so when her mother urged her to be screened for the BRCA1 variant and BRCA2 variant gene a couple of years ago (both of which indicate a high risk of breast and ovarian cancer) she didnt exactly jump at the chance.

Jones, who has type 1 diabetes, coeliac disease and spinal development issues, speaks about her dealings with the health system in the world-weary way of someone whos been in and out of waiting rooms her whole life.

Ive got the whole wazoo. So a part of me was like, Whats the likelihood that Im going to have another thing? Itll be fine. Theres no point.

But Jones mother insisted. After all, shed been diagnosed with breast cancer at the age of 40. Mum said its better to know than not to know. And if we know, then we can warn others in our family and we can look into better treatment methods for ourselves in future.

Eventually, Jones agreed to take the saliva test. And then I forgot about it. So when I did get that phone call, to tell me I had the (BRCA1) gene, I was like, Oh, youve got to be kidding me.

Jones results have the potential to save her life, but they have also irrevocably informed the way she views and plans for her future, regardless of whether she ever receives an eventual diagnosis. As technological advancements and decreasing costs make testing accessible to broader swathes of the population, what does it mean to know the risk embedded within our DNA?

Last month, Monash University launched DNA Screen, offering 10,000 people aged 18 to 40 secure, free DNA testing to identify risk of cancer and heart disease that can be prevented or treated early.

The study is a chance to gauge the public appetite for preventive genetic testing (as opposed to the current status quo of clinical criteria-based testing) and could help Australia become the first country to offer preventive DNA screening through a public healthcare system.

The appetite from people in this age bracket was overwhelming. The DNA Screen team initially aimed to contact young people across social media to spread the word. Instead, without social media promotion, the website reached their target of registering 10,000 people to do the at-home saliva tests in 24 hours.

The interest is enormous, says Jane Tiller, co-lead of the project and ethical, legal and social adviser for Public Health Genomics at Monash.

DNA Screen, which is partly funded by the federal government, is attempting to pilot and demonstrate the value of population-level screening in an effort to provide greater access to genomics for everyone, similar to the mass bowel and breast cancer screening the government already funds for older Australians. Historically, the costs of genetic testing have been prohibitive, which meant it was only available to people with a family or personal history of disease, but up to 90% of people at high risk are not identified by current family history-based testing.

Although there are many genes that could be studied, the researchers picked 10 gene variants because the conditions they can lead to are medically actionable and there are already preventive measures for them hereditary ovarian and breast cancer, Lynch syndrome and familial hypercholesterolaemia (which increases the likelihood of having coronary heart disease at a younger age).

Those found to be at high risk after DNA testing expected to be about one in 75 will have their situation explained by experts and be offered genetic counselling and prevention measures, such as regular scans and check-ups. Given the stats, roughly 130 people from the study are likely to be found to be high risk. But what does it mean to scale up genetic screening and introduce mass preventive testing into any health system?

Bringing genetic screening into public health has huge promise if we use it wisely, says Prof Ainsley Newson, professor of bioethics at the University of Sydney. But there are questions to consider. For health problems where there isnt a good way to find and diagnose people, can genetics help? If a gene test exists, is it reliable in diverse populations? Does it only detect what we want to know, and nothing else? Is the health system ready to support those who are identified as at higher risk? Is there something people can do with the information it generates, and is there evidence that they will take that action?

Tiller and her co-leads have considered those same questions. If we were to test the whole of Australia tomorrow that would likely identify a number of people that may start to create a strain on a service that may not be resourced to deal with that many people, she says.

But we cant pretend that just not screening is the answer to protect the resources of the health system, because people who are at risk and develop cancer and need care will eventually need that system. And its far better to front-load your preventive care and keep people healthy and well.

The response for the DNA Screen study indicates there is widespread demand for this information beyond people such as Jones with family histories. It is powerful and heavy knowledge. Who seeks this information out?

Its a mix of people who are very big on preventive health who see that connection between finding out information now and being able to do something about it and then people who are just curious, says Tiller. Weve seen a huge increase in ancestry testing in recent years and people being interested to see whats in their genes.

Therell always be people who say, Im not interested in that. I would be too worried. I wouldnt want to know. And thats completely a personal choice.

Sign up to Guardian Australia's Morning Mail

Our Australian morning briefing email breaks down the key national and international stories of the day and why they matter

Communicating what the results could mean is a vital first step. Tiller says they want to ensure people understand that finding a gene is not a diagnosis of a condition and that not finding a gene doesnt mean they wont ever get cancer or heart disease.

This isnt about fear-mongering we really want to say to people, If you would like to know about this, this can empower you to take preventive steps for your own health.

So what does it mean for a young person to take on that information, to shape their hypothetical future with knowledge that wasnt available to any of us just a few years ago?

For the one in 75 people who are found to be high risk, it can of course be distressing, says Tiller. Theres a lot of support thats required in the initial stages of giving people that information, giving them space to perhaps feel some distress, to grieve over what that might mean for them and to support them through the next steps of decision-making.

Every person reacts differently to what their results could mean for them and their family. For Jones, her results have meant a cascading series of future choices and consequences, all of which are hypothetical at this stage.

Protective surgery such as a double mastectomy was initially suggested, which Jones has thus far resisted. She was also told that she should consider having her ovaries removed as soon as possible. So that changed my view of my timeline for starting a family.

Jones is also acutely aware she could pass the gene on to future children. Shes single and is studying a bachelor of design that she loves. Shed like to travel after graduation, maybe land an internship, meet someone nice.

But concurrently, at the age of 28, she has already weighed up scenarios such as freezing her eggs (shes opted not to thus far); considered what shed do if an embryo tested positive for the variant (she would abort), considered the financial implications of IVF (shed rather conceive naturally, especially given she needs to save a deposit for a house); weighed up how shell tell a future partner about her genetic risk (shed be upfront); and worried about menopause and what it means for removing her ovaries (Im actually more worried about that than the cancer at the moment to be honest). Those possibilities are a lot to deal with, she says. Shes banking on her future self, future and more mature Perry, to be able to handle them.

The knowledge she carries with her doesnt keep me up every night, but its definitely something ticking away in the back of my mind.

But despite all these considerations, Jones is grateful for the opportunity to be tested.

Having the test gave me a sense of control, even if I cant control whether or not I develop cancer. Im in control of knowing about it. I know the risks and I know what steps I can take to capture it as soon as possible if it develops.

Two years on from receiving her results, Jones is philosophical about living with what she knows. Shes much more vigilant and shes made peace with having to endure extra tests.

She also reminds herself that theres a chance that she may never be diagnosed. I guess I just accept that its part and parcel of the body that allows me to live. So whatever it comes with, Im just going to have to deal with. And as much as I dont like carrying these genes, its better to be alive and have them than not at all. So Im still thankful for this meat cage that contains my consciousness.

Excerpt from:
Ticking away in the back of my mind: what does it mean to know the risk embedded in your DNA? - The Guardian

Posted in Genetics | Comments Off on Ticking away in the back of my mind: what does it mean to know the risk embedded in your DNA? – The Guardian

How are genetic technologies being applied to combat infectious diseases in aquaculture? – The Fish Site

Posted: September 8, 2022 at 2:37 am

The CrispResist, NoLice and GenoLice research team

Results from the research could help improve the ability of whole stocks of farmed fish and shellfish to avoid and fight off infectious diseases and parasitesNofima

Disease and parasitism cause major welfare, environmental and economic concerns for global aquaculture. A broad team of scientists has been assembled to examine the status and potential of technologies that exploit genetic variation in host resistance to tackle this problem.

In environments that contain high densities of animals or plants there is a high risk of contracting, propagating and spreading infectious disease. Diseases affecting fish and shellfish can lead to 100 percent mortality, necessitate complete destocking and/or severely affect fish welfare. Disease prevention and treatment are necessary, but current options are often costly, ineffective and can negatively impact animal welfare, local ecosystems and product quality. For example, biosecurity is particularly challenging when animals are farmed in an open water system, and logistical difficulties in handling makes it challenging to vaccinate and treat individual animals.

But is there a way that we could improve the ability of whole stocks of farmed fish and shellfish to avoid and fight off infectious diseases and parasites? To answer this question we need to study the improvement of host disease resistance which can be defined as the hosts ability to reduce pathogen invasion (ie limiting pathogen entry into target tissues and replication).

The team is reviewing genetic technologies that can be used to determine the mechanisms underlying host resistance to pathogens and parasitesNicholas Robinson, Nofima

In a new paper published in the latest issue of Reviews in Aquaculture we argue that there is an urgent need to improve understanding of the genetic mechanisms involved, leading to the development of tools that can be applied to boost host resistance and reduce the disease burden. Together with other experts on fish and shellfish breeding, genetics, genomics, proteomics, disease biology, immunology, feed technology, epidemiology, biochemistry, welfare, vaccine discovery, behaviour and gene editing we draw on two pressing global disease problems as case studies sea lice infestations in salmonids and white spot syndrome in shrimp.

We review how the latest genetic technologies can be capitalised upon to determine the mechanisms underlying host resistance to pathogens and parasites, and how the derived knowledge could be applied in ethical and efficient ways to boost disease resistance using selective breeding, gene editing, and/or with targeted feed treatments and vaccines.

Substantial research programmes are underway that aim to produce new knowledge that could be applied for boosting host resistance to eliminate or severely reduce infections by, for instance, sea lice in salmon and WSSV in shrimp.

These projects are utilising a suite of technologies that have been enabled by ultra-high throughput sequencing, such as single nuclei and spatial transcriptomics and single nucleotide polymorphism genome-wide association studies. Newly developed methodologies like in-vivo or in-vitro gene editing and functional testing hold great promise for helping to find and test genetic mechanisms affecting host resistance. These projects are also exploring the possibility of using genomic selection and gene editing with CRISPR-Cas9 to create host populations that will resist these diseases.

The implementation of these technologies needs to be carefully considered. Practical methods that will allow easy adoption, implementation and dissemination by aquaculture sectors are needed. Population genetic variability needs to be maintained, inbreeding limited and possibilities for the genetic improvement of other important traits must be ensured. Ethical concerns, particularly about the use of gene editing, need to be openly discussed and debated in public arenas, and thorough testing and safeguards (eg sterilisation) are needed to ensure that there are no negative consequences for the wild populations of these species or for the broader ecosystem.

The application of new genomic technologies and methodologies is expected to generate knowledge about genes that trigger a more effective immune response in some species or lines; the effect that could be realised by editing these genes in more susceptible species or lines; potential lice attractants, repellents and assays; and the extent of additive genetic variation affecting the production and release of important immune factors and semiochemicals.

Results from the research could help develop of feed additives, gene edits, new vaccines and enhance genomic breeding value estimation that promotes host resistanceNofima

Such knowledge could lead to the development of feed additives, gene edits, new vaccines and the enhancement of genomic breeding value estimation to promote host resistance. The epidemiological implications of these applications on the infectivity and virulence of aquatic diseases needs to be explored, and routines need to be devised to enhance the suppression of disease in the general aquaculture environment.

Such projects are ambitious in that it is hypothesised that specific semiochemical or immune pathways play major roles in differentiating disease-resistant from disease-susceptible hosts and that these differences are measurable, have a strong genetic basis, have implications for the epidemiology of infection and that genomic selection and/or gene editing approaches can be effectively and sustainably applied to reduce or eliminate the effect of disease on the host without counter-evolutionary responses by the infectious agent taking effect.

The long-term suppression of disease will only be realised through a collaborative and coordinated multi-disciplinary effort involving scientists working closely with the aquaculture industry and governments.

Success applying these genetic technologies to combat infectious disease has the potential to transform global aquaculture by greatly improving animal welfare and the sustainability of production.

Such efforts are likely to significantly advance our understanding of host-parasite and host-disease interactions and mechanisms affecting resistance to disease and should result in significant economic impacts for aquaculture sectors, benefit the welfare of production animals and create ecosystem benefits for natural populations of these species.

Application of genetic technologies and approaches has potential to improve fundamental knowledge of mechanisms affecting genetic resistance and provide effective pathways for implementation that could lead to more resistant aquaculture stocks. Large collaborative research efforts provide the best chance of achieving such goals. Success applying these genetic technologies to combat infectious disease has the potential to transform global aquaculture by greatly improving animal welfare and the sustainability of production.

Dr Robinson is also South-East Asia-Pacific contact at Nofima and Melbourne Enterprise Fellow in Aquaculture with the University of Melbourne. He has worked for 18 years with Nofima, focusing on the application of genomic technologies for the genetic improvement of aquatic species.

Read more:
How are genetic technologies being applied to combat infectious diseases in aquaculture? - The Fish Site

Posted in Genetics | Comments Off on How are genetic technologies being applied to combat infectious diseases in aquaculture? – The Fish Site

CCMB zeroes in on major genetic causes of male infertility – The Hans India

Posted: September 8, 2022 at 2:37 am

Hyderabad: CSIR-Centre for Cellular and Molecular Biology (CCMB), Hyderabad has been researching to understand the genetic causes of male infertility for the last two decades. As per the study, 38 per cent of males with infertility have specific regions missing or abnormalities in their chromosomes or mutations in their mitochondrial and autosomal genes.

CCMB's new multi-institutional study focuses on the cause of infertility in the rest of the cases, which constitutes the majority of infertility-affected men. The researchers have identified eight novel genes that were defective in these men in India. The study has been recently published online in the journal Human Molecular Genetics.

Dr Sudhakar Digumarthi, the lead author of the study, who was a Ph D student of CCMB and presently a scientist at ICMR-National Institute for Research in Reproductive and Child Health in Mumbai, said, "We first sequenced all the essential regions of all genes (around 30,000 of them) using next generation sequencing in 47 well-characterised infertile men. We then validated the identified genetic changes in about 1,500 infertile men from different parts of India."

Dr Thangaraj, lead investigator of this study and presently Director of the DBT-Centre for DNA Fingerprinting and Diagnostics, Hyderabad said, "We identified a total of eight genes (BRDT, CETN1, CATSPERD, GMCL1, SPATA6, TSSK4, TSKS and ZNF318), that were not known earlier for their role in human male fertility".

He further said that they have identified variations (mutations) in these genes that cause impaired sperm production leading to male infertility. The researchers have characterised a mutation in one of the eight genes, Centrin 1 (CETN1), to understand how the mutation affects sperm production. They demonstrated the impact of CETN1 mutation in cellular models and found that the mutation arrests cell division, causing insufficient sperm production.

This study should be a reminder to the society that half of infertility cases are due to problems in men. And many of them are due to genes that come from the parents, often mothers, of these men. It is wrong to assume a couple cannot bear children because of only the woman's fertility," remarked Dr Thangaraj.

Dr Vinay Kumar Nandicoori, Director, CCMB said, "The genetic causes established in this study can be used as potential diagnostic markers for male infertility and development of improved management strategies for male infertility".

Go here to see the original:
CCMB zeroes in on major genetic causes of male infertility - The Hans India

Posted in Genetics | Comments Off on CCMB zeroes in on major genetic causes of male infertility – The Hans India

In Brief This Week: Illumina, Interpace, Genetic Signatures, Guardant Health, More – GenomeWeb

Posted: September 8, 2022 at 2:37 am

NEW YORK Illumina said this week that it has opened its first manufacturing site in China. The Shanghai-based facility will initially produce 16 clinical sequencing reagents. In a statement, the firm said it plans to "achieve complete localized production for its gene sequencing instruments and consumables within the next five years."

Interpace Biosciences this week announced the closing of the sale of its Pharma Services business to Flagship Biosciences for an undisclosed amount. Parsippany, New Jersey-based Interpace will use the proceeds of the transaction for working capital requirements and investments to help drive the growth of its molecular diagnostics business. The company said the disposition of its pharma services business is expected to improve operating cash flow by nearly $5 million annually.

The American Society of Human Genetics said this week that the Illumina Corporate Foundation has awarded it a one-year, $175,000 grant to support the ASHG learning center.

The web portal offers scientists access to professional education videos, webinars, workshops, and other content. ASHG said it would use the grant to implement closed captioning across its live and on-demand content. Other details were not disclosed.

Genetic Signatures this week reported fiscal year 2022 revenues of A$35.4 million, a 25 percent increase from A$28.3 million in FY 2021. The growth was driven by demand for the firm's EasyScreen SARS-CoV-2 Detection Kit, although the company said in a statement that demand for other non-COVID-19 tests has increased. The company's net income for the full year was A$3.3 million, or A$2.11 per share, compared to A$1.8 million, or A$1.23 per share, in the previous year. The Australian firm had A$36.9 million in cash and cash equivalents at the end of the fiscal year.

OpGen has been granted a 180-day extension from the Listing Qualifications Department of Nasdaq to regain compliance with the exchange's minimum bid price requirement. If at any time until Feb. 27, 2023, the bid price for OpGen's common stock closes at or above $1.00 per share for a minimum of 10 consecutive trading days, the firm will regain compliance with the rule. The firm's share price hasn't closed at $1 or higher since mid-January.

Ochsner Health this week became the first healthcare system to incorporate Epic Systems' Orders and Results Anywhere integration with its genomic module. Physicians at New Orleans-based Ochsner, through the system's Precision Medicine Program, will now be able to order Tempus Health genomic tests for patients within the electronic health record system. Through the Epic EHRs, physicians can order genomic tests to identify actionable variants, in turn informing therapeutic decisions and clinical trial eligibility. In addition to Tempus, Epic has also partnered with Caris Life Sciences, Guardant Health, and Myriad Genetics to integrate biomarker testing into EHRs.

Guardant Health said this week that it has expanded its collaboration with Merck KGaA to further leverage the GuardantINFORM real-world evidence platform to help accelerate development efforts for the pharma firm's precision oncology pipeline. The expanded strategic collaboration will focus on therapy development for core cancer indications with significant unmet need.

Caris Life Sciences said this week that the Medical College of Wisconsin Cancer Center has joined its Precision Oncology Alliance, a growing network of leading cancer centers that collaborate to advance precision oncology and biomarker-driven research. MCW is the largest private research institution in Wisconsin, and its cancer center serves a distinct region that includes large, underserved populations of patients who experience significant disparities in cancer incidence and outcomes.

POA members gain access to a growing portfolio of biomarker-directed trials as well as Caris' CODEai, an industry-leading dataset with cancer treatment information and clinical outcomes data for over 275,000 patients.

The Malaysian Genomics Resource Centre Berhad said this week that it has signed a memorandum of understanding to explore opportunities for the distribution of biopharmaceutical and genomics products and services with Ajlan & Bros Medical Company. Under the MoU, the parties will explore the feasibility of Riyadh, Saudi Arabia-based Ajlan becoming a marketing and distribution representative for Malaysian Genomics for genetic screening tests, mesenchymal stem cell products, and exosome products. Ajlan will also identify commercial R&D opportunities for genome sequencing and analysis in the Middle East and North Africa region for areas such as agriculture, aquaculture, plantations, healthcare, and industrial biotechnology. In turn, Malaysian Genomics will analyze samples for genetic screening tests as well as provide Ajlan with genomic and bioinformatics expertise to bid for projects.

BioEcho Life Sciences, a Cologne, Germany-based biotech company specializing in nucleic acid extraction technology, has opened a US subsidiary in Boston. In a statement, BioEcho General Manager Lydia Willing noted that the company will provide an extensive portfolio of its products in the US and have the ability to work on specific customer needs around nucleic acid research.

In Brief This Week is a selection of news items that may be of interest to our readers but had not previously appeared on GenomeWeb.

Read more here:
In Brief This Week: Illumina, Interpace, Genetic Signatures, Guardant Health, More - GenomeWeb

Posted in Genetics | Comments Off on In Brief This Week: Illumina, Interpace, Genetic Signatures, Guardant Health, More – GenomeWeb

Atossa Genetics (ATOS) Atossa Therapeutics, Inc. to Attend the 24th Annual H.C. Wainwright Global Inves – Benzinga

Posted: September 8, 2022 at 2:37 am

SEATTLE, Sept. 07, 2022 (GLOBE NEWSWIRE) -- Atossa Therapeutics, Inc. ATOS, a clinical-stage biopharmaceutical company seeking to develop innovative medicines in areas of significant unmet medical need in oncology and infectious disease with a current focus on breast cancer and COVID-19, announced today that Kyle Guse, General Counsel & Chief Financial Officer will attend the 24th Annual H.C. Wainwright Global Investment Conference being held on September 12 14, 2022 at the Lotte New York Palace.

Mr. Guse will be available for one-on-one meetings. To request a meeting and to register for the conference, click below:

Annual Global Investor Conference

About Atossa Therapeutics

Atossa Therapeutics, Inc. is a clinical-stage biopharmaceutical company seeking to develop innovative medicines in areas of significant unmet medical need in oncology and infectious diseases with a current focus on breast cancer and COVID-19.

For more information, please visitwww.atossatherapeutics.com

Contact:

Atossa Therapeutics, Inc.Kyle Guse, General Counsel and Chief Financial Officerkyle.guse@atossainc.com

See the original post here:
Atossa Genetics (ATOS) Atossa Therapeutics, Inc. to Attend the 24th Annual H.C. Wainwright Global Inves - Benzinga

Posted in Genetics | Comments Off on Atossa Genetics (ATOS) Atossa Therapeutics, Inc. to Attend the 24th Annual H.C. Wainwright Global Inves – Benzinga

Genetics – National Institute of General Medical Sciences (NIGMS)

Posted: August 30, 2022 at 3:01 am

Why do scientists study the genes of other organisms?

All living things evolved from a common ancestor. Therefore, humans, animals, and other organisms share many of the same genes, and the molecules made from them function in similar ways.

Scientists have found many genes that have been preserved through millions of years of evolution and are present in a range of organisms living today. They can study these preserved genes and compare the genomes of different species to uncover similarities and differences that improve their understanding of how human genes function and are controlled. This knowledge helps researchers develop new strategies to treat and prevent human disease. Scientists also study the genes of bacteria, viruses, and fungi for solutions to prevent or treat infection. Increasingly, these studies are offering insight into how microbes on and in the body affect our health, sometimes in beneficial ways.

Increasingly sophisticated tools and techniques are allowing NIGMS-funded scientists to ask more precise questions about the genetic basis of biology. For example, theyre studying the factors that control when genes are active, the mechanisms DNA uses to repair broken or damaged segments, and the complex ways traits are passed to future generations. Another focus of exploration involves tracing genetic variation over time to detail human evolutionary history and to pinpoint the emergence of disease-related attributes. These areas of basic research will continue to build a strong foundation for more disease-targeted studies.

Read more from the original source:
Genetics - National Institute of General Medical Sciences (NIGMS)

Posted in Genetics | Comments Off on Genetics – National Institute of General Medical Sciences (NIGMS)

The genetics behind why some people get sicker with COVID-19 than others – ABC News

Posted: August 30, 2022 at 3:01 am

Norman Swan: One of the common questions that Tegan and I get about Covid is why there's so much variation in how people respond to the infection. One answer is in your genes, and there is a massive ongoing study into comparing people's genomes with how COVID-19 has affected them. Dr Gita Pathak is a team leader in what's called the COVID-19 Host Genetics Initiative. Gita is based at Yale University's School of Medicine in the United States.

Gita Pathak: Thank you for inviting me, I really appreciate it.

Norman Swan: So you're not mapping the virus here, you're mapping the people who were infected with the virus to see what happens to them and whether there are specific genes involved in their experience of the virus.

Gita Pathak: That is correct. The goal of the study is to understand human genetics response to the viral infection which we know as COVID-19. We wanted to look at three different outcomes of COVID-19, specifically people who were critically ill from Covid, then people who were hospitalised due to Covid, and people who tested positive for Covid, so the least severe of the three definitions, and which genes might be associated with these three outcomes.

Norman Swan: And how many genomes have you managed to test?

Gita Pathak: 60 studies from 25 countries, and that resulted in close to 3 million individuals' genetic profiles, and we found a total of 23 genes that show an association with COVID-19.

Norman Swan: So, let's take severity, and this is in a European population, by and large, a Caucasian population. Have you found any consistency in genes for severe disease?

Gita Pathak: Yes, so genetic ancestry is different than what someone may identify themselves as, like ethnically or geographically. Mostly we do have genetic ancestry of the European descent, but we also had people who are genetically South Asian, East Asian, African ancestry, and that separate from where they are geographically or what they identify as.

Norman Swan: So this is a bit like 23andMe or Ancestry.com where you send off your genes and you find out that you are 50% Greek and you didn't think you were 50% Greek.

Gita Pathak: Correct. When we are looking at genetic profiles, it's really important to adjust for genetic ancestry and not specifically for what somebody identifies as. Some genetic variation is more common in one ancestry over others, and if we include people from these diverse ancestries, we can pick up these signals much more quickly

Norman Swan: So, for example, it was said in the early part of the pandemic that people of South Asian origin had more severe disease and a higher risk of death. Did that pan out in your study?

Gita Pathak: We did find one of the genetic variants that was more common in South Asian populations relative to other populations, but that is just one variant. Genes tend to perform in a similar way across ancestries. They may vary based on their frequency in different ancestries, and that information helps us capture why one ancestry might be exhibiting a higher response or a softer response, but by and large all the genes we saw, they tend to have a similar effect across all ancestries.

Norman Swan: And what with these genes doing to increase your vulnerability to severe disease?

Gita Pathak: Some of the genes that we found were related to different lung functions. So, for example, we found something called SFTPD which is a lung surfactant protein, and it has already been known to be associated with different pulmonary functions, and there are other studies which have shown that this specific gene has been known with respiratory distress syndrome in different populations.

Norman Swan: And just to explain, surfactant is the fluid, if you like, that lines the tubes of your lungs and keeps them open, and it's what is deficient in premature babies, causing the respiratory disease of the premature baby. So, in other words, a deficiency of this in adults may predispose you, unsurprisingly, to severe disease. The question of course on everybody's lips now is why do some people not seem to catch COVID-19? There's a group of people who appear anecdotally to be resistant. Did you find COVID-19 resistance genes?

Gita Pathak: Not in our work. Depending on how we look at the variant, the varients we find are associated with the COVID-19 outcome, but if there are people who may be on the opposite spectrum of these, so let's say who are not carriers of this, they might be generally resistant to Covid but that specific study we haven't performed, but that's a good question for later.

Norman Swan: And just finally, any therapeutic insights that might direct people towards more effective medications to treat people who've got Covid, or prevent it getting worse?

Gita Pathak: One good thing that we understand from this work is that we now have a good number of genes to specifically focus our efforts into, and now this can lead to efforts of drug repurposing or drug development. Did we find a specific drug? No, but we definitely found several targets that now could be investigated for different drugs.

Norman Swan: Gita, thank you very much for joining us.

Gita Pathak: Thank you so much for having me, I really appreciate it.

Norman Swan: Dr Gita Pathak is a team leader in the COVID-19 Host Genetics Initiative at Yale University's School of Medicine.

More here:
The genetics behind why some people get sicker with COVID-19 than others - ABC News

Posted in Genetics | Comments Off on The genetics behind why some people get sicker with COVID-19 than others – ABC News

Ambry Genetics Publishes 43,000 Patient Study Showing Combined RNA and DNA Analysis Identifies Patients Who Are High-Risk for Cancer but Would Have…

Posted: August 30, 2022 at 3:01 am

The largest RNA study ever conducted in hereditary cancer analyzed more than 43,000 patients who received Ambrys +RNAinsight testing and found that 1 in 950 had an elusive clinically actionable result that would have been missed by DNA-only testing.

Combined DNA and RNA testing identified cancer risk in an additional 1 out of 79 patients compared to DNA-only testing.

ALISO VIEJO, Calif., August 29, 2022--(BUSINESS WIRE)--Ambry Genetics, a leader in clinical diagnostic testing and a subsidiary of REALM IDx, Inc., announced today the findings of a study that showed paired RNA and DNA genetic testing, conducted at the same time, detected elusive pathogenic variants in 1 of every 950 patients that were missed by DNA testing alone. The findings, published in npj Genomic Medicine, highlight the importance of combining RNA and DNA analysis in hereditary cancer testing to give clinicians and their patients the most accurate and comprehensive genetic data needed to inform patient care and achieve the best outcomes.

According to the National Library of Medicine, as of August 2017, there were approximately 75,000 genetic tests on the market, representing 10,000 unique test types. Unfortunately, many of these DNA-only tests exclude large portions of DNA such as introns, a sequence of DNA that is spliced out before an RNA molecule is translated into a protein. In addition to omitting large portions of introns, DNA-only testing lacks the functional context to determine whether a variant increases cancer risk, which can lead to inconclusive results. These limitations may prevent patients and their families from getting accurate results to inform their preventative or therapeutic care.

Concurrent RNA and DNA testing helps identify more patients at risk by determining if an uncertain result from DNA testing is normal or disease-causing, and expands the range of genetic testing to identify mutations that DNA-only testing misses.

"With our +RNAinsight test we were the first company to offer upfront paired DNA and RNA sequencing to give clinicians and their patients the most accurate and comprehensive information about their cancer risk," said Tom Schoenherr, CEO, Ambry Genetics. "This study confirms that conducting RNA and DNA testing together is critical to help identify high-risk individuals who would have been missed by DNA-only testing."

Story continues

Previously, published evidence of the value of RNA sequencing has been limited by studies with small sample sizes and enriched cohorts. This study by Ambry is the largest to examine the impact of paired DNA and RNA analysis in hereditary cancer testing. In the study, tests from 43,524 patients who underwent paired DNA-RNA genetic testing using Ambrys +RNAinsight from March 2019 through April 2020 were examined to determine if the paired sequencing detected more pathogenic variants than DNA testing alone. The analysis identified patients who had disease-causing alterations that DNA testing alone would have misinterpreted. Examining the RNA data resolved variant findings in 549 patients (1 in 79 patients) by providing the required functional data for more accurate interpretation of splicing variants. In addition, the analysis showed that 1 of every 950 patients had a pathogenic deep intronic variant that would not have appeared in DNA testing alone.

The results from the study may underestimate the total clinical impact because some of the patients families who are now eligible for genetic testing were not tested. In addition, the ripple effect created by these updated results extends to past and future patients. These downstream benefits were not quantified in the current study.

"This is the largest study of its kind to show the importance of RNA testing in predicting cancer risk," said Carrie Horton, senior clinical research specialist for oncology and first author of the study. "Its clear that RNA analysis has the potential to become a standard practice for genetic testing to improve hereditary cancer care."

A webinar, open to the media, genetic counselors, clinicians and other interested parties, will be conducted on Thursday, September 15 at 10 a.m. PT to review the study findings. Registration information is here.

Ambrys +RNAinsight was the first test to provide comprehensive gene coverage for RNA analysis to help classify and detect DNA variants associated with a variety of cancers including breast, ovarian, prostate, colon, pancreatic and uterine. +RNAinsight enables more accurate identification of patients with increased genetic risks for cancer, finds actionable results that may otherwise be missed and decreases the frequency of inconclusive results.

About Ambry Genetics

Ambry Genetics, a subsidiary of REALM IDx, Inc., translates scientific research into clinically actionable test results based upon a deep understanding of the human genome and the biology behind genetic disease. It is a leader in genetic testing that aims to improve health by understanding the relationship between genetics and disease. Its unparalleled track record of discoveries over 20 years, and growing database that continues to expand in collaboration with academic, corporate and pharmaceutical partners, means Ambry Genetics is first to market with innovative products and comprehensive analysis that enable clinicians to confidently inform patient health decisions.

View source version on businesswire.com: https://www.businesswire.com/news/home/20220829005605/en/

Contacts

Media Contact

Brad LottermanCommunications DirectorREALM IDx949-401-0465blotterman@realmidx.com

Continued here:
Ambry Genetics Publishes 43,000 Patient Study Showing Combined RNA and DNA Analysis Identifies Patients Who Are High-Risk for Cancer but Would Have...

Posted in Genetics | Comments Off on Ambry Genetics Publishes 43,000 Patient Study Showing Combined RNA and DNA Analysis Identifies Patients Who Are High-Risk for Cancer but Would Have…

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated | Scientific Reports -…

Posted: August 30, 2022 at 3:01 am

The near-perfect case of dimensionality reduction

Applying principal component analysis (PCA) to a dataset of four populations sampled evenly: the three primary colors (Red, Green, and Blue) and Black illustrate a near-ideal dimension reduction example. PCA condensed the dataset of these four samples from a 3D Euclidean space (Fig.1B) into three principal components (PCs), the first two of which explained 88% of the variation and can be visualized in a 2D scatterplot(Fig.1C). Here, and in all other color-based analyses, the colors represent the true 3D structure, whereas their positions on the 2D plots are the outcome of PCA. Although PCA correctly positioned the primary colors at even distances from each other and Black, it distorted the distances between the primary colors and Black (from 1 in 3D space to 0.82 in 2D space). Thereby, even in this limited and near-perfect demonstration of data reduction, the observed distances do not reflect the actual distances between the samples (which are impossible to recreate in a 2D dataset). In other words, distances between samples in a reduced dimensionality plot do not and cannot be expected to represent actual genetic distances. Evenly increasing all the sample sizes yields identical results irrespective of the sample size (Fig.1D,E).

When analyzing human populations, which harbor most of the genomic variation between continental populations (12%) with only 1% of the genetic variation distributed within continental populations39, PCA tends to position Africans, Europeans, and East Asians at the corners of an imaginary triangle, which closely resembles our color-population model and illustration. Analyzing continental populations, we obtained similar results for two even-sized sample datasets (Fig.2A,C) and their quadrupled counterparts (Fig.2B,D). As before, the distances between the populations remain similar (Fig.2AD), demonstrating that for same-sized populations, sample size does not contribute to the distortion of the results if the increase in size is proportional.

Testing the effect of even-sample sizes using two population sets. The top plots show nine populations with n=50 (A) and n=188 (B). The bottom plots show a different set of nine populations with n=50 (C) and n=192 (D). In both cases, increasing the sample size did not alter the PCs (the y-axis flip between (C) and (D) is a known phenomenon).

The extent to which different-sized populations produce results with conflicting interpretations is illustrated through a typical study case in Box 1.

Note that unlike in Figs.1C and 3A, where Black is in the middle, in other figures, the overrepresentation of certain alleles (e.g., Fig. 4B) shifts Black away from (0,0). Intuitively, this can be thought of as the most common allele (Green in Fig. 4B) repelling Black, which has three null or alternative alleles.

PCA is commonly reported as yielding a stable differentiation of continental populations (e.g., Africans vs. non-Africans, Europeans vs. Asians, and Asians vs. Native Americans or Oceanians, on the primary PCs40,41,42,43). This prompted prehistorical inferences of migrations and admixture, viewing the PCA results that position Africans, East Asians, and Europeans in three corners of an imaginary triangle as representing the post Out Of Africa event followed by multiple migrations, differentiation, and admixture events. Inferences for Amerindians or Aboriginals typically follow this reconstruction. For instance, Silva-Zolezzi et al.42 argued that the Zapotecosdid not experience a recent admixture due to their location on the AmerindianPCA cluster at the Asian end of the European-Asian cline.

Here we show that the appearance of continental populations at the corners of a triangle is an artifact of the sampling scheme since variable sample sizes can easily create alternative results as well as alternative clines. We first replicated the triangular depiction of continental populations (Fig. 3A,B) before altering it (Fig. 3CF). Now, East Asians appear as a three-way admixed group of Africans, Europeans, and Melanesians (Fig. 3C), whereas Europeans appear on an African-East Asian cline (Fig. 3D). Europeans can also be made to appear in the middle of the plot as an admixed group of Africans-Asians-Oceanians origins (Fig. 3E), and Oceanians can cluster with (Fig. 3F) or without East Asians (Fig. 3E). The latter depiction maximizes the proportion of explained variance, which common wisdom would consider the correct explanation. According to some of these results, only Europeans and Oceanians (Fig. 3C) or East Asians and Oceanians (Fig. 3D) experienced the Out of Africa event. By contrast, East Asians (Fig. 3C) and Europeans (Fig. 3D) may have remained in Africa. Contrary to Silva-Zolezzi et al.s42 claim, the same MexicanAmerican cohort can appear closer to Europeans (Fig. 3A) or as a European-Asian admixed group (Fig. 3B). It is easy to see that none of those scenarios stand out as more or less correct than the other ones.

PCA of uneven-sized African (Af), European (Eu), Asian (As), and Mexican-Americans (Ma) or Oceanian (Oc) populations. Fixing the sample size of Mexican-Americans and altering the sample sizes of other populations: (A) nAf=198; nEu=20; nAs=483; nMa=64 and (B) nAf=20; nEu=343; nMa=20; nAm=64 changes the results. An even more dramatic change can be seen when repeating this analysis on Oceanians: (C) nAf=5; nEu=25; nAs=10; nOce=20 and (D) nAfr=5; nEu=10; nAs=15; nOc=20 and when altering their sample sizes: (E) nAf=98; nEu=25; nAs=150; nOc=24 and (F) nAf=98; nEu=83; nAs=30; nOc=15.

Reich et al.44 presented further PCA-based evidence to the out of Africa scenario. Applying PCA to Africans and non-Africans, they reported that non-Africans cluster together at the center of African populations when PC1 was plotted against PC4 and that this rough cluster[ing] of non-Africans is about what would be expected if all non-African populations were founded by a single dispersal out of Africa. However, observing PC1 and PC4 for Supplementary Fig. S3, we found no rough cluster of non-Africans at the center of Africans, contrary to Reich et al.s44 claim. Remarkably, we found a rough cluster of Africans at the center of non-Africans (Supplementary Fig. S3C), suggesting that Africans were founded by a single dispersal into Africa by non-Africans. We could also infer, based on PCA, either that Europeans never left Africa (Supplementary Fig. S3D), that Europeansleft Africa through Oceania (Supplementary Fig. S3B), that Asians and Oceanians never left Europe (or the other way around) (Supplementary Fig. S3F), or,since all are valid PCA results,all of the above. Unlike Reich et al.44, we do not believe that their example highlights how PCA methods can provide evidence of important migration events. Instead, our examples (Fig. 3, Supplementary Fig. S3) show how PCA can be used to generate conflicting and absurd scenarios, all mathematically correct but, obviously, biologically incorrect and cherry-pick the most favorable solution. This is an example of how vital a priori knowledge is to PCA. It is thereby misleading to present one or a handful of PC plots without acknowledging the existence of many other solutions, let alone while not disclosing the proportion of explained variance.

Three research groups sought to study the origin of Black. A previous study that employed even sample-sized color populations alluded that Black is a mixture of all colors (Fig.1BD). A follow-up study with a larger sample size (nRed=nGreen=nBlue=10) and enriched in Black samples (nBlack=200) (Fig. 4A) reached the same conclusion. However, the Black-is-Blue group suspected that the Blue population was mixed. After QC procedures, the Blue sample size was reduced, which decreased the distance between Black and Blue and supported their speculation that Black has a Blue origin (Fig. 4B). The Black-is-Red group hypothesized that the underrepresentation of Green, compared to its actual population size, masks the Red origin of Black. They comprehensively sampled the Green population and showed that Black is very close to Red (Fig. 4C). Another Black-is-Red group contributed to the debate by genotyping more Red samples. To reduce the bias from other color populations, they kept the Blue and Green sample sizes even. Their results replicated the previous finding that Black is closer to Red and thereby shares a common origin with it (Fig. 4D). A new Black-is-Green group challenged those results, arguing that the small sample size and omission of Green samples biased the results. They increased the sample sizes of the populations of the previous study and demonstrated that Black is closer to Green (Fig. 4E). The Black-is-Blue group challenged these findings on the grounds of the relatively small sample sizes that may have skewed the results and dramatically increased all the sample sizes. However, believing that they are of Purple descent, Blue refused to participate in further studies. Their relatively small cohort was explained by their isolation and small effective population size. The results of the new sampling scheme confirmed that Black is closer to Blue (Fig. 4F), and the group was praised for the large sample sizes that, no doubt, captured the actual variation in nature better than the former studies.

PCA of uneven-sized samples of four color populations. (A) nRed=nGreen=nBlue=10; nBlack=200, (B) nRed=nGreen=10; nBlue=5; nBlack=200, (C) nRed=10; nGreen=200; nBlue=50; nBlack=200 (D) nRed=25; nGreen=nBlue=50; nBlack=200, (E) nRed=300; nGreen=200; nBlue=nBlack=300, and (F) nRed=1000; nGreen=2000; nBlue=300; nBlack=2000. Scatter plots show the top two PCs. The numbers on the grey bars reflect the Euclidean distances between the color populations over all PCs. Colors include Red [1,0,0], Green [0,1,0], Blue [0,0,1], and Black [0,0,0].

The question of who the ancestors of admixed populations are and the extent of their contribution to other groups is at the heart of population genetics. It may not be surprising that authors hold conflicting views on interpreting these admixtures from PCA. Here, we explore how an admixed group appears in PCA, whether its ancestral groups are identifiable, and how its presence affects the findings for unmixed groups through a typical study case (Box 2).

To understand the impact of parameter choices on the interpretation of PCA, we revisited the first large-scale study of Indian population history carried out by Reich et al.45. The authors applied PCA to a cohort of Indians, Europeans, Asians, and Africans using various sample sizes that ranged from 2 (Srivastava) (out of 132 Indians) to 203 (Yoruban) samples. After applying PCA to Indians and the three continental populations to exclude outliers that supposedly had more African or Asian ancestries than other samples, PCA was applied again in various settings.

At this point, the authors engaged in circular logic as, on the one hand, they removed samples that appeared via PCA to have experienced gene flow from Africa (their Note 2, iii) and, on the other hand, employed a priori claim (unsupported by historical documents) that African history has little to do with Indian history (which must stand in sharp contrast to the rich history of gene flow from Utah (US) residentsto Indians, which was equally unsupported). Reich et al. provided no justification for the exact protocol used or any discussion about the impact of using different parameter values on resulting clusters.They then generated a plethora of conflicting PCA figures, never disclosing the proportion of explained variance along with the first four PCs examined. They then inferred based on PCA that Gujarati Americans exhibit no unusual relatedness to West Africans (YRI) or East Asians (CHB or JPT) (Supplementary Fig. S4)45. Their concluding analysis of Indians, Asians, and Europeans (Fig. 4)45 showed Indians at the apex of a triangle with Europeans and Asians at the opposite corners. This plot was interpreted as evidence of an ancestry that is unique to India and an Indian cline. Indian groups were explained to have inherited different proportions of ancestry from Ancestral North Indians (ANI), related to western Eurasians, and Ancestral South Indians (ASI), who split from Onge. The authors then followed up with additional analyses using Africans as an outgroup, supposedly confirming the results of their selected PCA plot. Indians have since been described using the terms ANI and ASI.

In evaluating the claims of Reich et al.45 that rest on PCA, we first replicated the finding of the alleged Indian cline (Fig. 5A). We next garnered support for an alternative cline using Indians, Africans, and Europeans (Fig. 5B). We then demonstrated that PCA results support Indians to be European (Fig. 5C), East Asians (Fig. 5D), and Africans (Fig. 5E), as well as a genuinely European-Asian, admixed population (Fig. 5F). Whereas the first two PCs of Reich et al.s primary figure explain less than 8% of the variation (according to our Fig. 5A, Reich et al.s Fig. 4 does not report this information), four out of five of our alternative depictions explain 814% of the variation. Our results also expose the arbitrariness of the scheme used by Reich et al. and show how radically different clustering can be obtained merely by manipulating the non-Indian populations used in the analyses. Our results also question the authors choice in using an analysis that explained such a small proportion of the variation (let alone not reporting it), yielded no support for a unique ancestry to India, and cast doubt on the reliability and usefulness of the ANI-ASI model to describe Indians provided their exclusive reliability on a priori knowledge in interpreting the PCA patters. Although supported by downstream analyses, the plurality of PCA results could not be used to support the authors findings because using PCA, it is impossible to answer a priori whether Africa is in India or the other way around (Fig. 5E). We speculate tat the motivation for Reich et al.'s strategy was to declare Africans an outgroup, an essential component of D-statistics.Clearly, PCA-based a posteriori inferences can lead to errors of Colombian magnitude.

Studying the origin of Indians using PCA. (A) Replicating Reich et al.s 45 results using nEu=99; nAs=146; nInd=321. Generating alternative PCA scenarios using: (B) nAf=178; nEu=99; nInd=321, (C) nAf=400; nEu=40; nAs=100; nInd=321, (D) nAf=477; nEu=253; nAs=23; nInd=321, (E) nAf=25; nEu=220; nAs=490; nInd=320, and (F) nAf=30; nEu=200; nAs=50; nInd=320.

To evaluate the extent of deviation of PCA results from genetic distances, we adopted a simple genetic distance scheme where we measured the Euclidean distance between allelic counts (0,1,2) in the same data used for PCA calculations. We are aware of the diversity of existing genetic distance measures. However, to the best of our knowledge, no study has ever shown that PCA outcomes numerically correlate with any genetic distance measure, except in very simple scenarios and tools like ADMIXTURE-like tools, which, like PCA, exhibit high design flexibility. Plotting the genetic distances against those obtained from the top two PCs shows the deviation between these two measures for each dataset. We found that all the PC projections (Fig. 6) distorted the genetic distances in unexpected ways that differ between the datasets. PCA correctly represented the genetic distances for a minority of the populations, and just like the most poorly represented populationsnone were distinguishable from other populations. Moreover, populations that clustered under PCA exhibited mixed results, questioning the accuracy of PCA clusters. Although it remains unclear which sampling scheme to adopt, neither scheme is genetically accurate. These results further question the genetic validity of the ANI-ASI model.

Comparing the genetic distances with PCA-based distances for the corresponding datasets of Fig. 5. Genetic and PCA (PC1+PC2) distances between populations pairs (symbol pairs) and 2000 random individual pairs (grey dots) were calculated using Euclidean distances and normalized to range from 0 to 1. Population and individual pairs whose PC distances reflect their genetic distances are shown along the x=y dotted line. Note that the position of heterogeneous populations on the plot may deviate from that of their samples and that some populations are very small.

We are aware that PCA disciplesmay reject our reductio ad absurdum argument and attempt to read into these results, as ridiculous as they may be, a valid description of Indian ancestry. For those readers, demonstrating the ability of the experimenter to generate near-endless contradictory historical scenarios using PCA may be more convincing or at least exhausting. For brevity, we present six more such scenarios that show PCA support for Indians as a heterogeneous group with European admixture and Mexican-Americans as an Indian-European mixed population (Supplementary Fig. S4A), MexicanAmerican as an admixed African-European group with Indians as a heterogeneous group with European admixture (Supplementary Fig. S4B), Indians and Mexican-Americans as European-Japanese admixed groups with common origins and high genetic relatedness (Supplementary Fig. S4C), Indians and Mexican-Americans as European-Japanese admixed groups with no common origins and genetic relatedness (Supplementary Fig. S4D), Europans as Indian and Mexican-Americans admixed group with Japanese fully cluster with the latter (Supplementary Fig. S4E), and Japanese and Europeans cluster as an admixed Indian and Mexican-Americans groups (Supplementary Fig. S4F). Readers are encouraged to use our code to produce novel alternative histories.We suspect that almost any topology could be obtained by finding the right set of input parameters.In this sense, any PCA output can reasonably be considered meaningless.

Contrary to Reich et al.'s claims,a more common interpretation of PCA is that the populations at the corners of the triangle are ancestral or are related to the mixed groups within the triangle, which are the outcome of admixture events, typically referred to as gradient or clines45. However, some authors held different opinions. Studying the African component of Ethiopian genomes, Pagani et al.46 produced a PC plot showing Europeans (CEU), Yoruba (western African), and Ethiopians (Eastern Africans) at the corners of a triangle (Supplementary Fig. S4)46. Rather than suggesting that the populations within the triangle (e.g., Egyptians, Spaniards, Saudi) are mixtures of these supposedly ancestral populations, the authors argued that Ethiopians have western and eastern Africanorigins, unlike the central populations with different patterns of admixture. Obviously, neither interpretation is correct. Reich et al.s interpretation does not explain why CEUs are not an Indian-African admix nor why Africans are not a European-Indian admix and is analogous to arguing that Red has Green and Blue origins (Fig.1). Pagani et al.s interpretation is a tautology, ignores the contribution of non-Africans, and is analogous to arguing that Red has Red and Green origins. We carried out forward simulations of populations with various numbers of ancestral populations and found that admixture cannot be inferred from the positions of samples in a PCA plot (Supplementary Text 1).

In a separate effort to study the origins of AJs, Need et al.47 applied PCA to 55 Ashkenazic Jews (AJs) and 507 non-Jewish Caucasians. Their PCA plot showed that AJs (marked as Jews) formed a distinct cluster from Europeans (marked as non-Jews). Based on these results, the authors suggested that PCA can be used to detect linkage to Jewishness. A follow-up PCA where Middle Eastern (Bedouin, Palestinians, and Druze) and Caucasus (Adygei) populations were included showed that AJs formed a distinct cluster that nested between the Adygei (and the European cluster) and Druze (and the Middle Eastern cluster). The authors then concluded that AJs might have mixed Middle Eastern and European ancestries. The proximity to the Adygei cluster was noted as interesting but dismissed based on the small sample size of the Adygei (n=17). The authors concluded that AJ genomes carry an unambiguous signature of their Jewish heritage, and this seems more likely to be due to their specific Middle Eastern ancestry than to inbreeding. A similar strategy was employed by Bray et al.48 to claim that PCA confirmed that the AJ individuals cluster distinctly from Europeans, aligning closest to Southern European populations along with the first principal component, suggesting a more southern origin, and aligning with Central Europeans along the second, consistent with migration to this region. Other authors49,50 made similar claims.

It is easy to show why PCA cannot be used to reach such conclusions. We first replicated Need et al.s47 primary results (Fig. 7A), showing that AJs cluster separately from Europeans. However, such an outcome is typical when comparing Europeans and non-European populations like Turks (Fig. 7B). It is not unique to AJs, nor does it prove that they are genetically detectable. A slightly modified design shows that most AJs overlap with Turks in support of the Turkic (or Near Eastern) origin of AJs (Fig. 7C). We can easily refute our conclusion by including continental populations and showing that most AJs cluster with Iberians rather than Turks (Fig. 7D). This last design explains more of the variance than all the previous analyses together, although, as should be evident by now, it is not indicative of accuracy. This analysis questions PCA's use as a discriminatory genetic utility and to infer genetic ancestry.

Studying the origin of 55 AJs using PCA. (A) Replicating Need et al.s results using nEu=507; Generating alternative PCA scenarios using: (B) nEu=223; nTurks=56; (C) nEu=400; nTurks+Caucasus=56, and (D) nAf=100, nAs=100 (Africans and Asians are not shown), nEu=100; and nTurks=50.Need et al.'s faulty terminology was adopted in A and B.

There are several more oddities with the report of Need et al.47. First, they did not report the variance explained by their sampling scheme (it is, likely, ~1%, as in Fig. 7A). Second, they misrepresented the actual populations analyzed. AJs are not the only Jews, and Europeans are not the only non-Jews (Figs.1, 7A)47. Finally, their dual interpretations of AJs as a mixed population of Middle Eastern origin are based solely on a priori belief: first, because most of the populations in their PCA are nested between and within other populations, yet the authors did not suggest that they are all admixed and second because AJs nested between Adygii and Druze51,52, both formed in the Near Eastern. The conclusions of Need et al.47 were thereby obtained based on particular PCA schemes and what may be preconceived ideas of AJs origins that are no more real than the Iberian origin of AJs (Fig. 7D). This is yet another demonstration (discussed in Elhaik36) of how PCA can be misused to promote ethnocentric claims due to its design flexibility.

Following criticism on the sampling scheme used to study the origin of Black (Box 1), the redoubtableBlack-is-Red group genotyped Cyan. Using even sample sizes, they demonstrated that Black is closer to Red (DBlack-Red=0.46) (Fig. 8A), where D is the Euclidean distance between the samples over all three PCs (short distances indicate high similarity). The Black-is-Green school criticized their findings on the grounds that their Cyan samples were biased and their results do not apply to the broad Black cohort. They also reckoned that the even sampling scheme favored Red because Blue is related to Cyan through shared language and customs. The Black-is-Red group responded by enriching their cohort in Cyan and Black (nCyan, nBlack=1000) and provided even more robust evidence that Black is Red (DBlack-Red=0.12) (Fig. 8B). However, the Black-is-Green camp dismissed these findings. Conscious of the effects of admixture, they retained only the most homogeneous Green and Cyan (nGreen, nCyan=33), genotyped new Blue and Black (nBlue, nBlack=400), and analyzed them with the published Red cohort (nRed=100). The Black-is-Green results supported their hypothesis that Black is Green (DBlack-Green=0.27) and that Cyan shared a common origin with Blue (DBlue-Green=0.27) (Fig. 8C) and should thereby be considered an admixed Blue population. Unsurprisingly, the Black-is-Red group claimed that these results were due to the under-representation of Black since when they oversampled Black, PCA supported their findings (Fig. 8A). In response, the Black-is-Green school maintained even sample sizes for Cyan, Blue, and Green (nBlue, nGreen, nCyan=33) and enriched Black and Red (nRed, nBlack=100). Not only did their results (DBlack-Green=0.63

PCA with the primary and mixed color populations. (A) nall=100; nBlack=200, (B) nRed=nGreen=nBlue=100; nBlack=nCyan=500, (C) nRed=100; nGreen=nCyan=33; nBlue=nBlack=400; and (D) nRed=nBlack=100; nGreen=nBlue=nCyan=33; Scatter plots show the top two PCs. The numbers on the grey bars reflect the Euclidean distances between the color populations over all PCs. Colors include Red [1,0,0], Green [0,1,0], Blue [0,0,1], Cyan [0,1,1], and Black [0,0,0].

The question of how analyzing admixed groups with multiple ancestral populations affects the findings for unmixed groups is illustrated through a typical study case in Box 3.

To understand how PCA can be misused to study multiple mixed populations, we will investigate other PCA applications to study AJs. Such analyseshave a thematic intepretation, wherethe clustering of AJsamples is evidence of a sharedLevantine origin, e.g., Refs.12,13, that short distances between AJs and Levantines indicate close genetic relationships in support of a shared Levantine past, e.g., Ref.12, whereas the short distances between AJs and Europeans areevidence of admixture13. Finally,as a rule, the much shorter distances between AJs and the Caucasus or Turkish populations, observed by all recent studies, were ignored12,13,47,48. Bray et al.48 concluded that not only doAJs have a more southern origin but that their alignment with Central Europeans is consistent with migration to this region. In these studies, "short" andbetween received a multitude of interpretations. For example, Gladstein and Hammer's53 PCA plot that showed AJs in the extreme edge of the plot with Bedouins and French in the other edges was interpreted as AJs clustering tightly between European and Middle Eastern populations. The authors interpreted the lack of outliers among AJs (which were never defined) as evidence of common AJ ancestry.

Following the rationale of these studies, it is easy to show how PCA can be orchestrated to yield a multitude origins for AJs. We replicated the observation that AJs are population isolate, i.e., AJs form a distinct group, separated from all other populations (Fig. 9A), and are thereby genetically distinguishable47. We also replicated the most common yet often-ignored observation, that AJs cluster tightly with Caucasus populations (Fig. 9B). We next produced novel results where AJs cluster tightly with Amerindians due to the north Eurasian or Amerindian origins of both groups (Fig. 9C). We can also show that AJs cluster much closer to South Europeans than Levantines (Fig. 9D), and overlap Finns entirely, in solid evidence of AJs ancient Finnish origin (Fig. 9E). Last, we wish to refute our previous finding and show that only half of the AJs are of Finnish origin. The remaining analysis supports the lucrative Levantine origin (Fig. 9F)a discovery touted by all the previous reports though never actually shown. Excitingly enough, the primary PCs of this last Eurasian Finnish-Levantine mixed origin depiction explained the highest amount of variance. An intuitive interpretation of those results is a recent migration of the Finnish AJs to the Levant, where they experienced high admixture with the local Levantine populations that altered their genetic background. These examples demonstrate that PCA plots generate nonsensical results for the same populations and no a posteriori knowledge.

An in-depth study of the origin of AJs using PCA in relation to Africans (Af), Europeans (Eu), East Asians (Ea), Amerindians (Am), Levantines (Le), and South Asians (Sa). (A) nEu=159; nAJ=60; nLe=82, (B) nAf=30; nEu=159; nEa=50; nAJ=60; nLe=60, (C) nAf=30; nEa=583; nAJ=60; nAm=255; (D) nAf=200; nEu=115; nEa=200; nAJ=60; nLe=235; nSa=88, (E) nAf=200; nEu=30; nAJ=400, nLe=80 (F) nAf=200; nEu=30; nAJ=50; nLe=160. Large square indicate insets.

The value of using mixed color populations to study origins prompted new analyses using even (Fig. 10A) and variable sample sizes (Fig. 10BD). Using this novel sampling scheme, the Black-is-Green school reaffirmed that Black is the closest to Green (Fig. 10A, 10C, and 10D)in a series of analyses, but using a different cohort yielded a novel finding that Black is closest to Pink (Fig. 10B).

PCA with the primary and multiple mixed color populations. (A) nall=50, (B) nall=50 or 10, (C,D) nAll=[50, 5, 100, or 25]. Scatter plots show the top two PCs. Colors codes are shown. (E) The difference between the true distances calculated over a 3D plane between every color population pair (shown side by side) from (D) and their Euclidean distances calculated from the top two PCs. Pairs whose PC distances from each other reflect their true 3D distances are shown along the x=y dotted line. One of the largest PCA distortions is the distances between the Red and Green populations (inset). The true Red-Green distance is 1.41 (x-axis), but the PCA distance is 0.5 (y-axis).

The extent to which PCA distances obtained by the top two PCs reflect the true distances among color population pairs is shown in Fig. 10E. PCA distorted the distances between most color populations, but the distortion was uneven among the pairs, and while a minority of the pairs are correctly projected via PCA, most are not. Identifying which pairs are correctly projected is impossible without a priori information. For example, some shades of blue and purple were less biased than similar shades. We thereby show that PCA inferred distances are biased in an unpredicted manner and thereby uninformative for clustering.

Unlike stochasticmodels that possess inherent randomness, PCA is a deterministic process, a property that contributes to its perceived robustness. To explore the behavior of PCA, we tested whether the same computer code can produce similar or different results when the only variable that changes is the standard randomization technique used throughout the paper to generate the individual samples of the color populations (to avoid clutter).

We evaluated two color sets. In the first set, Black was the closest to Yellow (Fig.11A), Purple (Fig.11C), and Cyan (Fig.11D,E). When adding White, in the second set, Black behaved as an outgroup as the distances between the secondary colors largely deviated from the expectation and produced false results (Fig.11DF). These results illustrate the sensitivity of PCA to tiny changes in the dataset, unrelated to the populations or the sample sizes.

Studying the effects of minor sample variation on PCA results using color populations (nall=50). (AC) Analyzing secondary colors and Black. (DE) Analyzing secondary colors, White, and Black. Scatter plots show the top two PCs. Colors include Cyan [0,1,1], Purple [1,0,1], Yellow [1,1,0], White [1,1,0], and Black [0,0,0].

To explore this effect on human populations, we curated a cohort of 16 populations. We carried out PCA on ten random individuals from 15 random populations. We show that these analyses result in spurious and conflicting results (Fig.12). Puerto Ricans, for instance, clustered close to Europeans (A), between Africans and Europeans (B), close to Adygei (C), and close to Europe and Adygei (D). Indians clustered with Mexicans (A, B, and D) or apart from them (C). Mexicans themselves cluster with (A and D) or without (B and C)Africans. Papuans and Russians cluster close (B) or afar (C) from East Asian populations. More robust clustering was observed for East Asians, Caucasians, and Europeans, as well as Africans. However, these were not only indistinguishable from the less robust clustering but also failed to replicate over multiple runs (results not shown). These examples show that PCA results are unpredictable and irreproducible even when 94% of the populations are the same. Note that the proportion of explained variance was similar in all the analyses, demonstrating that it is not an indication of accuracy or robustness.

Studying the effect of sampling on PCA results. A cohort of 16 worldwide populations (see legend) was selected. In each analysis, a random population was excluded. Populations were represented by random samples (n=10). The clusters highlight the most notable differences.

We found that although a deterministic process, PCA behaves unexpectedly, and minor variations can lead toan ensemble of different outputs that appear stochastic. This effect is more substantial when continentalpopulations are excluded from the analysis.

Samples of unknown ancestry or self-reported ancestry are typically identified by applying PCA to a cohort of test samplescombined with reference populations of known ancestry (e.g., 1000 Genomes), e.g., Refs.22,54,55,56. To test whether using PCA to identify the ancestry of an unknown cohort with known samples is feasible, we simulated a large and heterogeneous Cyan population (Fig.13A, circles) of self-reported Blue ancestry. Following a typical GWAS scheme, we carried out PCA for these individuals and seven known and distinct color populations. PCA grouped the Cyanindividuals with Blue and Black individuals (Fig.13B), although none of the Cyanindividuals were Blue or Black (Fig.13A), as a different PCA scheme confirmed (Fig.13C). A casecontrol assignment of this cohort to Blue or Black based on the PCA result (Fig.13B) produced poor matches that reduced the power of the analysis. When repeating the analysis with different reference populations (Fig.13D), the simulated individuals exhibited minimal overlap with Blue, no overlap with Black, and overlapped mostly with the Cyan reference population present this time. We thereby showed that the clustering with Blue and Black is an artifact due to the choice of reference populations. In other words, the introduction of reference populations with mismatched ancestries respective to the unknown samples biases the ancestry inference of the latter.

Evaluating the accuracy of PCA clustering for a heterogeneous test population in a simulation of a GWAS setting. (A) The true distribution of the test Cyan population (n=1000). (B) PCA of the test population with eight even-sized (n=250) samples from reference populations. (C) PCA of the test population with Blue from the previous analysis shows a minimal overlap between the cohorts. (D) PCA of the test population with five even-sized (n=250) samples from reference populations, including Cyan (marked by an arrow). Colors (B) from top to bottom and left to right include: Yellow [1,1,0], light Red [1,0,0.5], Purple [1,0,1], Dark Purple [0.5,0,0.5], Black [0,0,0], dark Green [0,0.5,0], Green [0,1,0], and Blue [1,0,0].

We next asked whether PCA results can group Europeans into homogeneous clusters. Analyzing four European populations yielded 43% homogeneous clusters (Fig.14A). Adding Africans and Asians and then South Asian populations decreased the European cluster homogeneity to 14% and 10%, respectively (Fig.14B,C). Including the 1000 Genome populations, as customarily done, yielded 14% homogeneous clusters (Fig.14D). Although the Europeans remained the same, the addition of other continental populations resulted in a three to four times decrease in the homogeneity of their clusters.

Evaluating the cluster homogeneity of European samples. PCA was applied to the four European populations (Tuscan Italians [TSI], Northern and Western Europeans from Utah [CEU], British [GBR], and Spanish [IBS]) alone (A), together with an African and Asian population (B), as well as South Asian population (C), and finally with all the 1000 Genomes Populations (D). (E) Evaluating the usefulness of PCA-based clustering. The bottom two plots show the sizes of non-homogeneous and homogeneous clusters, and the top three plots show the proportion of individuals in homogeneous clusters. Each plot shows the results for 10 or 20 random African, European, or Asian populations for the same PCs (x-axis).

The number of PCs analyzed in the literature ranges from 2 to, at least, 28035, which raises the question of whether using more PCs increases cluster homogeneity or is another cherry-picking strategy. We calculated the cluster homogeneity for different PCs for either 10 or 20 African (n10=337, n20=912), Asian (n10=331, n20=785), and European (n10=440, n20=935) populations of similar sample sizes (Fig.14E). Even in this favorable setting that included only continental populations, on average, the homogeneous clusters identified using PCA were significantly smaller than the non-homogeneous clusters (Homogeneous=12.5 samples; Non-homogeneous=42.6 samples; Homogeneous=12.5 samples; Non-homogeneous=42.6 samples; KruskalWallis test [nHomogeneous=nNon-homogeneous=238 samples, p=1.951075, Chi-square=338]) and included a minority of the individuals when 20 populations were analyzed. Analyzing higher PCs decreased the size of the homogeneous clusters and increased the size of the non-homogeneous ones. The maximum number of individuals in the homogeneous clusters fluctuated for different populations and sample sizes. Mixing other continental populations with each cohort decreased the homogeneity of the clusters and their sizes (results now shown). Overall, these examples show that PCA is a poor clustering tool, particularly as sample size increases, in agreement with Elhaik and Ryan57, who reported that PCA clusters are neither genetically nor geographical homogeneous and that PCA does not handle admixed individuals well. Note that the cluster homogeneity in this limited setting should not be confused with the amount of variance explained by additional PCs.

To further assess whether PCA clustering represents shared ancestry or biogeography, two of the most common applications of PCA, e.g., Ref.22, we applied PCA to 20 Puerto Ricans (Fig.15) and 300 Europeans. The Puerto Ricans clustered indistinguishably with Europeans (by contrast to Fig.12) using the first two and higher PCs (Fig.15). The Puerto Ricans represented over 6% of the cohort, sufficient to generate a stratification bias in an association study. We tested that by randomly assigning casecontrol labels to the European samples with all the Puerto Ricans as controls. We then generated causal alleles to the evenly-sized cohorts and computed the association before and after PCA adjustment. We repeated the analysis with randomly assigned labels to all the samples. In all our 12 casecontrol analyses, the outcome of the PCA adjustment for 2 and 10 PCs were worse than the unadjusted results, i.e., PCA adjusted results had more false positives, fewer true positives, and weaker p-values than the unadjusted results (Supplementary Text 3).

PCA of20 Puerto Ricans and 300 random Europeans from the 1000 Genomes. The results are shown for various PCs.

We next assessed whether the distance between individuals and populations is a meaningful biological or demographic quantity by studying the relationships between Chinese and Japanese, a question of major interest in the literature58,59. We already applied PCA to Chinese and Japanese, using Europeans as an outgroup (Supplementary Fig. S2.4). The only element that varied in the following analyses was the number of Mexicans as the second outgroup (5, 25, and 50). We found that the proportion of homogeneous Japanese and Chinese clusters dropped from 100% (Fig.16A) to 93.33% (Fig.16B) and 40% (Fig.16C), demonstrating that the genetic distances between Chinese and Japanese depend entirely on the number of Mexicans in the cohort rather than the actual genetic relationships between these populations as one may expect.

The effect of varying the number of MexicanAmerican on the inference of genetic distances between Chinese and Japaneseusing various PCs. We analyzed a fixed number of 135 Han Chinese (CHB), 133 Japanese (JPT), 115 Italians (TSI), and a variable number of Mexicans (MXL), including 5 (left column), 25 (middle column), and 50 (right column) individuals over the top four PCs. We found that the overlap between Chinese and Japanese in PC scatterplots, typically used to infer genomic distances, was unexpectedly conditional on the number of Mexican in the cohort. We noted the meaning of the axes of variation whenever apparent (red). The right column had the same axes of variations as the middle one.

Some authors consider higher PCs informative and advise considering these PCs alongside the first two. In our case, however, these PCs were not only susceptible to bias due to the addition of Mexicans but also exhibited the exact opposite pattern observed by the primary PCs (e.g., Fig.16GI). It has also been suggested that in datasets with ancestry differences between samples, axes of variation often have a geographic interpretation10. Accordingly, the addition of Mexicans altered the order of axes of variation between the cases, making the analysis of additional PCs valuable. We demonstrate that this is not always the case. Excepting PC1, over 60% of the axes had no geographical interpretation or an incorrect one. An a priori knowledge of the current distribution of the population was essential to differentiate these cases. The addition of the first 20 Mexicans replaced the second axis of variation (initially undefined) with a third axis (Eurasia-America) in the middle and right columns and resulted in a minor decline of~5% of the homogeneous clusters. Adding 25 Mexicans to the second cohort did not affect the axes, but the proportion of homogeneous clusters declined by 66%. The axes changes were unexpected and altered the interpretation of PCA results. Such changes were not detectable without an a priori knolwedge.

These results demonstrate that (1) the observable distances (and thereby clusters) between populations inferred from PCA plots (Figs.14, 15, 16) are artifacts of the cohort and do not provide meaningful biological or historical information, (2) that distances betewen samples can be easily manipulated by the experimenter in a way that produces unpredictable results, (3) that considering higher PCs produces conflicting patterns, which are difficult to reconcile and interpret, and (4) that our extensive exploration of PCA solutions to Chinese and Japanese relationships using 18 scatterplots and four PCs produced no insight. It is easy to see that the multitude of conflicting results, allows the experimenter to select the favorable solution that reflects their a priori knowledge.

Incorporating precalculated PCA is done by projecting the PCA results calculated for the first dataset onto the second one, e.g., Ref.17. Here, we tested the accuracy of this approach by projecting one or more color populations onto precalculated color populations that may or may not match the projected ones. The accuracy of the results was dependent on the identity of the populations of the two cohorts. When the same populations were analyzed, they overlapped (Fig.17A), but when unique populations were found in the two datasets, PCA created misleading matches (Figs.17BD). In the latter case, and when the sample sizes were uneven (Fig.17C), the projected samples formed clusters with the wrong populations, and their positioning in the plot was incorrect. Overall, we found that PCA projections are unreliable and misleading, with correct outcomes indistinguishable from incorrect ones.

Examining the accuracy of PCA projections. The PCA results of one dataset (circles) were projected onto another (squares). In (A), testing the case of varying sample sizes between the first (nRed=200, nGreen=10, nBlue=200, nPurple=10) and second (nRed=200, nGreen=200, nBlue=10, nPurple=10) datasets, where in the second dataset, colors varied a little (e.g., [1,0,0][1,0.1,0.1]). In (BD), the sample size varied (10n300) for both datasets. Colors include Red [1,0,0], Green [0,1,0], light Green [1,0.2,1], Cyan [0,1,1], Blue [0,0,1], Purple [1,0,1], Yellow [1,1,0], Grey [0.5,0.5,0.5], White [1,1,1], and Black [0,0,0].

To evaluate the reliability of projections for human populations, we tested whether the projected populations cluster with their closest groups and to what extent these results can be manipulated. We found that populations can be shown to correctly align with continental populations when the base (or test) populations and the projected populations are very similar (Fig.18A), which gives us confidence in the accuracy of PCA projections. However, even in the simplest scenario of using three continental populations, it is unclear how to interpret the overlap between the base and projected populations since the Spanish would not be considered genetically closer to Finns than Italians, as suggested by PCA. In another simple scenario, where Europeans are projected onto other Europeans, distinct populations like AJs, Iberians, French, CEU, and British overlap entirely (Fig.18B), whereas Finns and Italians were separate. Not only do the results share no apparent resemblance to the geographical distribution, but they also produce conflicting information as to the genetic distances between these populationstwo properties that PCAenthusiastics claimit represents. Adding more populations, even if only to the projected populations, contributes to further distortions with previously distinct populations (Fig.18B) now clustering (Fig.18C). In a different dataset, projecting Japanese onto a base dataset of Africans and Europeans places them as an admixed African-European population. The projected Finns cluster with other Europeans (Fig.18D), at odds with the previous results (Fig.18B) that singled them out.

PCA projections of populations (italic and black star inside the shape) onto base populations with even-sized sample (n=50, unless noted otherwise) (regular font). In (A) nprojected=100, (B) nprojected=50, (C) nprojected=20, (D) nprojected=100, (E) nprojected=80 and nprojected=100, and (F) 80nprojected100 and 12nprojected478.

To test the behavior of PCA when projecting populations different from the base populations, we projected Chinese, Finns, Indians, and AJs onto Levantine and two European populations (Fig.18E). The results imply that the Chinese and AJs are of an Indian origin originating from a European-Levantine mix. Replacing Levantines with Africans does not stabilize the projected results (Fig.18F). Now the projected Chinese and Japanese overlap, and AJs cluster with Iranians.

Overall, our results show that it is unfeasible to rely on PCA projections, particularly in studies involving different populations, as is commonly done. Even when the projected populations are identical to the base ones, the base and projected populations may or may not overlap.

PCA is the primary tool in paleogenomics, where ancient samples are initially identified based on their clustering with modern or other ancient samples. Here, a wide variety of strategies is employed. In some studies, ancient and modern samples are combined60. In other studies, PCA is performed separately for each ancient individual and particular reference samples, and the PC loadings are combined61. Some authors projected present-day human populations onto the top two principal components defined by ancient hominins (and non-humans)62. The most common strategy is to project ancient DNA onto the top two principal components defined by modern-day populations14. Here, we will investigate the accuracy of this strategy.

Since ancient populations show more genetic diversity than modern ones14, we defined ancient colors (a) as brighter colors whose allele frequency is 0.95 with an SD of 0.05 and modern colors (m) as darker colors whose allele frequency is 0.6 with an SD of 0.02. Two approaches were used in analyzing the two datasets: calculating PCA separately for the two datasets and presenting the results jointly (Fig.19A,B), and projecting the PCA results of the ancient populations onto the modern ones (Fig.19C,D). In both cases, meaningful results would show the ancient colors clustering close to their modern counterparts in distances corresponding to their true distances.

Merging PCA of ancient (circles) and modern (squares) color populations using two approaches. First, PCA is calculated separately on the two datasets, and the results are plotted together (A,B). Second, PCA results of ancient populations are projected onto the PCs of the modern ones (C,D). In (A), even-sized samples from ancient (n=25) and modern (n=75) color populations are used. In (B), different-sized samples from ancient (10n25) and modern (10n75) populations are used. In (C) and (D), different-sized samples from ancient (10n75) are used alongside even-sized samples from modern populations: (C) (n=15) and (D) n=25. Colors include Red [1,0,0], dark Red [0.6,0,0], Green [0,1,0], dark Green [0,0.6,0], Blue [0,0,1], dark Blue [0,0,0.6], light Cyan [0,0.6,0.6], light Yellow [0.6,0.6,0], light Purple [0.6,0,0.6], and Black [0,0,0].

These are indeed the results of PCA when even-sized modern and ancient samples from color populations are analyzed and the color pallett isbalanced (Fig.19A). In the more realistic scenario where the color pallet is imbalanced and sample sizes differ, PCA produced incorrect results where ancient Green (aGreen) clustered with modern Yellow (mYellow) away from its closest mGreen that clustered close to aRed. mPurple appeared as 4-ways mixed of aRed, aBlue, mCyan, and mDark Blue. Instead of being at the center (Fig.19A), Black became an outgroup and its distances to the other colors were distorted (Fig.19B). Projecting ancient colors onto modern onesalso highly misrepresented the relationships among the ancient samples as aRed overlapped with aBlue or aGreen, mYellow appeared closer to mCyan or aRed, and the outgroups continuously changed (Fig.19C,D). Note that the first two PCs of the last results explained most of the variance (89%) of all anlyses.

Recently, Lazaridis et al.14 projected ancient Eurasians onto modern-day Eurasians and reported that ancient samples from Israel clustered at one end of the Near Eastern cline and ancient Iranians at the other, close to modern-day Jews. Insights from the positions of the ancient populations were then used in their admixture modeling that supposedly confirmed the PCA results. To test whether the authors inferences were correct and to what extent those PCA results are unique, we used similar modern and ancient populations to replicate the results of Lazaridis et al.14 (Fig.20A). By adding the modern-day populations that Lazaridis et al.14 omitted, we found that the ancient Levantines cluster with Turks (Fig.20B), Caucasians (Fig.20C), Iranians (Fig.20D), Russians (Fig.20E), and Pakistani (Fig.20F) populations. The overlap between the ancient Levantines and other populations also varied widely, whereas they cluster with ancient Iranians and Anatolians, Caucasians, or alone, as a population isolate. Moreover, the remaining ancient populations exhibited conflicting results inconsistent with our understanding of their origins. Mesolithic and Neolithic Swedes, for instance, clustered with modern Eastern Europeans (Fig.20AC) or remotely from them (Fig.20DF). These examples show the wide variety of results and interpretations possible to generate with ancient populations projected onto modern ones. Lazaridis et al.s14 results are neither the only possible onesnor do they explain the most variation. It is difficult to justify Lazaridis et al.s14 preference for the first outcome where the first two components explained only 1.35% of the variation (in our replication analysis. Lazaridis et al. omitted the proportion of explained variation) (Fig.20A), compared to all the alternative outcomes that explained a much larger portion of the variation (1.926.06%).

PCA of 65 ancient Palaeolithic, Mesolithic, Chalcolithic, and Neolithic from Iran (12), Israel (16), the Caucasus (7), Romania (10), Scandinavia (15), and Central Europe (5) (colorful shapes) projected onto modern-day populations of various sample sizes (grey dots, black labels). The full population labels are shown in Supplementary Fig. S8. In addition to the modern-day populations used in (A), the following subfigures also include (B) Han Chinese, (C) Pakistani (Punjabi), (D) additional Russians, (E) Pakistani (Punjabi) and additional Russians, and (F) Pakistani (Punjabi), additional Russians, Han Chinese, and Mexicans. The ancient samples remained the same in all the analyses. In each plot (AF), the ancient Levantines cluster with different modern-day populations.

We note that for high dimensionality data where markers are in high LD, projected samples tend to shrink, i.e., move towards the center of the plot. Corrections to this phenomenon have been proposed in the literature, e.g., Ref.63. This phenomenon does not affect our datasets, which are very small (Fig.19) or LD pruned (Fig.20).

The effect of marker choice on PCA results received little attention in the literature. Although PCA is routinely applied to different SNP sets, the PCs are typically deemed comparable. In forensic applications, that typically employ 100300 markers, this is a major problem. To evaluate the effect of various markers on PCA outcomes, it is unfeasible to use our color model, although it can be used to study the effects of missing data and noise, which are common in genomic datasets and reflect the biological properties of different marker types in capturing the population structure. Remarkably, the addition of 50% (Fig.21A) and even 90% missingness (Fig.21B) allowed recovering the original population structure. The structure decayed when random noise was added to the latter dataset (Fig.21C). To further explore the effect of noise, we added random markers to the dataset. An addition of 10% of noisy markers increased the dataset's disparity, but it still retained the original structure (Fig.21D). Interestingly, even adding 100% noisy markers allowed identifying the original structure's key features (Fig.21E). Only when adding 1000%, noisy markers did the original structure disappear (Fig.21F). Note that the introduction of noise has also sliced the percent of variation explained by the PCs. These results highlight the importance of using ancestry informative markers (AIMs) to uncover the true structure of the dataset and accounting for disruptive markers.

Testing the effects of missingness and noise in a PCA of six fixed-size (n=50) samples from color populations. The top plots show the effect of missingness alone or combined with noise: (A) 50% missingness, (B) 90% missingness, and (C) 90% missingness and low-level random noise in all the markers. The bottom plots test the effect of noise when added to the original markers in the above plots using: (D) 30 random markers, (E) 300 random markers, and (F) 3000 random markers. Colors include Red [1,0,0], Green [0,1,0], Blue [0,0,1], Cyan [0,1,1], Yellow [1,1,0], and Black [0,0,0].

To evaluate the extent to which marker types represent the population structure, we studied the relationships between UK British and other Europeans (Italians and Iberians) using different types of 30,000 SNPs, a number of similar magnitude to the number of SNPs analyzed by some groups64,65. According to the full SNP set, the British do not overlap with Europeans (Fig.22A). However, coding SNPs show considerable overlap (Fig.22B) compared with intronic SNPs (Fig.22C). Protein coding SNPs, RNA molecules, and upstream or downstream SNPs (Fig.22DF, respectively) also show small overlap. The identification of outliers, already a subjective measure, may also differ based on the proportions of each marker type. These results not only illustrate how the choice of markers and populations profoundly affect PCA results but also the difficulties in recovering the population structure in exome datasets. Overall, different marker types represent the population structure differently.

PCA of Tuscany Italians (n=115), British (n=105), and Iberians (n=150) across all markers (p~129,000) (A) and different marker types (p~30,000): (B) coding SNPs, (C) intronic SNPs, (D) protein-coding SNPs, (E) RNA molecules, and (F) upstream and downstream SNPs. Convex hull was used to generate the European cluster.

PCA is used to infer the ancestry of individuals for various purposes, however a minimal sample size of one, may be even more subjected to biases than in population studies. We found that such biases can occur when individuals with Green (Fig.23A) and Yellow (Fig.23B) ancestries clustered near admixed Cyan individuals and Orange, rather than with Greens or by themselves, respectively. One Grey individual clustered with Cyan (Fig.23C) when it is the only available population, much like a Blue sample clustered with Green samples (Figs. 23D).

Inferring single individual ancestries using reference individuals. In (A) Using even-sized samples from reference populations (n=37): Red [1,0,0], Green [0,1,0], bright Cyan [0, 0.9, 0.8], dark Cyan [0, 0.9, 0.6], heterogeneous darker Cyan [0, 0.9, 0.4] with high standard deviation (0.25) with a light Green test individual [0, 0.5, 0]. In (B) Using the same reference populations as in (A) with uneven-sizes: Red (n=15), Green (n=15), bright Cyan (n=100), dark Cyan (n=15), heterogeneous darker Cyan (n=100), with a Yellow test indiviaul (1,1,0). In (C) A heterogeneous Cyan population [0, 1, 1] (n=300) with high standard deviation (0.25) and a Grey test individual (0.5, 0.5, 0.5). In (D) Red [1,0,0] (n=10), Green [0,1,0] (n=10), a heterogeneous population [1, 1, 0.5] (n=200) and a Blue test individual (0,0,1).

Arguably, one of the most famous cases of personalancestral inference occurred during the 2020 US presidential primaries when a candidate published the outcome of their genetic test undertaken by Carlos Bustamante that tested their Native American ancestry (https://elizabethwarren.com/wp-content/uploads/2018/10/Bustamante_Report_2018.pdf). Analyzing 764,958 SNPs, Bustamante sought to test the existence of Native American ancestry using populations from the 1000 Genomes Project and Amerindians. RFMix66 was used to identify Native American ancestry segments and PCA, elevated to be a machine learning technique, to verify that ancestry independently of RFMix. The longest of five genetic segments, judged to be of Native American origin, was analyzed using PCA and reported to be clearly distinct from segments of European ancestry and strongly associated with Native American ancestry as it clustered with Native Americans distinctly from Europeans and Africans (Fig.1 in their report) and between Native American samples (Fig.2 in their report). Bustamante concluded that While the vast majority of the individuals ancestry is European, the results strongly support the existence of an unadmixed Native American ancestor in the individuals pedigree, likely in the range of 610 generations ago.

We have already shown that AJs (Fig.9C) and Pakistanis (Fig.14D) can cluster with Native Americans. With the candidates DNA unavailable (and their specific European ancestry undisclosed), we tested whether the two PCA patterns observed by Bustamante can be reproduced for modern-day Eurasians without any reported Native American ancestry (Pakistani, Iranian, Even Russian, and Moscow Russian) (Figs.24AD, respectively).

Evaluation of Native American ancestry for four Eurasians. (A) Using even-sample size (n=37) for Africans, Mexican-Americans, British, Puerto Ricans, Colombians, and a Pakistani. (B) Using uneven-sample sizes, for Africans (n=100), Mexican-Americans (n=20), British (n=50), Puerto Ricans (n=89), Colombians (n=89), and an Iranian. (C) Analyzing awhole-Amerindian cohort of Colombian (n=93), Mexican-Americans (n=117), Peruvian (n=75), Puerto Ricans (n=102), and an Even Russian. (D) Using uneven-sample sizes, for Africans (n=100), Mexican-Americans (n=53), British (n=20), Puerto Ricans (n=30), Colombians (n=89), and a Moscow Russian. All the samples were randomly selected.

These analyses show that the experimenter can easily generate desired patterns to support personalancestral claims, making PCA an unreliable and misleading tool to infer personalancestry. We further question the accuracy of Bustamantes report, provided the biased reference population panel used by RFMixto infer the DNA segments with the alleged Amerindian origin, which excluded East European and North Eurasian populations. We draw no conclusions about the candidates ancestry.

Continue reading here:
Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated | Scientific Reports -...

Posted in Genetics | Comments Off on Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated | Scientific Reports -…

Page 7«..6789..2030..»