Quantifying the relative importance of genetics and environment on the comorbidity between mental and … – Nature.com

Posted: June 14, 2024 at 2:41 am

Integrative psychiatric research consortium 2012 (iPSYCH2012) cohort

The iPSYCH2012 cohort is a well-documented extensively cited cohort24. In short, individuals born between 1981 and 2005 (n=1,472,762) were considered for ascertainment, representing the entire population of Denmark born in that timeframe. Of these 30,000 were randomly sampled, regardless of (psychiatric) disorder status, to create an unbiased population representative control group. Using information ascertained from the Danish Civil31,32, National Patient33 and/or Psychiatric Central Research Registers34 57,764 design cases were selected with indications of clinical diagnoses of mental health disorders. In total, 87,764 individuals were selected to form the cohort. Indications are based on International Classification of Disease (ICD) codes representing the clinical diagnosis associated with an instance of care provided at one of many psychiatric facilities throughout Denmark. The following six case groups as having at least one indication with the corresponding ICD1035 (or equivalent ICD836) codes were defined: attention deficit hyperactivity disorder (ADHD: F90.0), anorexia nervosa (AN: F50.0, F50.1), autism spectrum disorder (ASD: F84.0, F84.1, F84.5, F84.8, F84.9), affective disorder (AFF: F30F39), bipolar disorder (BD: F30-31), and schizophrenia (SCZ: F20). Of all selected individuals a dried neonatal heel prick blood spot was obtained from the Danish Neonatal Screening Biobank37. Individuals were removed when no blood spots could be obtained. The use of this data is according to the guidelines provided by the Danish Scientific Ethics Committee, the Danish Health Data Authority, the Danish Data Protection Agency, and the Danish Neonatal Screening Biobank Steering Committee. For each dried bloodspot the DNA was extracted and amplified followed by genotyping using the Infinium PsychChip v1.0 array24. Of 9714 bloodspots not DNA could successfully be genotyped and therefore the individuals were excluded from the study. A subset of good quality SNPs were phased into haplotypes using SHAPEIT338 and imputed using Impute239 with reference haplotypes from the 1000 genomes project phase 340. Genotypes were checked for imputation quality (INFO>0.2), Hardy-Weinberg equilibrium (HWE; p<110-6), association with genotyping wave (p<5108), association with imputation batch (p<5108), differing imputation quality between subjects with and without psychiatric diagnoses (p<1106), and minor allele frequency (MAF>0.01). Finally, we extracted unrelated individuals of European ancestry leaving 77,082 individuals.

The Danish Civil Registration System was established in 1986 and contains detailed information pertaining to sex, date of birth, parental links, and continuously updated information on vital status (e.g., migration or death) for all individuals alive and living in Denmark for the past seventy years31,32. The Danish National Patient Register includes the full medical records of all individuals treated at Danish hospitals (inpatient department) since January 1, 1977, as well as in outpatient clinics since 1 January 1994 (or occasionally since 1995)33. The register was updated in 2002 to also include individuals treated in hospitals outside of Denmark and treatments not covered under the Danish health insurance agreement at private healthcare facilities. Finally, the Danish Psychiatric Central Research Register contains data on admissions to psychiatric inpatient facilities up to and including 1994. Following 1994, the register was extended to include outpatient contacts in psychiatric departments34. As of April 2017, the civil register contained 9,851,330 individuals, the national patient registers 8,065,597 individuals, and the psychiatric register 1,005,068 individuals. All individuals were born between January 1, 1858, and April 21, 2017. All registers contained a unique personal identification number given to all individuals living in Denmark, therefore allowing for accurate linking across the different registers. By Danish law, informed consent is not required for register-based studies and no compensation was provided. This work is based on Danish register data that are not publicly available due to privacy protection, including the General Data Protection Regulation (GDRP). Only Danish research environments are granted authorisation. Foreign researchers can, however, get access to data under Danish research environment authorisation. Further information on data access can be found at https://www.dst.dk/en/TilSalg/Forskningsservice or by contacting the senior corresponding authors.

The Swedish Total Population Register (TPR), started in 1968 and continuously updated, holds information on all individuals who are residents of Sweden. It contains information on birth, death, name change, marital status, family relationships and migration within Sweden as well as to and from other countries41. Multi-Generation Register (MGR)42 is part of TPR and contains information on all residents in Sweden who were born in 1932 or later and alive in 1961 (index persons), together with their parents. Familial linkage (i.e., parental information) is available for more than 95% of individuals who died before 1968, about 60% of those died between 1968 and 1990, and more than 90% of those alive in 1991. The Swedish Inpatient Register was launched in 1964 (psychiatric diagnoses from 1973) but complete coverage was reached in 1987. It includes discharge diagnoses, dates of hospital admission and discharge, and has a coverage of at least 71% of all residents for somatic care discharge in 1982 and 86% of all psychiatric care in 1973. Since 2001, this register also covers outpatient43. The individually unique National Registration Number was used to link data from all the registers. All Swedish-born residents were followed for any cardiometabolic and mental disorders from birth until emigration or death from 1973 to 2016. By Swedish law, informed consent is not required for register-based studies and no compensation was provided. The use of Swedish data was approved by the regional ethics review board in Stockholm, Sweden with DNR 2012/1814-31/4. Data from Swedish registers are not available for sharing due to policies and regulations in Sweden. Swedish register data are available to all researchers through applications at Statistics Sweden (SCB, https://www.scb.se/en/) and The National Board of Health and Welfare (Socialstyrelsen, https://www.socialstyrelsen.se/)

By Danish and Swedish law, consent to use register data for register-based studies is not required.

We defined the six MDs, namely attention-deficit/hyperactivity disorder (ADHD), anorexia nervosa (AN), autism spectrum disorders (ASD), affective disorders (AFF), bipolar disorder (BD), and schizophrenia (SCZ), and cardiometabolic disorders, using information from the Danish and Swedish Patient Register. Mental disorders were previously defined and used for GWAS analysis of iPSYCH 2012 data by Schork et al. 2019. These disorders represent the most well-documented, well-known and most common mental disorders occurring in the population. The cardiometabolic disorders were selected based on a.) common in the population i.e., high prevalence or b.) less common prevalence i.e., low prevalence and c.) selected disorder had available GWAS summary statistics in any publicly available repository. Note, that AFF includes two main diagnosis, BD and major depressive disorder. Individuals with at least one hospital visit concerning these disorders (primary or secondary diagnosis) were considered cases with MD or CMD. Individuals diagnosed with SCZ, BD, or AFF before age 10 were removed from the analysis, as the validity of such a diagnosis is considered clinically unreliable. ICD 8 codes were used until 1993 and ICD 10 codes were used since 1994 in Denmark; ICD 8 codes were used until 1986, ICD 9 codes were used during 19871996, and ICD 10 codes were used since 1997 in Sweden (Supplementary data9). To minimise the effect of left-handed censuring we removed individuals born outside of Denmark and Sweden as these individuals may have been diagnosed in another country. By doing so we excluded both Danish/Swedish citizens as well as individuals migrating to Denmark and Sweden. No information is recorded regarding terms such as race, ancestry, or ethnicity. However, both Denmark and Sweden are predominantly of white-European ancestry with relatively recent large migration patterns from non-European countries therefore we assume that we extracted mostly individuals of white-European ancestry and indirectly removed individuals of non-European ancestry when filtering on country of birth.

A total of 15 cardiometabolic GWAS summary statistics including stroke (subtypes)44, CAD45, aneurysms46 and HF47 were obtained through multiple public repositories. GWAS summary statistics containing participants of the VA Million Veterans Programme (e.g., T2D48, venous thromboembolism49, and peripheral artery disease50) were provided after approval was granted by the National Institute of Health (project #26508). GWAS summary statistics for ADHD51, AN52, ASD53, BD54, and MDD55 excluding iPSYCH participants (except SCZ56 which does not contain iPSYCH samples) were kindly provided through their respective PGC consortium. iPSYCH only GWAS summary statistics for MDs25 were downloaded from internal iPSYCH servers and are available on request. The full list of all cardiometabolic- and mental disorder GWAS summary statistics used is shown in Supplementary Data6.

All GWAS summary statistics were uniformly cleaned using internal software57. First, for each GWAS summary statistic, we inferred the genome build by mapping SNPs to dbSNP build 151 using GRCh38, GRCHh35, GRCh36, and GRCh37 genomic coordinates. The version with the highest number of mapped SNPs was inferred as the build of the original GWAS. Next, a second mapping step uses the inferred build to simultaneously map and liftover the position and chromosome coordinate to the GRCh37 version of dbSNP, which adds information about reference and alternative alleles. RSids were used when chromosome and base pair information were not available. The reference allele of dbSNP corresponds to the reference allele of the reference genome. The allele directions were flipped making the effect allele the reference allele. Effect scores (e.g., beta coefficients, odds ratios, and z-scores) were adjusted accordingly. Finally, multi-allelic, allele mismatched, and strand ambiguous SNPs alongside SNPs with duplicated positions, missing test statistics, and indels were removed57.

SNP based heritability (({{{{{{rm{h}}}}}}}_{{{{{{rm{SNP}}}}}}}^{2})) and genetic correlations (rg SNP) between all cleaned MD and CMD GWAS summary statistics were estimated using linkage-disequilibrium score regression (LDSC)58,59 version 1.0.1 using authors protocols.

We estimated the cumulative incidence of all MDs and CMDs, which can be interpreted as the number of cases happening before a specific age. The cumulative incidences were estimated for the general population, individuals with one or more full siblings diagnosed with the same disorder, and individuals with one or more parents diagnosed with the cross-disorder (e.g., the cumulative incidence of ADHD for individuals with at least one parent diagnosed with type-2 diabetes). We expected the distribution of individuals into these three categories to be associated with birth year. Thus, to control for substantial changes over time in the underlying incidence, diagnoses (e.g., shifting of ICD systems), data availability, and registration (e.g., use of inpatient diagnoses up to 1995/2000 and in- and out-patient diagnoses subsequently), all cumulative incidences were estimated stratifying on the year of birth using the Nelson-Aalen estimator, which can utilise censored, competing risks, and incomplete data19. Next, we estimated the additive heritability (h2) and genetic correlation (rg) under the liability threshold model based on the cumulative incidence as a function of pedigree relatedness following procedures described by Wray and Gottesman21,60,61. In short, the liability threshold model assumes that disease liability underlying the disease status is normally distributed, Z~N(0,1), and individuals with the disorder must therefore have surpassed a liability threshold62,63. Given the normal distribution theory, the liability threshold of a given disorder can be estimated from the population that are affected in their lifetime (lifetime risk). All analyses were done in R v4.2.1 using the cmprsk v2.2 package.

Using the full available register data (no restriction of birth year), the heritability of liability of disorders was calculated by deriving the general population- (e.g., risk of ADHD in the population) and full-sibling familial risk (e.g., risk of ADHD when having a full-sibling with ADHD) cumulative incidences for individuals born in the same calendar year (e.g., 1965, 1966, till 2016). Here, we use the cumulative incidence (general population and full-sibling risk) at the last observed time point as estimates of the proportion of the population born in the same calendar year that is affected in their lifetime resulting in estimates of heritability (Eqs.1 and 2) for individuals born in the same calendar year (({h}_{{year; of; birth}}^{2})).

$${{{{{rm{Heritability}}}}}},({{{{{{rm{h}}}}}}}^{2})=frac{T-{T}_{R}sqrt{left(1-left(1-T/iright)left({T}^{2}-{T}_{R}^{2}right)right)}}{{a}_{R}left(i+left(i-Tright){T}_{R}^{2}right)}$$

(1)

$${{{{{rm{s}}}}}}.{{{{{rm{e; }}}}}}left({{{{{{rm{h}}}}}}}^{2}right)=frac{1}{{a}_{R}}sqrt{left[frac{{K}^{2}}{{y}^{2}}{left(frac{1}{i}-{a}_{R}{h}^{2}left(i-Tright)right)}^{2}+frac{{K}_{R}^{2}}{{i}^{2}{y}_{R}^{2}}right]}$$

(2)

Where T=Liability threshold of the disease in the general population, TR=liability threshold of the disease based on affected family members, i=mean liability of disease in the population calculated as i=y/K; where K is the lifetime probability of disease in the population and y the height of the normal curve at threshold T, aR=additive genetic relationship between relatives, KR=the lifetime probability of disease in individuals with affected family members. Note that all estimates are derived for individuals born in the same calendar year.

In contrast to the h2 estimation, for the genetic correlation, we restricted the birth window to individuals born between 1981 and 2005, using medical records up to 2012. The rg between disorders was calculated by deriving: the general population risk for both disorders (e.g., ADHD and T2D) and parent-offspring cross disorder familial risk (e.g., risk of ADHD when having a parent with T2D) cumulative incidences for individuals born in the same calendar year (e.g., 1981,1982 till 2005). In line with the h2 estimation, we used the cumulative incidence at the last observed time point for each birth year for all three cumulative incidence functions (general population risk and cross-disorder familial risk). Using the h2 of both disorders previously obtained we derived estimates of genetic correlations (Eqs.3 and 4) per year of birth (({r}_{g,{year; of; birth}})).

$${{{{{rm{Genetic}}}}; {{{rm{correlation}}}}; }}({{{{{{rm{r}}}}}}}_{{{{{{rm{g}}}}}}})=frac{left(frac{{T}_{c}-{T}_{{R}_{c}}sqrt{1 , - , left(1 , - , {T}_{f}/{i}_{f}right)left({T}_{f}^{2} , - , {T}_{{R}_{c}}^{2}right)}}{{a}_{R}left({i}_{f}+left({i}_{f}-{T}_{f}right){T}_{{R}_{c}}^{2}right)}right)}{sqrt{{h}_{c}^{2}{h}_{f}^{2}}}$$

(3)

$${{{{{rm{s}}}}}}.{{{{{rm{e; }}}}}}left({{{{{{rm{r}}}}}}}_{{{{{{rm{g}}}}}}}right)=frac{frac{1}{{a}_{R}}sqrt{left[frac{{K}_{f}^{2}}{{y}_{f}^{2}}{left(frac{i}{{i}_{f}}-{a}_{R}{r}_{{cf}}{h}_{c}{h}_{f}left({i}_{f}-{T}_{f}right)right)}^{2}+frac{1}{{i}_{f}^{2}}left(frac{{K}_{{R}_{c}}^{2}}{{y}_{{R}_{c}}^{2}}+frac{{K}_{c}^{2}}{{y}_{c}^{2}}right)right]}}{sqrt{{h}_{c}^{2}{h}_{f}^{2}}}$$

(4)

Where Tc and Tf=liability threshold of disease c and f in the general population, ({T}_{{R}_{c}})=liability threshold of disease c in individuals with relatives with disease f, if=mean liability of disease f in the population, aR=additive genetic relationship between relatives, ({h}_{c}^{2}{{; and; h}}_{f}^{2})=heritability of diseases c and f, Kf is the lifetime probability of disease f in the general population. Note that all estimates arederived for individuals born in the same calendar year.

We obtain overall h2 and rg estimates by weighing the individual ({h}_{{year; of; birth}}^{2}) and ({r}_{g,{year; of; birth}}) by the inverse of their sampling variance (Eqs.5 and 6) using a random-effects model.

$${{{{{{rm{IVW}}}}}}}_{{{{{{rm{random}}}}}}}=frac{{sum }_{k=1}^{K}{hat{theta }}_{k} , {w}_{k}^{*}}{{sum }_{k=1}^{K}{w}_{k}^{*}};{{{{{{rm{w}}}}}}}_{{{{{{rm{k}}}}}}}^{*}=frac{1}{{s}_{k}^{2}+{r}^{2}}$$

(5)

$$s.e.; left({{{{{{rm{IVW}}}}}}}_{{{{{{rm{random}}}}}}}right) ,=, sqrt{frac{1}{{sum }_{k=1 , }^{K}{w}_{k}^{*}}}$$

(6)

Where K = numbers of estimates, ({s}_{k}^{2}) = variance of estimate k, r2 = the variance of the distribution of true effect sizes, ({hat{theta }}_{k}) = point estimate k, and ({w}_{k}^{*}) = random-effects weight.

Under a bivariate liability threshold model, the phenotypic correlation (rP) between two traits can be broken down to its (additive) genetic- and non-genetic factors. This allows us to quantify and understand the contribution of the estimated genetic correlation and heritability to the level of comorbidity between MDs and CMDs, i.e., hazard ratios) reported by Momen et al. 1 which uses the same Danish register data.

$${{{{{rm{Relative}}}}; {{{rm{risk}}}}}},left({{{{{rm{RR}}}}}}right)=frac{{1-{{{{{rm{e}}}}}}}^{left({{{{{rm{HR}}}}}}times log left(1-{{{{{rm{r}}}}}}right)right)}}{{{{{{rm{r}}}}}}}$$

(7)

Where HR =hazard ratio reported by Moment et al. 2020, and r = rate of the disorder in the reference group derived by weighting the individual estimates (1981-2005 using medical records up to 2012) by the inverse of their sampling variances.

$${{{{{rm{Odds}}}}; {{{rm{ratio}}}}}},left({{{{{rm{OR}}}}}}right)=frac{left(1-{{{{{rm{p}}}}}}right)times {{{{{rm{RR}}}}}}}{1-left({{{{{rm{RR}}}}}}times {{{{{rm{p}}}}}}right)}$$

(8)

Where p = incidence of the disorder in the nonexposed group (here p=r) and RR = the calculated relative risk.

$${{{{{rm{Phenotypic}}}}; {{{rm{correlation}}}}}},({r}_{p})=frac{{{{{{{rm{OR}}}}}}}^{frac{{{{{{rm{pi }}}}}}}{4}}-1}{{{{{{{rm{OR}}}}}}}^{frac{{{{{{rm{pi }}}}}}}{4}}+1}$$

(9)

Where OR = odds ratio estimated as a function of relative risk.

The phenotypic correlation can be expressed as the function of the genetic and non-genetic component

$${{{{{rm{Phenotypic}}}}; {{{rm{correlation}}}}}},left({r}_{p}right)={r}_{g}sqrt{{h}_{c}^{2}{h}_{f}^{2}}+{r}_{e}sqrt{left(1-{h}_{c}^{2}right)left(1-{h}_{f}^{2}right){{{{{rm{;}}}}}}}$$

$${{{{{rm{Genetic}}}}; {{{rm{component}}}}}},({{{{{rm{G}}}}}})={{{{{{rm{r}}}}}}}_{{{{{{rm{g}}}}}}}times sqrt{{{{{{{rm{h}}}}}}}_{{{{{{rm{c}}}}}}}^{2} {times {{{{{rm{h}}}}}}_{{{{{rm{f}}}}}}^{2}}}$$

(10)

$${{{{{rm{Non}}}}}}-{{{{{rm{genetic}}}}; {{{rm{component}}}}}},({{{{{rm{E}}}}}})={r}_{p}-G$$

(11)

Where ({r}_{p}) = tetrachoric correlation derived from the HR, rg = the genetic correlation estimates and re the environmental correlation estimates between disorder c and f, h2 the heritability estimates for c and f. 95% CIs for G and E were derived using both the upper and lower 95% CIs of rg and rp. Note to estimate GSNP and ESNP replace rg and h2 estimates by SNP based estimates rg SNP and h2SNP.

Further information on research design is available in theNature Portfolio Reporting Summary linked to this article.

Go here to see the original:
Quantifying the relative importance of genetics and environment on the comorbidity between mental and ... - Nature.com

Related Posts