Missing loci & convertibility to hg38
For four BC PRS loci, no variants were listed at the specified genomic position in gnomAD v2.1.1, namely rs572022984, rs113778879, rs73754909, and rs79461387. gnomAD v3.1.2 also reported no variants for three of these four loci for corresponding loci in hg38 as defined by dbSNP [23] (Supplementary Table2). Locus rs572022984 was listed but with an overall allele count of zero in NFE samples (Table2).
For two loci, conversion to hg38 resulted in a change in alleles, namely for rs143384623 (hg19: 1-145604302-C-CT; hg38: 1-145830798-C-CA) and rs550057 (hg19: 9-136146597-C-T; hg38: 9-133271182-T-C). For rs143384623, the change of the alternative allele from CT to CA did not result in a noticeable shift in AFs observed in gnomAD NFE samples (5142/13304 (0.39) in v2.1.1 versus 24316/64610 (0.38) in v3.1.2, two-sided Fishers exact test p=0.14). For rs550057, the observed AFs appeared exactly opposite, i.e., 3786/14828 (0.26) for allele T in gnomAD v2.1.1 and 49878/67552 (0.74) for allele C in gnomAD v3.1.2. Therefore, 149878/67552 was assumed as the gnomAD v3.1.2 effect AF at this bi-allelic site.
For 39 of the 320 PRS loci listed with AF>0 in gnomAD v3.1.2, at least one observation of technical artifacts was reported: 38 loci were flagged as being located in low-complexity regions, 3 as being localized at a low-quality site, and 1 failed the allele-specific VQSR filter (Supplementary Table2).
Due to the absolute difference threshold 0.016 (Supplementary Fig.1), 24 loci were determined as showing deviating AFs compared to CanRisk (Fig.1, Table2). Absolute differences ranged from 0.03 to 0.71, and for 21 out of these 24 loci (87.5%), technical artifacts were reported in gnomAD v3.1.2.
Extremely deviating AFs with an absolute difference>0.016 are indicated by red markers.
All 49 PRS loci for which a noticeably deviating AF was observed in at least one of the data sets provided by the five participating GC-HBOC centers are listed in Table3.
For the IMGAG DRAGEN data, 0.052 was calculated as threshold to determine noticeably deviating AFs (Supplementary Fig.2), resulting in 18 loci affected (Table3, Fig.2). Of these, 16 were previously also identified as missing or showing noticeably deviating AFs in gnomAD v3.1.2. The exceptions were rs62485509 and rs9931038. For IMGAG freebayes data, 0.036 was calculated as threshold (Supplementary Fig.2), resulting in 16 loci from the BCAC 313 BC PRS determined as showing a noticeably deviating AF. Of these, 11 loci were also identified as showing deviating AF in IMGAG DRAGEN data, and all but rs12406858 and rs11268668 were previously identified as missing or showing deviating AFs in gnomAD v3.1.2.
Data were provided by the Institute of Medical Genetics and Applied Genomics (IMGAG) at University Hospital Tbingen, Institute for Clinical Genetics (ICG) at University Hospital Carl Gustav Carus Dresden, by the Department of Medical Genetics (DMG) at University Hospital Mnster, by the Center for Familial Breast and Ovarian Cancer (CFBOC) at University Hospital Cologne, and by the Institute of Human Genetics (IHG) at the University of Regensburg.
Considering genotyping data provided by the ICG based on 585 samples, 23 of the overall 324 PRS loci did not meet the minimum quality criteria (read depth20) in more than 25% of samples and were discarded (Supplementary Table3). Additionally, GATK reported read depth <20 for >25% of samples for rs56097627 and rs143384623. For 260 of the remaining 299 PRS loci (86.96%), forced genotyping with GATK and freebayes resulted in the observation of identical AFs. For both ICG GATK and freebayes data, 0.063 was calculated as threshold to determine noticeably deviating AFs (Supplementary Fig.3). Using this threshold, 11 loci showed noticeably deviating AFs in the GATK data set (including two loci exclusive for BCAC 313 BC PRS) and 14 loci in the freebayes data set (including three loci exclusive for BCAC 313 BC PRS), respectively, with an overlap of 7 (Table3, Fig.2).
The DMG provided GATK- and DRAGEN-based BRIDGES 306 BC PRS genotyping data of 545 samples. Locus rs138179519 did not meet the quality criteria, and additionally rs774021038 using DRAGEN. Of the remaining 304 loci, 252 (82.89%) showed identical AFs (Supplementary Table3). Using a threshold of 0.052 (Supplementary Fig.4), resulted in 20 loci showing deviating AFs in GATK data and14 loci in DRAGEN data, respectively,with an overlap of 9 loci.
For the CFBOC data based on 412 samples, a threshold of 0.047 was calculated (Supplementary Fig.5). The loci of the BRIDGES 306 BC PRS were considered, 243 (79.41%) of which showed identical AFs for both callers applied (Supplementary Table3). Overall 25 loci (all of which are included also in the BCAC 313 BC PRS) showed deviating AFs: 16 loci in GATK and 19 loci in freebayes data, with an overlap of 10 loci.
The IHG provided GATK- and CLC-based BRIDGES 306 BC PRS genotyping data of 251 samples (Supplementary Methods). Four loci did not meet the quality criteria in both settings, and additional four in the CLC setting. Of the remaining 298 loci, 228 (76.51%) showed identical AFs (Supplementary Table3). Using a threshold of 0.063 (Supplementary Fig.6), resulted in 23 loci showing noticeably deviating AFs in GATK data, respectively 19 loci in CLC data, with an overlap of 10 loci.
In summary, for four loci, deviating AFs were reported in all GC-HBOC real-world settings examined, namely for rs56097627, rs113778879, rs57589542, and rs3988353. Further four loci, namely rs574103382, rs73754909, rs3057314, and rs57920543, were reported with deviating AFs in all settings except for one (Table3).
However, there were also 16 loci that were conspicuous in a single setting exclusively, namely five in IHG GATK data (rs1511243, rs4880038, rs1027113, rs12709163, rs1111207), three each in ICG freebayes data (rs34207738, rs147399132, rs199504893) and in IHG CLC data (rs10975870, rs11049431, rs144767203), two in DMG GATK data (rs10644978, rs66987842), and one each in IMGAG DRAGEN (rs9931038), IMGAG freebayes data (rs12406858), and CFBOC freebayes data (rs140702307). Another three loci (rs10074269, rs55941023, rs35054928) showed AF deviations in only one center, but these were concordant.
Considering the loci non-existent in gnomAD v3.1.2, rs113778879 was not observed with expected AF in any GC-HBOC center, and rs73754909 only with forced DRAGEN calling in DMG data. For rs79461387, expected AFs were reported consistently when using freebayes, but not by unforced DRAGEN calling and in two settings using forced GATK. Of note, rs572022984 with zero allele count in gnomAD v3.1.2 NFEs and an expected AF of 0.0364 in CanRisk, was consistently not observed at all or with a maximum AF of 0.0037 (Supplementary Table3).
Five loci showing aberrant AFs in gnomAD v3.1.2 NFEs (Table2) were not reported with deviating AF by any of the participating GC-HBOC centers, namely rs78425380, rs62331150, rs60954078, rs10862899, and rs112855987.
Without further information and assuming a standardized PRS at the 50th percentile, the estimated 10-year risks of developing primary BC of cancer-unaffected women of 20, 40, and 60 years of age were 0.1%, 1.5%, and 3.4% according to CanRisk (Supplementary Table4). Percentiles of PRSs from artificial VCF files with aberrant dosages (see Materials and Methods) ranged from 47.5% (IHG CLC, BRIDGES 306) up to 55.7% (ICG freebayes, BCAC 313). The risk of 0.1% for a 20-year-old woman was concordantly unchanged in all scenarios including artificial PRSs. For a 40-year-old woman, estimated 10-year risks were increased by 0.1% in seven scenarios, and for a 60-year-old woman by up to 0.2% in eight scenarios.
Estimated remaining lifetime risks of developing primary BC assuming an average PRS (50th percentile) of cancer-unaffected women aged 20, 40, and 60 years are 11.3%, 10.9%, and 7.1% according to CanRisk (Supplementary Table4). When using PRSs from artificial VCF files with aberrant dosages, estimated lifetime risks ranged from 11.1% up to 11.9% for a 20-year-old woman, from 10.6% up to 11.4% for a 40-year-old woman, and from 7.0% up to 7.4% for a 60-year-old woman. The lowest estimates were obtained with the BRIDGES 306 BC PRS based on IHG CLC data with 19 artificial dosages imputed, and the highest with the BCAC 313 BC PRS based on ICG freebayes data with 14 artificial dosages imputed.
For 20 PRS loci showing noticeably deviating AFs in at least one real-world NGS data set, alternative alleles or overlapping variants with minimum AF 0.01 in NFEs were reported in gnomAD v3.1.2 (Supplementary Table5). For rs73754909 and rs79461387, both SNVs and non-existent in gnomAD v3.1.2, deletions were reported with comparable AFs to the ones expected by CanRisk. For both deletions, the adjacent downstream nucleotide of the reference sequence was identical to the substituted nucleotide of the expected effect allele (Fig.3). For rs113778879, which is also an SNV not contained in gnomAD v3.1.2, a similar observation could be made (Supplementary Fig.7), but the reported AF exceeds the expected one by more than 0.1 (0.5762 versus 0.6818).
Both alternative alleles are deletions with the adjacent downstream nucleotide identical to the expected substituted one.
For 28 out of the 49 loci showing noticeable deviating AFs in at least one real-world data set, proxies in 1000G GRCh37 microarray data, 1000G GRCh38 High Coverage WGS data, or TOPMED European data could be identified (Supplementary Table6). For rs113778879, rs73754909, and rs79461387, LDpair based on GRCh38 reported the same alternative alleles as gnomAD v3.1.2 (Supplementary Table5), where the original PRS loci are non-existent.
Proxies and alternative alleles showing AFs in gnomAD v3.1.2 comparable to expected CanRisk AFs, i.e., an absolute deviation <0.016, were considered as possible workarounds for improved PRS genotyping, and further evaluated with respect to observed AFs in IMGAG freebayes data (Table4). For 19 of these 21 PRS loci, absolute differences between expected and observed AFs in IMGAG freebayes data remained below the previously defined IMGAG freebayes-specific threshold of 0.036. The exceptions were the substitutions of rs12406858 and rs79461387. The latter is noteworthy because the original PRS locus, which is an SNV, was correctly called by freebayes in forced and unforced mode (Table3), whereas GATK HaplotypeCaller seemed to call an overlapping deletion of sequence GAG in DMG and CFBOC data. Also noteworthy are the potential replacements of rs73754909 and rs111833376, as both variants were called with noticeably deviating AFs in most real-world data sets.
Here is the original post:
Limitations in next-generation sequencing-based genotyping of breast cancer polygenic risk score loci | European ... - Nature.com