Fara beint í efni

Viðburður

Improved AI-assisted detection of deletions and duplications from SNP array data

Lýsing

Copy number variants (CNVs) are an important source of genetic variation in the human genome implicated in evolution and disease susceptibility. The presence of larger types of copy number variants can be inferred from SNP array data by examining probe intensity (log-R-ratio, LRR) and allelic ratios (B-allele frequency, BAF). Existing CNV-calling methods such as PennCNV are proficient in detecting true CNVs but suffer a high false positive (FP) call rate as well as inaccurate estimation of CNV boundaries, which limits their use for genome-wide analyses in large datasets, as CNV calls need to be validated through sequencing and/or visual inspection of LRR and BAF patterns. We visually inspected 60,000 CNV calls from 22,500 samples genotyped on different SNP arrays and found the majority to be false positive or unclear. Using a subset of this dataset, we trained a convolutional neural network to automate the validation of CNVs through machine vision. Out-of-sample accuracy of the model exceeded 90%, approximating that of a human analyst. Orthogonal validation with genome sequencing data found our visual validation to be highly accurate, with only 1.7% of calls supported by the sequencing dataset deemed as false by the human analyst.
IMG_9015