Kishwar Shafin

Research Scientist · Google Research

I work at the intersection of deep learning and genomics, developing methods that push the boundaries of variant calling, genome assembly, and clinical genomics. My research focuses on making genomic analysis more accurate, accessible, and impactful for precision medicine.

Kishwar Shafin

Research Highlights

Invited Talks

Selected invited talks and keynotes at leading conferences and institutions.

BICOB 2026 conference logo

Keynote: 18th International Conference on Bioinformatics and Computational Biology

Kishwar Shafin

BICOB 2026 · Honolulu, Hawaii · March 2026

Keynote speaker at BICOB-2026, the premier international conference on bioinformatics and computational biology. Presenting recent advances in deep learning for genomic analysis.

MIT CSAIL logo

Creating the next generation of genome analysis tools with deep learning

Kishwar Shafin

MIT CSAIL Bioinformatics Seminar · Cambridge, MA · October 2025

Presenting DeepVariant, DeepSomatic, DeepPolisher, and pangenome-aware variant calling at the Theory of Computation group's Bioinformatics Seminar series.

ICLR conference logo

Deep learning innovations for accurate genomic analysis

Kishwar Shafin

ICLR 2025 · AI for Nucleic Acids Workshop · Singapore · April 2025

Invited talk at the International Conference on Learning Representations (ICLR), one of the top machine learning conferences, discussing deep learning methods for genomics including DeepVariant, DeepSomatic, and DeepConsensus.

GIS Seminar poster for Kishwar Shafin's talk on deep learning in genomics

Deep learning applications in genomics variant calling and consensus calling

Kishwar Shafin

Genome Institute of Singapore (A*STAR GIS) · April 2025

Seminar covering DeepVariant, DeepSomatic, DeepConsensus, DeepPolisher, and the opportunities presented by pangenome references to improve variant calling.

USC Dornsife logo

Quantitative and Computational Biology Seminar

Kishwar Shafin

USC Dornsife · Los Angeles, CA · February 2025

Departmental seminar at the USC Dornsife Quantitative and Computational Biology program, presenting recent advances in deep learning for genomic analysis.

Selected Publications

Selected publications in genomics, deep learning, and computational biology.

Nature Biotechnology logo

Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic

J. Park*, D.E. Cook*, ... K. Shafin

Nature Biotechnology, 2025

DeepSomatic is a deep-learning method for detecting somatic small nucleotide variations and insertions and deletions from both short-read and long-read data. Includes modes for whole-genome and whole-exome sequencing, tumor–normal, tumor-only, and FFPE samples.

Genome Research logo

Highly accurate assembly polishing with DeepPolisher

M. Mastoras, M. Asri, L. Brambrink, ... K. Shafin, HPRC

Genome Research, 2025

An encoder-only transformer model for assembly polishing that predicts corrections to draft genome sequences using PacBio HiFi read alignments. Applied to 180 HPRC assemblies, achieving an average QV improvement of 3.4 (54% error reduction).

bioRxiv logo

Pangenome-aware DeepVariant

M. Kolmogorov, ... K. Shafin, et al.

bioRxiv, 2025

A variant caller that integrates a pangenome reference alongside sample-specific read alignments within DeepVariant. Generates pileup images of both reads and pangenome haplotypes, reducing errors by up to 25.5% versus linear-reference-based DeepVariant.

Nature Communications logo

Local read haplotagging enables accurate long-read small variant calling

A. Kolesnikov, D. Cook, M. Nattestad, ... K. Shafin

Nature Communications, 2024

An approximate haplotagging method built directly within DeepVariant that locally haplotags long reads without requiring external phasing tools. Enables state-of-the-art variant calling across PacBio Revio, ONT R10.4 simplex, and duplex data.

Nature Methods logo

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

K. Shafin*, T. Pesout*, R. Huang*, ... et al.

Nature Methods, 2021

A haplotype-aware variant calling pipeline combining PEPPER, Margin, and DeepVariant that achieves high accuracy on nanopore long-read data across challenging genomic regions, advancing variant detection for both germline and clinical applications.