Skip to main content

Research & Software

The Yang Lab has extensive experience with genomics, bioinformatics and statistical methods in the identification of mechanisms underlying disease. Learn more about our key findings and access our software programs.

Key Findings

 Genomic Alterations Detection

We have developed multiple algorithms to analyze high-throughput sequencing data to detect DNA alterations, including insertions and deletions (indels) and large-scale structural variations (SVs). We developed the ScanIndel algorithm (see Technologies & Methods below) to address the challenge of reliably detecting medium- and large-sized indels from whole-exome or whole-genome sequencing data. Our contributions for SV detection are highlighted by two algorithms developed by our group: SVfinder and ScanITD (also see below). Using these algorithms in cancer genomic data, we were able to detect diverse genomic rearrangement events involving androgen receptor (AR) in prostate cancer as well as a t(11;17) translocation event causing the SPI-ZNF287 gene fusion in multiple myeloma.


  • Wang TY, Yang R. ScanITD: Detecting internal tandem duplication with robust variant allele frequency estimation. GigaScience, Volume 9, Issue 8, August 2020, giaa089.
  • Yang R, Nelson AC, Henzler C, Thyagarajan B, Silverstein KA. ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly. Genome Medicine. 2015 Dec;7(1):1-2.

 Transcriptomic Mis-Splicing Events Discovery

We were the first to discover a type of non-canonical splicing, named “exitron,” that results in internally deleted protein sequences from annotated coding exons in prostate cancer. Through integrated pan-cancer analysis, we observed exitron splicing contributes oncogenic phenotype and represents a source for a new set of neoantigens that are potentially targetable with immunotherapy. Moreover, we identified various types of RNA mis-splicing and alterations, such as cryptic exon, alternative polyadenylation and non-linear splicing in androgen receptor (AR), that contributed constitutive activity of the broad AR transcriptional program. These activities resulted from AR splicing variants that are completely insensitive to all current prostate cancer targeted therapies, including the second-generation AR antagonist enzalutamide.


  • Yang R, Van Etten JL, Dehm SM. Indel detection from DNA and RNA sequencing data with transIndel. BMC Genomics volume 19, Article number: 270 (2018).
  • Wang TY, Liu Q, Ren Y, Alam SK, Wang L, Zhu Z, Hoeppner LH, Dehm SM, Cao Q, Yang R. A pan-cancer transcriptome analysis of exitron splicing identifies novel cancer driver genes and neoepitopes. Molecular Cell. Volume 81, Issue 10, 20 May 2021.
  • Van Etten JL, Nyquist M, Li Y, Yang R, Ho Y, Johnson R, Ondigi O, Voytas DF, Henzler C, Dehm SM. Targeting a Single Alternative Polyadenylation Site Coordinately Blocks Expression of Androgen Receptor mRNA Splice Variants in Prostate Cancer. Cancer Research. 2017 Oct 1;77(19):5228-5235.
  • Li YM, Yang R, et al. Diverse AR Gene Rearrangements Mediate Resistance to Androgen Receptor Inhibitors in Metastatic Prostate Cancer. Clinical Cancer Research. 2020 Apr 15;26(8):1965-1976.

 Multi-Omics Integrative Analysis

A major goal of our research is to develop integrative computational tools that leverage multi-omics biological and clinical datasets to infer disease-associated genes, regulatory interactions and patient-specific therapeutic targets. Our contributions in this area include:

  1. Developing the EgoNet algorithm (see Technologies & Methods below) by integrating gene expression data and protein-protein interaction networks to identify gene markers that are associated with clinical phenotypes.
  2. Constructing transcriptional regulatory networks of major transcriptional factors AR, BRD4 and STAT5 with ChIP-seq and gene expression data that drive prostate cancer and leukemia.
  3. Integrating transcriptomic and proteomic data to predict patient-specific tumor neoantigens for immunotherapy.


  • Wang TY, Wang L, Alam SK, Hoeppner LH, Yang R. ScanNeo: identifying indel derived neoantigens using RNA-Seq data. Bioinformatics. 2019 Mar 18.
  • Yang R, Bai Y, Qin Z, Yu T. EgoNet: identification of human disease ego-network modules. BMC Genomics. 2014 Apr 28;15:314.

Programs & Methods

Select an option below to find detailed technical guides and lab documents.


LncGSEA is a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis (GSEA) of gene expression profiles from large-scale cancer patient samples. 


A computational workflow for exitron splicing identification in long-read RNA-seq data.


An Efficient and Ergonomic Python Binding Library for BLAT.


Scanneo2 is a snakemake workflow for the prediction of neoantigens from multiple sources. In its current state, this includes canonical-splicing, exitron-splicing, gene fusion, indels and snvs.


ScanExitron is a computational workflow for exitron splicing identification from RNA-seq.


ScanITD performs a stepwise seed-and-realignment procedure for internal tandem duplication (ITD) detection with accurate variant allele fraction prediction.


ScanNeo is a pipeline for identifying insertion and deletion (indel) introduced neoantigens from RNA sequencing data.


transIndel is used to detect indels (insertions and deletions) from DNA-seq or RNA-seq data by parsing chimiric alignments from BWA-MEM.


ScanIndel is a Python program to detect indels (insertions and deletions) from next-generation sequencing data by re-align and de novo assemble soft clipped reads.


EgoNet, implemented by Python, is designed to detect disease-related subnetworks from a large biological network (PPI, metabolic network) combined with gene expression data.


ARSER is a Python package for identifying periodic expression profiles in analyzing circadian microarray data and has been released under the GPL.

Follow Yang Lab on