Home              Research              People              Publications              Software              Computers              Positions              Contact


Introduction of disease models
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a disabling and debilitating disease characterized by unrelenting fatigue, post-exertional malaise, cognitive impairment, sleep problems, and pain. Most ME/CFS patients are unable to work, study or participate in social activities. One-fourth are house- or bed-bound and fully dependent on the care of family members. ME/CFS, affecting as much as 1% of the population worldwide, disables at least 1–2.5 million Americans and costs $17–24 billion annually. The quality of life of ME/CFS patients is notably lower than any medical conditions to which it has been compared. Despite its high prevalence, impact, and economic costs, ME/CFS has long been neglected in medicine and is under-studied. The CDC labels ME/CFS “America’s hidden health crisis”. 90% of ME/CFS patients remain undiagnosed. There are currently no diagnostic biomarkers or FDA-approved treatments for ME/CFS. The prognosis is poor. Many patients remain ill for decades. The cause(s) of ME/CFS are unknown. ME/CFS research was a challenge; but now, it is an opportunity. We are developing a large ME/CFS multi-omics program. We use multidisciplinary approach to investigate its molecular basis. We also develop various novel bioinformatics tools. Our goals are to develop novel tools to identify disease risks to inform diagnosis, treatment, and prevention of ME/CFS. We are also interested in substance use disorder (SUD), and virus-caused cancer. Our research projects can be grouped into three main directions: bioinformatics development, disease risk discovery, and translational genomics. Below are examples for each category.


Bioinformatics development
  • Develop ERVcaller2 for individual transposable element (TE) expression quantification: We am finishing a novel toolkit (ERVcaller2) for using RNA-Seq data to correctly quantify expression of transcriptome-wide distinct ERVs and other TEs. (Our first version ERVcaller was published in Bioinformatics.)
  • Develop ERVcaller3 to use single cell RNA-Seq to quantify TE expression: It remains largely unknown how TE expression varies among cell types, we will develop an innovative, unbiased single cell RNA-Seq based ERVcaller3. It will consist of importing genome annotation and sequencing reads, performing barcode demultiplexing, quality filtering, mapping clean reads to individual TEs, and generating a matrix of read counts for each TE. This platform will allow us to gain a unique insight into cell specific TE transcriptome with high resolution.
  • Develop ERVcaller4 to use long-read sequencing to genotype TE variants and quantify TE expression: We will develop a new algorithm, ERVcaller4, to genotype and quantify expression of distinct individual TEs using PacBio and Nanopore long reads. ERVcaller4 will directly illuminate the whole length TE sequences and genome structure of each TE and identify TEs that are not detectable by short reads.
  • Develop VIcaller2 to detect mosaic viral integrations: We will develop a new platform, VIcaller2, to use single cell sequencing or ultra-deep sequencing to detect virome-wide mosaic viral integrations that can be applied to any tissue types (VIcaller was published in Genome Research). VIcaller2 can be applied to long read sequencing data. It can also be extended to the human bacteriome.

Disease risk discovery
  • Identify ME/CFS risks using multi-omics: Our goals are to develop innovative bioinformatics approaches and to find ME/CFS causal factors. We are building a large multi-omics data collection for ME/CFS (genome, transcriptome, epigenome, phenome, etc.). We are using comprehensive bioinformatics approaches to identify various genetic and epigenetic markers. In addition to the standard omics approaches, we have created a variety of innovative approaches. For example, elevated inflammatory immune responses are central features of ME/CFS. Endogenous retroviruses (ERVs), which result from the fixation of ancient viral infections and integrations into the human genome and are typically silenced, can be reactivated by various environmental triggers (e.g., infectious agents and stress). Transcriptional activation of ERVs enhances immune responses and inflammation. These suggest possible links between ERVs and ME/CFS. We developed a two-pronged approach for genotyping and expression quantification of genome-wide individual ERVs. Our specific aims are to identify distinct ERVs whose expression are associated with ME/CFS using RNA-Seq (including single cell RNA-Seq); identify distinct ERVs whose genotypes are associated with ME/CFS using whole-genome sequencing (including long read sequencing); and identify distinct ERVs whose demethylation activates ERV expression.
  • COVID-19, ERVs and ME/CFS: Coronaviruses (SARS and MERS) have been reported to trigger ME/CFS symptoms, implying that more Americans may exhibit ME/CFS-related symptoms following the COVID-19 pandemic (some COVID-19 “long-haulers” have already reported uncovered fatigue and other ME/CFS symptoms). If an ERV–ME/CFS link is proven, it may explain the fact that ME/CFS epidemics have historically followed viral outbreaks (e.g., various viruses can activate ERVs through demethylation of ERV sequences). Our research team are collecting longitudinal bio-specimens from COVID-19 survivors. We will focus on omics profiling to study the development of ME/CFS. This study may pave an avenue to dissect the genetic mechanism by which infectious agents trigger ME/CFS symptoms, providing clues for new ME/CFS patients among COVID-19 “long-haulers”.
  • Clonal viral integration analyses to find viral causes of HIV+ tumors: We applied virome-wide viral integration methods into HIV+ lung cancer deep sequencing data. Our initial study showed that 12% of tested HIV+ lung cancer patients were caused by high-risk HPVs based on identified HPV clonal integrations in early stages of lung tumorigenesis. This rate, if verified, will be important to this common cancer. Our analysis may lead to adoption of immunotherapies in virus-caused cancer patients. Our “clonal viral integration” methods can be applied broadly to other disease models.
  • Identify ERV variants and ERV expression in alcohol use disorder (AUD): We have collected WGS data from AUD samples and RNA-Seq data from AUD brain PFC, ethanol treated human embryonic stem cell-derived neurons, and alcohol use mouse brains. We showed that ethanol induced ERV activation and expression in neurons; AUD brains exhibited elevated ERV expression than controls; and ERV genotypes were associated with AUD. We will conduct systematic analyses to study the “ERV-immune response-neuroinflammation-disease” link. We will also examine developmental and environmental modulators, such as stress, childhood maltreatment, nutrients, which may be ERV activators. This analytic platform can be applied to ME/CFS and other neuroinflammatory diseases. The fact that addiction treatment medications (such as Naltrexone) are used to “treat” ME/CFS patients in current clinical practice support that the two diseases may share some biology and pathogenesis in common.

Translational medicine

  • Anti-retroviral drugs and methyl supplements to reverse ERV effects: There are already anti-retroviral clinical trials to suppress ERV expression in multiple sclerosis and amyotrophic lateral sclerosis (MS and ME/CFS share symptoms). Our goal is to use innovative approach to investigate whether ERVs are risks in ME/CFS. If successful, the existing FDA-approved anti-retroviral or anti-inflammatory drugs can be repurposed to reverse ERV effects. Some ME/CFS patients who self-use these drugs have reported improved symptoms. In addition, if a demethylation mediated ERV activation hypothesis is proven, methyl supplements may be used to reverse ERV effects. 
  • Identify alcohol dependence (AD) early stage risks and predictors: AD is a progressive disease. It is hard to cure but is preventable. The best period for prevention is adolescence. We will use a hybrid approach of software prediction (evolution theories, machine learning, etc.) and manual curation to analyze the lifespan phenome-genome data of a large number of AD patients. The project aims to identify the developmental pathway from early onset symptoms to AD diagnosis and thus may lead to early intervention in at-risk children. This study may generate clinically actionable data that can guide intervention and prevention for AD and related deaths. We will also generate a general workflow that can be applied to a broad range of similar research.