Introduction of disease models Myalgic
encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a disabling and
debilitating disease characterized by unrelenting fatigue,
post-exertional malaise, cognitive impairment, sleep problems, and
pain. Most ME/CFS patients are unable to work, study or participate in
social activities. One-fourth are house- or bed-bound and fully
dependent on the care of family members. ME/CFS, affecting as much as
1% of the population worldwide, disables at least 1–2.5 million
Americans and costs $17–24 billion annually. The quality of life of
ME/CFS patients is notably lower than any medical conditions to which
it has been compared. Despite its high prevalence, impact, and economic
costs, ME/CFS has long been neglected in medicine and is under-studied.
The CDC labels ME/CFS “America’s hidden health crisis”. 90% of ME/CFS
patients remain undiagnosed. There are currently no diagnostic
biomarkers or FDA-approved treatments for ME/CFS. The prognosis is
poor. Many patients remain ill for decades. The cause(s) of ME/CFS are
unknown. ME/CFS research was a challenge; but now, it is an
opportunity. We are developing a large ME/CFS multi-omics program. We
use multidisciplinary approach to investigate its molecular basis. We
also develop various novel bioinformatics tools. Our goals are to
develop novel tools to identify disease risks to inform diagnosis,
treatment, and prevention of ME/CFS. We are also interested in substance
use disorder (SUD), and virus-caused cancer. Our research projects can
be grouped into three main directions: bioinformatics development,
disease
risk discovery, and translational genomics. Below are examples for each
category.
Bioinformatics development
- Develop ERVcaller2 for
individual transposable element (TE) expression quantification: We am finishing a novel toolkit (ERVcaller2) for using RNA-Seq data to correctly quantify expression of
transcriptome-wide distinct ERVs and other TEs. (Our first version ERVcaller was published in Bioinformatics.)
- Develop ERVcaller3 to use
single cell RNA-Seq to quantify TE
expression: It remains largely unknown how TE expression varies among cell types, we
will develop an innovative, unbiased single cell RNA-Seq based ERVcaller3. It
will consist of importing genome annotation and sequencing reads, performing
barcode demultiplexing, quality filtering, mapping clean reads to individual
TEs, and generating a matrix of read counts for each TE. This platform will
allow us to gain a unique insight into cell specific TE transcriptome with high
resolution.
- Develop ERVcaller4 to use
long-read sequencing to genotype TE variants and quantify TE
expression: We will develop a new
algorithm, ERVcaller4, to genotype and quantify expression of distinct individual
TEs using
PacBio and Nanopore long reads. ERVcaller4 will directly illuminate the whole
length TE sequences and genome structure of each TE and identify TEs that are
not detectable by short reads.
- Develop
VIcaller2 to detect mosaic viral integrations: We will develop a
new platform, VIcaller2, to use single cell sequencing or ultra-deep sequencing
to detect virome-wide mosaic viral integrations that can be applied to any
tissue types (VIcaller was published in Genome Research). VIcaller2 can
be applied to long read sequencing data. It can also be extended to the human
bacteriome.
Disease risk discovery- Identify ME/CFS risks using multi-omics: Our
goals are to develop innovative bioinformatics
approaches and to find ME/CFS causal factors. We are building a large
multi-omics data collection for ME/CFS (genome, transcriptome,
epigenome,
phenome, etc.). We are using comprehensive bioinformatics approaches to
identify various genetic and epigenetic markers. In addition to the
standard omics approaches, we have created a variety of innovative
approaches.
For example, elevated inflammatory immune responses are central
features of
ME/CFS. Endogenous
retroviruses (ERVs), which result from
the fixation of ancient viral infections and integrations into the
human
genome and are typically silenced, can be reactivated by various
environmental
triggers (e.g., infectious agents and stress). Transcriptional
activation of
ERVs enhances immune responses and inflammation. These suggest possible
links
between ERVs and ME/CFS. We developed a
two-pronged approach for genotyping and expression quantification of
genome-wide individual ERVs. Our specific aims are to identify distinct
ERVs
whose expression are associated with ME/CFS using RNA-Seq (including
single
cell RNA-Seq); identify distinct ERVs whose genotypes are associated
with
ME/CFS using whole-genome sequencing (including long read sequencing);
and
identify distinct ERVs whose demethylation activates ERV expression.
- COVID-19, ERVs and ME/CFS: Coronaviruses
(SARS and MERS) have been reported to trigger ME/CFS symptoms, implying
that more Americans may exhibit ME/CFS-related symptoms following the
COVID-19 pandemic (some
COVID-19 “long-haulers” have already reported uncovered fatigue and
other
ME/CFS symptoms). If an ERV–ME/CFS link is proven, it may explain the
fact that ME/CFS epidemics have historically followed viral outbreaks
(e.g.,
various viruses can activate ERVs through demethylation of ERV
sequences). Our
research team are collecting
longitudinal bio-specimens from
COVID-19 survivors. We will focus on omics profiling to study the
development of ME/CFS. This study may pave an avenue to dissect the
genetic
mechanism by which infectious agents trigger ME/CFS symptoms, providing
clues
for new ME/CFS patients among COVID-19 “long-haulers”.
- Clonal viral integration
analyses to find viral causes of HIV+ tumors:
We applied virome-wide viral integration methods into HIV+ lung
cancer deep
sequencing data. Our
initial study showed that 12% of tested HIV+ lung cancer patients were
caused
by high-risk HPVs based on identified HPV clonal integrations in early
stages
of lung tumorigenesis. This rate, if verified, will be important to
this common cancer. Our analysis may lead to adoption of
immunotherapies in virus-caused
cancer patients. Our “clonal viral integration” methods can be applied
broadly
to other disease models.
- Identify ERV variants and ERV expression in alcohol use disorder (AUD): We
have collected WGS data from AUD samples and RNA-Seq data from AUD
brain PFC, ethanol treated human embryonic stem cell-derived neurons,
and alcohol use mouse brains. We showed that ethanol induced ERV
activation and expression in neurons; AUD brains exhibited elevated ERV
expression than controls; and ERV genotypes were associated with AUD.
We will conduct systematic analyses to study the “ERV-immune
response-neuroinflammation-disease” link. We will also examine
developmental and environmental modulators, such as stress, childhood
maltreatment, nutrients, which may be ERV activators. This analytic
platform can be applied to ME/CFS and other neuroinflammatory diseases.
The fact that addiction treatment medications (such as Naltrexone) are
used to “treat” ME/CFS patients in current clinical practice
support that the two diseases may share some biology and pathogenesis in
common.
Translational medicine - Anti-retroviral drugs and methyl supplements to reverse ERV effects: There are already
anti-retroviral clinical trials to suppress ERV expression in multiple
sclerosis and amyotrophic lateral sclerosis (MS and ME/CFS share symptoms). Our
goal is to use innovative approach to investigate whether ERVs are risks in
ME/CFS. If successful, the existing FDA-approved anti-retroviral or
anti-inflammatory drugs can be repurposed to reverse ERV effects. Some ME/CFS
patients who self-use these drugs have reported improved symptoms. In addition, if a demethylation mediated ERV
activation hypothesis is proven, methyl supplements may be used to reverse ERV
effects.
- Identify alcohol dependence (AD) early stage risks and predictors: AD is a progressive
disease. It is hard to cure but is preventable. The best period for prevention
is adolescence. We will use a hybrid approach of software prediction
(evolution theories, machine
learning, etc.) and manual curation to analyze the lifespan phenome-genome data
of a large number of AD patients. The project
aims to identify the developmental pathway from early onset symptoms to AD
diagnosis and thus may lead to early intervention in at-risk children. This study may
generate clinically actionable data that can guide intervention and prevention
for AD and related deaths. We will also generate a general workflow that can be
applied to a broad range of similar research.
|
|
|
|