RBPs as Regulators
Deepening our understanding of how RBPs control gene expression
eCLIP: Identifying the transcriptome-wide binding sites of RBPs at nucleotide resolution
Despite the growing numbers of candidate RBPs and the recognition that disrupted RBP-RNA interactions and RNA processing defects underlie many human diseases, a systematic understanding of which RBPs interact with what RNA(s) to control cellular homeostasis is lacking. Studies by my lab (Brannan et al., Molecular Cell 2016, Jin et al., Molecular Cell 2023) and others (Castello et al., Cell 2012) predicted that the human genome contained far more RBPs (over 2,500) than ever imagined. To enable sensitive discovery of RBP targets at transcriptome-wide scale and nucleotide resolution, in 2016 we developed enhanced CLIP (eCLIP) featuring 1,000-fold higher library yields and drastically improved signal-to-noise in identifying true RBP binding sites (Van Nostrand et al., Nature Methods 2016); previewed in Haque et al., Molecular Cell, 2016. Excitingly, its high robustness, reproducibility and ease of use led to rapid adoption (cited >1,000 times). My lab also previously led an ambitious effort to validate 438 antibodies interrogating 365 human RBPs (Sundararaman et al., Molecular Cell 2016). We have continued our efforts and recently doubled the number of IP-grade antibodies characterized in the field.
Together with the Graveley (UConn), Burge (MIT) and Lécuyer (IRCM Canada) labs, as part of the ENCODE consortium we published the characterization of the largest number of human RBPs to date (Van Nostrand et al., Nature 2020), revealing RBP binding sites and their function, and binding preferences and subcellular localization of RBPs in vivo and in vitro. We describe the spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation, and localization. As an example of the potential of eCLIP to discover new RBP biology, we resolved that the AQR protein associates after intronic lariat formation, clarified a branch point-based scanning model for 3’ splice site recognition, and found novel ribosomal RNA processing factors and RBPs that control retrotransposable elements in the human genome (Van Nostrand et al., Genome Biology 2020).
We have continued to improve eCLIP experimentally, for example by introducing non-radioactive visualization of protein-RNA complexes for their precise isolation (Van Nostrand et al., Genome Biology 2020) and a workflow that enables single-end library sequencing (Blue et al., Nature Protocols 2022). Together with Eclipse BioInnovations, we developed antibody barcode eCLIP (ABC) in which several RBP antibodies are multiplexed in the same sample (Lorenz et al., Nature Methods 2022). With Eclipse and Yeo lab alum Eric Van Nostrand at Baylor, we’ve also developed chimeric eCLIP for profiling of microRNA binding sites on RNA transcripts (Manakov et al., bioRxiv preprint). On the computational side, we recently developed SKIPPER, an end-to-end workflow that converts unprocessed CLIP-seq reads into annotated binding sites using an improved statistical framework (Boyle et al., Cell Genomics 2023). SKIPPER shows dramatically improved sensitivity in calling true RBP binding sites compared to the peak calling algorithms (including our original peak caller CLIPPER we published in Lovci et al, Nature Structural & Molecular Biology 2013) and identifies binding events to annotated repetitive elements.
STAMP: Mapping RBP binding sites and ribosome occupancy at the single-cell level and with isoform resolution
The dynamics of RBP regulation remain poorly understood, in part because transcriptome-wide, scalable, robust and sensitive methods to detect RBP interaction sites on RNAs are lacking. To address this gap in understanding, we developed STAMP (Surveying Targets by APOBEC-Mediated Profiling; Brannan et al., Nature Methods 2021). In STAMP, the RBP of interest is expressed in cells as a fusion to the RNA editing enzyme ABOBEC1 that converts cytosine to inosine. RNA sites bound by the RBP are then detected by standard RNA-seq using edit-aware analysis tools that we developed (Deffit et al., eLife 2018). Critically, STAMP does not rely on ultraviolet cross-linking or immunoprecipitation and, when coupled with single-cell capture, can identify RBP-specific and cell-type-specific RNA-protein interactions for multiple RBPs and cell types in single, pooled experiments. Pairing STAMP with long-read sequencing yields RBP target sites in an isoform-specific manner. Finally, Ribo-STAMP uses APOBEC-tagged small ribosomal subunits to measure transcriptome-wide ribosome association in single cells. To illustrate, in Einstein et al., Molecular Cell 2021, we used single-cell Ribo-STAMP to characterize the heterogeneous translation landscape of myc-dependent breast cancer cells. STAMP enables the study of RBP-RNA interactomes and translational landscapes with unprecedented cellular resolution.
We continue to improve this technology to enhance sensitivity and specificity of identifying true RBP binding sites, and to allow for cell-type selectivity in vivo and temporal resolution. In collaboration with the Kohli lab at U. Penn, we have tested different base editors and the utility of multiplexing base editors with different editing specificities (Medina-Muñoz et al., Nature Communications, in press). Together with the Lippi lab at TSRI, we’re building a STAMP toolkit geared towards mouse in vivo and neuroscience applications. On the analysis side, we recently published FLARE, a fast and flexible workflow to identify edited regions from RNA-seq data that is agnostic to the type of RNA editing (Kofman et al., BMC Bioinformatics 2023).
Meet the Teams working on these projects
New Mechanistic Views of RNA Processing
We have leveraged eCLIP to continue to ask deeper, mechanistic questions about RBP function. For example, LIN28 is a centrally important post-transcriptional regulator of embryonic development and changes in its expression levels are thought to maintain gene expression programs that promote tissue growth and morphogenesis. LIN28 is best known for its ability to regulate miRNA biogenesis but it also has the capacity to broadly bind other mammalian transcripts, as my lab was the first to show (Wilbert et al., Molecular Cell 2012). However, the purpose of its binding to non-miRNA targets remains unclear. In this current period of review, in 2019 (Tan et al., Cell Reports 2019), we show that LIN28 expression level is a key variable that sets the magnitude of protein translation these non-miRNA targets. We systematically varied LIN28B protein levels in human cells and discovered a dose-dependent divergence in transcriptome-wide ribosome occupancy that enabled the formation of two discrete translational subpopulations composed of nearly all expressed genes. This bifurcation in gene expression was mediated by a redistribution of Argonaute association, from let-7 to non-let-7 microRNA families, resulting in a global shift in cellular miRNA activity. In 2021, we further refined our model to show that the binding of targets on non-miRNAs does not by itself confer large-scale direct post-transcriptional regulation. Rather sequestration indirectly modulates gene expression by hindering LIN28’s ability to control miRNAs. In our alternative model, target sites on non-miRNAs act to sequester LIN28 protein from miRNAs and potentiate let-7 dependent regulation. I think this is very provocative, showing that the binding properties of the transcriptome broadly influence the ability of an RBP to mediate changes in RNA metabolism and gene expression (Tan et al., Cell Reports 2021). Basically, RBPs affect RNA but clearly RNA substrates control RBP availability.
Alternative splicing regulation by multiple RBPs. In order for cells to maintain homeostasis, it is thought that multiple RBPs coordinately control RNA processing of individual transcripts. The specific hypothesis we sought to test was that multiple RBPs would coordinately control alternative splicing events in human cells. As a pilot experiment, we were interested in how members of the most abundantly expressed RBPs named heterogeneous ribonucleoparticle (hnRNP) proteins coordinately affect alternative splicing in human cells. We published the first integrated genome-wide analyses of alternative splicing regulated by multiple RBPs and their binding sites in human cells (Huelga et al., Cell Reports 2012). Surprisingly, we showed that the hnRNPs had a high degree of cross- and auto-regulation, and basically attempted to compensate for each other’s absence.
To understand coordinated control it is important to systematically evaluate what is the composition of RBP-RNA complexes? RBPs interact with shared RNA substrates to form RNP-complexes. Knowing the protein interactors of each RBP will enable us to attribute novel functions to known RBPs. In ongoing research, we performed unbiased quantitative proteomics to comprehensively identify proteins that interact with 10 hnRNPs, RBFOX2 and UPF1 proteins. We identified hundreds to thousands of interactors for each RBP, the majority of which RNA-dependent interactors. This protein-protein interaction data is a rich resource for identifying previously unknown linkages to cellular machineries, signaling pathways or subcellular structures.
And as part of the NIH-funded ENCODE efforts, my lab screened more than 700 commercially available antibodies against >500 RBPs for their ability to selectively immunoprecipitate the target protein. We found antibodies that are effective against >250 RBPs where all this information is publicly available to the community. With these antibodies, my lab is in the process of generating a large RBP-RNA interactome dataset in the world to-date, consisting of 250 RBPs in two human cell-lines. Our improvements in the cross-linking and immunoprecipitation (CLIP) protocol have enabled us to operate at this scale in a highly reproducible fashion, setting the standard for such analyses in the field.
Adenosine-to-inosine RNA editing is mediated by RBPs named adenosine deaminase acting on RNA (ADARs). With the advent of high-throughput sequencing, RNA-seq datasets can be mined for evidence of RNA editing in tissues and cell-lines. However, it is often difficult to distinguish A-to-I RNA edit sites in the transcriptome with sequencing errors or nucleotide variation in the genomes. We started a project to build a computational approach that takes as input RNA-seq data and predicts A-to-I sites. At that point, several high-profile papers were published in genome-wide identification of A-to-I sites, rapidly followed by criticisms in the community about the treatment of these data bordering on being incorrect. Taking a step back, we realized that many of these studies did not have the proper genetic control, which is to knockout the ADAR proteins thus any predicted A-to-I site in the ADAR knockout would be considered a false positive by the algorithm. In a close collaboration with Prof. Hundley (Indiana University), we found that the non-catalytic ADR-2 in worms could still affect RNA editing in vivo by affecting the same substrates as ADR-1 (Washburn et al., Cell Reports 2014).
RNA quality control mechanisms are important for the removal of faulty RNAs. In order to discriminate normal RNAs from ones destined for degradation, Suzanne Lee that is co-mentored with Prof. Lykke-Andersen (UCSD) led a study that demonstrated that the ATPase cycle of the SF1 helicase Upf1 is necessary for mRNA target discrimination. Gabriel Pratt and Fernando Martinez also contributed significantly to the study. Pratt performed all the bioinformatics analyzing the large CLIP-seq, RIP-seq and RNA-seq datasets generated by Fernando and Suzanne (Lee et al., Molecular Cell 2015).
Repetitive elements in the genome can interact with RBPs. In collaboration with Tim Behrens at Genentech, we found that the Ro60 RBP binds an RNA motif derived from endogenous Alu retroelements. Alu transcripts were found to be induced by type I interferon and stimulated pro-inflammatory cytokine secretion by human peripheral blood cells. Importantly, Ro60 deletion resulted in enhanced expression of Alu RNAs and interferon-regulated genes. Tiffany Hung, a fellow at Genentech who led the study went on to show that anti-Ro60 positive systemic lupus erythematosus immune complexes contained Alu RNAs and Alu transcripts were enriched in SLE whole blood samples compared to control. This exciting finding was recently published in the journal Science in 2015.
Towards a comprehensive understanding of RNA processing networks
Predicting RBPs
As mentioned above, the number of known and predicted RNA binding proteins encoded in the human genome has increased dramatically over the past years. New experimental methods pioneered by many labs have identified proteins with RNA binding activity.