A compilation of resources for lncRNA research.
LncRNA refers to long non-coding RNA molecules, usually greater than 200 bases, and with features similar to mRNA, such as 5’ capping, splicing, and polyadenylation. However, lncRNAs have little or no open reading frames, and thus are not translated. A substantial number of lncRNAs may turn out to be mis-annotated, since they might code for essential proteins, as in the case of Aw112010, through non-canonical open reading frames [2]. In addition, lncRNAs are not well conserved across species. Even those conserved may display distinct processing in different species [3]. Many thousands or tens of thousands of lncRNA have been suggested in different species. It has been estimated that there are 91,000 human lncRNAs [4], as downloadable from MiTranscriptome. However, the definitive functions for only a handful of them have been experimentally identified. For example, lncRNA SLERT was found to regulate phase separation of fibrillar center and dense fibrillar component units in the nucleolus [5]. D Prokopenko et al identified LINC00298 as a candidate Alzheimer's disease candidate locus [6]. M Pradas-Juni et al identified the involvement of LincIRS2 in hepatic glucose metabolism [7]. Labonté B et al identified and characterized a novel lncRNA, MAALIN, that regulates the expression of monoamine oxidase A (MAOA) gene in the brain and may, consequentially, regulate the impulsive and aggressive behaviours in mice and humans [8]. LncRNA CCR5AS protects CCR5 mRNA from Raly-mediated degradation through its interference with interactions between Raly and the CCR5 3' untranslated region [9]. Dali, a 3.5-kb, CNS-expressed, mono-exonic, intergenic lncRNA, was shown to interact with a neighbouring transcription factor Pou3f3 and distally with DNMT1 DNA methyltransferase to affect the DNA methylation of promoters [10]. Long intergenic noncoding RNA HOTAIR might serve as a modular scaffold of histone modification complexes [11] and its transcription is regulated by chromatin topology modulation [12]. ARLNC1 (AR-regulated long non-coding RNA 1) interacts with and stabilizes the androgen receptor transcript and promotes prostate cancer growth [13]. The promoter of lncRNA gene PVT1 possesses tumor-suppressor function [14].
A large amount of information regarding lncRNAs identities, properties, and functions, as well as many tools for their analysis became available during the last few years [15, 16]. Here we list databases, computational, prediction and experimental tools related to lncRNAs. Valuable information about these resources can also be found in recent comparative analysis reviews on databases [1, 17-19], computational and experimental methods [20], structure prediction methods [21], or on lncRNA nomenclature [22].
A lot of effort has been put into organizing the vast amount of data on lncRNAs. The information curated in these databases includes basic genomic annotation, lncRNA expression profiles, sequence variants and lncRNA-protein, lncRNA-RNA or lncRNA-DNA interactions (Figure 1).

http://www.gencodegenes.org/ last update: ongoing. From ENCODE Consortium.
GENCODE is a large-scale effort, aiming to annotate all evidence-based gene features in the entire human genome at a high accuracy [23]. It is funded by NIH and Wellcome Trust. GENCODE combines manual curation, computational analysis, and targeted experimental validation of the GENCODE transcript database. The current human version, Gencode 38, released in May 2021, includes 17944 lncRNA genes for 48752 lncRNA transcripts. The current mouse version Genecode M27, released in May 2021, includes 13188 lncRNA genes for 18838 lncRNA gene transcripts.
http://www.lncipedia.org/ last updated: Aug 2, 2018. From Ghent University, Belgium.
LNCipedia V5.2, the latest version, contains 127802 transcripts from 56,946 genes. In addition to basic transcript information and structure, several statistics are calculated for each entry in the database, such as secondary structure information, protein-coding potential, and microRNA binding sites.
Publications about LNCipedia:
- LNCipedia 5: towards a reference set of human long non-coding RNAs [24].
- An update on LNCipedia: a database for annotated human lncRNA sequences [25].
- A database for annotated human lncRNA transcript sequences and structures [26]
http://www.noncode.org/ last update: Sep 2017. From Beijing, China.
The current version (5.0) presents an increased collection of lncRNAs from 17 species.
Publications about NONCODE:
- NONCODE: an integrated knowledge database of non-coding RNAs [27]
- NONCODE v2.0: decoding the non-coding [28]
- NONCODE v3.0:Integrative annotation of long noncoding RNAs [29]
- NONCODEv4: exploring the world of long non-coding RNA genes [30]
- NONCODEv4: Annotation of Noncoding RNAs with Emphasis on Long Noncoding RNAs [31]
http://bioinfo.life.hust.edu.cn/lncRNASNP/ last update: unknown. From Huazhong University of Science & Technology, Wuhan, China.
"Long non-coding RNAs (lncRNAs) are emerging as key factors in the regulation of various cellular processes and diseases. LncRNASNP is a database providing comprehensive resources of single nucleotide polymorphisms (SNPs) in human/mouse lncRNAs. It contains SNPs in lncRNAs, SNP effects on lncRNA structure, mutations in lncRNAs and lncRNA:miRNA binding.
In lncRNASNP2 [32], numbers of human lncRNAs and SNPs on them were updated to 141,353 and 10,205,295. Furthermore, we identified 859,534 Cosmic Noncoding Variations and 315,234 TCGA cancer mutations based on GRCh38 in these lncRNAs."
DIANA-LncBase: Part of the Diana lab tools, (http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=site/index), provides a comprehensive annotation of putative (miRNA)-lncRNA functional interactions. It includes experimentally verified (> 5000 as of Jan 2013) and computationally predicted (> 10 million as of Jan 2013) miRNA recognition elements (MREs) on human and mouse lncRNAs. For each miRNA-lncRNA pair it provides “external links, graphic plots of transcripts' genomic location, representation of the binding sites, lncRNA tissue expression as well as MREs conservation and prediction scores” [33].
An enhanced version, DIANA-LncBase v2.0, became available as of 2015 [34]. The database adds “more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions”, miRNA targets on lncRNAs, predicted with the DIANA-microT algorithm, cell type-specific miRNA:lncRNA regulation, and lncRNA expression information, derived from the analysis of more RNA-Seq reads.
A daTabase of RNA binding proteins and AssoCiated moTifs can be used to predict RNA-binding proteins for LncRNAs [35]. CJ Guo et al used ATtRACT to predict FAST binding proteins [3].
LncDisease is a sequence-based bioinformatics method to predict the lncRNA-disease associations based on the crosstalk between lncRNAs and miRNAs. The most recent update is January 2019. "Current version of LncRNADisease database integrated near 3000 lncRNA-disease entries and 475 lncRNA interaction entries, including 914 lncRNAs and 329 diseases from ~2000 publications. LncRNADisease also provided the predicted associated diseases of 1564 human lncRNAs".
lncRscan-SVM is a tool for predicting lncRNAs using Support Vector Machine (SMV). In order to make the predictions, it integrates features derived from gene structure, transcript sequence, potential codon sequence and conservation [36].
lncRNA-MFDL is a tool to identify lncRNAs by “fusing multiple features of the open reading frame, k-mer, the secondary structure and the most-like coding domain sequence and using deep learning classification algorithms” [37].
LncRNA-ID is a tool to calculate “the coding potential of a transcript using a machine learning model (random forest)”. The analysis takes into account multiple features including: “sequence characteristics of putative open reading frames, translation scores based on ribosomal coverage, and conservation against characterized protein families” [38].
LPBNI is a new computational method that aims to identify potential lncRNA-protein interactions, by making full use of the known lncRNA-protein interactions [39].
All the common mRNA research methods can be used to study lncRNA molecules. They include 3SEQ (3’ End Sequencing for Expression Quantification) [40, 41], RNA-seq, qRT-PCR, RNA in situ hybridizations (can be done in an array format) or RNAscope in situ hybridization [8], and Northern Blot. Specific category of siRNAs against lncRNAs is available from Dharmacon (Lincode SMARTpool); Butler AA et al injected Lincode SMARTpool siRNAs against Neat1 into mouse hippocampal areas to evaluate the role of lncRNA Neat1 in memory formation [42].
- Materials and Methods [ISSN : 2329-5139] is a unique online journal with regularly updated review articles on laboratory materials and methods. If you are interested in contributing a manuscript or suggesting a topic, please leave us feedback.