MotivationMicroRNAs (miRNAs) are small non-coding RNAs that are involved in post-transcriptional regulation of gene expression. In this high-throughput sequencing era, a tremendous amount of RNA-seq data is accumulating, and full utilization of publicly available miRNA data is an important challenge. These data are useful to determine expression values for each miRNA, but quantification pipelines are in a primitive stage and still evolving; there are many factors that affect expression values significantly. ResultsWe used 304 high-quality microRNA sequencing (miRNA-seq) datasets from NCBI-SRA and calculated expression profiles for different tissues and cell-lines. In each miRNA-seq dataset, we found an average of more than 500 miRNAs with higher than 5x coverage, and we explored the top five highly expressed miRNAs in each tissue and cell-line. This user-friendly miRmine database has options to retrieve expression profiles of single or multiple miRNAs for a specific tissue or cell-line, either normal or with disease information.
![]()
Results can be displayed in multiple interactive, graphical and downloadable formats. 1 IntroductionMicroRNAs (miRNAs) are small non-coding RNAs that target specific mRNAs through the RNA interference (RNAi) mechanism, and regulate gene expression and mRNA degradation (;;; ). The average length of mature miRNA is ∼22 nucleotides, generated from an approximately 70–100 nucleotides long hairpin RNA, called the precursor miRNA, or pre-miRNA (; ).
These mature miRNAs control expression from most human genes; it was estimated that transcripts of more than 60% of human genes carry at least one conserved miRNA-binding site , and many non-canonical binding sites were also reported in the human miRNA interactome. Moreover, a single miRNA can target multiple mRNAs; likewise, a single mRNA can be targeted by more than one miRNA. Indirectly, miRNAs can also affect the expression level of multiple mRNAs by targeting a transcription factor (; ). There is a cumulative effect of miRNA expression level on many mRNAs that ultimately results in diverse cellular functions because final protein output is defined by miRNA expression level as well as the availability of mRNA targets. Therefore, a change in the expression level of a particular miRNA sometimes leads to severe pathological conditions (; ).
These miRNA expression profiles are also useful to classify different tumors and assist cancer diagnosis (; ).The current miRBase release (v21, June 2014) contains more than 2500 mature human miRNAs, and these entries are continuously increasing, especially from high-throughput studies. The rapid advancement of small RNA sequencing technologies facilitated the quantification of all miRNAs in a particular condition with high-level sensitivity and accuracy (; ). A major challenge is to process publicly available heterogeneous microRNA sequencing (miRNA-seq) data with a robust pipeline and measure the abundance of all miRNAs in different experiments. Some miRNAs are expressed preferentially or exclusively in certain tissues; those miRNAs are generally associated with tissue identity, differentiation and function. Therefore, it is important to explore and quantify expression values of these tissue-specific miRNAs for better understanding their biological roles. There are many extracellular circulating miRNAs expressed in blood, plasma or serum may be promising noninvasive biomarker candidates for many diseases, including diagnosis, prognosis and treatment of cancers (; ).In the past, some databases such as microRNA.org , miRGator , YM500 (, ), HMED , deepBase v2.0 , miRbase and DASHR used small-RNAseq data and calculated expression values of miRNAs. Different library preparation protocols can affect the miRNA expression profiles.
It is more appropriate to use miRNA-seq data exclusively because sometimes miRNAs are underrepresented in the small-RNAseq. There are several other factors that affect the quantitative value of miRNA expression. For example, a simple adapter trimming step can change miRNA expression values drastically because public data contain different adapter sequences from diverse studies; therefore, manual detection and removal of those adapter sequences are necessary. There is a need to process public data with a robust pipeline and develop a comprehensive resource for human miRNA expression profiles.In the present study, we have developed the miRmine database of miRNA expression profiles from publicly available human miRNA-seq data. This database provides a global view of tissue and cell-line based expression profiles and relative abundance of different human miRNAs.
We processed the miRNA-seq data with a robust pipeline and measured the expression values. All expression profiles were integrated into this user-interactive web-resource. 2 Materials and methods 2.1 Data sourceWe retrieved raw reads of 349 publicly available experiments of human miRNA-seq from NCBI-SRA by specifying LibraryStrategy search as miRNA-Seq. All these experiments have been performed on an Illumina platform; 360 total runs were available for these 349 experiments (dated 18 August 2014). Our workflow is provided in. A workflow of the data source and processing of miRNA-seq data (Color version of this figure is available at Bioinformatics online.) 2.2 Data pre-processingThe recently published pipeline CAP-miRSeq has been applied for processing of miRNA-seq data. It was important to check the quality of public data before using them for any purpose.
Therefore, we used a repetitive 3-step strategy: first, the FastQC (v0.10.1) tool was applied for assessing reads quality; second, Cutadapt (v1.6) was used for removing adapter sequences; finally, FastQC was employed again for assessing reads quality after trimming. The public data contain diverse adapter sequences; therefore, we repeated these 3-steps with different adapter sequences until we got high-quality assurance from FastQC. We have used ‘CUTADAPTPARAMS=-b AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT -b GATCTACACGTTCAGAGTTCTACAGTCCGACGATC -b GATCGTCGGACTGTAGAACTCTGAACGTGTAGATC -O 3 -m 17 -f fastq’ to process the data. Ultimately, we retained only 304 high-quality experiments for further use in building the miRmine database. 2.3 Reads alignmentA miRDeep2 tool has been used for alignment of high-quality reads from the 304 experiments. Initially, we used miRDeep2 mapper for mapping of trimmed reads with the reference human genome (GRCh38 release).
From that point forward, the miRDeep2 module has been applied for mapping of reads with predefined miRNA annotations based on miRBase v21. We used different parameters of MAPPER =-e -h -q -m -r 5 -u -v -o 4, MIRDEEP2 =-P -t Human, QUANTIFIER =-P –W and BOWTIE =-p 1 -S -q -n 1 -e 80 -l 30 -a -m 5 –best –strata, to process the data.
2.4 miRNA quantificationAll reads mapped to miRNA coordinates are used to calculate expression profiles by miRDeep2. Although there were 2588 unique mature miRNAs, some mature miRNAs are derived from more than one precursor miRNA, so we calculated separate expression values for all 2822 mature miRNAs. We used the normalized reads count as reads per million (RPM) for all known miRNAs. Some experiments have more than one sample run, so we used the average of RPM values in our database. FeaturesdeepBase v2.0HMEDmiRbaseDASHRmiRmineFunctional websiteNot workingYesYesYesYesSequencing dataSmall RNA-seq and RNA-seqSmall RNA-seqSmall RNA-seqSmall RNA-seqmiRNA-seqType of small RNAsMultiple small RNAsmiRNAsmiRNAsMultiple small RNAsmiRNAsSpeciesMulti-speciesHumanMulti-speciesHumanHumanSingle miRNA expression profile–YesYesYesYesMultiple (selected) miRNAs comparison–NoNoNoYesDownloadable expression profile data–YesNoYesYesDifferent datasets and data processing pipelines make it difficult to directly compare miRmine with existing databases.
The only approach of HMED is somewhat similar to miRmine because that is specific to human and miRNAs only. HMED previously reported six highly expressed miRNAs and hsa-miR-21-5p (average 219057 RPM) as the top expressed miRNA in humans. Failures of goddards designs. Only two (hsa-miR-21-5p & hsa-miR-191-5p) of the top six expressed miRNAs are shared between HMED and miRmine database.
The difference may be due to our different methodology; while processing these data we found that manual removal of adapter sequence is important for heterogeneous public data. Comparatively, HMED found 70% of the known miRNAs as low or not expressed (RPM 100 RPM) whereas HMED reported 9% of known miRNAs as highly expressed. FeaturesHMEDmiRmineDataset (NCBI-SRA)410 small RNA-seq304 miRNA-seq% of known miRNAs as highly expressed (100 RPM)9%12%% of known miRNAs as low or not expressed (RPM.
Tissue and cell-line specific patterns of all mature miRNAs based on the average expression values (RPM). The hierarchical clustering of both different tissues and cell-lines, and miRNAs are based on the distance metric of Pearson correlation values. In upper panel, all tissues are highlighted with green color whereas cell-lines are highlighted with red color (Color version of this figure is available at Bioinformatics online.) 3.4 Web-interfaceAll expression profiles from the 304 experimental datasets have been implemented in the form of a web-resource called miRmine. Different modules are integrated into miRmine for query search and visualization purposes. We intend to update this resource with new release of miRBase.
A schematic representation of different applications of the miRmine database. All these search options are provided in user-friendly manner (please see section 3.3 for details) (Color version of this figure is available at Bioinformatics online.) 3.4.1 miRNA-based searchUsers have the option to search single or multiple (comma separated) human miRNAs using standard accession IDs (e.g. An auto-completion option is provided so the user can enter any sub-part of an accession ID (e.g.
MiR-21 or 21), and it will automatically provide all the corresponding entries from our database. 3.4.2 Tissue or cell-line based searchThere is an option to select any tissue or cell-line of 15 tissues and 24 cell-lines in the database. The hair follicle tissue has been excluded because so few miRNAs are expressed, which may create a balancing problem in graphs.
![]()
Multiple tissues can be selected using control + select option. By default, the database will give expression values from all tissues. The user can combine both miRNA and tissue or cell-line based search for retrieving expression values. 3.4.3 User interactive and downloadable graphsAll graphs are provided in the user-interactive mode for better visualization and customization, as developed previously for proteomics data. A user can select or hide a particular miRNA from the graph by using an option at the bottom of the column graph.
In the box-plot, there will be five different log 2 values (maximum, upper quartile, median, lower quartile and minimum) of average RPM for each tissue type from different experiments. We also provided a link on the top for plotting separate box-plot of normal vs diseased samples. These graphs are downloadable in various formats (png, jpeg, pdf, svg); users can also print using an option at the right top.
MiRDeep2 tutorialExtended miRDeep2 tutorial with step by step instructions. It will cover the mapper.plfor preprocessing and mapping, the miRDeep2.pl for de-novo prediction.DisclaimerThis tutorial comes with no warranty and demands common sense of the reader.
I am notresponsible for any damage that happens to your computer by using this tutorial.For comments or questions just create an 'issue' here.Apparently you will need a github account for that.PrefaceThis is a step by step guide for a full small RNA sequencing data analysis using themiRDeep2 package and its patched files. The first part will describe the general workflowto do de-novo miRNA predictions based on a small RNAseq data seq and the second part willfocus on expression analysis with the quantification module.Installation instructionsIf you haven't installed them yet you can obtain the main package fromand the patched files fromby clicking on 'Clone or download' and then on 'Download Zip'.Extract the zipped files and then open a command line window.If you have git installed you can obtain the packages also directly from the command lineby typing. Cd drmirdeep.github.io-master and continue with the tutorial.The data and what else you will needUsually you will have gotten a small RNA sequencing data file from a collaborator thatwants you to analyze the data file.
Jamie isaac discography blog. Before you can start with any kind of analysis youshould either already know the small RNA sequencing adapter that was used for thesequencing of the sample or ask your collaborator to sent it to you. If you don't clipthe adapter then the majority of the reads having an adapterare likely not to be aligned to anywhere.Once you know the adapter sequence you should do a simple check to see how manyof your sequences contain the adapter. This you can do by typing.
Grep -c TGGAATTC examplesmallrnafile.fastqwhere 'TGGAATTC' are the first 8 nucleotides of the adapter that has been usedfor this sample. Replace it with your own sequencing adapter. MicroRNAs have a mean lengthof 22 nucleotides in animals so if you have sequenced one of those it will likely have thesequencing adapter attached to it. If the resulting number of sequences with an adapteris around 70% of the number of your input sequences the data set can be considered asreasonably good.
Mirdeep2 Tutorial
Note: In case that only adapters have been sequenced predominantly you willalso get a high number which is obviously not good. If you only get 10% of sequences withan adpater then likely something went wrong during thesequnecing library preparation or your sample doesn't contain too many small RNAs.For novel miRNA prediction we need to map the reads against a reference databasewhich has to be indexed by bowtie 1. For this we take a reference database file,lets call it refdb.fa (This can be a genome file or simply a filewith scaffolds) and build a bowtie index by typing. Bowtie-build refdb.fa refdb.faThe first argument is here the actual file to index, the second argument is theprefix for the bowtie index files. You can name it differently but for ease ofuse I use the same name as my reference database file. Depending on the inputfile size it can take several hours (for the human genome for example) to beindexed. However, the bowtie website has already some prebuild index filesfor download.
Plus, organize your music into folders and set lists and much more! Musicnotes features the world's largest online digital sheet music catalogue with over 300,000 arrangements available to print and play instantly. Dragon ball themes saxophone. Shop our newest and most popular sheet music such as 'Limit Break x Survivor', 'Cha-La Head-Cha-La', or click the button above to browse all sheet music.Download our and interact with your sheet music anywhere with in-app transposition, text & highlighter markup and adjustable audio/video playback.
If you decide to download index files you will also need to downloadthe fasta file with which the index was build. Otherwise the results in the miRDeep2prediction will be not reliable.Data preprocessing for novel miRNA predictionSince the miRDeep2 package was designed as a complete solution for miRNA prediction andquantification it also contains data preprocessing routines that will also clip thesequencing adapter. The main function of the mapping module is the mappingof the preprocessed reads file to reference database. The reference database is typicallyan annotated genome sequence but can also be simply a scaffold assembly if no genome isavailable.
The scaffolds itself however should be at least 200 nucleotides long so thata sane miRNA precursor plus some flanking region fits into it. Apart from clippingadapters the module does sanity checks on your sequencing reads and also collapse readsequences to reduce the file size which will save computing time. Mapper.pl examplesmallrnafile.fastq -e -h -i -j -k TGGAATTC -l 18-m -p refdb.fa -s readscollapsed.fa -t readsvsrefdb.arf -v -o 4What does this command do? The first argument needs to be your sequencing file.Typically, this will be a fastq file. The format of the fastq file is designated byspecifying option '-e'. If your file is in fasta format already you specify option '-c'instead.
Mirdeep2 Manual For Mirna Expression Meaning
If your reads file is not in fasta format you need to specify option '-h' whichadvises the mapper module to parse your file to fasta format. Option -i will convert RNAto DNA and option '-j' will remove sequences that contain characters other than ACGTN.Now comes the actual adapter clipping which is only done if a adapter sequenceis given by option -k. Only the first 6 nucleotides of this sequence will be used tosearch for an exact match in the sequencing reads. Option '-m' will collapse the readsto remove redundancy and decrease the file size. A sequnecing read seen 10 times in yourraw file will occur only once in the collapsed file and have a x10 in its identifier.After that the reads will be mapped to the given refence genome which index filewas specified by option '-p'. Option -s indicates the preprocessed read file name whichis output by the mapper module and option -t is the file name of the read mappings to thereference database ('refdb.fa') in miRDeep2's arf format.
A mapping file in arf formatcan be easily obtained from a standard bowtie 1 output file (This is NOT in 'sam' formatbut a proprietary bowtie text file format) by typing. Convertbowtieoutput.pl readsvsrefdb.bwt readsvsrefdb.arfHowever, if you used the mapping module then the mapped output file is already inarf format.Identification of known and novel miRNAsFor predicting novel miRNAs the miRDeep2 module from the package is called with acollapsed reads file and a reference genome file in fasta format. For betterprediction results reference files of miRNAs and related miRNAs should be given sincemiRDeep2 considers predicted miRNAs with conserved seeds in other speciesmore reliable that miRNAs with non-conserved seeds.
![]()
. 60 wrote:Hey guys,I would like to analyze a set of miRNA data by suing miRDeep2. I have three samples and each has two replicates.I managed to find out novel and known miRNAs by using config.txt in mapper.pl command line like this:mapper.pl config.txt -d -c -i -j -l 18 -m -p genomeindex -s reads.fa -t readsvsgenome.arfSo I had one reads.fa and one readsvsgenome.arf, then I used miRDeep2.pl.Now I would like to find expression of miRNA in each sample to compare expression of each miRNA in different samples.Should I use miRDeep2.pl for each sample and its replicate separately or there are other ways. I know miRDeep2 pipeline but I don't know how I can mange expression analysis of multiple samples with this tool.Thank you in advance.
Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |