Talk:PLOS/Transcriptomics technologies

From Wikiversity
Jump to navigation Jump to search

Reviewer 1: Alvis Brazma[edit source]

Overall this is an important article and except for a few minor things (see below), I did not spot anything that would be incorrect or biased. One issue I have is that the document really focuses mostly on transcriptomics technologies rather than on transcriptomics itself defined by the authors as “the study of an organism’s transcriptome, the sum of all of its RNA transcripts” – a definition to which I would agree. There is a ling to another article “transcriptome”, but this article is rather short and does not say much about “transcriptome” either. Where would topics such as the structure of transcriptome – coding/noncoding, long/short, alternative splicing, the distribution of transcript abundance in a cell, transcriptomics and the Central Dogma, etc would go? Perhaps this can be solved by renaming this article into “Transriptomics technologies”, otherwise one would expect to find out in this article what these technologies have taught us about human and transcriptomes. I presume that an article titled “Genomics” will not be entirely devoted to sequencing technologies?

Assuming that transcriptomics technologies I have relatively minor comments.

The section “Before Transcriptomics” could mention the early nylon membrane arrays, which arguably are predecessors of spotted solid surface arrays, which is the first transcriptomics technology in a sense of having all genes on the same array. In the history section ESTs are correctly presented as “Before transcriptomics”, technology, but then EST appears in “Data gathering” and what is said is repetitive.

Microarrys section (under data gathering) again repeats some of what was said in the “history” section. The last sentence there “Microarray technology allowed the assay of 1000s of transcripts simultaneously, at a greatly reduced cost per gene and labour saving” reads like hysteric perspective. From moder perspective one would use the present tense and would say “tens of 1000s of transcripts”. Also whole genome tiling arrays are not mentioned. Ander methods, not that Affymetrix arrays are produced by photolithography technology.

The section “RNA-Seq”, I’m not sure the definition is correct. I think RNA-seq refers to assaying gene expression though sequencing of cDNA obtained from RNA transcripts. How exactly we derive the gene expression from this sequencing, eg., whether we align these sequences to the genome or first assemble in longer transcript fragments, is secondary.

In “Data analysis” it would be appropriate to mention size of typical microarray and raw RNA seq datasets. The section “Sequence alignment” is not accurate, or perhaps is biased – many reviews about RNAseq analysis have been written and should be used here. The steps that are typically mentioned are QC, alignment (to genome or to transcriptome), quantification (on gene, transcript or exon level) and differential expression. New quantification methods (e.g., kallisto) that bypass alignment are gaining in popularity. I don’t think this is clearly described in the article, while at the same time niche things, such as “Burrows-Wheeler transform” is mentioned. I think this part should be shortened considerably (and renamed from Sequence alignment to RNA-seq data analysis, otherwise why Differential expression is there?) to just listing the analysis steps with a reference to a couple of good review papers. I don’t think the section “Annotation” belongs in this article.

In the table Gene expression databases, I’m not sure that Sequence Read Archive counts as one – it is used as an archive for short reads from RNAseq experiments, but it does not tell anything about gene expression. ArrayExpress holds both microarray and RNAseq derived expression data.

Overall I think the article needs some shortening (as indicated above), and a bit of cleaning up (in particular the RNAseq data analysis section) after which it will be very valuable.

Reviewer 2: Ines Hellmann[edit source]

Transcriptomics or the the transcriptomics technologies have become an essential method for many molecular biology laboratories. Therefore it is important to provide an overview about what this all encompasses. This is a big task especially because this is aiming at a fast moving target. The field is developing so fast, that the review is already partially out of date, the question then is whether the authors should report something that is changing as quickly as for example the price tag on RNA-Seq. I am pretty sure that we can get one sample characterized for under 200 Euros.

This drop in the price is mainly due to RNA-seq library preparation techniques that allow for early barcoding and thus the early pooling of tagged libraries, which then also increases the throughput. Which in the case of single cell technologies such as Drop-Seq or 10x genomics is definitely higher than for microarrays, thus directly contradicting the comparison in Table 1. This brings me to another important difference between microarrays and RNA-seq: the required input amounts. Single cell transcriptome analysis would simply not work with microarrays. I believe that this important development needs to be mentioned in a review about transcriptomics, since it allows us to move away from measuring cell and thus also cell-type averages.

The description under Data gathering is a bit redundant.

For the RNA-Isolation chapter it would be good to explain why is the mRNA enrichment necessary, it is important to know that ~98% of the cellular RNAs are rRNA.

The Microarray section contains some technical details such as the fluorophores, but does not mention strategies for background and cross-hybridization correction, such as mis-match or gc-gradient probes, which are important for analysis. Details could be left to a more specialized section or need to be more comprehensive. This randomness in the kind and level of details is an issue throughout the wiki-page.

For example in the RNA-seq section only read-length are discussed with respect to sequencing design and choice between paired- or single end designs are not mentioned (which would be preferred for annotation & isoform discovery) as well the option to introduce unique molecular identifiers UMIs, which avoid amplification bias. In the same chapter, the authors claim that the fragmentation method is dictated by the sequencing platform: to my knowledge for Illumina one can choose Truseq heat-fragmentation, which actually is done at the RNA-level, not at the cDNA state and Tagmentation of cDNA. Those two methods are currently most popular, and not really mentioned. Also I don’t see why these methods shouldn’t work with other sequencing platforms?

The analysis part is also superficial and states outdated information. Current personal computers have enough RAM to easily analyse a normal RNA-seq experiment, I can run this on my laptop. Compute clusters and parallelization become an advantage if we deal with very large sample sizes. The image processing chapter can go, this is pure technology and has no special connection to Transcriptomics.

When the authors discuss the mapping to a reference genome they mention several challenges, and the way this part is written implies that those challenges are being addressed, which is by in large true, however multimapping remains an issue that is not solved yet. There are some attempts using to resolve the true location after a first pass mapping, but none of this is explicitly implemented in a the commonly used mapping tools.

Concerning the splice-aware mapping: it is important to mention, that mapping works much more reliable if the mapper gets an annotation file and for shallow sequencing errors will dominate. Note that gene annotation can make a big difference for expression quantification.

I don’t quite get the phasing of polyploid genomes? What exactly does this have to do with Transcriptomics? The last sentence “... a read aligner can be instructed to discard a read that aligns equally well to two regions, assign it to all regions, or randomly assign it to one of the regions.” would be standard for dealing with multimapped reads, which will occur in any genome not only polyploids.

I think also the chapter about sequence coverage is a bit misleading, I believe the Transcriptome complexity, i.e. how many genes are expressed in a tissue or under a certain environmental condition and the frequency distribution of the transcripts can have as big an impact on the necessary sequencing depth as the number of genes in the genome. For example, in mouse ES-Cells around 10,000 genes are found, while T-cells yield only ~6000 genes (please check those numbers, this is of the top of my head, but there are publications to this end). Next the authors correctly note that the real power lies in biological replicates, thus it is not the coverage per sample that is the limiting quantity.

The chapter about de-novo assembly conveys only little concrete information, I know a little about this topic, but could not learn a lot from Table 3 or the associated chapter. Yeah, makes sense that N50 is not a good measure, but it is not enough to just say that without spelling out the improvement. Also a little more about the actual strengths and weaknesses of those different softwares/approaches would be helpful, otherwise that chapter could also be left out, since the possibility of de-Novo assembly as been mentioned earlier.

For the differential expression -- why do the authors need to mention that these are command-line tools? Besides the CLC workbench this is also true for the assemblers and the mappers, so no news here. Table 4 lists the most popular DE software tools, but the is very uninformative and partially incorrect. They are all designed to detect Differential expression, so why is this the speciality of Cuffdiff2? Also the beta negative binomial in cuffdiff is used to model the isoforms within the same sample the DE part comes from an empirical distribution that uses in case of low sample sizes also information from genes with a similar mean. This information borrowing is in spirit what the Bayesian methods in edgeR, limma and DESeq2 do. EdgeR and DESeq are different implementations of a generalized linear model with using the negative binomial as link function. Limma is a linear model, but it requires precision weights (voom) to accommodate count data. The flexibility for experimental designs including isoform comparisons comes from the specification of model and contrast matrices, which allows similar flexibility for the glms (edgeR & DESeq) and lms (limma), while in cuffdiff only pair-wise comparisons are possible.

For the validation part, I believe that most RNA-seq methods are so standardized that if a qPCR fails to validate RNA-seq result, my first guess is that there might be something wrong with the qPCR. Also in recent papers, I rarely saw qPCR validation. The more important part would be a functional follow up of the identified candidate genes, such as a knock-down - rescue experiment.

I agree with the other reviewer that Annotation should be its own topic.

Examples of applications are matter of taste hence I will not comment too much on this. Only two remarks: 1) I am strongly opposed to motivating non-coding RNAs with the observation that 75% of the human genome showed some signs of transcription. I am sure this number would go up if we would re-analyse the short read archive now and with time go up to 99.9%. If we sample deep enough anything will be transcribed, but this of little consequence and merely evidence that also transcription is in part a stochastic process. 2) Gene expression databases do not belong into the Applications section which otherwise uses examples to illustrate the use of transcriptomics.

In summary, I think it is sad that a wiki page about such an important topic remains so shallow and oblivious to the new and exciting developments in the field of transcriptomics, such as for example single cell transcriptomics or in situ techniques.

Minor remarks:

The enzymes DNase and RNase are spelled with a small a.

Check for consistent spelling: RNA-seq RNA-Seq

Temporarily disabling references[edit source]

Following Rohan's report that edits are unbearably slow, I've temporarily turned off pmid formatting for the page. To restore formatting, either revert my edit from the history or switch all the citations to use Template:cite pmid instead of Template:cite pmid disabled. Hopefully enabling more aggressive caching of references will solve the problem. --Spencer Bliven 02:14, 19 January 2017 (PST)

Author response to reviews[edit source]

We thank the reviewers for their positive feedback on the review and for their specific concerns on how to improve it. We have considerably revised the text to address these concerns. Most sections were modified, generally evening out the level of detail and incorporate some missing aspects. As a result, we think the review is greatly improved.

We aimed to produce a broad introductory review to the field of transcriptomics, and this meant rather than focussing solely on contemporary techniques of the last 5 years we tried to incorporate as full a representation of the field as possible. We have been directed by both reviewers to generally reduce overly detailed examples in the piece, due to space considerations we have tried to be as concise as possible.

We have broken down the reviewer's text into numbered actionable comments and then listed what changes we made in response. The reviewers comments are in italics and our response is after the phrase "*Action completed:" indented with a bullet point. e.g.

reviewer comment

  • Action completed: author response

Response to Reviewer 1: Alvis Brazma[edit source]

Overall this is an important article and except for a few minor things (see below), I did not spot anything that would be incorrect or biased.

1. One issue I have is that the document really focuses mostly on transcriptomics technologies rather than on transcriptomics itself defined by the authors as “the study of an organism’s transcriptome, the sum of all of its RNA transcripts” – a definition to which I would agree. There is a link to another article “transcriptome”, but this article is rather short and does not say much about “transcriptome” either. Where would topics such as the structure of transcriptome – coding/noncoding, long/short, alternative splicing, the distribution of transcript abundance in a cell, transcriptomics and the Central Dogma, etc would go? Perhaps this can be solved by renaming this article into “Transriptomics technologies”, otherwise one would expect to find out in this article what these technologies have taught us about human and transcriptomes. I presume that an article titled “Genomics” will not be entirely devoted to sequencing technologies?

  • Action completed: We have changed the title to Transcriptomics technologies. We agree this new title is a more accurate description of the review.

Assuming that transcriptomics technologies I have relatively minor comments

2. The section “Before Transcriptomics” could mention the early nylon membrane arrays, which arguably are predecessors of spotted solid surface arrays, which is the first transcriptomics technology in a sense of having all genes on the same array. In the history section ESTs are correctly presented as “Before transcriptomics”, technology, but then EST appears in “Data gathering” and what is said is repetitive.

  • Action completed: We have included a link to reverse northern blotting, which is the wiki page for nylon membrane arrays. I have moved the EST paragraph before the transcriptomics section.

3. Microarrys section (under data gathering) again repeats some of what was said in the “history” section. The last sentence there “Microarray technology allowed the assay of 1000s of transcripts simultaneously, at a greatly reduced cost per gene and labour saving” reads like historic perspective. From moder perspective one would use the present tense and would say “tens of 1000s of transcripts”. Also whole genome tiling arrays are not mentioned. Ander methods, not that Affymetrix arrays are produced by photolithography technology.

  • Action completed: The statement on the advent of microarray technology has been moved to the section on the development of contemporary techniques, where its historic perspective is appropriate. Present tense 10,000’s of transcripts used to describe microarrays.

4. The section “RNA-Seq”, I’m not sure the definition is correct. I think RNA-seq refers to assaying gene expression though sequencing of cDNA obtained from RNA transcripts. How exactly we derive the gene expression from this sequencing, eg., whether we align these sequences to the genome or first assemble in longer transcript fragments, is secondary.

  • Action completed: new text “RNA-Seq refers to the combination of a high-throughput sequencing methodology with computational methods to capture and quantify transcripts present in an RNA extract.” I think that we don’t need to define the RNA to cDNA aspect of RNA-Seq as the advent of direct RNA sequencing suggests that part of the definition is not essential.

5. In “Data analysis” it would be appropriate to mention size of typical microarray and raw RNA seq datasets.

  • Action completed: new detail on size on disk for microarray and rnaseq data files.

6. The section “Sequence alignment” is not accurate, or perhaps is biased – many reviews about RNAseq analysis have been written and should be used here. The steps that are typically mentioned are 1) QC, 2) alignment (to genome or to transcriptome), 3) quantification (on gene, transcript or exon level) and 4) differential expression. New quantification methods (e.g., kallisto) that bypass alignment are gaining in popularity. I don’t think this is clearly described in the article, while at the same time niche things, such as “Burrows-Wheeler transform” is mentioned. I think this part should be shortened considerably (and renamed from Sequence alignment to RNA-seq data analysis, otherwise why Differential expression is there?) to just listing the analysis steps with a reference to a couple of good review papers.

  • Action completed: Several edits to reorganise the RNA-Seq section and harmonise level of detail.
  1. New heading called Quality Control
  2. New heading Alignment
  3. New heading Quantification
  4. Deleted heading “Introns”
  5. Deleted “A Burrows-Wheeler index of the human genome requires only 2 Gb of memory, an amount available in contemporary personal computers.”
  6. Deleted mention of different types of sequence alignment algorithms. In Sequence alignment.
  7. Deleted section on Phasing
  8. Moved paragraph on sequence coverage from data analysis to RNA-Seq methods.

7. I don’t think the section “Annotation” belongs in this article.

  • Action completed: Reviewer two supported reviewer 1 comment. Removed section on Annotation

8. In the table Gene expression databases, I’m not sure that Sequence Read Archive counts as one – it is used as an archive for short reads from RNAseq experiments, but it does not tell anything about gene expression. ArrayExpress holds both microarray and RNAseq derived expression data.

  • Action completed: SRA entry has been deleted from table 5, and elements listed in the data gathering section for RNA-Seq.

9. Overall I think the article needs some shortening (as indicated above), and a bit of cleaning up (in particular the RNA-seq data analysis section) after which it will be very valuable.

  • Action completed: The data analysis section has been heavily reorganised and shortened where indicated.
Thanks for the feedback!

Response to Reviewer 2: Ines Hellmann[edit source]

11. Transcriptomics or the transcriptomics technologies have become an essential method for many molecular biology laboratories. Therefore it is important to provide an overview about what this all encompasses. This is a big task especially because this is aiming at a fast moving target. The field is developing so fast, that the review is already partially out of date, the question then is whether the authors should report something that is changing as quickly as for example the price tag on RNA-Seq. I am pretty sure that we can get one sample characterized for under 200 Euros.

  • Action completed: We acknowledge it is indeed a fast moving field. To address the out of date costs we have removed dollar values on RNA-Seq vs microarrays, particularly since comparable costs aren't really generalisable.

12. This drop in the price is mainly due to RNA-seq library preparation techniques that allow for early barcoding and thus the early pooling of tagged libraries, which then also increases the throughput. Which in the case of single cell technologies such as Drop-Seq or 10x genomics is definitely higher than for microarrays, thus directly contradicting the comparison in Table 1. This brings me to another important difference between microarrays and RNA-seq: the required input amounts. Single cell transcriptome analysis would simply not work with microarrays. I believe that this important development needs to be mentioned in a review about transcriptomics, since it allows us to move away from measuring cell and thus also cell-type averages.

  • Action completed: This is a good point on sensitivity. We have added comparison on sensitivity and input amounts. We added a row to the comparison table on input amounts, low for rna-seq, high for microarray. We added a sentence on sensitivity into principles and advances, mention was made that single-cell transcriptomics is possible with RNA-Seq.

13. The description under Data gathering is a bit redundant.

  • Action completed: We acknowledge that the first sentence of the Data gathering section lists the three major techniques for transcriptomics, which can be considered repetitive. We have updated it to better provide continuity and explain the broad principles of the the forthcoming section headings.

14. For the RNA-Isolation chapter it would be good to explain why is the mRNA enrichment necessary, it is important to know that ~98% of the cellular RNAs are rRNA.

  • Action completed: we have added a statement on mRNA abundance in the cell.

15. The Microarray section contains some technical details such as the fluorophores, but does not mention strategies for background and cross-hybridization correction, such as mis-match or gc-gradient probes, which are important for analysis. Details could be left to a more specialized section or need to be more comprehensive. This randomness in the kind and level of details is an issue throughout the wiki-page.

  • Action completed: We agree that level of detail is occasionally uneven, we have removed unnecessary detail on cy3/cy5 fluorophores.

16. For example in the RNA-seq section only read-length are discussed with respect to sequencing design and choice between paired- or single end designs are not mentioned (which would be preferred for annotation & isoform discovery) as well the option to introduce unique molecular identifiers UMIs, which avoid amplification bias. In the same chapter, the authors claim that the fragmentation method is dictated by the sequencing platform: to my knowledge for Illumina one can choose Truseq heat-fragmentation, which actually is done at the RNA-level, not at the cDNA state and Tagmentation of cDNA. Those two methods are currently most popular, and not really mentioned. Also I don’t see why these methods shouldn’t work with other sequencing platforms?

  • Action completed: Tagmentation was incorportated in this section, however, it was considered under enzymatic cleavage. We have now listed transposase tagging explicitly in the text to avoid confusion. We have also added fragmentation of RNA to the text. Heat-fragmentation was listed differently as hydrolysis. We have improved the text and now listed it as chemical hydrolysis. A whole paragraph has been added on Unique Molecular identifiers. Paired-end and single-end sequencing has now been directly addressed in the text. Thanks for pointing out these omissions from the review, it is greatly improved as a result.


17. The analysis part is also superficial and states outdated information. Current personal computers have enough RAM to easily analyse a normal RNA-seq experiment, I can run this on my laptop. Compute clusters and parallelization become an advantage if we deal with very large sample sizes. The image processing chapter can go, this is pure technology and has no special connection to Transcriptomics.

  • Action completed: Image analysis has been greatly cut down to remove any possible extraneous details. The statement on hardware requirements has been edited and now indicates consumer hardware is sufficient for simple RNA-Seq experiments.

18. When the authors discuss the mapping to a reference genome they mention several challenges, and the way this part is written implies that those challenges are being addressed, which is by in large true, however multimapping remains an issue that is not solved yet. There are some attempts using to resolve the true location after a first pass mapping, but none of this is explicitly implemented in a the commonly used mapping tools.

  • Action completed: We agree that multimapping is a challenge that has not been solved as well as other challenges. The text on read mapping has been amended to incorporate this observation.

19. Concerning the splice-aware mapping: it is important to mention, that mapping works much more reliable if the mapper gets an annotation file and for shallow sequencing errors will dominate. Note that gene annotation can make a big difference for expression quantification.

  • Action completed: Good point here, we added some information on supplying prior info on splice junctions to improve mapping.

20. I don’t quite get the phasing of polyploid genomes? What exactly does this have to do with Transcriptomics? The last sentence “... a read aligner can be instructed to discard a read that aligns equally well to two regions, assign it to all regions, or randomly assign it to one of the regions.” would be standard for dealing with multimapped reads, which will occur in any genome not only polyploids.

  • Action completed: Reviewer 1 also reported more significant problems with the inclusion of such an extensive section on read alignments and in particular phasing in polyploid genomes. As a result, this section has been removed in order to keep closely to the topic and reduce extraneous information.

21. I think also the chapter about sequence coverage is a bit misleading, I believe the Transcriptome complexity, i.e. how many genes are expressed in a tissue or under a certain environmental condition and the frequency distribution of the transcripts can have as big an impact on the necessary sequencing depth as the number of genes in the genome. For example, in mouse ES-Cells around 10,000 genes are found, while T-cells yield only ~6000 genes (please check those numbers, this is of the top of my head, but there are publications to this end). Next the authors correctly note that the real power lies in biological replicates, thus it is not the coverage per sample that is the limiting quantity.

  • Action completed: Reviewer 1 indicated they had serious reservations with the inclusion of the section on sequence coverage and as a result it has been removed to ensure the review stays on topic. This chapter is much more streamlined as a result.

22. The chapter about de-novo assembly conveys only little concrete information, I know a little about this topic, but could not learn a lot from Table 3 or the associated chapter.

Yeah, makes sense that N50 is not a good measure, but it is not enough to just say that without spelling out the improvement.

  • Action completed: We have attempted a better explanation of alternative metrics for transcriptome assemblies.

23. Also a little more about the actual strengths and weaknesses of those different softwares/approaches would be helpful, otherwise that chapter could also be left out, since the possibility of de-Novo assembly has been mentioned earlier.

  • Action completed: as this was also mentioned by R1, we have re-arranged and edited the alignment and de novo assembly sections. They are now more streamlined. Table 3 has been updated to better accommodate R2’s viewpoint.

24. For the differential expression -- why do the authors need to mention that these are command-line tools? Besides the CLC workbench this is also true for the assemblers and the mappers, so no news here.

  • Action completed: We have removed references to the command-line in the differential expression section.

25. Table 4 lists the most popular DE software tools, but the is very uninformative and partially incorrect. They are all designed to detect Differential expression, so why is this the speciality of Cuffdiff2? Also the beta negative binomial in cuffdiff is used to model the isoforms within the same sample the DE part comes from an empirical distribution that uses in case of low sample sizes also information from genes with a similar mean. This information borrowing is in spirit what the Bayesian methods in edgeR, limma and DESeq2 do. EdgeR and DESeq are different implementations of a generalized linear model with using the negative binomial as link function. Limma is a linear model, but it requires precision weights (voom) to accommodate count data. The flexibility for experimental designs including isoform comparisons comes from the specification of model and contrast matrices, which allows similar flexibility for the glms (edgeR & DESeq) and lms (limma), while in cuffdiff only pair-wise comparisons are possible.

  • Action completed: We recognise the reviewers feedback that this table contains incorrect details, mostly to do with the types of mathematical models used for each program. We have decided that the best course of action is to remove the ‘model’ column from table 4. This is consistent with the general feedback from R1 and R2 to reduce instances of unnecessary or uneven detail in the review. We believe a more accurate appraisal of the methods in DE programs would likely unbalance the review, a concern already noted in other sections.

26. For the validation part, I believe that most RNA-seq methods are so standardized that if a qPCR fails to validate RNA-seq result, my first guess is that there might be something wrong with the qPCR. Also in recent papers, I rarely saw qPCR validation. The more important part would be a functional follow up of the identified candidate genes, such as a knock-down - rescue experiment.

  • Action completed: We added a comment on functional validation via gene knockdown.

27. I agree with the other reviewer that Annotation should be its own topic.

  • Action completed: The annotation section has been removed, it may find a useful home on another wiki page.

28. Examples of applications are matter of taste hence I will not comment too much on this. Only two remarks: 1) I am strongly opposed to motivating non-coding RNAs with the observation that 75% of the human genome showed some signs of transcription. I am sure this number would go up if we would re-analyse the short read archive now and with time go up to 99.9%. If we sample deep enough anything will be transcribed, but this of little consequence and merely evidence that also transcription is in part a stochastic process.

  • Action completed: We have removed the reference to ENCODE and now stuck to the more relevant facts on ncRNA.

29. 2) Gene expression databases do not belong into the Applications section which otherwise uses examples to illustrate the use of transcriptomics.

  • Action completed: gene expression databases have been moved to their own heading at the end of the document and renamed Transcriptome databases. This works much better, thanks.

30. In summary, I think it is sad that a wiki page about such an important topic remains so shallow and oblivious to the new and exciting developments in the field of transcriptomics, such as for example single cell transcriptomics or in situ techniques.

  • Action completed: We have not deliberately excluded notable developments in transcriptomics, however, we did focus on the most establised techniques over the many niche developments. Having said that, single cell transcriptomics has now been incorporated in the text as it is a well-published specialisation of RNA-Seq and has been brought up in a separate section of the reviewer comments.

31. Minor remarks: The enzymes DNase and RNase are spelled with a small a. Check for consistent spelling: RNA-seq RNA-Seq

  • Action completed: all spellings for DNase, RNase, RNA-Seq have been reviewed and corrected.
Thanks for the feedback! - Rohan Lowe (talk) 21:53, 2 February 2017‎ (PST)

Wikification[edit source]

I am through with the entire text — all changes here — and have the following comments:

  • While .png files are standard in the journal, Wikimedia Commons prefers SVG, so please provide Figure 1 in SVG as well.
  • Please fix the "for their by" phrase (marked in red).
  • In the "Data analysis" section, what about adding an image? The gist is in File:DNA microarray.svg from Category:DNA microarrays, which could be used here but some real data would be preferable.

Thanks, --Daniel Mietchen (talk) 19:56, 7 March 2017 (PST)

Thank you for your edits. We have also addressed your recommendations by:
Please let us know if there is anything else that we can do. I realise that the citations are now a mixture of {{cite pmid}} and the newer {{cite journal}} since the update. However, I assume there is some way of converting them, since previous Topic Page reviews seem to have been changed to {{cite journal}}. T Shafee (talk) 20:43, 9 March 2017 (PST)
This looks good — thanks. Don't worry about the reference formatting templates. Sorting them out basically just amounts to replacing {{cite pmid|PMID}} with {{cite journal|pmid=PMID}} when copying the page over to Wikipedia. --Daniel Mietchen (talk) 05:44, 10 March 2017 (PST)