Talk:PLOS/De Novo Gene Birth

Review by Joanna Masel

This piece is well-written, much-needed on wikipedia, and indeed also an excellent review relative to others in conventional peer-reviewed journals. I thank the authors for doing this.

Major comments:

The classification in the identification section and in Table 1 needs improvement. As currently written, the emphasis is on determining whether a candidate is de novo, which obscures an extremely important second dimension of classification, which asks whether the candidate is truly a gene. These latter problems are currently discussed under "Determination of de novo gene function". Here I believe the authors are in breach of POV criteria for wikipedia. They are the only advocates of using "effect" as a definition of "function" for the purpose of classifying de novo genes. Within evolutionary biology as a whole, there is consensus on the definition of function, shared with philosophers of biology (see eg https://plato.stanford.edu/entries/teleology-biology/), the latter who would seem to be the relevant experts. Within molecular biology as a whole, there is not consensus, but no serious rebuttal to pieces such as https://doi.org/10.1093/gbe/evt028 has ever been given. The study of de novo genes is intrinsically a study of evolution, and hence the evolutionary consensus should prevail. A NPOV needs to acknowledge the evolutionary definition of function as the consensus, while still being free to note the effect definition as a minority view. It should not present the two on equal footing as it currently does. For clarity, it should then use different words for the two, eg "function" and "effect" are unambiguous and have clear precedent in the cited literature.

In this light, the subheading "Determination of de novo gene function" is a strange way of putting things: there might not be a function to a de novo gene candidate, in which case it is not a de novo gene, and the existence of some function can be inferred from sequence conservation, even when we know nothing about what that function is. Inferring function is particularly difficult, and hence the methods for doing so need to be highlighted/classified, at short timescales when conservation is hard to find and/or ORF length is short (or the gene is non-coding).

The "Proto-gene model" section is also affected by this definitional issue, since the theory relies on "gene" being defined only via "effect" and not via "function". In this sense, the theories given are not alternatives, because they describe the evolution of a different set of entities. This should be made clear. Eg, it is stated that the difference with the preadaptation model is that the latter holds that “gene birth is a sudden transition to functionality.” This is a difference in how functionality is defined, rather than a difference in substance, and this is not made clear.

Re adding a second dimension of classification, note that the two current categories of phylostratigraphy and synteny-based methods both tend to use existing gene annotations. Other methods do their own usually more inclusive gene annotation, eg with the help of ribosomal profiling, defining that second dimension of methods. More exhaustive tblastn searches can also change phylostratigraphic classifications when unannotated homologous genes are uncovered: ref 59 (I think, it is corrupted) can be used as a source on this.

I recommend expanding Table 1 to rename existing column "Method" -> "Homology detection method", including noting whether existing gene annotations were used vs. a broader set of potentially genic sequences. Also in this column, "phylostratigraphy" is simply the application for blastp etc.: I recommend replacing the catch-all term with the specifics of what was used for homology detection. I also recommend adding a column for "Function?" than includes conservation studies, as well as noting any individual genes for which KO phenotypes or other important functional evidence is available. Under the column "# orphan/de novo genes", I do not understand the distinctions made: do they echo the terms used by the papers, rather than bring them into a common framework? The differences between orphan genes vs de novo genes vs TRGs vs young genes vs de novo gene candidates should be explained, and should follow consistent definitions across all studies in the Table.

The section "Features of de novo genes" also has NPOV issues. Length and evolutionary rate are susceptible to homology detection bias - this caveat should be mentioned. The authors' finding of low ISD in young yeast genes in ref 60 is given without caveat, even though ref 97 analyzed the exact same data to show that if non-genes are excluded, young yeast genes actually have high ISD, consistent with all evidence in the literature not using the phylostratigraphy of ref 60 and hence not biased by its tendency to include non-functional genes. Lower expression is also susceptible to the inclusion of non-genes.

"It is difficult to ascertain whether a gene is undergoing gene death from sequence data alone, as an acceleration in the rate of evolution could be indicative of either pseudogenization or of positive (aka Darwinian) selection for a novel, adaptive trait. Functional approaches are therefore employed to determine the biological role of a particular gene product; if it is deleterious, as in the case of a de novo microRNA gene in D. melanogaster[67], gene death may be immanent." It's not particularly difficult to spot gene death for coding genes - if no longer functional by the evolutionary definition, they very rapidly acquire premature stop codons. Evolutionary rate in the dN/dS sense is a much less effective tool. Gene death of an ORF of any substantive length is rapid not just if a gene is deleterious but also if it is neutral.

The section "Open Questions" reads to me more like a conventional review piece and not like an encylopedia article - I would cut it.

I also request correction of the following minor points:

Refs 58, 59, 77, and 79 are broken.
Overprinting can also refer to short overlaps following C-terminal extensions that extend one pre-existing gene into the reading frame of a second nearby pre-existing gene: this should be disambiguated, i.e. the term overprinting sometimes but not always refers to a form of de novo gene birth.
Re BSC4, I suggest "functional intermediate state" -> "structurally intermediate functional state"
"A 2012 reanalysis of this data found that ~1,100 non-genic ORFs showed signatures of translation.[60]" This is an overstatement. The bar was very low, better described as ribosomal association than as translation. I realize it's still on biorxiv, but at this stage https://www.biorxiv.org/content/early/2018/05/24/329730 is nevertheless a better citation, with much more rigorous criteria for inferring translation. (I guess this is ref 58 which is broken.) Or, to move away from yeast while staying not just best-evidence but also peer-reviewed, a better citation (with suitable numbers of ORFs) is ref 90.
"enriched in potential adaptations relative to constitutively expressed sequences.[96]" should read "relative to completely non-expressed sequences". Ref 98 is probably more suitable here than ref 96, same general idea but more tailored to the case in question, only reason to include 96 is earlier publication, which is not really a reason for wikipedia as opposed to a conventional review.
"The revealing of cryptic variation may be due to molecular errors including increased rates of translational readthrough and aberrant gene splicing events." This sentence refers to the application of preadaptation theories to other examples, not to de novo gene birth. Cut or replace with specific application to de novo gene birth as outlined in ref 92.
For section organization, I don't think "Dynamics of de novo gene birth" should be a subsection of "Prevalence of de novo gene birth". It seems better grouped with "ORF first vs. transcription first", "“Out of Testis” hypothesis", and "Grow slow and moult model", which would do well grouped together in some broader section devoted to the order of events. I would also group the first two paragraphs of "Role of epigenetic modifications" with "Features of de novo genes", while the third paragraph belongs under "Pervasive gene expression" together with "Proto-gene model" and "Preadaptation model".
Link to and from the wikipedia page on "Overlapping gene".

JoannaMasel (talk) 18:24, 28 November 2018 (PST)

Review by Anon

This is a really great piece of wiki review, very comprehensive, balanced, well structured and easy to read. The authors have done a brilliant job in compiling a huge amount of literature on de novo genes and closely related relevant literature. This will be immensely helpful for anyone interested in the subject and undoubtedly help move the filed forward significantly. I am happy may name passed on to the authors.

I have a couple of comments but they are all relatively minor, I trust the authors will be able to decide which suggestions to take on board and which ones to ignore. I tried to check some of the literature (most of it I was aware of) but please note that there were a couple of errors in the hypertext such that the references could not be identified and therefore some of my comments may go astray or be futile -- apologies for that. Feel free to adpt whichever additional reference you find useful, I would only really insist on the Tretyachenko et al. (see below).

Lacking line numbers or consistent formatting upon printing I tried my best to make comments align-able to text but apologies if not, please ask back. I attach my .pdf so at least the pages should fit.

Writing

Generally great, a couple of minor issues: The very first paragraph (page 1) contains too many too long sentences and many of those have reverse logics. For examples the sentence "Although in principle ..." -- occurrence is not related to detection but of course older events are more difficult to detect because general signals of homology and synteny become blurred. I think this is what you are trying to say. The following sentence insinuates that orphans are a subset of de novo genes but the opposite is in fact true -- and above you seem to use the word "novel" synonymously to "orphan". (I know this is all difficult to structure -- we are constantly having related troubles with reviewers in our own papers.)

When you say: "de novo genes evolve from 'ancestrally non-genic' DNA. What is meant by non-genic here? Does this include intronic sequences? Some authors also consider frame-shift mutations in protein coding-genes as de novo. The authors even state this on page 2 (P1L14-15).

'de novo genes ... play a role in the generation of evolutionary innovation' A citation is not provided. Are there many solid examples of this from the literature. Kalthurin et al. 2009 is often given as an example of the role of orphan genes in generating lineage specific novelty.

Break down last sentence. etc. etc.

Also 1st sentence next page "As early as ..." -- break down.

The last 2-3 pages are not so well written as the beginning, the ms becomes more of an annotated bibliography.

Specific comments

page 2: Fig 1 -- please remove exon shuffling. It does not occur anywhere else in the text and it simply does not exist in that context. It ounds good but is an urban legend. Maybe you mean domain rearrangements but recombination does not play a role there (see e.g. https://www.ncbi.nlm.nih.gov/pubmed/23376183 ) as all coming about of new combinations is a simple consequence of duplication, fusion and terminal losses. And exons do not correspond to domains (unless in artificial systems used for directed evolution).

Overprinting: could you pls distinguish between in-frame, antisense and frame-shifting overprinting? I think the virus examples (Karlin et al.) is out of frame.

I am not aware of any evidence of "starter type exons" [16]. I think current believe is that small fragments of peptides came first and later combined to form larger proteins in the early stages of life. There is recent literature on that by Andrei Lupas's lab and some also by Dan Tawfik's.

With all due respect -- but the "Big Bang" theory is pure nonsense. Sounds good but is an urban legend. Again, see Lupas and Tawfik. And a "space" can never expand since sequence space is metric and the dimensions are fixed. (The story may be different in a space- time continuum with possibly inverted boundaries but this is no longer molecular biology but astrophysics.)

page 3: "... extremely short (<90 bp) ... RNA genes" Not necessarily. See e.g. Galindo M Plos Biol 2007, Hsu+Benfey Proteomics 2017, Andrews+Rothnagel NatRevGenet 2013) and there was also a larger original Science paper ca. 3 yrs ago which I can't find unfortunately.

Ad ref 29: this is nice work but (see also below) I think common believe nowadays is that disorder is not a predictor of a (non-)functional protein and so there is not "in-between". (see also below)

"... all five genes have partial orthologs ..." What is meant by this? Surely the genes are either othologs or they are not? Or, is a portion of the sequence (e.g. a protein domain) orthologous between species. If this is the case should these sequences really be considered de novo genes?

"The first de novo ... in mice ..." I think it should be mouse as plural would insinuate using different species or population of Mus.

p4: around ref 38 -- you could mention that also more advanced methods have been used too, such as PSI-BLast or HMMs generated from e.g. de novo genes in 3 sister species to search against the nearest outgroups.

Just a small side note: it could be mentioned that synteny works very well in vertebrates but not at all in insects.

Using BLAST to detect 'ancestral homologs'. This should be inferring the presence/absence of ancestral homologs by detecting homologs in extant sister taxa. That is, sequences in living species are not ancestral to each other.

p5: The classification of de novo genes into types based on how much of their sequence is derived from ancestrally non-coding DNA was proposed by McLysaght and Hurst (2016). Has this terminology caught on outside of this one paper? I can't think of any other papers in which it is used.

p6: ref 59 is garbled -- I assume you refer to https://www.ncbi.nlm.nih.gov/pubmed/30346517 here -- if not yet please do so.

Last sentence of the paragraph ("Generally speaking ...") I fully agree but I think the story is quite simple: de novo arise much more often than single-gene duplicates but are also much more often lost, often even at the proto-gene or transcript level. Over -- say -- more than 100 my, the vast majority of novel genes will stem from duplications. So what we are really interested in, is actually how de novo manage to be retained -- is it neutral drift, small populations allowing also slightly deleterious mutations or how else they become useful -- eventually.

p7

"Feature of de novo genes":

Pls remove or place somewhere else the first sentence -- it does not lead well to the following one, or maybe swap the two.

ORF 1st vs trc. 1st:

most <-- must

mind that Poldi is an RNA gene

P2L6 possible typo immanent -> imminent

p8

Proto gene model:

thESE data (is plural)

Around the penultimate paragraph "Most non-genic ORFs that are ...": You may want to refer to the paper https://www.ncbi.nlm.nih.gov/pubmed/29133927 which i think settles the debate about random sequences being structured or not and functional or not. Tretyachenko et al. (Scientific Rep. 2018) suggested that (i) random sequences may well fold but that (ii) also highly disordered proteins may be functional, as long as they do not aggregate. Taking these and a couple of other evidence on the function of disordered proteins (e.g. as phase separation in the cell), it should be noted that there is absolutely not evidence or necessity to assume that (i) proteins have to become ordered and (ii) that any possible signal of positive selection is related to structural changes. On the contrary, disorder may be very well functional and promote important functional switches following e.g. phosphorylation or contribute to phase separations.

p9

I know this is supposed to be a balanced review but ref 97 (Wilson) used totally odd ID parameters (see reanalysis in Schmitz et al. 2018 Nat Ecol Evol) which biased the interpretation and as said above most of the proposed genes are not de novo.

"Out of testis":

Regarding the out-of-testis hypothesis and the general lack of functional evidence of de novo genes it might be worthwhile to consider https://www.ncbi.nlm.nih.gov/pubmed/28104747 , although the de novo genes are very old. (N.B.: we have expressed and structure analysed the protein and it looks nowhere close to an existing one but reasonably as it contains structural elements which are known to be emerging quite easily from random sequences)

p9/10 Grow slow:

In Klasberg et al. (FEBS J 2018) it was shown that novel fragments can be obtained either terminally or by exonisation from intronic regions. (The latter was btw. also shown in a completely different context in https://www.ncbi.nlm.nih.gov/pubmed/26541172). Interestingly, the novel fragments did again show no signs of adaptation but the hosting proteins showed signs of purifying selection, indicating that (maybe) the host protein has to fend off the slightly deleterious effects of new and not yet optimised fragment. Disturbingly, even though the synteny should be very simple, no signs of the novel fragments were found in the outgroups (see also on my above comment on insect synteny).. This and another study (https://www.ncbi.nlm.nih.gov/pubmed/30247639) are extensions of the "slow grow ...." model. But it may be too specific and detailed to mention.

p10

Methylation: Keller + Yi, PNAS 2014

Again, feel free to keep this as is but personally you will find everything related to cancer -- rigorous statistical tests are lacking and given the highly entangled cellular networks every proteins is just 2-3 interactions away from a couple of oncogenes.

p11

ref 120: could you check if this is the gene which has meanwhile been found to have been introgressed from Neanderthals?

"... but are enriched for transcription factors" what precisely is enriched for what?

Review by Diethard Tautz

Peer reviewer 3 comments and recommendations given as an annotated PDF of the submitted article:

Diethard Tautz reviewer comments

Response to Reviewers

We would like to thank all of the reviewers for their careful read of our review and their many helpful suggestions, the majority of which have been incorporated into the current draft. The following changes have been made in response to the specific comments and criticisms provided:

Response to Joanna Masel

1.The classification in the identification section and in Table 1 needs improvement. As currently written, the emphasis is on determining whether a candidate is de novo, which obscures an extremely important second dimension of classification, which asks whether the candidate is truly a gene. These latter problems are currently discussed under "Determination of de novo gene function". Here I believe the authors are in breach of POV criteria for wikipedia. They are the only advocates of using "effect" as a definition of "function" for the purpose of classifying de novo genes. Within evolutionary biology as a whole, there is consensus on the definition of function, shared with philosophers of biology (see eg https://plato.stanford.edu/entries/teleology-biology/), the latter who would seem to be the relevant experts. Within molecular biology as a whole, there is not consensus, but no serious rebuttal to pieces such as https://doi.org/10.1093/gbe/evt028 has ever been given. The study of de novo genes is intrinsically a study of evolution, and hence the evolutionary consensus should prevail. A NPOV needs to acknowledge the evolutionary definition of function as the consensus, while still being free to note the effect definition as a minority view. It should not present the two on equal footing as it currently does. For clarity, it should then use different words for the two, eg "function" and "effect" are unambiguous and have clear precedent in the cited literature

We thank the reviewer for highlighting an area where our review needed further semantic clarification. To address her concern, we have combined the previous two sections, “Classification of de novo genes” and “Determination of de novo gene function,” into a single section, “Determination of de novo gene status.” This section has been largely reorganized/rewritten. We have deemphasized the classification into different types of de novo genes (also per reviewer Anon’s suggestion) and reworked our discussion surrounding determination of whether a candidate is truly a gene. We also lay out different contemporary views regarding what constitutes a gene, and how the definition of gene relates to the definition of function. We present a synopsis of the debate concerning how function is defined and present both the genetic/biochemical definition (“effect”) and the evolutionarily definition (selection). We note that “Evolutionary biologists tend to view only those sequences under selective constraint as being functional in the strict sense of the word.”

We are confident the revised review is not in breach of NPOV. While there are many examples of studies that use the “effect” definition of function as part of a rigorous determination of de novo gene status (e.g. Cai et al., Genetics, 2008; Li et al., Cell Res, 2010), we have not specifically advocated using the “effect” definition to classify de novo genes beyond using expression as an inclusion criterion, a common practice. To avoid any misunderstanding about the meaning of “function”, we have re-organized Table 1 to clearly delineate evidence of expression, selection and physiological roles.

2. In this light, the subheading "Determination of de novo gene function" is a strange way of putting things: there might not be a function to a de novo gene candidate, in which case it is not a de novo gene, and the existence of some function can be inferred from sequence conservation, even when we know nothing about what that function is.

We have changed the subheading to “Determination of de novo gene status.”

3. Inferring function is particularly difficult, and hence the methods for doing so need to be highlighted/classified, at short timescales when conservation is hard to find and/or ORF length is short (or the gene is non-coding)

We have added a discussion of some of the evolutionary methods for detecting selection on short timescales (e.g. nucleotide divergence, coding score) as well as some additional detail regarding experimental methods for detecting genetic/biochemical activities.

4. The "Proto-gene model" section is also affected by this definitional issue, since the theory relies on "gene" being defined only via "effect" and not via "function". In this sense, the theories given are not alternatives, because they describe the evolution of a different set of entities. This should be made clear. Eg, it is stated that the difference with the preadaptation model is that the latter holds that “gene birth is a sudden transition to functionality.” This is a difference in how functionality is defined, rather than a difference in substance, and this is not made clear.

As we see it, the proto-gene model does not depend on the effect definition of function. We have taken care to avoid saying that proto-genes are functional, emphasizing that most non-genic sequences are evolving neutrally.

To address the important concern raised by the reviewer, we now specify that the preadaptation model uses the evolutionary definition of function and no longer place the idea that “gene birth is a sudden transition to functionality” in contrast with the proto-gene model. Instead we note the differences between the models with respect to their predictions about the features of young genes, relative to old genes and non-genes. We connect the models by noting that according to the preadaptation model, expressed sequences that are being purged might be considered proto-genes, and become genes if and when they come under purifying selection.

5. Re adding a second dimension of classification, note that the two current categories of phylostratigraphy and synteny-based methods both tend to use existing gene annotations. Other methods do their own usually more inclusive gene annotation, eg with the help of ribosomal profiling, defining that second dimension of methods. More exhaustive tblastn searches can also change phylostratigraphic classifications when unannotated homologous genes are uncovered: ref 59 (I think, it is corrupted) can be used as a source on this. I recommend expanding Table 1 to rename existing column "Method" -> "Homology detection method", including noting whether existing gene annotations were used vs. a broader set of potentially genic sequences. Also in this column, "phylostratigraphy" is simply the application for blastp etc.: I recommend replacing the catch-all term with the specifics of what was used for homology detection

We have renamed the column “Homology Detection Method(s)”, and now give a detailed description of the methods used for each study rather than the general terms “phylostratigraphy” and “synteny.” We thank the reviewer for this suggestion which renders Table 1 much more informative.

6.I also recommend adding a column for "Function?" than includes conservation studies, as well as noting any individual genes for which KO phenotypes or other important functional evidence is available.

We have added this information in two new columns -- “Evidence of Selection” and “Evidence of Physiological Role” – in order to maintain the distinction between the evolutionary and genetic/biochemical definitions of function.

7.Under the column "# orphan/de novo genes", I do not understand the distinctions made: do they echo the terms used by the papers, rather than bring them into a common framework? The differences between orphan genes vs de novo genes vs TRGs vs young genes vs de novo gene candidates should be explained, and should follow consistent definitions across all studies in the Table.

To address this important semantic concern, we have added a caption to the table that defines how the different terms are used in the table, which is a common and consistent framework, and note the special terms (i.e. “candidate” and “proto-gene”) that reflect the language used by the authors.

8."It is difficult to ascertain whether a gene is undergoing gene death from sequence data alone, as an acceleration in the rate of evolution could be indicative of either pseudogenization or of positive (aka Darwinian) selection for a novel, adaptive trait. Functional approaches are therefore employed to determine the biological role of a particular gene product; if it is deleterious, as in the case of a de novo microRNA gene in D. melanogaster[67], gene death may be immanent." It's not particularly difficult to spot gene death for coding genes - if no longer functional by the evolutionary definition, they very rapidly acquire premature stop codons. Evolutionary rate in the dN/dS sense is a much less effective tool. Gene death of an ORF of any substantive length is rapid not just if a gene is deleterious but also if it is neutral.

We have modified the writing to clarify that we are referring to the difficulty involved in determining if a gene is in the process of undergoing gene death, and that gene death is indeed easy to confirm after the fact.

9.The section "Open Questions" reads to me more like a conventional review piece and not like an encylopedia article - I would cut it.

We agree and have removed this section.

10.Refs 58, 59, 77, and 79 are broken.

We have fixed these references.

11.Overprinting can also refer to short overlaps following C-terminal extensions that extend one pre-existing gene into the reading frame of a second nearby pre-existing gene: this should be disambiguated, i.e. the term overprinting sometimes but not always refers to a form of de novo gene birth.

We have added to the description of overprinting to make this clear.

12.Re BSC4, I suggest "functional intermediate state" -> "structurally intermediate functional state"

Per reviewer Anon’s comment, we have removed this phrase entirely.

13."A 2012 reanalysis of this data found that ~1,100 non-genic ORFs showed signatures of translation.[60]" This is an overstatement. The bar was very low, better described as ribosomal association than as translation. I realize it's still on biorxiv, but at this stage https://www.biorxiv.org/content/early/2018/05/24/329730 is nevertheless a better citation, with much more rigorous criteria for inferring translation. (I guess this is ref 58 which is broken.) Or, to move away from yeast while staying not just best-evidence but also peer-reviewed, a better citation (with suitable numbers of ORFs) is ref 90.

We have changed the language to “showed evidence of ribosomal association, suggestive of translation.” We have also added the recommended citation.

14. "enriched in potential adaptations relative to constitutively expressed sequences.[96]" should read "relative to completely non-expressed sequences". Ref 98 is probably more suitable here than ref 96, same general idea but more tailored to the case in question, only reason to include 96 is earlier publication, which is not really a reason for wikipedia as opposed to a conventional review.

We have made the requested change to the text, and changed the reference.

15. "The revealing of cryptic variation may be due to molecular errors including increased rates of translational readthrough and aberrant gene splicing events." This sentence refers to the application of preadaptation theories to other examples, not to de novo gene birth. Cut or replace with specific application to de novo gene birth as outlined in ref 92.

We have replaced with the specific application to de novo gene birth, as suggested.

16. For section organization, I don't think "Dynamics of de novo gene birth" should be a subsection of "Prevalence of de novo gene birth".

We thank the reviewer for her suggestions regarding section organization. After several reorganizations (see comments 17 and 18), we found that the subsection “Prevalence of de novo gene birth” seemed to fit best under the “Dynamics of de novo gene birth” section. We have renamed the other subsection “Estimates of de novo gene numbers.” We hope that this section now captures what we feel are two major aspects of the prevalence of de novo gene birth, namely the different estimates of the specific numbers of de novo genes as well as the birth/death rate of de novo genes.

17. "ORF first vs. transcription first", "“Out of Testis” hypothesis", and "Grow slow and moult model", which would do well grouped together in some broader section devoted to the order of events.

We have placed the “ORF first vs. transcription first” and “Out of Testis” sections as subsections under an order of events section, as suggested. The “grow slow and moult model” seems to fit best with the proto-gene and preadaptation models under pervasive gene expression.

18. I would also group the first two paragraphs of "Role of epigenetic modifications" with "Features of de novo genes", while the third paragraph belongs under "Pervasive gene expression" together with "Proto-gene model" and "Preadaptation model".

We have reorganized the relevant sections as suggested.

19. Link to and from the wikipedia page on "Overlapping gene".

We have made the requested addition.

Response to Anon R2

1. The very first paragraph (page 1) contains too many too long sentences and many of those have reverse logics. For examples the sentence "Although in principle ..." -- occurrence is not related to detection but of course older events are more difficult to detect because general signals of homology and synteny become blurred. I think this is what you are trying to say.

We have removed “in principle” to shorten the sentence and make it clear that occurrence is not related to detection. We have also split the last sentence in the paragraph into two sentences.

2. The following sentence insinuates that orphans are a subset of de novo genes but the opposite is in fact true… and above you seem to use the word "novel" synonymously to "orphan".

We have reorganized the first paragraph in order to provide more clarity regarding terminology. We have specifically emphasized that not all orphan genes have emerged de novo, i.e. that young de novo genes are a subset of orphan genes and not the reverse. We thank the reviewer for helping us sort through these difficult semantic issues.

We have also added a caption to our table that defines how some of the various terms are used.

3. When you say: "de novo genes evolve from 'ancestrally non-genic' DNA. What is meant by non-genic here? Does this include intronic sequences? Some authors also consider frame-shift mutations in protein coding-genes as de novo. The authors even state this on page 2 (P1L14-15

We have stated that overprinting is a “special case” of de novo gene birth, and now go into more detail in this section per review Masel’s comments. We have also now added discussion of exonization as another special case and included this in Figure 1.

4.'de novo genes ... play a role in the generation of evolutionary innovation' A citation is not provided. Are there many solid examples of this from the literature. Kalthurin et al. 2009 is often given as an example of the role of orphan genes in generating lineage specific novelty.

We have added the recommended citation, along with an additional one.

5. Break down last sentence. etc. etc.

The sentence referenced here has been split into two sentences.

6.Also 1st sentence next page "As early as ..." -- break down.

The sentence referenced here has been shortened.

7. The last 2-3 pages are not so well written as the beginning, the ms becomes more of an annotated bibliography.

Portions of the final pages have been rewritten, and the final section has been removed per reviewer Masel’s suggestion. We hope that the final pages read better now.

8.page 2: Fig 1 -- please remove exon shuffling. It does not occur anywhere else in the text and it simply does not exist in that context. It ounds good but is an urban legend. Maybe you mean domain rearrangements but recombination does not play a role there (see e.g. https://www.ncbi.nlm.nih.gov/pubmed/23376183 ) as all coming about of new combinations is a simple consequence of duplication, fusion and terminal losses. And exons do not correspond to domains (unless in artificial systems used for directed evolution).

We have removed any reference to exon shuffling.

9. Overprinting: could you pls distinguish between in-frame, antisense and frame-shifting overprinting? I think the virus examples (Karlin et al.) is out of frame.

We thank the reviewer for pointing out this lack of clarity in the submitted review, and have made these distinctions clear in the revised text.

10. I am not aware of any evidence of "starter type exons" [16]. I think current believe is that small fragments of peptides came first and later combined to form larger proteins in the early stages of life. There is recent literature on that by Andrei Lupas's lab and some also by Dan Tawfik's.

We have changed the writing here to indicate that the notion of “starter type exons” was a view held at the time, emphasizing how the sequencing revolution (among other things) challenged this view.

11. With all due respect -- but the "Big Bang" theory is pure nonsense. Sounds good but is an urban legend. Again, see Lupas and Tawfik. And a "space" can never expand since sequence space is metric and the dimensions are fixed. (The story may be different in a space- time continuum with possibly inverted boundaries but this is no longer molecular biology but astrophysics.)

We thank the reviewer for this comment and have removed this portion.

12. page 3: "... extremely short (<90 bp) ... RNA genes" Not necessarily. See e.g. Galindo M Plos Biol 2007, Hsu+Benfey Proteomics 2017, Andrews+Rothnagel NatRevGenet 2013) and there was also a larger original Science paper ca. 3 yrs ago which I can't find unfortunately

We have added the suggested references to highlight examples of small, functional peptides.

13. Ad ref 29: this is nice work but (see also below) I think common believe nowadays is that disorder is not a predictor of a (non-)functional protein and so there is not "in-between". (see also below)

We have removed this reference.

14. "... all five genes have partial orthologs ..." What is meant by this? Surely the genes are either othologs or they are not? Or, is a portion of the sequence (e.g. a protein domain) orthologous between species. If this is the case should these sequences really be considered de novo genes?

We have changed the text here to clarify precisely what was found in this study.

15. "The first de novo ... in mice ..." I think it should be mouse as plural would insinuate using different species or population of Mus.

This gene is in fact present in multiple Mus species, so we have kept the plural mice in the revised version.

16. p4: around ref 38 -- you could mention that also more advanced methods have been used too, such as PSI-BLast or HMMs generated from e.g. de novo genes in 3 sister species to search against the nearest outgroups.

We thank the reviewer for this suggestion and have added mention of PSI-BLAST towards the end of the phylostratigraphy section.

17. Just a small side note: it could be mentioned that synteny works very well in vertebrates but not at all in insects.

We now make mention of this.

18. Using BLAST to detect 'ancestral homologs'. This should be inferring the presence/absence of ancestral homologs by detecting homologs in extant sister taxa. That is, sequences in living species are not ancestral to each other.

We thank the reviewer for noting this imprecision of language and have made the suggested correction.

19. p5: The classification of de novo genes into types based on how much of their sequence is derived from ancestrally non-coding DNA was proposed by McLysaght and Hurst (2016). Has this terminology caught on outside of this one paper? I can't think of any other papers in which it is used.

We have kept the reference, but cut much of the detail, as part of a general rearrangement of this section per reviewer Masel’s comments.

20. p6: ref 59 is garbled -- I assume you refer to https://www.ncbi.nlm.nih.gov/pubmed/30346517 here -- if not yet please do so.

Yes, that is the reference, it has now been fixed.

21. Last sentence of the paragraph ("Generally speaking ...") I fully agree but I think the story is quite simple: de novo arise much more often than single-gene duplicates but are also much more often lost, often even at the proto-gene or transcript level. Over -- say -- more than 100 my, the vast majority of novel genes will stem from duplications. So what we are really interested in, is actually how de novo manage to be retained -- is it neutral drift, small populations allowing also slightly deleterious mutations or how else they become useful -- eventually.

We have added language to this section that helps clarify the point that the debate may be in part resolved by the difference in retention rates between the different types of novel genes.

22. "Feature of de novo genes": Pls remove or place somewhere else the first sentence -- it does not lead well to the following one, or maybe swap the two.

We have changed the first sentence to make it simpler and to lead better into the second sentence.

23. ORF 1st vs trc. 1st: most <-- must

We have fixed the typo.

24. mind that Poldi is an RNA gene

Thank you; Poldi was indeed a bad example for the point we were trying to make and we have removed it.

25. P2L6 possible typo immanent -> imminent

We have fixed the typo.

26. p8 Proto gene model: thESE data (is plural)

We have fixed the typo.

27. Around the penultimate paragraph "Most non-genic ORFs that are ...": You may want to refer to the paper https://www.ncbi.nlm.nih.gov/pubmed/29133927 which i think settles the debate about random sequences being structured or not and functional or not. Tretyachenko et al. (Scientific Rep. 2018) suggested that (i) random sequences may well fold but that (ii) also highly disordered proteins may be functional, as long as they do not aggregate. Taking these and a couple of other evidence on the function of disordered proteins (e.g. as phase separation in the cell), it should be noted that there is absolutely not evidence or necessity to assume that (i) proteins have to become ordered and (ii) that any possible signal of positive selection is related to structural changes. On the contrary, disorder may be very well functional and promote important functional switches following e.g. phosphorylation or contribute to phase separations.

We have added a description of the Tretyachenko findings along with another citation supporting the functionality of disordered proteins.

28. p9 I know this is supposed to be a balanced review but ref 97 (Wilson) used totally odd ID parameters (see reanalysis in Schmitz et al. 2018 Nat Ecol Evol) which biased the interpretation and as said above most of the proposed genes are not de novo

We have qualified our description of these results with the caveat that the ISD trend, specifically, was not observed over shorter time scales, per Schmitz et al.

29. "Out of testis": Regarding the out-of-testis hypothesis and the general lack of functional evidence of de novo genes it might be worthwhile to consider https://www.ncbi.nlm.nih.gov/pubmed/28104747 , although the de novo genes are very old. (N.B.: we have expressed and structure analysed the protein and it looks nowhere close to an existing one but reasonably as it contains structural elements which are known to be emerging quite easily from random sequences

Thank you very much for this suggestion, we have added the reference along with a brief description of the findings.

30. p9/10 Grow slow: In Klasberg et al. (FEBS J 2018) it was shown that novel fragments can be obtained either terminally or by exonisation from intronic regions. (The latter was btw. also shown in a completely different context in https://www.ncbi.nlm.nih.gov/pubmed/26541172). Interestingly, the novel fragments did again show no signs of adaptation but the hosting proteins showed signs of purifying selection, indicating that (maybe) the host protein has to fend off the slightly deleterious effects of new and not yet optimised fragment. Disturbingly, even though the synteny should be very simple, no signs of the novel fragments were found in the outgroups (see also on my above comment on insect synteny).. This and another study (https://www.ncbi.nlm.nih.gov/pubmed/30247639) are extensions of the "slow grow ...." model. But it may be too specific and detailed to mention.

We have added both of the references mentioned, along with a brief description of the findings, to the “grow slow and moult” section.

31. p10 Methylation: Keller + Yi, PNAS 2014

This reference is specific to duplicated genes and therefore doesn’t seem directly relevant.

32. Again, feel free to keep this as is but personally you will find everything related to cancer -- rigorous statistical tests are lacking and given the highly entangled cellular networks every proteins is just 2-3 interactions away from a couple of oncogenes

While we generally agree with this criticism, we think that, for the purposes of a review, it is OK to mention that many de novo genes have been linked to cancer.

33. p11 ref 120: could you check if this is the gene which has meanwhile been found to have been introgressed from Neanderthals?

We thank reviewer Anon for helping us confirm that the gene referenced here was not introgressed from Neandertals.

34. "... but are enriched for transcription factors" what precisely is enriched for what?

We have changed the text to “enriched for genes involved in transcriptional regulation relative to other functional classes” to make it more clear.

Response to Diethard Tautz

1. A specific account in the history of discovery of de novo gene evolution is provided in:

Tautz D. The discovery of de novo gene evolution. Perspect Biol Med. 2014;57(1):149-161. doi: 10.1353/pbm.2014.0006. PMID: 25345708

Thank you, we have added the suggested citation.

2. the currently most comprehensive paper on this would be: Samandi S et al. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. eLife. 2017;6. pii: e27860. doi: 10.7554/eLife.27860. PMID: 29083303

We have added the suggested citation.

3. another rather influential paper was: Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992;357(6379):543-4. DOI: 10.1038/357543a0; PMID: 1608464

We have added the suggested citation along with a brief description of the findings.

4. while these southerblot experiments were historically important to get the paper published, they would nowadays not be considered as good evidence. Given that a de novo gene is expected to evolve out of non-coding sequences which would still show up in southernblots, the absence of a southernblot signal could actually suggest a horizontal gene transfer instead of a de novo gene evolution event. Accordingly, it would be better to emphasize here that the authors had actually shown synthenic alignments with related species iin their suppl. Figure 5 of this paper

Thank you, we have removed the Southern blot reference and emphasized the synteny result.

5. should also add: "functionally characterized"

We have made the suggested change to the text.

6. no, it is not really the "evolutionary distance", but a predetermined phylogeny that is used (there has been some confusion about this, since some people got the impression that the BLAST scores were used for constructing the phylogeny, but this is not what is doen in phylostartigrphy

Thank you for pointing this out, we have rewritten this portion to make it more accurate.

7. note that the use of the term "orphan" is probelmatic, since it requires always the naing of the lineages that are considered as comparison. As discussed in ref 63, this implies that all genes that can not be traced to LUCA are orphan genes to some extent

While we agree with the problems associated with this term in general, for the purpose of this review we have defined the term “orphan” as synonymous with “species-specific” along with the caveat that it is dependent on the group of species being examined.

8. strictly speaking it is the "status whether it is a orphan gene or not"

We have changed the text to reflect this.

9. replace "are often" with "can be", since most can actually be detected

We have made the requested change.

10. better: "was claimed to be detected"

We have made the requested change.

11. I think it would be important to make a clear disticntion here between "de novo evolution" and "orhan genes". All the critique about BLAST artefacts relate to very old events, while the demonstration of tru de novo evolution should always include a synteny step with ancestral non-coding regions, i.e. this excludes all the old events anyway. Hence, the "de novo evolution" question should be disentangled from the question of the detectability of the oldest ancestor of a gene

12. again, given that this review is about de novo evolution, not about phylostratigraphy statistics, one should be careful to not intermix these questions

We have rewritten the phylostratigraphy section to address criticisms 11 and 12, making clear that phylostratigraphy is used for determining a gene’s age but not its mechanism of origin.

13. note that this issue was carefully discussed in ref 63

We have added the recommended reference.

14. I would generally recommend to not cite BioRXiv papers ins uch articles. A published paer that looks at the tissue distribution (among other aspects) is: Bekpen C, Xie C, Tautz D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol Biol. 2018 Aug 3;18(1):121. doi: 10.1186/s12862-018-1232-z. PMID: 30075701

We have added the recommended reference. With permission from the journal editor, we have however decided to keep the few existing bioRxiv references, one of which was suggested by another reviewer.

15. note that at least in vertebrates there is an issue with the adaptve immune system in de novo gene emergence, as discussed in: Bekpen C, Xie C, Tautz D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol Biol. 2018 Aug 3;18(1):121. doi: 10.1186/

We have added this reference and discussion of the findings to the “Features of de novo genes” and “Out of Testes” sections.

16. this point has been made by many authors - why were these two citations selected?

We have added two additional review citations that make this same point.

Second Review by Joanna Masel

Major comments:

Table 1 evidence for selection is often unclear - what kind of evidence is referred to? Eg, for positive selection, is it Ka/Ks>1, and is so is it for the entire sequence, a portion of the sequence, a concatenation of all the sequences in some class, or something else such as coding scores? It is also important to note that there are serious issues with attempting to use Ka/Ks in orphan genes, which the main text describes without comment as simply being "calculated from different strains of the focal species" (language that works for microbes such as yeast, but not for all species in the table). When polymorphism data is used, that is normally referred to as pN/pS rather than dN/dS or Ka/Ks. dN/dS and pN/pS are not expected to be equal (this is the basis of the McDonald-Kreitman test), and deviations from 1 are more difficult to pick up with polymorphism data. Importantly, there is essentially no chance of ever picking up a signal for individual candidates, only for collections of sequences as a whole, something that it is important to note. Instead, the text refers to "the sequence in question", giving the false impression that pN/pS signals are being found specific to individual candidates. I think it is also worth noting that because of this problem, it is far easier to establish selection for de novo TRGs than it is for de novo orphan genes.
Point #8 of my previous review has not been satisfactorily addressed. The text still reads "an acceleration in the rate of evolution could be indicative of either pseudogenization or of positive (aka Darwinian) selection for a novel, adaptive trait", but such an acceleration will never be seen during the process of pseudogenization, because a premature stop codon is expected prior to accumulating enough changes to be able to detect such an acceleration. Indeed, references throughout the article to conservation all still seem to point to dN/dS or pN/pS (as best I can tell - see comment above), and there is still no mention of conservation of the boundaries of the ORF as a metric. Similarly, the text still reads "if it is deleterious..., gene death may be imminent", which fails to take into account that gene death is also fairly imminent given mere neutrality.
There was no response to my previous comment, repeated here: "The section "Features of de novo genes" also has NPOV issues. Length and evolutionary rate are susceptible to homology detection bias - this caveat should be mentioned. The authors' finding of low ISD in young yeast genes in ref 60 is given without caveat, even though ref 97 analyzed the exact same data to show that if non-genes are excluded, young yeast genes actually have high ISD, consistent with all evidence in the literature not using the phylostratigraphy of ref 60 and hence not biased by its tendency to include non-functional genes. Lower expression is also susceptible to the inclusion of non-genes." The relevant refs are now 74 and 126.
Re point 13, the language "suggestive of translation" is still too strong. In ref 117, we analyzed exactly the same 2009 data as you did in the cited ref 74. Manually inspecting ribohits to all of the transcripts, we found only a single ORF (RDT1) among them that really had evidence for translation. In wikipedia terms, when two papers analyzed the identical data and reached different conclusions, you should present either both perspectives or neither in order to conform to NPOV. The literature record not only of ref 117 vs. 74, but also of subsequent work with more stringent criteria for translation (like ref 72, which used better data to produce more candidates for translation than ref 117 but less than ref 74) that a single ribohit to an ORF, shows that there is not scientific consensus that ribosomal association alone is enough to suggest translation. More accurate would be "suggesting the opportunity for translation", and let ref 72 give the much lower number actually likely to be translated.
"When translated, non-genic ORFs become “proto-genes,” with features intermediate between genes and non-genes." should be modified to say that "The proto-gene model proposes that...". Ref 126 points out that the observation of a smooth relationship of mean properties with conservation level does not prove intermediate features along a continuum, because this empirical observation is also expected under a model of distinct genes and non-genes in variable proportions as a function of age. A NPOV should therefore only refer to the proto-gene model as a model, not as something that has been shown by that data.
"young genes have higher ISD than old genes, while random non-genic sequences tend to show the lowest levels of ISD[126], although the observed trend may have resulted from a subset of young genes derived by overprinting[73] and was not observed over shorter timescales". Higher ISD in young genes was also seen among overlapping gene pairs (http://www.genetics.org/content/210/1/303), i.e. while the elevation may be partly due to overprinting, the evidence is that it is not entirely so. In mouse, high ISD has now been confirmed on shorter timescales, specifically genes shared by Mus musculus and Mus pahari (http://www.genetics.org/content/early/2019/01/28/genetics.118.301719). In the light of this published evidence, the statement here does not conform to NPOV.

Minor comments:

Table 1 "Ant-restricted orthologs" If they are restricted to ants, what are they orthologous to?
Typo "translatomoic"

JoannaMasel (talk) 12:38, 15 March 2019 (PDT)

Response to Second Review by Joanna Masel

1. Table 1 evidence for selection is often unclear - what kind of evidence is referred to? Eg, for positive selection, is it Ka/Ks>1, and is so is it for the entire sequence, a portion of the sequence, a concatenation of all the sequences in some class, or something else such as coding scores? It is also important to note that there are serious issues with attempting to use Ka/Ks in orphan genes, which the main text describes without comment as simply being "calculated from different strains of the focal species" (language that works for microbes such as yeast, but not for all species in the table). When polymorphism data is used, that is normally referred to as pN/pS rather than dN/dS or Ka/Ks. dN/dS and pN/pS are not expected to be equal (this is the basis of the McDonald-Kreitman test), and deviations from 1 are more difficult to pick up with polymorphism data. Importantly, there is essentially no chance of ever picking up a signal for individual candidates, only for collections of sequences as a whole, something that it is important to note. Instead, the text refers to "the sequence in question", giving the false impression that pN/pS signals are being found specific to individual candidates. I think it is also worth noting that because of this problem, it is far easier to establish selection for de novo TRGs than it is for de novo orphan genes.

We thank the reviewer for pointing out several issues related to insufficient clarity and detail in the discussion of selection. To address this, we have made several changes to the text:

-In the “evidence for selection” column of Table 1, more detail has been added for many of the studies. In each case for which evidence of selection exists, the table now states the specific evidence. Additionally, care has been taken to correctly indicate whether Ka/Ks or pN/pS was used.

-We now make greater note of the issues of surrounding detection of selection in young genes, giving an example of de novo genes with physiological roles for which selection cannot be detected, and giving other examples of signatures of selection that may be used, including ORF boundaries.

-We use the term populations along with strains in order to generalize the assertions to all species.

-We note that detecting selection is very challenging for species-specific genes.

2. Point #8 of my previous review has not been satisfactorily addressed. The text still reads "an acceleration in the rate of evolution could be indicative of either pseudogenization or of positive (aka Darwinian) selection for a novel, adaptive trait", but such an acceleration will never be seen during the process of pseudogenization, because a premature stop codon is expected prior to accumulating enough changes to be able to detect such an acceleration. Indeed, references throughout the article to conservation all still seem to point to dN/dS or pN/pS (as best I can tell - see comment above), and there is still no mention of conservation of the boundaries of the ORF as a metric. Similarly, the text still reads "if it is deleterious..., gene death may be imminent", which fails to take into account that gene death is also fairly imminent given mere neutrality.

We thank the reviewer for helping us further clarify language regarding this issue, and pointing out the problematic logic of the paper we had previously cited. We have removed this reference entirely, shortening the “Dynamics of de novo gene birth” section. Additionally, we now mention conservation of ORF boundaries as an important alternative metric of conservation, under the “Determination of de novo status” section, and thank the reviewer for pointing out this previous omission.

3. There was no response to my previous comment, repeated here: "The section "Features of de novo genes" also has NPOV issues. Length and evolutionary rate are susceptible to homology detection bias - this caveat should be mentioned. The authors' finding of low ISD in young yeast genes in ref 60 is given without caveat, even though ref 97 analyzed the exact same data to show that if non-genes are excluded, young yeast genes actually have high ISD, consistent with all evidence in the literature not using the phylostratigraphy of ref 60 and hence not biased by its tendency to include non-functional genes. Lower expression is also susceptible to the inclusion of non-genes." The relevant refs are now 74 and 126.

We sincerely apologize for this omission in our previous response. A discussion of how homology detection methods influences these types of inferences is already included in the section “genomic phylostratigraphy” of our manuscript. We now cross-reference this discussion in the section “features of de novo genes”, making abundantly clear that this issue is relevant to length, evolutionary rate and expression level.

We respectfully disagree with the reviewer statement that “ young yeast genes actually have high ISD, consistent with all evidence in the literature not using the phylostratigraphy of ref 60”. In addition to Carvunis 2012, the following papers also show low ISD in young S. cerevisiae ORFs: Ekman et al., JMB 2010; Basile et al., PLOS Comp Bio 2017; Vakirlis et al., MBE 2018; none of these papers used the phylostratigraphy of Carvunis 2012. In fact, Ekman 2010 predates Carvunis 2012, and Vakirlis 2018 uses a synteny approach. As far as we know, the re-analysis by the reviewer of the data generated by Carvunis 2012 is in fact the only publication to claim that young yeast genes have high ISD. “All the evidence in the literature” that the reviewer mentions pertains to various lineages, but not budding yeast (this is why we discuss this in a section entitled “lineage-dependent features”). Nevertheless, we now mention Fig. 6a of the manuscript by the reviewer at the end of the section to ensure that we cite this paper here too.

4. Re point 13, the language "suggestive of translation" is still too strong. In ref 117, we analyzed exactly the same 2009 data as you did in the cited ref 74. Manually inspecting ribohits to all of the transcripts, we found only a single ORF (RDT1) among them that really had evidence for translation. In wikipedia terms, when two papers analyzed the identical data and reached different conclusions, you should present either both perspectives or neither in order to conform to NPOV. The literature record not only of ref 117 vs. 74, but also of subsequent work with more stringent criteria for translation (like ref 72, which used better data to produce more candidates for translation than ref 117 but less than ref 74) that a single ribohit to an ORF, shows that there is not scientific consensus that ribosomal association alone is enough to suggest translation. More accurate would be "suggesting the opportunity for translation", and let ref 72 give the much lower number actually likely to be translated.

In order to avoid getting into a technical discussion of Ribo-seq analysis in this section about models of gene birth, we have shortened the “proto-gene model” section and kept the discussion at a more theoretical level (also in keeping with Reviewer comment 5).

5. "When translated, non-genic ORFs become “proto-genes,” with features intermediate between genes and non-genes." should be modified to say that "The proto-gene model proposes that...". Ref 126 points out that the observation of a smooth relationship of mean properties with conservation level does not prove intermediate features along a continuum, because this empirical observation is also expected under a model of distinct genes and non-genes in variable proportions as a function of age. A NPOV should therefore only refer to the proto-gene model as a model, not as something that has been shown by that data.

We thank the reviewer for pointing out that there is not consensus on the proto-gene model, and that the above statement is indeed merely true according to the proto-gene model, and have made the requested change to the text.

6. "young genes have higher ISD than old genes, while random non-genic sequences tend to show the lowest levels of ISD[126], although the observed trend may have resulted from a subset of young genes derived by overprinting[73] and was not observed over shorter timescales". Higher ISD in young genes was also seen among overlapping gene pairs (http://www.genetics.org/content/210/1/303), i.e. while the elevation may be partly due to overprinting, the evidence is that it is not entirely so. In mouse, high ISD has now been confirmed on shorter timescales, specifically genes shared by Mus musculus and Mus pahari (http://www.genetics.org/content/early/2019/01/28/genetics.118.301719). In the light of this published evidence, the statement here does not conform to NPOV.

We thank the reviewer for pointing out these recent publications, and have updated the text accordingly.

7. Table 1 "Ant-restricted orthologs" If they are restricted to ants, what are they orthologous to?

This study looked at seven Formicidae (ant) species with recently sequenced genomes and found that “that 1:1 orthologs that are restricted to Formicidae evolve 2–3 times faster than nonrestricted 1:1 orthologs, and that in some cases, such sequence divergence may be driven by positive selection.” For clarification purposes, we have changed the entry in the table to “Some Formicidae-restricted orthologs appear under positive selection.” We have also change “ants” to Formicidae in the Homology Detection Method(s) column.

8. Typo "translatomoic"

Thank you, we have corrected the typo.