Talk:WikiJournal of Science/Design effect
Add topic
WikiJournal of Science
Open access • Publication charge free • Public peer review • Wikipedia-integrated
Previous
Volume 1(1)
Volume 1(2)
Volume 2(1)
Volume 3(1)
Volume 4(1)
Volume 5(1)
Volume 6(1)
This article has been through public peer review.
It was adapted from the Wikipedia page Design_effect and contains some or all of that page's content licensed under a CC BY-SA license. Post-publication review comments or direct edits can be left at the version as it appears on Wikipedia.
First submitted:
Accepted:
Article text
PDF: Download
DOI: 10.15347/WJS/2024.004
QID: Q116768211
XML: Download
Share article
Email
| Facebook
| Twitter
| LinkedIn
| Mendeley
| ResearchGate
Suggested citation format:
Tal Galili (5 May 2024). "Design effect". WikiJournal of Science 7 (1): 4. doi:10.15347/WJS/2024.004. Wikidata Q116768211. ISSN 2470-6345. https://upload.wikimedia.org/wikiversity/en/6/6d/Design_Effects.pdf.
Citation metrics
AltMetrics
Page views on Wikipedia
Wikipedia: This work is adapted from the Wikipedia article Design effect (CC BY-SA). Content has also subsequently been used to update that same Wikipedia article Design effect.
License: This is an open access article distributed under the Creative Commons Attribution ShareAlike License, which permits unrestricted use, distribution, and reproduction, provided the original author and source are credited.
Editors:Alex O. Holcombe contact
Reviewers: (comments)Charles DiSogra
anonymous
Article information
Plagiarism check
Pass. Report from WMF copyvios tool detected only trivially short phrase overlap (e.g. "estimator for the variance of the weighted mean"). T.Shafee(Evo﹠Evo)talk 22:33, 8 February 2023 (UTC)
Editor notes
In the ''Common types of weights'' section, the term "reliability weights" probably needs a definition (was mentioned in the WP version). T.Shafee(Evo﹠Evo)talk 22:30, 8 February 2023 (UTC)
- Thanks for the comment T.Shafee(Evo﹠Evo).
- I've added a new reference for the definitions of types of weights (this one), and also decided to remove the term "reliability weights" from the article. This is because I couldn't find a good reference that defined it. For a discussion on this in crossvalided (stats in stackoverflow), see here. For the changes I've made, see here. Talgalili (discuss • contribs) 18:00, 2 September 2023 (UTC)
The preprint, as well as the current Wikipedia article, launches into technical language quite quickly, which isn’t ideal according to Wikipedia’s “Provide an accessible overview” guideline for what is called the lead section. Please attempt to provide a more accessible lead section. --Aoholcombe (discuss • contribs) 08:20, 4 June 2023 (UTC)
- Thank you Aoholcombe, I agree with your comment. I've made a new abstract, and updated the introduction accordingly (you can see the changes here) Talgalili (discuss • contribs) 08:03, 31 August 2023 (UTC)
- @Talgalili Second peer reviewer has submitted a second round of comments. Please view the PDF in this section and respond. Thanks. OhanaUnitedTalk page 05:44, 5 January 2024 (UTC)
First peer review
Review by Anonymous expert solicited by the handling editor ,
These assessment comments were submitted on , and refer to this previous version of the article
Comments were provided as a PDF and uploaded by the editor here: https://en.wikiversity.org/wiki/File:Comments_on_Design_effect_articleAnonymized.pdf
Item 1
Thanks, I'm happy to expend the article with your feedback to add more sources other than quotes from Kish.
Item 2
Thanks, good point. I've added a mention of "measure of interest" in the abstract. Also, the introduction includes a clear mention of how the estimator of interest is intertwined with the definition of the Deff.
Item 3
Thanks, good point. I've standardize the various notations of Deff across the article.
Item 4
Thanks, this is a great point. I've revised the section on Deft to discuss how both the without-replacement aspect of the design might be ignored not just in the denominator but also the numerator. I've added the examples you've provided there.
Item 5
This sentence basically meant that if we did Deff*var_SRS we would get the variance that includes all the complexities of the design. But after giving it some thought, this doesn't seem to add more information or clarity beyond what's already written, so I've removed it. I did reference this briefly in the "uses" section, so to indicate the Deff is not likely to be used for building confidence intervals.
Item 6
This is good feedback. I've fixed the sentence so it makes it very clear that the Deft cannot be generalized across statistics and measurements. I also moved it to be a note (instead of a paragraph in the article). The reason I still kept it as a note is since it's (IMHO) a worthwhile comment about the original hope of the Deft (to be generalizable), and to see how followup works have not supported that aspiration.
Item 7
Agreed. I've removed the "put differently" paragraph. Instead, I've added a clearer example with specific estimator and numbers. I also moved the Kish's formula from this section to the section of Kish's design effect.
Item 8a
I've retitled the section to "Design effect depends on sampling design and statistical adjustments" and tried to improve the leading section to make it clearer. I also added a reference to "Introduction to Variance Estimation"
Item 8b
I've improved the section describing the impact of estimating sampling design aspect (e.g.: post stratification etc.).
Item 8c
Relating to the comment about "the Sources for unequal selection probabilities section" - thank you for the correction! I've fixed the text (to move to talk about using either SRS of selection of clusters in the first stage, or the second method which you provided). I also added the notations you've provided as a note in the text.
Item 9
Thanks, this is very helpful. I've fixed the text to make it clearer, and also added the example you've provided as a note in the text.
Item 10
To clarify I've added to the text: "Adjustments for non-coverage can lead to unequal survey weights.". If I understand correctly, non-coverage leads, by definition, to unequal probability of selection - since it means that the sampling frame has some items with some positive probabilities, and other items that have 0 probability of selection.
Item 11
Fair point. I've removed the term ad hoc from this section, and attempted to clarify the sentence further.
Item 12
Thanks. I've added a note with this clarification.
Item 13
This is great input. I've removed most of Kish's comments from the usage section, as well as moved some of his claims to the "History" section, where I mentioned some of Kish's original intent for Deff, and how the applicability has diminished nowdays that we have more sophisticated software
Item 14
Added citations to the software implementations you've mentioned (and also added citations to the ones already present)
Item 15
Thanks, I've added the relevant Deff papers.
Second peer review
Review by Charles DiSogra , Freelance Consultant
These assessment comments were submitted on , and refer to this previous version of the article
Accuracy
- Is anything incorrectly stated?
- No
Great, thanks.
- Do the references support the statements being made?
- Yes although I could not locate the 2006 reference for “Park and Lee” in the “Alternative definitions” section.
This is citation number 4, it also includes a link to a pdf of the paper.
- Are any important recent papers missed?
- Not to my knowledge
Great, thanks.
- Are any references out of date or obsolete?
- Not to my knowledge however, much of the work has been around for a while. Ask Dr. Raphael Nishimura at Univ. of Michigan (survey sampling statistician) for thoughts overall and most recent papers, if there are any.
Thanks. I sent him an email now, and hope to get his feedback soon. If so, I'll happily include it in the paper. TODO: give an update.
Balance
- Does it reflect current thinking in the field?
- Yes - Article is as much about weighting as it is about Deff
Great, thanks. In the future, it might make sense to split the article to other articles. But I think it's current shape is a good framing.
- Is anything important missing (or cherry-picked)?
- There is no mention of replicate weighting.
Thanks. After giving this some thought - I've decided to mention them in the uses section, when discussing how the Deff is not likely to be used for building confidence intervals (there I mentioned how an alternative to this could be to use the replication weights.
- No discussion using trimming methods to reduce Deff for analysis purposes.
Thanks. I added a mention of this to the "uses" section.
- Mention of Neyman’s optimal allocation requires a reference in section Unequal selection probabilities, number 1, first bullet.
Thanks. Done.
- Are viewpoints given due weight given the existing literature?
- I would think so.
Great, thanks.
- Are any conclusions/perspectives/outlook/opinions/originalresearch clearly indicated?
- Nothing indicated as original research but a decent tour of existing concepts.
Great, thanks.
Accessibility
- Is the language clear and unambiguous?
- Mostly. I did some editing for plurals and use of articles in sentence construction but these were minor. Use of “e.g.” seems like a lot but makes sense when there are examples to be mentioned.
Thanks for edits. I reviewed all of them, and agreed with all of them.
- Are any diagrams misleading or incomplete?
- n/a
NA
- Is the work written such that a knowledgable generalist can understand it?
- Absolutely not. Knowledge of statistics, especially sampling statistics is necessary.
Thanks, I agree. But I think it should be clear enough for reasonably statistically oriented readers.
- Is the abstract/lead understandable to a general audience?
- Somewhat. It doesn’t mention that the effective sample is the size to be used when conducting statistical texts using weighted data.
Thanks, I agree. I've added a new abstract (lead), that should be accessible for the general public. It also mentions the use of the design effect in sample size determination (but without going into details, so to keep it "light" enough).
- 4th paragraph of intro change “quantifying the representative of a sample” to “quantifying the representativeness of a sample”.
Thanks, I see you've already fixed it.
- Does the lay summary (if included) capture the key points of the work while being understandable to a reader with only secondary school background?
- No. Reading level is 17.9 years, that would be >college.
Good point! In the abstract I added I simplified the language to adjust it to a reader with secondary level background.
Some other points:
- In text, “trace back” should be two words (as opposed to Python coding language which makes it all one word)
Thanks, I see you've already fixed it.
- Should “Formula” section be “Formulae”? … or “Formulas” if the Latin is to be ignored.
Thanks. Changed to "Formulas".
- First proof box under “Formula” needs to be lengthened to encompass the full length of the proof.
Thanks. Done.
Second round of comments
Review by Charles DiSogra , Freelance Consultant
These assessment comments were submitted on , and refer to this previous version of the article
In general - thanks a lot for your second round of feedback, it's helpful in making this manuscript better - much appreciated!
Item 1
Deff smaller than 1 occurs when using stratified sampling with known a-priori knowledge (e.g.: stratum size, and variance within each stratum of the outcome of interest). This has the advantage of getting a sample with reduced variability, as compared with a SRS. I acknowledge that this is generally the more rare case for most practitioners, so I can add it to the text (although I don't have a citation for this statement, so I prefer to not state it in the abstract and leave it as is). I've added the following sentence to the text (second paragraph in the intro): "Intuitively we can get when we have some a-priori knowledge we can exploit during the sampling process (which is somewhat rare). And, in contrast, we often get when we need to compensate for some limitation in our ability to collect data (which is more common).". I also added the specific example of stratified sampling in which Deff is smaller than 1.
Item 2
Linked to Deft definition. Also fixed study->studies, strata->stratum.
Item 3
Non coverage could be corrected theoretically, but obviously not always. I've added more clarification on this (and mentioned that the weights might be “inadequate”).
Item 4
Fixed: pick up -> answer. Also things around section “4. statistical adjustments”
Item 5
Fixed: each strata -> each stratum. Regarding Ii, I clarified that it MAY be non-independent. It doesn't have to be, but the point I wanted to make is that EPSEM may or may not be independent (that it's about the marginal probability only).
Item 6
stratum -> strata. Thanks for keeping an eye about the correct use of strata/stratum and data/datum. I went through and tried my best to fix all discrepancies.
Item 7
"neff should correctly be Deff" - I think it should be Neff (N effective).
Item 8
Regarding Kish's Deff formula - I added clarification. The point of that section ("Assumptions and proofs") is to show that Kish's Deff can be derived from a model based perspective, since it's a relatively clean proof (and it's indeed added in the paper).
Item 9
Thanks for taking a look at the formulas for the two deffs (spencer and lee)
Peer review from editorial board member with more mathematical expertise
Comment 0
Comment 0
- Hi, I am the editorial board member who had a readthrough. I am pro acceptance, but there were a few smaller mistakes/unclarities which should be fixed or clarified, see list below:
Response 0
Thanks a LOT for your review. I appreciate all the mistakes you caught, and fixed all of them. Thank you so much! Talgalili (discuss • contribs) 20:09, 26 March 2024 (UTC)
Comment 1
Comment 1
- “When Deff>1, then the data collected is not as accurate as it could have been if people were picked randomly. On the other hand, if Deff<1, then the data is even more accurate than a simple random sample.”
- If don’t think “accurate” is correct here, since the collected data in itself can’t be more or less accurate. It’s about the data in relation to the population. So I would suggest writing the following (which is more clunky, but also more precise)
- “As a result, an analyst cannot estimate a with replacement variance for the numerator even if desired. The standard workaround is to compute a variance estimator as if the PSUs were selected with replacement.“
- Is this correct, or should one of the “with replacement”s be “without replacement”?
Response 1
I agree that the word "accurate" is wrong here, since indeed the data is not more or less accurate, but rather the inference made with it. I would rather not use your proposed alternative since one of the requests in the template is that: "the lay summary (if included) capture the key points of the work while being understandable to a reader with only secondary school background?" So I worked to keep the text level of the abstract to be in a relatively basic level.
Instead, I propose to change the text to be:
“When Deff>1, then inference from the data collected is not as accurate as it could have been if people were picked randomly. On the other hand, if Deff<1, then the inference is even more accurate than it would have been if a simple random sample was used.”
What do you think? Talgalili (discuss • contribs) 17:00, 24 March 2024 (UTC)
Comment 2
Comment 2
- The two paragraphs starting with “When the sampling design is not known upfront” and “When the sampling design isn’t set in advance” seem to be essentially duplicates. I don’t have a clear favourite, but it seems like the top version might be the more recent one, so maybe that one should stay.
Response 2
Great catch, thanks! I've removed the first version and kept the second one.
Comment 3
Comment 3
- The author gives a formula for Kish’s design effect as
- Deff = \frac{n \sum_{i=1}^n w_i^2}{(\sum_{i=1}^n w_i)^2}
- and then another version where both parts of the fraction are divided by n^2
- These are of course equivalent, but the two proofs that follow would be shorter and kind of nicer if the first version were used, instead of the second one.
Response 3
Fair point. In the definition, I'll keep both versions (as they are used interchangeably in the literature). However, I've now simplified both proofs so that they'll use the faster-to-get-to version of the formula. Thanks.
Comment 4
Comment 4
- The proof in section “Assumptions and proofs” has little numbers on top of every equal sign; the numbers 6 to 11 are not needed
Response 4
Fair. I've simplified it further.
Comment 5
Comment 5
- First paragraph of section on Spencer’s Deff says “Each item has a probability of p_k (k from 1 to N) to be drawn in a single draw”. Should that not be “M” (the population size) instead of “N”? If it is indeed N, then it should be defined what N is.
Response 5
Good catch, thanks. I've moved the notation in the whole section to use n and N (instead of mixing it with m and M).
Comment 6
Comment 6
- Still in the section about Spencer’s Deff, it says that “Only if the variance of y is much larger than the mean then the right-most term is close to 0.” That is true, but the content of the paranthesis that follows is wrong. It’s 1/relvar(y) that will be approximately 0, not relvar(y).
Response 6
Good catch, thanks! Fixed.
Comment 7
Comment 7
- I did not read for grammar, but noticed a few plural apostrophes.
Response 7
Thanks, fixed a couple of these.
Comment 8
Comment 8
- In the section on Unequal selection probabilities x Cluster sampling, the brackets in the denominator should go outside the sum.
Response 8
Good catch! Fixed.
Peer review and editing timeline notes
On 8 Dec, in an email the anonymous explicitly granted a a public domain license for his review above. They also said they had skimmed the revision and thought it looked ok.
On 7 Dec, in an email the reviewer Charles DiSogra explicitly granted a a public domain license for his review above. He said he hoped to look at the revisions in the next week.
On 6 Dec, I sent some writing comments (below) to the author, who responded by making the appropriate changes.
Minor writing comments Looks like the revision accidentally introduced some redundancy into the preprint, which now says “The term "Design effect" was coined by Leslie Kish in” in both the first and third paragraphs of the Introduction.
“it also matters if the design (e.g.: selection probabilities) are correlated with the outcome of interest”. “whether” is preferred over “if” in formal writing for this type of use of “if”. Also, there is a verb agreement problem, where “design” is singular but the verb (“are”) is plural. Also, I don’t fully understand the sentence, because “selection probabilities” hasn’t been introduced at this point in the article, and unfortunately it looks like it’s never explained properly even later, so I’m hoping you can fit a brief explanation of it in earlier.
“ a researcher might approximate the Deft with calculating the variance” - I think “by” calculating the variance is better.
“SRS with replacement (srswr)”. It’s highly unusual to not capitalize acronyms and, while I see that online some people don’t capitalize this acronym, I assume Wikipedia style is to capitalize all acronyms, so I think you should do that - similarly for srswor.
“Also, let it be combined with an estimator that rakes to totals for several demographic variables.” Did you really mean “rakes”? Only because I’m not familiar with that use of the word.
“some pairs of PSUs implying” - I think you should have a comma before “implying”.
“This is, in fact, the default choice in the software packages that will handle survey data—e.g., Stata, R survey package, and the SAS survey procedures”. I suggest you shorten this to “This is the default choice in software packages such as Stata, the R survey package, and the SAS survey procedures.”
I think that all of your headings that start with “Design effect” should probably say “The design effect”.
“For example, we might decide”. I think you can delete “for example” because the previous sentence already says that, plus the word “might” further implies it.
Where you wrote “enough of the bias”, can you reword , because unfortunately there is no indication of the criterion for enough, maybe you can change to “sufficient”?
At this point I started making some changes directly to the preprint, because I noticed you were happy when a reviewer did some of that. However, so far I haven’t gotten any further than the “Design effect depends on sampling design and statistical adjustments” section.
— Preceding unsigned comment added by Aoholcombe (talk • contribs) 22:53, 21 December 2023 (UTC)
After the author responded to the above and we both made some more minor edits to the pre-print, a second round of comments was received from Reviewer 2 the first week of January 2024. On 6 January 2024, I notified the author of them and asked him to respond.
In Feb / March 2024, the author responded to the comments of Reviewer 2 and he replied to say he was happy with the author's responses.
In late Mar 2024, an editorial board member, Mstefan, with more mathematical expertise, looked through the preprint and made a number of comments (above) that the author responded to.
Author further revision
Today I have finished reviewing the paper again. I fixed a typo and made tens of grammatical improvements. I also added four new summary tables for, hopefully, improved readability.
I'm ready for a final go-over by the relevant editors and (hopefully) an acceptance.
Talgalili (discuss • contribs) 18:02, 2 April 2024 (UTC)
Editor further comments
I made a few more wordsmithing edits after checking some of them with the author.
--Aoholcombe (discuss • contribs) 20:07, 10 April 2024 (UTC)