WikiJournal Preprints/Readability of Wikipedia's health information over time
Brezar, A; Heilman, J.
Design: The twenty-five most accessed Wikipedia articles on diseases in August 2018 were identified for this study. The content of the lead paragraphs was formatted to remove any hyperlinks, decimals, colons, semicolons and periods used in abbreviations. An online tool was then used to assign a score to the readability of each text sample using the following formulae: Gunning FOG (Frequency of Gobbledygook) index, Flesch-Kincaid Grade Level (F-K), Simple Measure of Gobbledygook (SMOG) and Flesch Reading Ease (FRE). A single reading grade (RG) was calculated for each passage by averaging scores from the FOG, SMOG and F-K tests to facilitate interpretation. These steps were repeated for the lead paragraph of the same medical articles as visible 1, 5 and 10 years ago on Wikipedia.
Main Outcome Measures: Readability grade (RG) and reading ease (FRE score)
Results: The average (mean) RG of the twenty-five most accessed Wikipedia articles on diseases in 2018 was 12.73 (95% CI = 12.07-13.38), and the average FRE score was 39.91 (95% CI = 36.09-43.74), a score considered “difficult”. The number of articles that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001), but not significantly different when compared to 2017. When paired by titles and compared over time, a statistically significant difference in readability (RG and FRE) was seen in 2018 when compared to earlier years: 2017 (Friedman Chi-squared=13.70, p=0.0002), 2013 (Friedman Chi-squared=46.08, p<0.0001) and 2008 (Friedman Chi-squared=33.03, p=0.0001). None of the pages were written at the 7th or 8th grade level as recommended by the U.S. National Institutes of Health (NIH).Conclusions: The average readability of Wikipedia’s medical pages has improved in 2018 when compared to previous years. Most of the health information, however, remains written at a level above the reading ability of average adults.
As one of the most popular websites on the Internet, Wikipedia is a frequently used source of health information by both health professionals and the lay public. Its English medical content was viewed more than 2.19 billion times in 2018, and more than 160 million times in the month of December 2018 alone. Fifty to 70% of physicians report using Wikipedia as a source of health care information and it was the single most used resource by medical students (94%). Although its content is often used by both healthcare professionals and patients, Wikipedia’s Manual of Style reminds authors that its target audience remains the general reader.
Much of the health information available online is written at a level that is not accessible to people with low health literacy. Patients with inadequate health literacy, defined as the skills necessary to access, understand and use health information, have reported worse health status and less understanding about their medical conditions and treatment. Health literacy has been found to be a stronger predictor of health than age, income, education level and racial or ethnic group. The World Health Organization (WHO) considers health literacy critical for public empowerment, as lower literacy level has also been linked to a lower quality of life and higher mortality.
In Canada, the average adult reading ability is between the 8th and 9th grade level. The U.S. National Institutes of Health (NIH) recommends that patient education materials should be written at the 7th or 8th grade level. Both in the United States and in Canada, about 25% of the adult population is considered to be functionally illiterate with reading abilities at or below 5th grade level. Wikipedia’s medical pages have been found in previous studies to have an average readability corresponding to a level of 12th grade and above.
This study aims to compare the readability of Wikipedia’s most popular health condition related articles over the years. Since a study comparing the readability of different online resources published in 2011 found that Wikipedia’s medical pages were among the hardest to read, efforts have been put into improving their accessibility, while remaining relevant to all of its readership. While several studies have evaluated the readability of various specialty topics at a specific point in time, this is the first study to our knowledge to assess the readability of the most viewed Wikipedia articles on diseases over time.
A list of the top 1000 medical pages ordered by the number of views is available on Wikipedia and updated every month. The lead paragraph of each page was formatted to remove any hyperlinks decimals, colons, semicolons and periods used in abbreviations, as recommended by previous authors.
The readability of each sample was assessed using multiple readability metrics to increase confidence in the test results. A readability score was assigned to each text sample using the following formulae: Gunning FOG (Frequency of Gobbledygook) index, Flesch-Kincaid Grade Level (F-K), Simple Measure of Gobbledygook (SMOG) and https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch_reading_ease Flesch Reading Ease] (FRE).
Gunning FOG, F-K and SMOG scores correspond to an academic grade level needed to understand the text easily on the first reading. For example, a Gunning FOG score of 8 indicates that a minimum of 8 years of education is necessary to easily understand the corresponding text. Scores above 12 correspond to a post-high school level.
The FRE score reports a readability score from zero to one hundred, with higher scores indicating a more readable text. The FRE scores are categorized using the following scale: 91-100 (“very easy”); 81-90 (“easy”); 71-80 (“fairly easy”); 61-70 (“standard”); 51-60 (“fairly difficult”); 31-50 (“difficult”); 0-30 (“very difficult”) (Table 1). For example, a score of 70 would be appropriate for most adults, whereas a score of 50 would be a text that is difficult to read. A previously validated online tool was used for the analysis of each text sample (http://www.online-utility.org/).
The strength of association between the FOG and SMOG scores (r = 0.964), FOG and F-K scores (r = 0.965) and SMOG and F-K (r = 0.958) scores was calculated for each of the 100 samples analyzed using Spearman correlations. The strong correlations obtained for each pair of scores justified the use of an averaged single reading grade (RG) to facilitate interpretation.
Friedman tests were used to compare RG and FRE scores from year 2018 against the same scores in 2017, 2013 and 2008. Since higher FRE scores indicate a lower readability level, the following transformation was used to allow comparison with RG scores: 100 – FRE Score. Wilcoxon rank sums tests (paired by titles) were then used to do pairwise comparisons if the Friedman test was statistically significant. Statistical analysis was performed using the R software package, version 3.5.1. All testing was two-sided, and used a significance level of p = 0.05.
The average RG of the twenty-five most accessed Wikipedia articles on diseases in 2018 was 12.73 (95% CI = 12.07-13.38), and the average reading ease (FRE score) was 39.91 (95% CI = 36.09-43.74), a score considered “difficult”. Table 2 shows the average readability grade and readability ease of each of the 25 pages included in this study. The distribution of these scores is demonstrated by figure 1. Out of the 25 articles analyzed, 5 (20%) were “fairly difficult’ to read, 16 (64%) “difficult” and 4 (16%) “very difficult” using the Flesch reading ease scoring system. The easiest article to read was on hand, foot and mouth disease (RG 9.84, FRE 53.47), whereas the hardest article to read was about cholangiocarcinoma (RG 18.04, FRE 11.36). The number of articles that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001), but not significantly different when compared to 2017 (figures 2 and 3). When paired by titles and compared over time, a statistically significant difference in readability (RG and FRE) was seen in 2018 when compared to earlier years: 2017 (Friedman Chi-squared=13.70, p=0.0002), 2013 (Friedman Chi-squared=46.08, p<0.0001) and 2008 (Friedman Chi-squared=33.03, p=0.0001).
|Table 1 | Characteristics of completed trials.Flesh Reading ease interpretation. Adapted from various sources. [click to expand]|
|Table 2 | Reading grade and reading ease of the twenty-five most accessed Wikipedia medical pages in 2018. [click to expand]|
Summary of results
Wikipedia is one of the most heavily visited sites on the Internet. A survey done in the United States showed that Wikipedia was far more popular among the well-educated and college-aged users than it was among those with lower education levels. Some of the suggested reasons to explain its popularity include the free and easy access online, exhaustive information on a multitude of topics and a high position within the results provided by search engines.
Prior studies have evaluated the readability of a variety of different health-related articles on Wikipedia. In 2011, McInnes et al. found that the average readability of Wikipedia’s pages pertaining to thirteen different causes of burden and mortality was 15.21 (95% CI 14.44-15.99), readability level typical of university students. In 2015, John et al. found that Wikipedia pages on pediatric ophthalmology were written at an average grade level of 17.4. Interventional radiology materials on Wikipedia were found to be written at a grade level of 15.0 (95% CI 13.9-16.1) in 2015. Modridi et al. found that the mean Flesch Kincaid reading ease (FRE) score of neurosurgical articles on Wikipedia was 31.10, a score considered “difficult” and corresponding to a university-level of education. Wikipedia pages on Parkinson’s disease were also found to have a low readability level (FRE 30.21).
Wikipedia’s Manual of Style for medicine-related articles was updated to strengthen the recommendations around using easier to understand language in 2015. Its recommendations for authors include writing for the “general reader”, in plain English and as simply as possible without introducing errors. For example, technical terms are to first be explained in plain language followed by the technical term in parenthesis. It is also recommended to use links within the text to refer readers to a different page for additional information if needed.
This study reports an average readability of the twenty-five most accessed disease-related articles on Wikipedia of 12.73 (95% CI = 12.07-13.38) and an average reading ease (FRE score) of 39.91 (95% CI = 36.09-43.74) in 2018. In other words, on average 12 years of education are necessary to easily understand the articles on the first reading in 2018. The results of this study suggest that the majority of the most accessed Wikipedia articles on diseases remain difficult to read for people with low literacy. None of the articles were written at or below the 7th or 8th grade reading level, corresponding to an “easy” or “very easy” readability.
However, this study is also showing an overall improvement in the readability of Wikipedia’s pages over time. The number of pages that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001). For example, 22 of the 25 (88%) articles analyzed had a lower reading grade and higher FRE score (i.e. easier to read) in 2018 when compared to the same article in 2008.
One hypothesis suggested for this improvement in the readability of disease-related articles is Wikipedia’s updated Manual of Style. Since its publication in 2015, it has reiterated Wikipedia’s target audience to be the general public and has emphasized the use of plain English. Since Wikipedia’s articles can be modified at any time and by anyone, it is difficult to ensure that all editors are aware of the updated guidelines. A suggestion to improve the overall readability of Wikipedia’s content could be to include an automated readability assessment once an article is submitted for publication.
Strengths and limitations
This is the first study to our knowledge to assess the readability of the most viewed disease related articles on Wikipedia over time. Instead of focusing on articles related to a specific medical field, this study assessed the readability of the most viewed articles pertaining to a variety of medical conditions. Although analyzing the readability of a text has been a long tradition in literature, some authors have suggested that other factors such as the page display, navigation menus and hyperlinks should be evaluated when assessing the readability of online documents. Wikipedia articles regularly link key words or important concepts to a different Wikipedia page providing more information to aid reader’s understanding. Furthermore, this study was based on the readability of the lead paragraphs which could be underestimating general readability of the article, since some studies have shown that the last paragraphs tend to be the hardest to read.
Wikipedia remains one of the most accessed websites on the Internet. Although the readability of Wikipedia’s disease-related articles remains overall fairly difficult, this study shows that a significant improvement has been made in the past few years.
Authorship and acknowledgements
Both authors participated in the study design. The literature review was done with the help of Interior Health librarians. Data collection and manuscript preparation were done by A.B. Statistical analysis was done by Veronika Moravan, Applied Statistician (email@example.com). Both authors reviewed the manuscript.
- "Wikipedia:WikiProject Medicine/Popular pages". Wikipedia. Wikipedia. 8 January 2019. Retrieved 20 January 2019.
From this list, the top 25 most viewed medical conditions in August 2018 were selected for ease of analysis. Since all content revisions are archived by Wikipedia, the content of the same pages 1, 5 and 10 years ago were obtained using the “view history” tab.
- Allahwala, UK; Nadkarni, A; Sebaratnam, DF (April 2013). "Wikipedia use amongst medical students - new insights into the digital revolution.". Medical teacher 35 (4): 337. doi:10.3109/0142159X.2012.737064. PMID 23137251.
- Heilman, JM; Kemmann, E; Bonert, M; Chatterjee, A; Ragar, B; Beards, GM; Iberri, DJ; Harvey, M et al. (31 January 2011). "Wikipedia: a key tool for global public health promotion.". Journal of medical Internet research 13 (1): e14. doi:10.2196/jmir.1589. PMID 21282098.
- Hughes, B; Joshi, I; Lemonde, H; Wareham, J (October 2009). "Junior physician's use of Web 2.0 for information seeking and medical education: a qualitative study.". International journal of medical informatics 78 (10): 645-55. doi:10.1016/j.ijmedinf.2009.04.008. PMID 19501017.
- "Wikipedia:Manual of Style/Medicine-related articles". Wikipedia. 4 June 2015. Retrieved 2 October 2019.
- "Health literacy: report of the Council on Scientific Affairs. Ad Hoc Committee on Health Literacy for the Council on Scientific Affairs, American Medical Association.". JAMA 281 (6): 552-7. 10 February 1999. PMID 10022112.
- McInnes, N; Haglund, BJ (December 2011). "Readability of online health information: implications for health literacy.". Informatics for health & social care 36 (4): 173-89. doi:10.3109/17538157.2010.542529. PMID 21332302.
- Rootman, I; Goron-El-Bihbety, D (2008). "A Vision for a Health Literate Canada Report of the Expert Panel on Health Literacy" (PDF). Canadian Public Health Association. Retrieved 6 April 2019.
- Davis, TC; Williams, MV; Marin, E; Parker, RM; Glass, J (2001). "Health literacy and cancer communication.". CA: a cancer journal for clinicians 52 (3): 134-49. PMID 12018928.
- Merriman, B; Ades, T; Seffrin, JR (2001). "Health literacy in the information age: communicating cancer information to patients and families.". CA: a cancer journal for clinicians 52 (3): 130-3. PMID 12018927.
- Brigo, F; Erro, R (June 2015). "The readability of the English Wikipedia article on Parkinson's disease.". Neurological sciences : official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology 36 (6): 1045-6. doi:10.1007/s10072-015-2077-5. PMID 25596713.
- Koo, M (2014). "Complementary and alternative medicine on wikipedia: opportunities for improvement.". Evidence-based complementary and alternative medicine : eCAM 2014: 105186. doi:10.1155/2014/105186. PMID 24864148.
- Modiri, O; Guha, D; Alotaibi, NM; Ibrahim, GM; Lipsman, N; Fallah, A (March 2018). "Readability and quality of wikipedia pages on neurosurgical topics.". Clinical neurology and neurosurgery 166: 66-70. doi:10.1016/j.clineuro.2018.01.021. PMID 29408776.
- Phillips, J; Lam, C; Palmisano, L (2013). "Analysis of the accuracy and readability of herbal supplement information on Wikipedia.". Journal of the American Pharmacists Association : JAPhA 54 (4): 406-14. doi:10.1331/JAPhA.2014.13181. PMID 25063262.
- Watad, A; Bragazzi, NL; Brigo, F; Sharif, K; Amital, H; McGonagle, D; Shoenfeld, Y; Adawi, M (18 July 2017). "Readability of Wikipedia Pages on Autoimmune Disorders: Systematic Quantitative Assessment.". Journal of medical Internet research 19 (7): e260. doi:10.2196/jmir.8225. PMID 28720555.
- Santos, PJF; Daar, DA; Paydar, KZ; Wirth, GA (January 2018). "Readability of Online Materials for Rhinoplasty.". World journal of plastic surgery 7 (1): 89-96. PMID 29651397.
- Shafee, T; Masukume, G; Kipersztok, L; Das, D; Häggström, M; Heilman, J (November 2017). "Evolution of Wikipedia's medical content: past, present and future.". Journal of epidemiology and community health 71 (11): 1122-1129. doi:10.1136/jech-2016-208601. PMID 28847845.
- Kincaid, J; Fishburne, R; Rogers, R; Chissom, B (1 January 1975). Derivation Of New Readability Formulas (Automated Readability Index, Fog Count And Flesch Reading Ease Formula) For Navy Enlisted Personnel. Retrieved 6 April 2019.
- McLaughlin, H (1969). "SMOG grading - a new readability formula". Journal of Reading: 639-646.
- Flesch, R (12 July 2016). "Guide to Academic Writing Article - Management - University of Canterbury - New Zealand". web.archive.org. Retrieved 6 April 2019.
- Gunning, Robert (1968). The technique of clear writing (Revis edition ed.). McGraw-Hill. ISBN 978-0070252066.CS1 maint: Extra text (link)
- Team RDC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing 2018.
- "Flesch–Kincaid readability tests". Wikipedia. 20 April 2019. Retrieved 8 May 2019.
- Lee Rainie, B (24 April 2007). "Wikipedia users". Pew Research Center. Retrieved 6 April 2019.
- John, Ann M.; John, Elizabeth S.; Hansberry, David R.; Thomas, Prashant J.; Guo, Suqin (October 2015). "Analysis of online patient education materials in pediatric ophthalmology". Journal of American Association for Pediatric Ophthalmology and Strabismus 19 (5): 430–434. doi:10.1016/j.jaapos.2015.07.286.