# WikiJournal of Medicine/Readability of English Wikipedia's health information over time

Wikipedia-integrated • Public peer review • Libre open access

Journal issues
Current

<meta name='citation_doi' value='10.15347/wjm/2019.007'>

## Article information

Authors: Aleksandar Brezar[a] , James Heilman [a][i]

Aleksandar Brezar; James Heilman (2019), "Readability of English Wikipedia's health information over time", WikiJournal of Medicine, 6 (1): 7, doi:10.15347/WJM/2019.007, ISSN 2002-4436, Wikidata Q75392964

## Abstract

Objective: To assess and compare the readability of the twenty-five most accessed English medical articles on Wikipedia 0, 1, 5 and 10 years ago.

Design: The twenty-five most accessed Wikipedia articles on diseases in August 2018 were identified for this study. The content of the lead paragraphs was formatted to remove any hyperlinks, decimals, colons, semicolons and periods used in abbreviations. An online tool was then used to assign a score to the readability of each text sample using the following formulae: Gunning FOG (Frequency of Gobbledygook) index, Flesch-Kincaid Grade Level (F-K), Simple Measure of Gobbledygook (SMOG) and Flesch Reading Ease (FRE). A single reading grade (RG) was calculated for each passage by averaging scores from the FOG, SMOG and F-K tests to facilitate interpretation. These steps were repeated for the lead paragraph of the same medical articles as visible 1, 5 and 10 years ago on Wikipedia.

Main Outcome Measures: Readability grade (RG) and reading ease (FRE score)

Results: The average (mean) RG of the twenty-five most accessed Wikipedia articles on diseases in 2018 was 12.73 (95% CI = 12.07-13.38), and the average FRE score was 39.91 (95% CI = 36.09-43.74), a score considered “difficult”. The number of articles that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001), but not significantly different when compared to 2017. When paired by titles and compared over time, a statistically significant difference in readability (RG and FRE) was seen in 2018 when compared to earlier years: 2017 (Friedman Chi-squared=13.70, p=0.0002), 2013 (Friedman Chi-squared=46.08, p<0.0001) and 2008 (Friedman Chi-squared=33.03, p=0.0001). None of the pages were written at the 7th or 8th grade level as recommended by the U.S. National Institutes of Health (NIH).

Conclusions: The average readability of English Wikipedia’s medical pages has improved in 2018 when compared to previous years. Most of the health information, however, remains written at a level above the reading ability of average adults.

## Introduction

As one of the most popular websites on the Internet, Wikipedia is a frequently used source of health information by both health professionals and the lay public. Its English medical content was viewed more than 2.19 billion times in 2018, and more than 160 million times in the month of December 2018 alone.[1] Fifty to 70% of physicians report using Wikipedia as a source of health care information and it was the single most used resource by medical students (94%).[2][3][4] Although its content is often used by both healthcare professionals and patients, Wikipedia’s Manual of Style reminds authors that its target audience remains the general reader.[5]

Much of the health information available online is written at a level that is not accessible to people with low health literacy. Patients with inadequate health literacy, defined as the skills necessary to access, understand and use health information, have reported worse health status and less understanding about their medical conditions and treatment.[6] Health literacy has been found to be a stronger predictor of health than age, income, education level and racial or ethnic group.[6][7] The World Health Organization (WHO) considers health literacy critical for public empowerment, as lower literacy level has also been linked to a lower quality of life and higher mortality.[8][9][10]

This study aims to compare the readability of Wikipedia’s most popular health condition related articles over the years. Since a study comparing the readability of different online resources published in 2011 found that Wikipedia’s medical pages were among the hardest to read, efforts have been put into improving their accessibility, while remaining relevant to all of its readership.[7][17] While several studies have evaluated the readability of various specialty topics at a specific point in time, this is the first study to our knowledge to assess the readability of the most viewed Wikipedia articles on diseases over time.

## Methods

### Selection criteria

A list of the top 1000 medical pages ordered by the number of views is available on Wikipedia and updated every month.[1] The lead paragraph of each page was formatted to remove any hyperlinks decimals, colons, semicolons and periods used in abbreviations, as recommended by previous authors.[7]

The readability of each sample was assessed using multiple readability metrics to increase confidence in the test results. A readability score was assigned to each text sample using the following formulae: Gunning FOG (Frequency of Gobbledygook) index, Flesch-Kincaid Grade Level (F-K), Simple Measure of Gobbledygook (SMOG) and Flesch Reading Ease (FRE).[18][19][20][21]

${\displaystyle GunningFOG=\left({\frac {\mbox{words}}{\mbox{sentences}}}\right)+100\left({\frac {\mbox{polysyllables}}{\mbox{words}}}\right)}$

${\displaystyle F-K=0.39\left({\frac {\mbox{words}}{\mbox{sentences}}}\right)+11.8\left({\frac {\mbox{syllables}}{\mbox{words}}}\right)-15.59}$

${\displaystyle SMOG=1.043{\sqrt {30\times {\frac {\mbox{polysyllables}}{\mbox{sentences}}}}}+3.1291}$

${\displaystyle FRE=206.835-1.015\left({\frac {\mbox{words}}{\mbox{sentences}}}\right)-84.6\left({\frac {\mbox{syllables}}{\mbox{words}}}\right)}$

Gunning FOG, F-K and SMOG scores correspond to an academic grade level needed to understand the text easily on the first reading. For example, a Gunning FOG score of 8 indicates that a minimum of 8 years of education is necessary to easily understand the corresponding text. Scores above 12 correspond to a post-high school level.

The FRE score reports a readability score from zero to one hundred, with higher scores indicating a more readable text. The FRE scores are categorized using the following scale: 91-100 (“very easy”); 81-90 (“easy”); 71-80 (“fairly easy”); 61-70 (“standard”); 51-60 (“fairly difficult”); 31-50 (“difficult”); 0-30 (“very difficult”) (Table 1). For example, a score of 70 would be appropriate for most adults, whereas a score of 50 would be a text that is difficult to read. A previously validated online tool was used for the analysis of each text sample (http://www.online-utility.org/).[7]

### Statistical analysis

The strength of association between the FOG and SMOG scores (r = 0.964), FOG and F-K scores (r = 0.965) and SMOG and F-K (r = 0.958) scores was calculated for each of the 100 samples analyzed using Spearman correlations. The strong correlations obtained for each pair of scores justified the use of an averaged single reading grade (RG) to facilitate interpretation.

${\displaystyle RG={\frac {\mbox{F-K score+FOG score+SMOG score}}{\mbox{3}}}}$

Friedman tests were used to compare RG and FRE scores from year 2018 against the same scores in 2017, 2013 and 2008. Since higher FRE scores indicate a lower readability level, the following transformation was used to allow comparison with RG scores: 100 – FRE Score. Wilcoxon rank sums tests (paired by titles) were then used to do pairwise comparisons if the Friedman test was statistically significant. Statistical analysis was performed using the R software package, version 3.5.1.[22] All testing was two-sided, and used a significance level of p = 0.05.

## Results

### Tables

Table 1 | Characteristics of completed trials.Flesh Reading ease interpretation. Adapted from various sources.[20][23] [click to expand]
FRE Score Grade level Description
91-100 5th grade Very easy to read. Easily understood by an average 11-year-old student.
71-80 7th grade Fairly easy
61-70 8th and 9th grade Standard/plain language. Easily understood by 13- to 15-year-old students.
51-60 10th to 12th grade (high school) Fairly difficult to read
31-50 College Difficult to read
0-30 College graduate Very difficult to read. Best understood by university graduates.

Table 2 | Reading grade and reading ease of the twenty-five most accessed Wikipedia medical pages in 2018. [click to expand]
Crohn’s disease 13.46 37.81 (difficult)
Hand, foot, and mouth disease 9.84 53.47 (fairly difficult)
Borderline personality disorder 12.28 39.86 (difficult)
Lyme disease 12.44 45.53 (difficult)
Bipolar disorder 13.98 35.07 (difficult)
Asperger syndrome 15.14 25.59 (very difficult)
Pentasomy X 11.59 41.64 (difficult)
Pneumonia 13.37 33.80 (difficult)
Schizophrenia 14.36 27.21 (very difficult)
Sepsis 12.61 40.56 (difficult)
Tuberculosis 11.11 51.72 (fairly difficult)
Fibromyalgia 13.76 28.65 (very difficult)
Guillain-Barré syndrome 12.53 42.44 (difficult)
HIV/AIDS 13.1 40.43 (difficult)
Multiple sclerosis 12.73 43.19 (difficult)
Pancreatic cancer 13.22 38.60 (difficult)
Parkinson’s disease 12.65 41.24 (difficult)
Rabies 10.86 52.07 (fairly difficult)
Cholangiocarcinoma 18.04 11.36 (very difficult)
Vitiligo 11.07 44.93 (difficult)
Diabetes mellitus 12.42 42.35 (difficult)
Shingles 11.16 52.70 (fairly difficult)
Syphilis 13.72 33.24 (difficult)
Dengue fever 12.03 42.73 (difficult)
Marfan syndrome 10.68 51.63 (fairly difficult)
Average (± SD) 12.73 ± 1.68 39.91 ± 9.76

### Figures

A

B

B) and Flesh reading ease (right) in 2018 (n=25). A higher FRE score indicates a text easier to read.

A

B

B) Flesh reading ease (right) by year. A higher FRE score indicates a text easier to read.

A

B

B) Flesh reading ease (right) in 2008 (darkest color), 2013, 2017 and 2018 (lightest color), respectively. A higher FRE score indicates a text easier to read.

## Discussion

### Summary of results

Wikipedia is one of the most heavily visited sites on the Internet. A survey done in the United States showed that Wikipedia was far more popular among the well-educated and college-aged users than it was among those with lower education levels.[24] Some of the suggested reasons to explain its popularity include the free and easy access online, exhaustive information on a multitude of topics and a high position within the results provided by search engines.[24]

Wikipedia’s Manual of Style for medicine-related articles was updated to strengthen the recommendations around using easier to understand language in 2015.[5] Its recommendations for authors include writing for the “general reader”, in plain English and as simply as possible without introducing errors. For example, technical terms are to first be explained in plain language followed by the technical term in parenthesis. It is also recommended to use links within the text to refer readers to a different page for additional information if needed.

This study reports an average readability of the twenty-five most accessed disease-related articles on Wikipedia of 12.73 (95% CI = 12.07-13.38) and an average reading ease (FRE score) of 39.91 (95% CI = 36.09-43.74) in 2018. In other words, on average 12 years of education are necessary to easily understand the articles on the first reading in 2018. The results of this study suggest that the majority of the most accessed Wikipedia articles on diseases remain difficult to read for people with low literacy. None of the articles were written at or below the 7th or 8th grade reading level, corresponding to an “easy” or “very easy” readability.

However, this study is also showing an overall improvement in the readability of Wikipedia’s pages over time. The number of pages that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001). For example, 22 of the 25 (88%) articles analyzed had a lower reading grade and higher FRE score (i.e. easier to read) in 2018 when compared to the same article in 2008.

One hypothesis suggested for this improvement in the readability of disease-related articles is Wikipedia’s updated Manual of Style. Since its publication in 2015, it has reiterated Wikipedia’s target audience to be the general public and has emphasized the use of plain English. Since Wikipedia’s articles can be modified at any time and by anyone, it is difficult to ensure that all editors are aware of the updated guidelines. A suggestion to improve the overall readability of Wikipedia’s content could be to include an automated readability assessment once an article is submitted for publication.

## Conclusion

Wikipedia remains one of the most accessed websites on the Internet. Although the readability of Wikipedia’s disease-related articles remains overall fairly difficult, this study shows that a significant improvement has been made in the past few years.

## Authorship and acknowledgements

Both authors participated in the study design. The literature review was done with the help of Interior Health librarians. Data collection and manuscript preparation were done by A.B. Statistical analysis was done by Veronika Moravan, Applied Statistician (vmoravan@vmstats.ca). Both authors reviewed the manuscript.

## References

1. "Wikipedia:WikiProject Medicine/Popular pages". Wikipedia. Wikipedia. 8 January 2019. Retrieved 20 January 2019. From this list, the top 25 most viewed medical conditions in August 2018 were selected for ease of analysis. Since all content revisions are archived by Wikipedia, the content of the same pages 1, 5 and 10 years ago were obtained using the “view history” tab.
2. Allahwala, UK; Nadkarni, A; Sebaratnam, DF (April 2013). "Wikipedia use amongst medical students - new insights into the digital revolution.". Medical teacher 35 (4): 337. doi:10.3109/0142159X.2012.737064. PMID 23137251.
3. Heilman, JM; Kemmann, E; Bonert, M; Chatterjee, A; Ragar, B; Beards, GM; Iberri, DJ; Harvey, M et al. (31 January 2011). "Wikipedia: a key tool for global public health promotion.". Journal of medical Internet research 13 (1): e14. doi:10.2196/jmir.1589. PMID 21282098.
4. Hughes, B; Joshi, I; Lemonde, H; Wareham, J (October 2009). "Junior physician's use of Web 2.0 for information seeking and medical education: a qualitative study.". International journal of medical informatics 78 (10): 645-55. doi:10.1016/j.ijmedinf.2009.04.008. PMID 19501017.
5. "Wikipedia:Manual of Style/Medicine-related articles". Wikipedia. 4 June 2015. Retrieved 2 October 2019.
6. "Health literacy: report of the Council on Scientific Affairs. Ad Hoc Committee on Health Literacy for the Council on Scientific Affairs, American Medical Association.". JAMA 281 (6): 552-7. 10 February 1999. PMID 10022112.
7. McInnes, N; Haglund, BJ (December 2011). "Readability of online health information: implications for health literacy.". Informatics for health & social care 36 (4): 173-89. doi:10.3109/17538157.2010.542529. PMID 21332302.
8. Rootman, I; Goron-El-Bihbety, D (2008). "A Vision for a Health Literate Canada Report of the Expert Panel on Health Literacy" (PDF). Canadian Public Health Association. Retrieved 6 April 2019.
9. Davis, TC; Williams, MV; Marin, E; Parker, RM; Glass, J (2001). "Health literacy and cancer communication.". CA: a cancer journal for clinicians 52 (3): 134-49. PMID 12018928.
10. Merriman, B; Ades, T; Seffrin, JR (2001). "Health literacy in the information age: communicating cancer information to patients and families.". CA: a cancer journal for clinicians 52 (3): 130-3. PMID 12018927.
11. Brigo, F; Erro, R (June 2015). "The readability of the English Wikipedia article on Parkinson's disease.". Neurological sciences : official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology 36 (6): 1045-6. doi:10.1007/s10072-015-2077-5. PMID 25596713.
12. Koo, M (2014). "Complementary and alternative medicine on wikipedia: opportunities for improvement.". Evidence-based complementary and alternative medicine : eCAM 2014: 105186. doi:10.1155/2014/105186. PMID 24864148.
13. Modiri, O; Guha, D; Alotaibi, NM; Ibrahim, GM; Lipsman, N; Fallah, A (March 2018). "Readability and quality of wikipedia pages on neurosurgical topics.". Clinical neurology and neurosurgery 166: 66-70. doi:10.1016/j.clineuro.2018.01.021. PMID 29408776.
14. Phillips, J; Lam, C; Palmisano, L (2013). "Analysis of the accuracy and readability of herbal supplement information on Wikipedia.". Journal of the American Pharmacists Association : JAPhA 54 (4): 406-14. doi:10.1331/JAPhA.2014.13181. PMID 25063262.
15. Watad, A; Bragazzi, NL; Brigo, F; Sharif, K; Amital, H; McGonagle, D; Shoenfeld, Y; Adawi, M (18 July 2017). "Readability of Wikipedia Pages on Autoimmune Disorders: Systematic Quantitative Assessment.". Journal of medical Internet research 19 (7): e260. doi:10.2196/jmir.8225. PMID 28720555.
16. Santos, PJF; Daar, DA; Paydar, KZ; Wirth, GA (January 2018). "Readability of Online Materials for Rhinoplasty.". World journal of plastic surgery 7 (1): 89-96. PMID 29651397.
17. Shafee, T; Masukume, G; Kipersztok, L; Das, D; Häggström, M; Heilman, J (November 2017). "Evolution of Wikipedia's medical content: past, present and future.". Journal of epidemiology and community health 71 (11): 1122-1129. doi:10.1136/jech-2016-208601. PMID 28847845.
18. Kincaid, J; Fishburne, R; Rogers, R; Chissom, B (1 January 1975). Derivation Of New Readability Formulas (Automated Readability Index, Fog Count And Flesch Reading Ease Formula) For Navy Enlisted Personnel. Retrieved 6 April 2019.
19. McLaughlin, H (1969). "SMOG grading - a new readability formula". Journal of Reading: 639-646.
20. Flesch, R (12 July 2016). "Guide to Academic Writing Article - Management - University of Canterbury - New Zealand". web.archive.org. Retrieved 6 April 2019.
21. Gunning, Robert (1968). The technique of clear writing (Revis ed.). McGraw-Hill. ISBN 978-0070252066.
22. Team RDC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing 2018.
23. "Flesch–Kincaid readability tests". Wikipedia. 20 April 2019. Retrieved 8 May 2019.
24. Lee Rainie, B (24 April 2007). "Wikipedia users". Pew Research Center. Retrieved 6 April 2019.
25. John, Ann M.; John, Elizabeth S.; Hansberry, David R.; Thomas, Prashant J.; Guo, Suqin (October 2015). "Analysis of online patient education materials in pediatric ophthalmology". Journal of American Association for Pediatric Ophthalmology and Strabismus 19 (5): 430–434. doi:10.1016/j.jaapos.2015.07.286.