WikiJournal Preprints/Readability of Wikipedia's Health Information Over Time
Brezar, A; Heilman, J.
Design: The twenty-five most accessed Wikipedia articles on diseases in August 2018 were identified for this study. The content of the lead paragraphs was formatted to remove any hyperlinks, decimals, colons, semicolons and periods used in abbreviations. An online tool was then used to assign a score to the readability of each text sample using the following formulae: Gunning FOG (Frequency of Gobbledygook) index, Flesch-Kincaid Grade Level (F-K), Simple Measure of Gobbledygook (SMOG) and Flesch Reading Ease (FRE). A single reading grade (RG) was calculated for each passage by averaging scores from the FOG, SMOG and F-K tests to facilitate interpretation. These steps were repeated for the lead paragraph of the same medical articles as visible 1, 5 and 10 years ago on Wikipedia.
Main Outcome Measures: Readability grade (RG) and reading ease (FRE score)
Results: The average RG of the twenty-five most accessed Wikipedia articles on diseases in 2018 was 12.73 (95% CI = 12.07-13.38), and the mean FRE score was 39.91 (95% CI = 36.09-43.74), a score considered “difficult”. The number of articles that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001), but not significantly different when compared to 2017. When paired by titles and compared over time, a statistically significant difference in readability (RG and FRE) was seen in 2018 when compared to earlier years: 2017 (Friedman Chi-squared=13.70, p=0.0002), 2013 (Friedman Chi-squared=46.08, p<0.0001) and 2008 (Friedman Chi-squared=33.03, p=0.0001). None of the pages were written at the 7th or 8th grade level as recommended by the U.S. National Institutes of Health (NIH).Conclusions: The average readability of Wikipedia’s medical pages has improved in 2018 when compared to previous years. Most of the health information, however, remains written at a level above the reading ability of average adults.
As one of the most popular websites on the Internet, Wikipedia is a frequently used source of health information by both health professionals and the lay public. Its English medical content was viewed more than 2.19 billion times in 2018, and more than 160 million times in the month of December 2018 alone. Fifty to 70% of physicians report using Wikipedia as a source of health care information and it was the single most used resource by medical students (94%).
Much of the health information available online is written at a level that is not accessible to people with low health literacy. Patients with inadequate health literacy, defined as the skills necessary to access, understand and use health information, have reported worse health status and less understanding about their medical conditions and treatment. Health literacy has been found to be a stronger predictor of health than age, income, education level and racial or ethnic group. The World Health Organization (WHO) considers health literacy critical for public empowerment, as lower literacy level has also been linked to a lower quality of life and higher mortality.
In Canada, the average adult reading ability is between the 8th and 9th grade level. The U.S. National Institutes of Health (NIH) recommends that patient education materials should be written at the 7th or 8th grade level. Both in the United States and in Canada, about 25% of the adult population is considered to be functionally illiterate with reading abilities at or below 5th grade level. Wikipedia’s medical pages have been found in previous studies to have an average readability corresponding to a level of 12th grade and above. As patients commonly access health information online, it is important for health professionals to understand how health literacy impacts patient’s understanding and to refer them to resources adapted to their level of comprehension.
This study aims to compare the readability of Wikipedia’s most popular health condition related articles over the years. Since a study comparing the readability of different online resources published in 2011 found that Wikipedia’s medical pages were among the hardest to read, efforts have been put into improving their accessibility, while remaining relevant to all of its readership. While several studies have evaluated the readability of various specialty topics at a specific point in time, this is the first study to our knowledge to assess the readability of the most viewed Wikipedia articles on diseases over time.
A list of the top 1000 medical pages ordered by the number of views is available on Wikipedia and updated every month.
The lead paragraph of each page was formatted to remove any hyperlinks decimals, colons, semicolons and periods used in abbreviations, as recommended by previous authors.
The readability of each sample was assessed using multiple readability metrics to increase confidence in the test results. A readability score was assigned to each text sample using the following formulae: Gunning FOG (Frequency of Gobbledygook) index, Flesch-Kincaid Grade Level (F-K), Simple Measure of Gobbledygook (SMOG) and Flesch Reading Ease (FRE).
Gunning FOG, F-K and SMOG scores correspond to an academic grade level needed to understand the text easily on the first reading. For example, a Gunning FOG score of 8 indicates that a minimum of 8 years of education is necessary to easily understand the corresponding text. Scores above 12 correspond to a post-high school level.
The FRE score reports a readability score from zero to one hundred, with higher scores indicating a more readable text. The FRE scores are categorized using the following scale: 91-100 (“very easy”); 81-90 (“easy”); 71-80 (“fairly easy”); 61-70 (“standard”); 51-60 (“fairly difficult”); 31-50 (“difficult”); 0-30 (“very difficult”) (Table 1). For example, a score between 70 would be appropriate for most adults, whereas a score of 50 would be a text that is difficult to read.
The strength of association between the FOG and SMOG scores (r = 0.964), FOG and F-K scores (r = 0.965) and SMOG and F-K (r = 0.958) scores was calculated for each of the 100 samples analyzed using Spearman correlations. The strong correlations obtained for each pair of scores justified the use of an averaged single reading grade (RG) to facilitate interpretation.
Friedman tests were used to compare RG and FRE scores from year 2018 against the same scores in 2017, 2013 and 2008. Since higher FRE scores indicate a lower readability level, the following transformation was used to allow comparison with RG scores: 100 – FRE Score. Wilcoxon rank sums tests (paired by titles) were then used to do pairwise comparisons if the Friedman test was statistically significant.
Statistical analysis was performed using the R software package, version 3.5.1. All testing was two-sided, and used a significance level of p = 0.05.
The average RG of the twenty-five most accessed Wikipedia articles on diseases in 2018 was 12.73 (95% CI = 12.07-13.38), and the mean reading ease (FRE score) was 39.91 (95% CI = 36.09-43.74), a score considered “difficult”.
Table 2 shows the average readability grade and readability ease of each of the 25 pages included in this study. The distribution of these scores is demonstrated by figure 1. Out of the 25 articles analyzed, 5 (20%) were “fairly difficult’ to read, 16 (64%) “difficult” and 4 (16%) “very difficult” using the Flesch reading ease scoring system. The easiest article to read was on hand, foot and mouth disease (RG 9.84, FRE 53.47), whereas the hardest article to read was about cholangiocarcinoma (RG 18.04, FRE 11.36).
The number of articles that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001), but not significantly different when compared to 2017 (figure 2 and 3)
When paired by titles and compared over time, a statistically significant difference in readability (RG and FRE) was seen in 2018 when compared to earlier years: 2017 (Friedman Chi-squared=13.70, p=0.0002), 2013 (Friedman Chi-squared=46.08, p<0.0001) and 2008 (Friedman Chi-squared=33.03, p=0.0001).
|FRE Score||Grade level||Description|
|91-100||5th grade||Very easy to read. Easily understood by an average 11-year-old student.|
|81-90||6th grade||Easy to read|
|71-80||7th grade||Fairly easy|
|61-70||8th and 9th grade||Standard/plain language. Easily understood by 13- to 15-year-old students.|
|51-60||10th to 12th grade (high school)||Fairly difficult to read|
|31-50||College||Difficult to read|
|0-30||College graduate||Very difficult to read. Best understood by university graduates.|
|Page Titles||Reading Grade||Reading Ease (Ease level)|
|Crohn’s disease||13.46||37.81 (difficult)|
|Hand, foot, and mouth disease||9.84||53.47 (fairly difficult)|
|Borderline personality disorder||12.28||39.86 (difficult)|
|Lyme disease||12.44||45.53 (difficult)|
|Bipolar disorder||13.98||35.07 (difficult)|
|Asperger syndrome||15.14||25.59 (very difficult)|
|Pentasomy X||11.59||41.64 (difficult)|
|Schizophrenia||14.36||27.21 (very difficult)|
|Tuberculosis||11.11||51.72 (fairly difficult)|
|Fibromyalgia||13.76||28.65 (very difficult)|
|Guillain-Barré syndrome||12.53||42.44 (difficult)|
|Multiple sclerosis||12.73||43.19 (difficult)|
|Pancreatic cancer||13.22||38.60 (difficult)|
|Parkinson’s disease||12.65||41.24 (difficult)|
|Rabies||10.86||52.07 (fairly difficult)|
|Cholangiocarcinoma||18.04||11.36 (very difficult)|
|Diabetes mellitus||12.42||42.35 (difficult)|
|Shingles||11.16||52.70 (fairly difficult)|
|Dengue fever||12.03||42.73 (difficult)|
|Marfan syndrome||10.68||51.63 (fairly difficult)|
|Average (± SD)||12.73 ± 1.68||39.91 ± 9.76|
Summary of results
Wikipedia is one of the most heavily visited sites on the Internet. A survey done in the United States showed that Wikipedia was far more popular among the well-educated and college-aged users than it was among those with lower education levels. Some of the suggested reasons to explain its popularity include the free and easy access online, exhaustive information on a multitude of topics and a high position within the results provided by search engines.
Prior studies have evaluated the readability of a variety of different health-related articles on Wikipedia. In 2011, McInnes et al. found that the average readability of Wikipedia’s pages pertaining to thirteen different causes of burden and mortality was 15.21 (95% CI 14.44-15.99), readability level typical of university students. In 2015, John et al. found that Wikipedia pages on pediatric ophthalmology were written at an average grade level of 17.4. Interventional radiology materials on Wikipedia were found to be written at a grade level of 15.0 (95% CI 13.9-16.1) in 2015. Modridi et al. found that the mean Flesch Kincaid reading ease (FRE) score of neurosurgical articles on Wikipedia was 31.10, a score considered “difficult” and corresponding to a university-level of education. Wikipedia pages on Parkinson’s disease were also found to have a low readability level (FRE 30.21).
This study reports an average readability of the twenty-five most accessed health-related articles on Wikipedia of 12.73 (95% CI = 12.07-13.38) and a mean reading ease (FRE score) of 39.91 (95% CI = 36.09-43.74) in 2018. In other words, on average 12 years of education are necessary to easily understand the articles on the first reading in 2018. The easiest article to read was on hands, foot and mouth disease (RG 9.84, FRE 53.47), while the article on cholangiocarcinoma had the highest score (RG 18.04, FRE 11.36).
The results of this study suggest that the majority of the most accessed Wikipedia articles on diseases remain difficult to read for people with low literacy. None of the articles were written at or below the 7th or 8th grade reading level, corresponding to an “easy” or “very easy” readability.
However, more importantly, this study is showing an overall improvement in the readability of Wikipedia’s pages over time. The number of pages that were easier to read (lower RG and higher FRE) in 2018 was significantly higher when compared to 2013 and 2008 (p<0.0001). For example, 22 of the 25 (88%) articles analyzed had a lower reading grade and higher FRE score (i.e. easier to read) in 2018 when compared to the same article in 2008.
Strengths and limitations
This is the first study to our knowledge to assess the readability of the most viewed disease related articles on Wikipedia over time. Instead of focusing on articles related to a specific medical field, this study assessed the readability of the most viewed articles pertaining to a variety of medical conditions. Although analyzing the readability of a text has been a long tradition in literature, some authors have suggested that other factors such as the page display, navigation menus and hyperlinks should be evaluated when assessing the readability of online documents. Wikipedia articles regularly link key words or important concepts to a different Wikipedia page providing more information to aid reader’s understanding. Furthermore, this study was based on the readability of the lead paragraphs which could be underestimating general readability of the article, since some studies have shown that the last paragraphs tend to be the hardest to read.
Implications for practice
As the Internet has become an increasingly popular source of health information, it is important for healthcare professionals to be familiar with the online resources that are consulted by their patients. The overall complexity of the Wikipedia’s disease related articles could prevent successful transmission of health information. This could potentially lead patients to use more easily readable, but less accurate online resources. Healthcare professionals should take into account the health literacy level of their patients when discussing and referring to online resources.
Wikipedia remains one of the most accessed websites on the Internet. Although the readability of Wikipedia’s disease-related articles remains overall high, this study shows that a significant improvement has been made in the past few years. It is important for healthcare professionals to be aware of the readability of online resources and refer their patients to health resources that are adapted to their literacy level.
Authorship and acknowledgements
Both authors participated in the study design. The literature review was done with the help of Interior Health librarians. Data collection and manuscript preparation were done by A.B. Statistical analysis was done by Veronika Moravan, Applied Statistician (firstname.lastname@example.org). Both authors reviewed the manuscript.
- "Wikipedia:WikiProject Medicine/Popular pages". Wikipedia. Wikipedia. 8 January 2019. Retrieved 20 January 2019.
From this list, the top 25 most viewed medical conditions in August 2018 were selected. Since all content revisions are archived by Wikipedia, the content of the same pages 1, 5 and 10 years ago were obtained using the “view history” tab.
- Allahwala, UK; Nadkarni, A; Sebaratnam, DF (April 2013). "Wikipedia use amongst medical students - new insights into the digital revolution.". Medical teacher 35 (4): 337. doi:10.3109/0142159X.2012.737064. PMID 23137251.
- Heilman, JM; Kemmann, E; Bonert, M; Chatterjee, A; Ragar, B; Beards, GM; Iberri, DJ; Harvey, M et al. (31 January 2011). "Wikipedia: a key tool for global public health promotion.". Journal of medical Internet research 13 (1): e14. doi:10.2196/jmir.1589. PMID 21282098.
- Hughes, B; Joshi, I; Lemonde, H; Wareham, J (October 2009). "Junior physician's use of Web 2.0 for information seeking and medical education: a qualitative study.". International journal of medical informatics 78 (10): 645-55. doi:10.1016/j.ijmedinf.2009.04.008. PMID 19501017.
- "Health literacy: report of the Council on Scientific Affairs. Ad Hoc Committee on Health Literacy for the Council on Scientific Affairs, American Medical Association.". JAMA 281 (6): 552-7. 10 February 1999. PMID 10022112.
- McInnes, N; Haglund, BJ (December 2011). "Readability of online health information: implications for health literacy.". Informatics for health & social care 36 (4): 173-89. doi:10.3109/17538157.2010.542529. PMID 21332302.
- Rootman, I; Goron-El-Bihbety, D (2008). "A Vision for a Health Literate Canada Report of the Expert Panel on Health Literacy" (PDF). Canadian Public Health Association. Retrieved 6 April 2019.
- Davis, TC; Williams, MV; Marin, E; Parker, RM; Glass, J (2001). "Health literacy and cancer communication.". CA: a cancer journal for clinicians 52 (3): 134-49. PMID 12018928.
- Merriman, B; Ades, T; Seffrin, JR (2001). "Health literacy in the information age: communicating cancer information to patients and families.". CA: a cancer journal for clinicians 52 (3): 130-3. PMID 12018927.
- Brigo, F; Erro, R (June 2015). "The readability of the English Wikipedia article on Parkinson's disease.". Neurological sciences : official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology 36 (6): 1045-6. doi:10.1007/s10072-015-2077-5. PMID 25596713.
- Koo, M (2014). "Complementary and alternative medicine on wikipedia: opportunities for improvement.". Evidence-based complementary and alternative medicine : eCAM 2014: 105186. doi:10.1155/2014/105186. PMID 24864148.
- Modiri, O; Guha, D; Alotaibi, NM; Ibrahim, GM; Lipsman, N; Fallah, A (March 2018). "Readability and quality of wikipedia pages on neurosurgical topics.". Clinical neurology and neurosurgery 166: 66-70. doi:10.1016/j.clineuro.2018.01.021. PMID 29408776.
- Phillips, J; Lam, C; Palmisano, L (2013). "Analysis of the accuracy and readability of herbal supplement information on Wikipedia.". Journal of the American Pharmacists Association : JAPhA 54 (4): 406-14. doi:10.1331/JAPhA.2014.13181. PMID 25063262.
- Watad, A; Bragazzi, NL; Brigo, F; Sharif, K; Amital, H; McGonagle, D; Shoenfeld, Y; Adawi, M (18 July 2017). "Readability of Wikipedia Pages on Autoimmune Disorders: Systematic Quantitative Assessment.". Journal of medical Internet research 19 (7): e260. doi:10.2196/jmir.8225. PMID 28720555.
- Santos, PJF; Daar, DA; Paydar, KZ; Wirth, GA (January 2018). "Readability of Online Materials for Rhinoplasty.". World journal of plastic surgery 7 (1): 89-96. PMID 29651397.
- Shafee, T; Masukume, G; Kipersztok, L; Das, D; Häggström, M; Heilman, J (November 2017). "Evolution of Wikipedia's medical content: past, present and future.". Journal of epidemiology and community health 71 (11): 1122-1129. doi:10.1136/jech-2016-208601. PMID 28847845.
- Kincaid, J; Fishburne, R; Rogers, R; Chissom, B (1 January 1975). Derivation Of New Readability Formulas (Automated Readability Index, Fog Count And Flesch Reading Ease Formula) For Navy Enlisted Personnel. Retrieved 6 April 2019.
- McLaughlin, H (1969). "SMOG grading - a new readability formula". Journal of Reading: 639-646.
- Flesch, R (12 July 2016). "Guide to Academic Writing Article - Management - University of Canterbury - New Zealand". web.archive.org. Retrieved 6 April 2019.
- Gunning, Robert (1968). The technique of clear writing (Revis edition ed.). McGraw-Hill. ISBN 978-0070252066.CS1 maint: Extra text (link)
- Team RDC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing 2018.
- "Flesch–Kincaid readability tests". Wikipedia. 20 April 2019. Retrieved 8 May 2019.
- Lee Rainie, B (24 April 2007). "Wikipedia users". Pew Research Center. Retrieved 6 April 2019.
- John, Ann M.; John, Elizabeth S.; Hansberry, David R.; Thomas, Prashant J.; Guo, Suqin (October 2015). "Analysis of online patient education materials in pediatric ophthalmology". Journal of American Association for Pediatric Ophthalmology and Strabismus 19 (5): 430–434. doi:10.1016/j.jaapos.2015.07.286.