Searching R Packages

From Wikiversity
Jump to navigation Jump to search
This article (a) summarizes what the authors know of available search capabilities for R (programming language) and (b) invites readers to contribute ideas for improvement. It is placed on Wikiversity and listed currently as a “research project” to encourage a wide discussion of the issues it raises moderated by the Wikimedia rules that invite contributors to “be bold but not reckless,” contributing revisions written from a neutral point of view, citing credible sources -- and raising other questions and concerns on the associated '“Discuss”' page. Your contribution(s) to this article may help transform it from a dream into a very useful reality.
initial draft by Spencer Graves with help from John Nash and Julia Silge

As of 2018-01-22, there were 12,126 active packages on the Comprehensive R Archive Network (CRAN). Just over a year earlier, 2017-01-07, John Nash[1] noted “There are now over 9000 packages on CRAN, with many more in Bioconductor, on Github, and other repositories. How can or should R users navigate this large and unruly collection of packages to find the tools they need and use them effectively?”

Almost eight years earlier, I had published the “sos” package that allowed users to search for packages, not just help pages as had been available with the previous RSiteSearch{utils} function.[2]

However, “sos” is still a command line solution and has largely been replaced by newer tools like the CRANsearcher addin for RStudio, crantastic, and RDocumentation. Only 2.2 percent of respondents in Julia Silge's recent survey[3] said they used “R packages built for search such as the sos package.” Some “sos” features could be improved, but R users might benefit more from using that effort to improve more popular search capabilities.

Summary table of search capabilities devoted to R[edit]

The following table summarizes our understanding of search capabilities devoted to R. The "base::readLines, vkR::getURLs" column summarizes the results of searching for those two terms in the existing search alternatives. The benchmarking done here suggests a strong preference for RDocumentation for most web-based searches, followed by Rseek. The sos package can create an Excel workbook with summary results by package. However, Jonathan Baron plans to stop maintaining his "RSiteSearch" database next year, because other options are better. This will also obsolete the RSiteSearch{utils} function and the sos package unless someone else decides to modify them to use one of the existing databases, e.g., RDocumentation.

search capability introduced comments base::readLines, vkR::getURLs FOSS
Getting Help with R[4] early Official overview of various help facilities recognized by the R Core Team, updated to mention help(), vignettes, demo(), apropos(), help.search(), help.start(), CRAN Task Views, FAQs, Stack Overflow, and R Email Lists. help(), demo(), apropos(), and help.search() access only locally installed documentation.[5] Y
R site search[6] (before 2004) Search email lists and help pages of contributed packages Clumsy relative to RDocumentation.org for many things. If you want a URL for a package and function in the R Site Search database, you can often get it similar to http://finzi.psych.upenn.edu/R/library/base/html/readLines.html.[5] Y
Rseek[7] 2007 Search email lists and R web sites including, e.g., RDocumentation.org Rseek found these help files quickly. (It found them in RDocumentation.)[8] ?
Nabble R Forum[9] early Search email lists and R web sites Searching for "readLines" and "base::readLines" produced nothing useful.[10] N
Google Advanced Search[11] 2010s the advanced search feature of the Google search engine A naive search for "readLines", "getURLs", "base::readLines" and "vkR::getURLs" got similar functions in other languages in addition to possibly relevant resources in R.[5] N
RSiteSearch[12] 2004 R function in the “utils” package that searches a database of email lists plus all the help pages in packages on CRAN and Bioconductor plus a few others. Same results as with with the "R Site Search" web site above, but with the query entered from within R.[5] Y
sos[13] 2009 R package to search for packages, not just help pages, in the RSiteSearch database.[2] Found 'readLines' and 'getURLs', but it but took longer than some of the other options. "???base::readLines" and "???vkR::getURLs" threw errors.[5] Y
Metacran[14] 2010s Includes "Featured packages", "Most downloaded", "Trending", "Most depended upon", and "Recently updated", but CRAN only. Invites contributions for further development on their github site.[15] Searching for "base" or "readLines", "base::readLines", "getURLs" and "vkR::getURLs" returned, "did not match any packages."[5] Y
crantastic[16] 2010s Includes "Most popular packages" and "Recent activity" with user reviews, but CRAN only. Searches for "readLines" and "getURLs" returned, "no results were found." Searches for "base::readLines" and "vkR::getURLs" returned, "The page you were looking for doesn't exist."[5] Y
CRANsearcher[17] since 2011 RStudio addin - CRAN only with user reviews. Searching for "base" or "readLines", "base::readLines", "getURLs" and "vkR::getURLs" returned, "Showing 0 to 0 of 0 entries".[5]
RDocumentation[18] 2010s Includes "Top 5 packages", "Top 5 authors", and "Newest packages" on CRAN, BioConductor and Github. Invites users to contribute (a) new examples and (b) ideas and code for further development on their github site.[19] Searching for "readLines" and "getURLs" produced the desired help files, including a URL (Uniform Resource Locator) that I could use in an R Markdown vignette. Similar searches for other packages and functions seemed to produce something useful. This seems to create a strong preference for RDocumentation over other options, especially since Metacran, crantastic, and CRANsearcher all failed to return anything useful for this request. When this fails, try Rseek.[20] N
depsy[21] 2012[22] Text-mines papers for mentions of software they use, revealing impacts invisible to citation indexes like Google Scholar. Also analyzes code from over half a million GitHub repositories to find how packages are reused by other software projects and assigns fractional credit to contributors based on designated authorship, number of commits, and repo ownership.[23] Searching for "base" or "readLines", "base::readLines", "getURLs" and "vkR::getURLs" nothing.[5] Y
rdrr.io[24] 2016 An index of R packages and documentation from CRAN, Bioconductor, GitHub and R-Forge by Ian Howson. The site allows to run R code only "Snippets Run R code online. Over 9,000 packages are preinstalled!" The "base" package and "readLines" help were easily found on 2018-03-30, but searching for package "vkR" returned, "No results found." (Also, as of 2018-02-26, the "Ecdat" package that has been on CRAN since 2009 was not found in 'rdrr.io'.) ?

Questions to be considered in a proposal for improvement[edit]

Key questions from this comparison:

  • Might it be worth the effort to build a common database and search engine used by all with different defaults and options tailorable by users? The R Foundation might fund something like this if the concept were sufficiently well defined and compelling.
  • One of the simplest parts of such a system might be to share the user reviews between crantastic and CRANsearcher.
  • What might people want done with download statistics?
  • Task Views might include user ratings and download statistics.
  • Data on actual usage could be obtained from users who explicitly agree to having R monitor their usage of different packages. This facility could document which packages were tried, how long each was used, what errors were reported, and what package was used next. Data like these could be used to identify users switching between different related packages. For example, I recently tried gnumeric and quickly switched to readODS when I found that gnumeric required other software that I did not seem to have. Maintainers of gnumeric and readODS might be able to use information like this to improve both packages. Analyses of such data might be portrayed in a network diagram. Such a system would also allow users to turn this feature off and on at any time.

These notes are being published on Wikiversity to invite anyone to add their own thoughts, either directly in this article or in the associated “Discuss” page.

Now it's your turn, dear reader: What would you like to see in a search capability for R?

These notes are posted on Wikiversity precisely to invite others to edit them directly or add comments on the associated “Discuss” page.

Acknowledgements[edit]

This discussion and the accompanying Draft Proposal for improving the ability of R users to search R packages were inspired by the plenary session on "Navigating the R package universe" in the international useR!2017 conference in Brussels, Belgium, July 4-7, 2017.

See also[edit]

  • Silge (2017)

References[edit]

Notes[edit]

  1. Nash, John, University of Ottowa, http://www.telfer.uottawa.ca/en/directory/professors/nash-john 
  2. 2.0 2.1 Graves, Spencer; Sundar, Dorai-Raj; François, Romain (December 2009), "sos: Searching Help Pages of R Packages", R Journal (R Project for Statistical Computing) 1 (2), https://journal.r-project.org/archive/2009/RJ-2009-017/RJ-2009-017.pdf 
  3. Silge (2017)
  4. Getting Help with R, R Foundation, https://www.r-project.org/help.html 
  5. 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 2018-03-12
  6. R Site Search, Jonathan Baron, Department of Psychology, School of Arts and Sciences (which provides this computer), University of Pennsylvania, http://finzi.psych.upenn.edu/search.html 
  7. Rseek, Sasha Goodman, https://rseek.org/ 
  8. On 2018-03-12 RDocumentation had a problem, since fixed, and could not find them -- but Rseek found what RDocumentation had and couldn't find.
  9. <<Citation | title = Nabble R Forum | publisher = Nabble | url = http://r.789695.n4.nabble.com/ | accessdate = 2018-03-12}}
  10. 2018-03-30
  11. Google, Google, https://www.google.com/advanced_search 
  12. RSiteSearch: Search for Key Words or Phrases in Documentation, RDocumentation, https://www.rdocumentation.org/packages/utils/versions/3.4.3/topics/RSiteSearch 
  13. sos: Search Contributed R Packages, Sort by Package, RDocumentation, https://www.rdocumentation.org/packages/sos/versions/2.0-0 
  14. METACRAN: Search and browse all CRAN/R packages, https://www.r-pkg.org/ 
  15. metacran: Tools for a better CRAN experience, METACRAN, https://github.com/metacran 
  16. Wickham, Hadley; Mæland, Bjørn, Welcome to crantastic, a community site for R packages where you can search for, review and tag CRAN packages., R Foundation, https://crantastic.org/ 
  17. CRANsearcher: RStudio Addin for Searching Packages in CRAN Database Based on Keywords, RStudio, https://cloud.r-project.org/web/packages/CRANsearcher/index.html 
  18. RDocumentation: Search all 14,381 CRAN, BioConductor and Github packages., DataCamp, https://www.rdocumentation.org/ 
  19. The web application running rdocumentation.org, DataCamp, https://github.com/datacamp/RDocumentation-app 
  20. 2018-03-30
  21. Credit when software is informally cited, Depsy, http://depsy.org 
  22. About Impactstory, Impactstory, https://profiles.impactstory.org/about 
  23. Dalmeet Singh Chawla (2016-01-04), "The unsung heroes of scientific software: Creators of computer programs that underpin experiments don’t always get their due — so the website Depsy is trying to track the impact of research code.", Nature 529 (7584), http://www.nature.com/news/the-unsung-heroes-of-scientific-software-1.19100 
  24. Ian Howson https://ianhowson.com/