Digital Libraries/Evaluation and user studies

Older versions of the draft developed by UNC/VT Project Team (2009-10-07 PDF WORD)

Module name

Digital Library Evaluation, User Studies

Scope

While a number of kinds of evaluation/research studies may be conducted during the design and development of a digital library (e.g., usability testing), this module is concerned with methods for evaluating the outcomes, impacts, or benefits of a digital library, including cost/benefit analyses. It also includes methods that are useful for general user studies (i.e., studies that intend to more fully understand people's interactions with digital libraries). While some methods covered here are useful for usability testing, usability inspections and usability testing are explicitly covered in module 6-d, Interaction Design, Information Summarization and Visualization, and Usability Assessment.

Learning objectives:

By the end of this module, the student will be able to:

a. Understand the importance of DL evaluation;

b. List and describe the strengths and weaknesses of multiple approaches to evaluation; and

c. Apply an appropriate evaluation method to a particular DL.

5S characteristics of the module:

a. Streams: N/A

b. Structures: N/A

c. Spaces: N/A

d. Scenarios: Scenarios may form the basis of an evaluation plan, by describing particular situations of use that must be supported effectively by the DL.

e. Societies: The concept of societies may be useful in planning an evaluation because it will support the evaluator in more systematically consider the potential stakeholders of the DL.

Level of effort required:

a. In-class time: 2-2 1/2 hours

b. Out-of-class time: 1 1/2 hours for assigned reading

c. Learning activities (optional): See notes on timing with each activity or assignment.

Relationships with other modules:

a. It is expected that this module will follow other modules on digital libraries, and will be in the final portions of the module sequence.

Prerequisite knowledge required:

a. Students will not be expected to have had prior training in social science research methods.

Introductory remedial instruction:

a. None

Body of knowledge:

Evaluation and user studies

1. Definition of evaluation: "An appraisal of the performance or functioning of a system, or part thereof, in relation to some objective(s)" Saracevic, 2000, p.359

a. Evaluation incorporates the making of value judgments about whether performance is adequate

b. Its purpose is to inform decision making (Reeves et al., 2003)

c. Evaluation is critical to any project

d. NSF recommended that at least 10% of the project budget be devoted to evaluation in their early DL initatives

2. User studies may be more general, in terms of the types of questions asked

a. They do not necessarily incorporate the making of value judgments about performance quality

3. User studies may be more specific, in that they involve users

a. Evaluations may be conducted on the DL collection or other aspects of the DL without involving users

4. This module will focus particularly on evaluations that involve users

The object of the evaluation or user study: digital libraries and their processes/functions

1. A particular aspect of a digital library

2. Individual digital libraries

3. Multiple digital libraries

4. DL processes that may be evaluated (based on Saracevic, 2005), and criteria for evaluation

a. Information representations, metadata and surrogates used in the DL

i. Task appropriateness

ii. Usability

iii. User satisfaction

b. Particular tools available in the DL

i. Index, search, and output features

ii. Navigation, browsing

iii. Failures in functionality or usability

iv. User satisfaction

c. Particular services offered by the DL

i. Collection quality

ii. Retrieval performance (recall, precision)

iii. Reliability

iv. Human-intermediated services (e.g., reference services)

v. User satisfaction with individual services or with collection of services

d. User behaviors when interacting with the DL (may or may not be evaluative)

i. Information seeking/searching behaviors

ii. Use of information retrieved

iii. Work patterns

5. "All efforts to design, implement, and evaluate digital libraries must be rooted in the information needs, characteristics, and contexts of the people who will or may use those libraries." Marchionini, Plaisant, & Komlodi, 2003, p.1

Questions that may be asked during an evaluation/user study

1. Frame the study questions based on the decisions that must be made about the DL's functions/processes (Reeves et al., 2003)

a. Focus on those questions that are most important for making the decisions that are most important

b. Focus on impacts of DL functions/services (Marchionini, Plaisant, & Komlodi, 2003)

i. What types of impacts are there? On whom?

ii. Who and what influence those impacts?

2. Formative versus summative evaluation

a. Formative evaluation focused on decisions about how to modify/change the DL's functions/services

b. Summative evaluation focused decisions about the worth or value of the DL's functions/services

Stages/steps in the evaluation/research process

1. Develop an evaluation/research plan (Chowdhury & Chowdhury, 2003; Reeves et al., 2003)

a. Clarify the decisions to be addressed and the questions they generate

b. Identify appropriate evaluation methods, including sampling procedures, data collection procedures and data analysis procedures

c. Carry out the evaluation/research plan

d. Report the results to appropriate stakeholders

i. Primarily the decision makers

ii. Also other constituencies of the DL

2. Review of an example evaluation, in terms of these steps

a. Possible example evaluations:

b. Bishop, A.P. (1998, December). Measuring access, use, and success in digital libraries. The Journal of Electronic Publishing, 4(2). Retrieved February 8, 2006, from www.press.umich.edu/jep/04-02/bishop.html.

c. Marchionini, G. (2000). Evaluating digital libraries: A longitudinal and multifaceted view. Library Trends, 49(2), 304-333.

Evaluation design strategies

1. Naturalistic studies

a. For some evaluation studies, it is critical to conduct them in a natural or naturalistic setting/context

b. The constraints of the setting usually imply that fewer experimental controls can be applied to the study design

c. Usually, the evaluator will need to take into account aspects of the setting as part of the data collected for the evaluation study

2. Experiments

a. Usually conducted in a lab setting, or a setting in which control over the conditions of the evaluation study can be exerted

b. The researcher attempts to control all the potential effects on the results of the study, other than those effects being intentionally manipulated as the focus of the evaluation

c. Some important concepts in designing an experiment

i. Randomization is a key tool for control: random sampling and/or random assignment to treatment and control groups

ii. Variables: the researcher will manipulate the independent variables (e.g., whether the DL has a particular feature or not) and will evaluate the outcomes based on the dependent variable

iii. The design may be a within-subjects design (where each participant interacts with all the variations of the independent variables, and so comparisons on the dependent variable are made "within" each subject's performance) or a between-subjects design (where each participant interacts with only one version of the system and comparisons are made "between" groups of subjects)

3. Avoiding the effects of researcher bias

a. It's easy for a researcher's biases to influence the design of a study and, thus, its outcomes

i. Identify your biases

ii. Ensure that your study design and procedures will allow you to avoid any influence on the study outcomes

Data collection and measurement methods

1. Collecting data from people requires ethical treatment of those people as study participants

a. Each institution will require review of the research proposal by an Institutional Review Board that verifies that study participants are being treated ethically

2. Observation of user behaviors, including transaction logs

a. To see what the user is doing as he or she interacts with the system

b. Observation of work (e.g., via contextual inquiry)

i. Special kind of interview

ii. Observe person while performing task to be supported

iii. Interrupt with questions about how and why, as needed

c. Think-aloud protocols

i. During DL use, the participant is asked to verbalize their thought processes

ii. Allows you to observe "unobservable" cognitive behaviors

iii. Usually videotaped or audiotaped

d. Indirect observation of work

i. Logging and metering techniques embedded in the software of the current system or intermediate versions

e. Diaries

i. For detailed descriptions of tasks

How much time they take
Sequential dependencies between tasks

ii. Allows observation over longer periods of time than contextual inquiry interviews

3. Interviews and focus groups

a. Augmenting other data collection methods, or on their own

b. Uses during DL evaluation

i. For identifying problems in the DL design

ii. For additional features needed in the DL

iii. For other improvements in the DL which the user can suggest

c. Individual interviews or group interview (focus groups)

i. Focus groups require a skilled facilitator

4. Questionnaires

a. Surveys: typically one item/question per construct

b. Measures: intended to measure constructs that are not directly observable and not easily measured with a single item

i. The more subjective the construct, the more likely that you will need a multiple-item measure for it

Find a measure in the literature, rather than developing your own

c. Print vs. online administration

i. Print allows people to annotate (can be good or bad)

ii. Online eliminates the need for a separate data entry step

Study sample: Who should be participants in your evaluation study?

1. Define the population of interest

a. Current users or a subset of them

b. Potential audiences (who are not current users)

2. Consider sample size

a. Usually a tradeoff between small sample (cheaper) and generalizability

b. Intensive versus extensive studies

i. Intensive studies: to thoroughly understand a phenomenon within its context

ii. Extensive studies: to understand the extent of a phenomenon within a population

3. Develop a sampling plan

a. Random sampling

i. Supports statistical inferences to the population

ii. Identify a population to which you want to generalize your findings

iii. Enumerate the population

iv. Draw a random sample

v. Problem: enumerating the entire population

Not necessarily problematic, but often is

4. Other methods of sampling

a. Quota sampling, purposive sampling, accidental/convenience sampling

b. Strive for representativeness

i. In range, as well as central tendency

5. Develop a plan for recruiting the sample you want

a. May need to offer incentives

Analysis and interpretation of data

1. Reporting the results and interpreting the results are two distinct steps

2. Interpretation should address the questions, "What do the results mean? How should they be understood?"

3. All results must be interpreted in the context of:

a. Prior empirical work and relevant theoretical frameworks Situation

i. What is happening in the particular situation in which the study was done?

b. Weaknesses in the research

i. Measurements: level of reliability; validity

ii. Design: attrition, external events; internal and external threats to validity

iii. Analysis method: assumptions violated

c. Recommend particular actions, based on the interpretation of the results

Resources

Assigned readings for students

i. Nicholson, Scott. (2004). A conceptual framework for the holistic measurement and cumulative evaluation of library services. Journal of Documentation, 60(2), 164-182.

ii. Reeves, Thomas, Apedoe, Xornam, & Hee Woo, Young. (2003). Evaluating digital libraries: A user-friendly guide. University Corporation for Atmospheric Research; National Science Digital Library. Retrieved 3/1/2007 from http://www.dpc.ucar.edu/projects/evalbook/EvaluatingDigitalLibraries.pdf.

a. Chapter 1, Why evaluate? (p.1-6)

b. Chapter 2, Evaluating planning (p.7-21)

Additional Potential Readings

i. Bishop, A.P. (1998, December). Measuring access, use, and success in digital libraries. The Journal of Electronic Publishing, 4(2). Retrieved 2/8/2006 from www.press.umich.edu/jep/04-02/bishop.html.

ii. Bishop, A.P., Mehra, B., Bazzell, I., & Smith, C. (2003). Participatory action research and digital libraries: Reframing evaluation. In Bishop, A.P., Van House, N.A., & Buttenfield, B.P. (eds.), Digital Library Use: Social Practice in Design and Evaluation. Cambridge, MA: MIT Press, 161-189.

iii. Bollen, J. and R. Luce. (2002). Evaluation of digital library impact and user communities by analysis of usage patterns. D-Lib Magazine, 8(6) June 2002. Retrieved 3/1/2007 from http://www.dlib.org/dlib/june02/bollen/06bollen.html

iv. Bryan-Kinns, Nick & Blandford, Ann. (2000). A survey of user studies for digital libraries. RIDL Working Paper. Retrieved 3/1/2007 from http://www.cs.mdx.ac.uk/ridl/DLuser.pdf.

v. Choudhury, G.S.; Hobbs, B.; M Lorie, Flores, N.E. (2002, July/August). A framework for evaluating digital library service. D-Lib Magazine, 8(7/8). Retrieved 3/1/2007 from http://www.dlib.org/dlib/july02/choudhury/07choudhury.html.

vi. Marchionini, G. (2000). Evaluating digital libraries: A longitudinal and multifaceted view. Library Trends, 49(2), 304-333.

vii. Rieger, R., & Gay, G. (1999, June 15). Tools and Techniques in Evaluating Digital Imaging Projects. RLG DigiNews.

viii. Saracevic, Tefko (2005). How were digital libraries evaluated? Presented at Libraries in the Digital Age (LIDA), Dubrovnik and Mljet, Crotia, May 30-June 3. Retrieved 3/1/2007 from http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf.

ix. Thong, J. (2002). Understanding user acceptance of digital libraries: What are the roles of interface characteristics, organizational context, and individual differences? International Journal of Human-Computer Studies, 57(3), 215-242.

Concept map

None

Exercises / Learning activities

1. Analyze a DL evaluation report

a. Exercise 13.a, "Analyze a DL evaluation report," could be adapted to an in-class small-group discussion exercise. If so, the results of the each group's analysis could be reported orally or could be posted to a class wiki or discussion forum.

b. If used as an in-class exercise, assign the groups to read a particular evaluation report before class, to prepare for their in-class discussion and report.

c. Time requirements: 2 hours of preparation outside of class; 25-30 minutes for discussion in class; 20-30 minutes for report presentation in class, depending on the number of groups.

2. Develop an evaluation plan

a. Based on a DL that is familiar to all the students in the class (e.g., flickr, MySpace, a music collection, the university's OPAC or a special collection that is well-known), have students work in small teams (3-4 people each) to develop a draft evaluation plan. They can play the role of an evaluation consulting firm, designing an evaluation study for their client, the DL managers.

b. Each plan should include the following:

i. The evaluation questions to be addressed

Stated briefly, in one sentence (preferably ending with ?)

ii. The sample to be included

How they will be selected
How they will be recruited

iii. The methods for data collection

The types of data to be collected, and how each pertains to the evaluation question
The procedures for collecting the needed data

c. Have each team of students present their plan to the class, as if it were an initial presentation to the client (the DL managers).

d. Time requirements: It is expected that the students will prepare their report after this module has been presented in class. Students should expect to spend 4-5 hours outside of class, preparing their reports. Each report should be presented in 7-10 minutes, during the next class session.

3. Interview a digital librarian about evaluation

a. Note: This exercise is only possible if there are a number of robust local digital library projects, and the students will have access to their directors/administrators.

b. For this exercise, students should work in pairs; each pair will be assigned to investigate a particular digital library. Prior to the interview, each pair should read the available documentation on the digital library on which they're focused. Using the following interview guide, they should interview the director/administrator of the digital library.

c. Interview guide:

i. When was the digital library first established?

ii. What are the primary goals of the DL?

iii. In what ways do you evaluate whether you're achieving those goals?

iv. Do you evaluate any other aspects of the DL's operations? If so, how?

v. How and to whom are the evaluation results reported?

d. Each pair should write up a brief (1-3 page) summary of their interview findings. In addition, they should be prepared to orally report on the most interesting aspects of those findings at the next class session.

e. Time requirements, outside of class: 1-2 hours for preparatory reading; 1 hour for conducting the interview; 2-3 hours for writing up the interview report.

f. Time requirements, in class: 30-40 minutes for the class to discuss the findings from the interviews.

Evaluation of learning outcomes

1. Analyze a digital library evaluation report

a. Using Saracevic's (2005) meta-analysis of digital library (DL) evaluations as a framework, evaluate an additional DL evaluation report. The report can be selected from the following :

i. Byrd, S., et al. (2001). Cost/benefit analysis for digital library projects: The Virginia Historical Inventory project (VHI). The Bottom Line: Managing Library Finances, 14(2), 65-75.

ii. Gambles, A. (2001). The HeadLine personal information environment: Evaluation Phase One. D-Lib Magazine, 7(3). www.dlib.org/dlib/march01/gambles/03gambles.html.

iii. Palmer, D., & Robinson, B. (2001). Agora: The hybrid library from a user's perspective. Ariadne, 26. www.ariadne.ac.uk/issue26/case-studies/intro.htm.

iv. Zhang, Y., Lee, K., & You, B.-J. (2001). Usage patterns of an electronic theses and dissertations system. Online Information Review, 25(6), 370-377.

v. List of possibilities; still need to be viewed.

b. Analyze the evaluation report in terms of the following aspects:

i. Construct for evaluation.

What was evaluated? What was actually meant by a "digital library"? What elements (components, parts, processes…) were involved in evaluation?

ii. Context of evaluation - selection of a goal, framework, viewpoint or level(s) of evaluation.

What was the basic approach or perspective? What was the level of evaluation? What was the objective(s)?

iii. Criteria reflecting performance as related to selected objectives.

What parameters of performance were concentrate[d] on? What dimension or characteristic [was] evaluated?

iv. Methodology for doing evaluation.

What measures and measuring instruments were used? What samples? What procedures were used for data collection? For data analysis?

v. Findings from evaluation studies

Only a single generalization is provided." (Saracevic, 2005, p.2-3)

c. Prepare a report (2-5 pages, single-spaced) summarizing the findings of your analysis.

d. The report should be evaluated in terms of its demonstration that the authors understood the DL evaluation conducted, its coverage of the five aspects of evaluations posed by Saracevic, its identification of strengths and weaknesses in the DL evaluation, and its clarity (organization, grammar, etc.).

e. Time requirements: approximately 6-8 hours outside of class.

2. Develop an evaluation plan

a. Class exercise 10.b, "Develop an evaluation plan," could be adapted as a graded assignment. Each team would be expected to develop their plan over the week after the class's discussion of evaluation. If class time is available, the final plan can be presented orally; or, if preferred, the evaluation plans could be turned in as an evaluation proposal (2-4 pages).

b. The evaluation plans would be evaluated in terms of their completeness (were all the major components of an evaluation study addressed?), their feasibility (could the evaluation study actually be conducted, given reasonable resources?), and their clarity.

c. Time requirements: 6-8 hours outside of class, preparing and writing the evaluation plan.

Glossary

a. Between-subjects design: A research design in which "each research participant receives only one level of the independent variable" (Schmidt, 2000).

b. Dependent variable: "A variable that may, it is believed, be predicted by or caused by one or more other variables called independent variables." (U.S. Dept. of Justice, n.d.)

c. Evaluation: "An appraisal of the performance or functioning of a system, or part thereof, in relation to some objective(s)" Saracevic, 2000, p.359

d. Formative evaluation: An evaluation that is intended to "strengthen or improve the object being evaluated. Formative evaluations are used to improve [information systems] while they are still under development." (Trochim, 2001, p.347)

e. Independent variable: "A variable that may, it is believed, predict or cause fluctuation in an dependent variable." (U.S. Dept. of Justice, n.d.)

f. Research design: "A plan of what data to gather, from whom, how and when to collect the data, and how to analyze the data obtained." (U.S. Dept. of Justice, n.d.)

g. Sample: "The actual units you select to participate in your study." (Trochim, 2001, p.351)

h. Stakeholders: "People who have a vested interest in the success of the project or are involved in the implementation of the project." (California State University, Monterey Bay, n.d.)

i. Summative evaluation: An evaluation that "examine[s] the effects or outcomes of [an information system]." (Trochim, 2001, p.352)

j. Within-subjects design: A research design in which "each research participant provides data for all the levels of the independent variable" (Schmidt, 2000).

References for glossary:

a. Saracevic, T. (2000). Digital library evaluation: Toward evolution of concepts. Library Trends, 49(2), 350-369.

b. Trochim, W.M.K. (2001). The Research Methods Knowledge Base. Second ed. Cincinnati, OH: Atomic Dog Publishing.

c. U.S. Dept. of Justice, Office of Justice Programs, Bureau of Justice Assistance, Center for Program Evaluation. (n.d.). Glossary. Retrieved 5/23/2007 from http://www.ojp.usdoj.gov/BJA/evaluation/glossary/glossary_r.htm.

Additional useful links