Summative Evaluation

Expert Judgement

According to Dick and Carey (1996), the expert judgement phase of summative evaluation is used to find out if either current or candidate instruction can meet an organization's identified instructional needs. The following activities are part of the expert judgement phase of summative evaluation when reviewing candidate instruction:

evaluating the congruence between the organization's instructional needs and candidate instruction
evaluating the completeness and accuracy of candidate instruction
evaluating the instructional strategy contained in the candidate instruction
evaluating the utility of the instruction
determining current users' satisfaction with the instruction (p. 323)

The expert judgement phase has already been accomplished if the instruction was tailored to the identified needs of the organization, systematically design and developer and been through formative evaluation. However, the instruction must be subjected to expert judgement if the organization is unfamiliar with the instruction and its developmental history (Dick & Carey, 1996). Usually expert judgement is used to select from the available instructional options in order to choose one or two that are most promising for a field trial.

Dick, W. and Carey, L. (1996). The Systematic design of instruction, 4th ed. New York: Harper Collins Publishing.

Field Trial

The purpose of the field trial phase of summative evaluation is to determine the effectiveness of instruction with the target group in the intended setting (Dick & Carey, 1996). There are two parts to the field trial phase: outcomes analysis and management analysis. The outcomes analysis reviews the impact of the instruction on the learner, the job and the organization. Management analysis assesses "instructor and supervisor attitudes related to learner performance, implementation feasibility, and costs" (p. 323).

Dick, W. and Carey, L. (1996). The Systematic design of instruction, 4th ed. New York: Harper Collins Publishing.

Determine goals of evaluation

Smith and Ragan (1999) suggest to determine goals of evaluation as the first step in a goal-based summative evaluation. The most important part of this stage is determining questions that should be answered as a result of the evaluation. The client organization and/or funding agencies and other stakeholders should identify the questions. These questions will guide the remainder of the summative evaluation. Questions might include:

Does implementation of the instruction solve the problem identified in the needs assessment?
Do the learners achieve the goals of the instruction?
What are the costs of the instruction? What is the "return on investment" of the instruction? (p. 355)

Both the client and evaluator should agree on the questions before moving on to subsequent steps of summative evaluation.

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Select indicators of success

In the select indicators of success phase of summative evaluation, Smith and Ragan (1999) recommend that the evaluator and his clients "determine where to look for evidence of the impact of the instructional program (p. 355). The following questions can be used to help target the program's impact: If the program is successful, what will we observe

In instructional materials and learners' activities?
In teachers'/trainers' knowledge, practice, attitudes?
In learners' understanding, processes, skills, attitudes? (p. 356)

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Select orientation of evaluation

Once questions for summative evaluation are identified, the evaluator and clients must agree on the most appropriate orientation for answering the questions (Smith & Ragan, 1999). The following issues are addressed by decisions on orientation.

Will the approach be more goal-based or goal-free?
If one perspective predominates, will there be aspects of the other orientation?
Are quantitative or qualitative data appropriate as evidence to answer the questions?
Will a more experimental or naturalistic approach be used? (p. 356)

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Select design of evaluation

According to Smith and Ragan (1999), "Evaluation designs describe what data will be collected, when the data should be collected, and under what conditions data should be collected in order to answer the evaluation questions" (p. 356). Instructional designers should actually begin developing this plan during needs assessment when the learning goals and reasons for identifying them can clearly be recalled. Three issues to consider when designing evaluation are internal validity (many things cause changes in learners' performance/attitude other than the instructional program), external validity (ability to generalize results of evaluation to learners or contexts not part of the evaluation) and control (designer determines the limits of what can be done with internal and external validity). Using comparison groups and randomization can help a designer deal with the issues associated with evaluation.

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Design or select evaluation measures

When designing summative evaluation, most evaluators plan for several different measures of the effectiveness of the instructional program (Smith & Ragan, 1999). An evaluator most often plans for measurement in the categories of payoff, learning, attitude, implementation and cost. Multiple measures can also be used within any one of these categories.

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Collect data

After the evaluator selects or develops appropriate measurement instruments, the next phase of summative evaluation is to plan for the collection of data (Smith & Ragan, 1999). Within the data collection plan should be scheduling of data collection periods. These are determined by the evaluation design and the types of payoff and implementation measures. It is the responsibility of the evaluator to ensure that all data collection policies are strictly followed.

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Analyze data

Smith and Ragan (1999) recommend analyzing data in a fashion that will be easy for the decision makers "to see how the instructional program affected the problem presented in the needs assessment" (p. 360). Descriptive statistics (e.g. means, range, frequency) or inferential statistics (differences between two instructional programs or from pretest to posttest within the same program) may be required.

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Report results

In Smith and Ragan's (1999) summative evaluation model, report results is the final phase. According to Morris (1978), the following sections should be included in a summative evaluation report.

Summary
Background
Description of evaluation study
Results
Discussion
Conclusions and recommendations

Smith, P. and Ragan, T. (1999). Instructional design (2nd ed.). New York: John Wiley & Sons, Inc.

Morris, L. L. (1978). Program evaluation kit. Beverly Hills, CA: Sage Publications.

Level 1 - Reaction

Level 1 of Kirkpatrick's model of evaluation measures the the participants' reaction to the instructional program (Winfrey, 1999). After completing the program students are asked to evaluate the training. This is usually done using a questionnaires, or what are sometimes referred to as "happy sheets" or "smile sheets" (Kruse, 2002). This level of evaluation differs from surveys used in formative evaluation in that the questionnaires are distributed to the entire student population. A typical Level 1 questionnaire might ask questions about the relevance of objectives, interest level, interactivity, ease of navigation and perceived transferability to the workplace. Most organizations conduct at least a Level 1 evaluation because it is the easiest and cheapest to administer.

Carliner, S. (2002). Summary of the Kirkpatrick model. Retrieved October 11, 2002, from VNU Business Media, An Overview of Online Learning http://www.vnulearning.com/wp/kirkpatrick.htm

Kruse, K. (2002). Evaluating e-Learning: Introduction to the Kirkpatrick model. Retrieved October 11, 2002, from the eLearning Guru.com Web site: http://www.e-learningguru.com/articles/art2_8.htm

Level 2 - Learning

Level 2 of Kirkpatrick's model of evaluation measures how much participants learned (Carliner, 2002). This is often accomplished with a criterion-referenced test, with the criteria being the objectives for the course. This type of evaluation ensures quality through conformance to course requirements. "Assessing at this level moves the evaluation beyond learner satisfaction and attempts to assess the extent students have advanced in skills, knowledge, or attitude" (Winfrey, 1999, Level 2 Evaluation - Learning, para. 1). Several methods may be used to evaluate learning including formal and informal testing, team assessment and self-assessment. Usually pre- and posttests are administered in order to determine the amount of learning that occurs.

Carliner, S. (2002). Summary of the kirkpatrick model. Retrieved October 11, 2002, from VNU Business Media, An Overview of Online Learning http://www.vnulearning.com/WP/kirkpatrick.htm

Winfrey, E. C. (1999). Kirkpatrick's four levels of evaluation. Retrieved October 11, 2002, from San Diego State University, Encyclopedia of Educational Technology Web site: http://coe.sdsu.edu/eet/Articles/k4levels/index.htm

Level 3 - Transfer

Level 3 of Kirkpatrick's model of evaluation attempts to answer the question, "Are the newly acquired skills, knowledge, or attitude being used in the everyday environment of the learner?" (Winfrey, 1999, Level 3 Evaluation - Transfer, para. 1). Typically a Level 3 evaluation assesses the amount of learned material students actually use in their work environment 6 weeks to 6 months (or longer) after completing a course (Carliner, 2002). This type of evaluation may be conducted in the form of tests, observations, surveys, and interviews with co-workers and supervisors.

Carliner, S. (2002). Summary of the kirkpatrick model. Retrieved October 11, 2002, from VNU Business Media, An Overview of Online Learning http://www.vnulearning.com/WP/kirkpatrick.htm

Winfrey, E. C. (1999). Kirkpatrick's four levels of evaluation. Retrieved October 11, 2002, from San Diego State University, Encyclopedia of Educational Technology Web site: http://coe.sdsu.edu/eet/Articles/k4levels/index.htm

Level 4 - Results

Level 4 of Kirkpatrick's model of evaluation measures the impact of training from a business perspective. Success of the training may be evaluated in terms of increased production, improved quality, decreased costs, reduced frequency of accidents, increased sales, and even higher profits or return on investment (Winfrey, 1999). According to Kruse, "The only scientific way to isolate training as a variable would be to isolate a representative control group within the larger student population, and then rollout the training program, complete the evaluation, and compare against a business evaluation of the non-trained group" (Level Four: Business Results, para. 1). Unfortunately, Level 4 evaluation is rarely completed because of difficulty in obtaining appropriate business data and ability to isolate training as a unique variable.

Kruse, K. (2002). Evaluating e-Learning: Introduction to the Kirkpatrick model. Retrieved October 11, 2002, from the eLearning Guru.com Web site: http://www.e-learningguru.com/articles/art2_8.htm

Winfrey, E. C. (1999). Kirkpatrick's four levels of evaluation. Retrieved October 11, 2002, from San Diego State University, Encyclopedia of Educational Technology Web site: http://coe.sdsu.edu/eet/Articles/k4levels/index.htm