Charles Champlin (2006), a journalist for Time and Life magazines, describes his experience of taking essay tests as a student at Harvard:
“The worst were the essay questions (which seemed only distantly related to whatever you’d read or heard in lectures). They made a statement and then simply said, ‘Discuss.’ O terrifying word, ‘Discuss.’ Nothing so simple as tossing in a few facts retained from all-night cramming. It was meaning that was sought – which was, as I’d already begun to appreciate, the way it should be. But it was a strained step up from the exams I’d known before, when memory, regurgitated, would get you around almost any corner.”
Champlin’s reminiscence reveals some of the strengths and dangers associated with essay questions. They are a wonderful way to test higher-level learning, but they require careful construction to maximize their assessment effectiveness.
I. Strengths Associated with Essay Examinations
Among the strengths of essay examinations, faculty who use them find they are a valuable means to measure higher-order learning and a wonderful way, when scored properly, to further student learning. Given these strengths, essay tests require careful preparation and scoring.
1. Essay Questions Test Higher-Level Learning Objectives
Unlike objective test items that are ideally suited for testing students’ broad knowledge of course content in a relatively short amount of time, essay questions are best suited for testing higher-level learning. By nature, they require longer time for students to think, organize and compose their answers.
In the table below, appropriate testing strategies are associated with Bloom’s hierarchy of learning. The action verbs under each domain illustrate the kinds of activities that a test item might assess. Use the verbs when constructing your essay questions so that students know what you expect as they write. While essay questions can assess all the cognitive domains, most educators suggest that due to the time required to answer them, essay questions should not be used if the same material can be assessed through a multiple-choice or objective item. Reserve your use of essay questions for testing higher-level learning that requires students to synthesize or evaluate information.
2. Essay Questions When Scored Properly Can Further Learning
Teachers score essay exams by either the holistic approach or the analytic approach.
The holistic approach involves the teacher reading all the responses to a given essay question and assigning a grade based on the overall quality of the response. Some teachers use a holistic approach by ranking students’ answers into groups of best answers, average answers and poor answers and subdividing the groups to assign grades.
Holistic scoring works best for essay questions that are open-ended and can produce a variety of acceptable answers.
Analytic scoring involves reading the essays for the essential parts of an ideal answer. In this case, you will need to make a list of the major elements that students should include in an answer. You will grade the essays based on how well students’ answers match the components of the model answer.
Whichever method, holistic or analytic, that you use to score the exam, you should write comments on the students’ papers to enhance their learning. Your comments will help students write better essays for future classes and reinforce what students know and need to learn. Your comments are also a good reminder for yourself if students come to you with questions about their grades.
II. Dangers to Consider When Giving and Grading Essay Examinations
1. Establish limits within the essay question
The example of Charles Champlin’s experience at Harvard where his teachers gave a statement and then simply said, ‘Discuss,’ shows a danger in using essay questions. Instructors should build limits into questions in order to save needless writing due to vague questions: “With some essay questions, students can feel like they have an infinite supply of lead to write a response on an indefinite number of pages about whatever they feel happy to write about. This can happen when the essay question is vague or open to numerous interpretations. Remember that effective essay questions provide students with an indication of the types of thinking and content to use in responding to the essay question” (Reiner, 2002).
Another good way to prevent students from spending excessive time on essays is to give them testing instructions on how long they should spend on test items. McKeachie (2002) gives the following advice: “As a rule of thumb I allow about 1 minute per item for multiple-choice or fill-in-the-blank items, 2 minutes per short-answer question requiring more than a sentence answer, 10 to 15 minutes for a limited essay question, and half-hour to an hour for a broader question requiring more than a page or two to answer.”
2. Remember that essays require more time to score
While essay exams are quicker to prepare than multiple-choice exams, essay exams take much longer to score. You should plan sufficient time for scoring the essays to prevent finding yourself crunched to report final grades.
3. Avoid scoring prejudices
Essay exams are subject to scoring prejudices. Reading all of an individual’s essays at the same time can cause either a positive or a negative bias on the part of the reader. If a student’s first essay is strong, the examiner might read the student’s remaining essays with a predisposition that they are also going to be strong. The reverse is also true. To prevent this scoring prejudice, educators suggest reading all the answers to a single essay question at one time.
Champlin, C. (2006). A life in writing: the story of an American journalist. Syracuse: Syracuse University.
McKeachie, W. (2002). McKeachie’s teaching tips (11th. ed.) New York: Houghton Mifflin.
Reiner, C., Bothell, T., Sudweeks, R., & Wood, B. (2002). Preparing effective essay questions. (http://testing.byu.edu/info/handbooks/WritingEffectiveEssayQuestions.pdf).
Writing Good Multiple Choice Test Questions
Multiple choice test questions, also known as items, can be an effective and efficient way to assess learning outcomes. Multiple choice test items have several potential advantages:
Versatility: Multiple choice test items can be written to assess various levels of learning outcomes, from basic recall to application, analysis, and evaluation. Because students are choosing from a set of potential answers, however, there are obvious limits on what can be tested with multiple choice items. For example, they are not an effective way to test students’ ability to organize thoughts or articulate explanations or creative ideas.
Reliability:Reliability is defined as the degree to which a test consistently measures a learning outcome. Multiple choice test items are less susceptible to guessing than true/false questions, making them a more reliable means of assessment. The reliability is enhanced when the number of MC items focused on a single learning objective is increased. In addition, the objective scoring associated with multiple choice test items frees them from problems with scorer inconsistency that can plague scoring of essay questions.
Validity:Validity is the degree to which a test measures the learning outcomes it purports to measure. Because students can typically answer a multiple choice item much more quickly than an essay question, tests based on multiple choice items can typically focus on a relatively broad representation of course material, thus increasing the validity of the assessment.
The key to taking advantage of these strengths, however, is construction of good multiple choice items.
A multiple choice item consists of a problem, known as the stem, and a list of suggested solutions, known as alternatives. The alternatives consist of one correct or best alternative, which is the answer, and incorrect or inferior alternatives, known as distractors.
Constructing an Effective Stem
1. The stem should be meaningful by itselfand should present a definite problem. A stem that presents a definite problem allows a focus on the learning outcome. A stem that does not present a clear problem, however, may test students’ ability to draw inferences from vague descriptions rather serving as a more direct test of students’ achievement of the learning outcome.
2.The stem should not contain irrelevant material, which can decrease the reliability and the validity of the test scores (Haldyna and Downing 1989).
3. The stem should be negatively stated only when significant learning outcomes require it.Students often have difficulty understanding items with negative phrasing (Rodriguez 1997). If a significant learning outcome requires negative phrasing, such as identification of dangerous laboratory or clinical practices, the negative element should be emphasized with italics or capitalization.
4. The stem should be a question or a partial sentence. A question stem is preferable because it allows the student to focus on answering the question rather than holding the partial sentence in working memory and sequentially completing it with each alternative (Statman 1988). The cognitive load is increased when the stem is constructed with an initial or interior blank, so this construction should be avoided.
Constructing Effective Alternatives
1. All alternatives should be plausible. The function of the incorrect alternatives is to serve as distractors,which should be selected by students who did not achieve the learning outcome but ignored by students who did achieve the learning outcome. Alternatives that are implausible don’t serve as functional distractors and thus should not be used. Common student errors provide the best source of distractors.
2. Alternatives should be stated clearly and concisely. Items that are excessively wordy assess students’ reading ability rather than their attainment of the learning objective
3. Alternatives should be mutually exclusive. Alternatives with overlapping content may be considered “trick” items by test-takers, excessive use of which can erode trust and respect for the testing process.
4. Alternatives should be homogenous in content.Alternatives that are heterogeneous in content can provide cues to student about the correct answer.
5. Alternatives should be free from clues about which response is correct. Sophisticated test-takers are alert to inadvertent clues to the correct answer, such differences in grammar, length, formatting, and language choice in the alternatives. It’s therefore important that alternatives
- have grammar consistent with the stem.
- are parallel in form.
- are similar in length.
- use similar language (e.g., all unlike textbook language or all like textbook language).
6. The alternatives “all of the above” and “none of the above” should not be used.When “all of the above” is used as an answer, test-takers who can identify more than one alternative as correct can select the correct answer even if unsure about other alternative(s). When “none of the above” is used as an alternative, test-takers who can eliminate a single option can thereby eliminate a second option. In either case, students can use partial knowledge to arrive at a correct answer.
7. The alternatives should be presented in a logical order (e.g., alphabetical or numerical) to avoid a bias toward certain positions.
8.The number of alternatives can vary among items as long as all alternatives are plausible. Plausible alternatives serve as functional distractors, which are those chosen by students that have not achieved the objective but ignored by students that have achieved the objective. There is little difference in difficulty, discrimination, and test score reliability among items containing two, three, and four distractors.
1. Avoid complex multiple choice items, in which some or all of the alternatives consist of different combinations of options. As with “all of the above” answers, a sophisticated test-taker can use partial knowledge to achieve a correct answer.
2. Keep the specific content of items independent of one another. Savvy test-takers can use information in one question to answer another question, reducing the validity of the test.
Considerations for Writing Multiple Choice Items that Test Higher-order Thinking
When writing multiple choice items to test higher-order thinking, design questions that focus on higher levels of cognition as defined by Bloom’s taxonomy. A stem that presents a problem that requires application of course principles, analysis of a problem, or evaluation of alternatives is focused on higher-order thinking and thus tests students’ ability to do such thinking. In constructing multiple choice items to test higher order thinking, it can also be helpful to design problems that require multilogical thinking, where multilogical thinking is defined as “thinking that requires knowledge of more than one fact to logically and systematically apply concepts to a …problem” (Morrison and Free, 2001, page 20). Finally, designing alternatives that require a high level of discrimination can also contribute to multiple choice items that test higher-order thinking.
- Burton, Steven J., Sudweeks, Richard R., Merrill, Paul F., and Wood, Bud. How to Prepare Better Multiple Choice Test Items: Guidelines for University Faculty, 1991.
- Cheung, Derek and Bucat, Robert. How can we construct good multiple-choice items? Presented at the Science and Technology Education Conference, Hong Kong, June 20-21, 2002.
- Haladyna, Thomas M. Developing and validating multiple-choice test items, 2nd edition. Lawrence Erlbaum Associates, 1999.
- Haladyna, Thomas M. and Downing, S. M.. Validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 51-78, 1989.
- Morrison, Susan and Free, Kathleen. Writing multiple-choice test items that promote and measure critical thinking. Journal of Nursing Education 40: 17-24, 2001.