http://apm.sagepub.com
Applied Psychological Measurement
DOI: 10.1177/0146621606288554
2006; 30; 394 Applied Psychological Measurement
Joanna S. Gorin and Susan E. Embretson
Item Diffficulty Modeling of Paragraph Comprehension Items
http://apm.sagepub.com/cgi/content/abstract/30/5/394
The online version of this article can be found at:
Published by:
http://www.sagepublications.com
can be found at:Applied Psychological Measurement Additional services and information for
http://apm.sagepub.com/cgi/alerts Email Alerts:
http://apm.sagepub.com/subscriptions Subscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
© 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
by on June 23, 2008 http://apm.sagepub.comDownloaded from
Item Difficulty Modeling of Paragraph
Comprehension Items
Joanna S. Gorin, Arizona State University
Susan E. Embretson, Georgia Institute of Technology
Recent assessment research joining cognitive
psychology and psychometric theory has introduced
a new technology, item generation. In algorithmic
item generation, items are systematically created
based on specific combinations of features that
underlie the processing required to correctly solve
a problem. Reading comprehension items have been
more difficult to model than other item types due to
the complexities of quantifying text. However,
recent developments in artificial intelligence for text
analysis permit quantitative indices to represent
cognitive sources of difficulty. The current study
attempts to identify generative components for the
Graduate Record Examination paragraph
comprehension items through the cognitive
decomposition of item difficulty. Text
comprehension and decision processes accounted
for a significant amount of the variance in item
difficulties. The decisionmodel variables
contributed significantly to variance in item
difficulties, whereas the text representation variables
did not. Implications for score interpretation and
future possibilities for item generation are
discussed. Index terms: difficulty modeling,
construct validity, comprehension tests, item
generation
For the past century, the use of standardized tests has been one of the most successful methods
of evaluation and placement for large populations. The growing use of such tests has placed an
enormous demand on the assessment community to develop large numbers of test questions of suf-
ficient psychometric quality to sustain perpetual testing. One of the most recent advances in large-
scale assessment is the promise of a new technology for test developers called item generation
(Bejar, 1993; Embretson, 1998; Irvine & Kyllonen, 2002). The benefits of item generation for the
assessment field are numerous, but a major challenge exists. Adequate knowledge of the response
processes to permit accurate prediction of an item’s psychometric features from its generative
components must be acquired (Bejar, 1993). In fact, this requires two types of knowledge:
(a) knowledge of the relevant processes guiding the item solution and (b) knowledge of the manip-
ulable task features corresponding to cognitive processing.
To date, generative approaches have been applied to item types in several domains, including
mental rotation (Bejar, 1993; Embretson, 1994; Embretson & Gorin, 2001), abstract reasoning
(Embretson, 1999), and hidden figures (Bejar & Yocom, 1991), among others. These item types
have been particularly suited to generative approaches given the ease with which relevant task
features can be manipulated to control processing difficulty. For example, mental rotation item
difficulty can be manipulated by specifying the length of a line or the width of an angle. The
instantiation of the generative features is such that processing difficulty can be manipulated reli-
ably by a simple algorithm for either a human item writer or a computerized item generation
Applied Psychological Measurement,Vol. 30 No. 5, September 2006, 394–411
394 DOI: 10.1177/0146621606288554
� 2006 Sage Publications
© 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
by on June 23, 2008 http://apm.sagepub.comDownloaded from
program. Other item types, specifically those measuring verbal processes, present a unique chal-
lenge to item generation. Although a large body of cognitive research on verbal reasoning exists,
clear construct definitions and specific processing models for standardized test items can be far
more complex than for nonverbal tasks. In addition, the typical tasks used to measure verbal rea-
soning, such as reading comprehension questions, are difficult to quantify in terms of task fea-
tures. Some advances in artificial intelligence (AI), such as automatic essay scorers like e-rater
(Powers, Burstein, Chodorow, Fowles, & Kukich, 2000) or text parsers like SourceFinder (Pas-
sonneau, Hemat, Plante, & Sheehan, 2002), have improved our capabilities of reliably measuring
characteristics of texts. However, even with these advances, until the relevant generative features
associated with processing are more clearly identified, the technology will need to wait for the the-
ory to catch up.
The current study seeks to gather evidence supporting a cognitive model of processing for read-
ing comprehension items on the Graduate Record Examination (GRE)–Verbal for several pur-
poses: (a) to provide a theoretical rationale for test score interpretation by explaining item
difficulty through cognitive processes, (b) to gather evidence of the substantive aspect of the valid-
ity of test score interpretation, and (c) to examine possible aspects of item design useful for future
developments in item generation. This study approaches the examination of item processing with
a systematic application of the first three steps of Embretson’s Cognitive Design System (CDS;
Embretson, 1993, 1998). The conceptual framework of the CDS approach distinguishes between
the two aspects of construct validity: construct representation and nomothetic span (Messick,
1995). Construct representation concerns the processes, strategies, and knowledge structures that
are involved in item solving and allows cognitive theory to have a central role in test development
and interpretation (Embretson & Gorin, 2001). Items can be created by combining processing
components of items that link directly back to a comprehensive definition of the construct based in
cognitive theory (Bejar, 1993; Embretson, 1994, 1999).
The procedural framework of the CDS not only elaborates the stages involved in developing
process models of item performance but also relates item processes to test validity. Embretson
(1994) outlines a procedural framework for generating items to measure specific constructs: (a)
specify the goals of measurement, (b) identify the features in the task domain, (c) develop a cogni-
tive model, (d) generate the items, (e) evaluate the model of generated tests, (f) bank the items by
cognitive complexity, and (g) validate by checking for nomothetic span. The methods in this study
focus on the first three stages of the CDS for reading comprehension items, emphasizing the devel-
opment and validation of a cognitive model, followed by suggestions for future research related to
item generation.
Reading Comprehension Questions
Text-based reading comprehension (RC) questions, consisting of short passages followed by
a related set of multiple-choice questions, are commonly included on tests of verbal reasoning. The
GRE, which is used as a partial criterion for admission to graduate school, requires students to
respond to these items as an indicator of general verbal reasoning. According to Educational Test-
ing Service’s (ETS; 1998a) supporting material for the GRE-V, the verbal ability measure is
designed to test one’s ability to reason with words in solving problems. Reasoning effectively in
a verbal medium depends primarily on ability to discern, comprehend, and analyze relationships
among words or groups of words and within larger units of discourse such as sentences and written
passages. Passages should either present an argument or a comparison between concepts, such as
theories, models, and approaches. Small parts of the passages can be descriptions, but comparisons
are needed to ask inference questions. The difficulty of the reading comprehension items is intended
J. S. GORIN and S. E. EMBRETSON
ITEMDIFFICULTYMODELINGOFPARAGRAPH 395
© 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
by on June 23, 2008 http://apm.sagepub.comDownloaded from
to be based in the passage complexity, not from the difficulty of the question itself. Complexity
should therefore arise from the argument and diction complexity (ETS, 1998b). The validity of
these assumptions regarding sources of item difficulty is tied intrinsically to the validity of scores-
based inferences.
Several studies have examined the relationship between cognitive models and processing
difficulty for reading comprehension test questions (Anderson, 1982; Embretson &Wetzel, 1987;
Freedle & Kostin, 1991, 1992, 1993; Mitchell, 1983; Sheehan & Ginther, 2000). The processing
components of prime interest vary across models and include propositional density (Embretson &
Wetzel, 1987; Kintsch & Van Dijk, 1978), context of to-be-learned information (Landauer, 1998;
Landauer & Dumais, 1997; Landauer, Foltz, & Laham, 1998), location of relevant information
(Freedle & Kostin, 1992; Sheehan & Ginther, 2000), and correspondence between passage and
question information (Alderson, 1990; Embretson &Wetzel, 1987; Freedle & Kostin, 1993; Shee-
han & Ginther, 2000). Although the cognitive models examined in previous studies differ from
one another, the methodological approaches are similar. Once the relevant strategies and knowl-
edges are integrated into a cohesive cognitive model, related features of existing items can be
quantified. The validity of the items and the models can then be interpreted in terms of the empiri-
cally established links between task features and processing components.
Embretson and Wetzel (1987) model. Embretson and Wetzel (1987) developed a cognitive pro-
cessing model of reading comprehension to describe the processing difficulty of the Armed Ser-
vices Vocational Aptitude Battery (ASVAB) items. To validate the model with these items, they
identified stimulus features that were theoretically related to processing components of the model
and then scored items in terms of the features. Figure 1 outlines the general order of processing,
as well as specific subcomponents of the processes. The model describes sources of cognitive
complexity derived from two general processes: text representation and response decision.
Text representation processes consist of the encoding and coherence of the passage for a set of
items. The difficulty of encoding is controlled by linguistic features of the passage, particularly
vocabulary difficulty (Drum, Calfee, & Cook, 1981; Graves, 1986). Passages with high levels of
vocabulary are more difficult to encode and consequently more difficult to retrieve when respond-
ing to comprehension questions. Coherence is the process of connecting wordmeanings and propo-
sitions into a meaningful representation of the text. Kintsch and Van Dijk (1978) described text
comprehension as an iterative process of construction and integration, wherein text is processed
as propositional units that are continuously integrated with prior knowledge. The construction inte-
gration theory (Kintsch, 1988, 1998), derived from earlier work by Kintsch andVan Dijk, describes
text comprehension as cyclic propositional processing. During each processing cycle, propositions
are retrieved from the text and arranged into a network. At the integration phase, activation spreads
throughout the network and accumulates primarily at points of high interconnectivity. Following
each of these cycles, the most highly activated propositions are carried over into the next cycle with
Text Representation Response Decision
Encoding:
construction
Coherence
Processes:
integration
Encoding &
Coherence
Processes
Text Mapping EvaluateTruth Status
Figure 1
Embretson andWetzel’s (1987) Reading Comprehension ProcessingModel
396
Volume 30 Number 5 September 2006
APPLIEDPSYCHOLOGICALMEASUREMENT
© 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
by on June 23, 2008 http://apm.sagepub.comDownloaded from
working memory for further processing (Kintsch & Van Dijk, 1978). The difficulty of coherence
processes is most strongly influenced by the propositional density of the text, which is the ratio of
the number of propositions to the total length of the passage. Several studies have concluded that
propositionally dense text is difficult to process and integrate for later recall and comprehension
(Kintsch, 1994; Kintsch & Keenan, 1973; Kintsch & Van Dijk, 1978). This finding may be related
to limitations in working memory capacity that preclude holding large amounts of information
simultaneously. If propositions are not well integrated into the working knowledge representation,
then the information may not be available for later recall (Kintsch, 1994; Kintsch & Van Dijk,
1978).
The remainder of the model describes three decision processes: encoding and coherence, text
mapping, and evaluating the truth status of the response alternatives. Encoding and coherence are
the same as in text representation, except that they apply to questions and response alternatives
rather than the passage. Text mapping is the process of relating the propositions in the question
and response alternatives to the information retrieved from the passage. Difficulty in text mapping
is partially influenced by the amount of information needed from the text to answer the question.
According to Embretson andWetzel (1987), as the amount of text relevant to answering a question
increases, so do the demands on memory, encoding, and item difficulty.
Finally, evaluating truth status involves a two-stage process of falsification and confirmation of
response alternatives. The decision processes of falsification and confirmation were the strongest
predictors of item difficulty in the Embretson and Wetzel (1987) study. These two decision pro-
cesses describe the extent to which information given in the passage could be used to make deci-
sions regarding the response options. Items with correct responses that were directly confirmable
or distractors that were explicitly contradicted by the text required little processing. Their findings
are consistent with other research suggesting that the overlap or matching between the text and
a question can affect response processes (Alderson, 1990; Freedle & Kostin, 1992, 1993).
Embretson andWetzel (1987) postulated that decision processes were also affected by vocabu-
lary difficulty of the response options. The vocabulary level of the response alternatives affected
the likelihood that an examinee would consider the alternative as a potential correct response. Dis-
tractors with difficult vocabulary were less likely to be processed for consideration as potential
alternatives and required less processing than low vocabulary distractors, as measured by
response time and item difficulty. The reverse effect was found for the vocabulary level of the cor-
rect response. Examinees were less likely to confirm a response alternative if the vocabulary level
was high.
In addition to vocabulary level, the phrasing of the information in the alternatives also affects
decision processes. The reasoning level of the response alternatives represents the relationship
between the structure of the propositions in the alternatives and those in the passage. Anderson
(1982) proposed a taxonomy describing the levels of transformation needed to match a question to
text. The lowest level is verbatim, in which the exact words used in the question are found in the
passage. These items are assumed to be easy because little to no transformation of information
must be conducted to identify the location of the item answer. The highest level question is trans-
formed paraphrase, in which neither the order nor the wording of information in the question
matches that of the passage text. Items with transformed paraphrase questions are assumed to be
hard because ideas in the passage must be reworded and reordered to map the question to the loca-
tion of the information needed to correctly answer it (Craik & Lockhart, 1972).
Sheehan and Ginther (2000) model. Similar research by Sheehan and Ginther (2000) examined
the relationships between task features and item difficulties for main idea RC questions from the
Test of English as a Foreign Language (TOEFL-2000). Sheehan and Ginther modeled item diffi-
culties in terms of activation processes by which an individual selects a response alternative. They
J. S. GORIN and S. E. EMBRETSON
ITEMDIFFICULTYMODELINGOFPARAGRAPH 397
© 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
by on June 23, 2008 http://apm.sagepub.comDownloaded from
described a memory-type model of processing, in which the examinee selects the response that is
most highly activated in the individuals’ minds. First, grossly incorrect distractors are eliminated
during early global falsification. The remaining model of item difficulty was defined by item and
passage features that define two intermediate structures: activation of the key and activation of the
remaining distractors. The activation of the response option is similar to activation of nodes in
memory theory; the element with the highest level of activation is most likely to be selected. In the
context of multiple-choice questions, the response alternative that is most highly activated is
selected as the correct answer. Questions with high key activation and low distractor activation
should be easy because the key is far more likely to be selected than any of the incorrect responses.
Sheehan and Ginther (2000) found three types of item and passage feature effects to be critical
for defining activation in main idea questions: location effects, correspondence effects, and elabo-
ration of information. Location effects refer to the location within the text of relevant information
for answering a particular question. Kintsch (1998) suggested that as comprehension proceeds
while reading a text, the location of information in mental representation, the representational
text, is related to the location of the information in the text itself. Therefore, information closely
positioned in the text is more easily found in and retrieved from memory because it is stored in rel-
atively close proximity. The location of relevant information within the text was found to be
related to comprehension item difficulty. Furthermore, Sheehan and Ginther (2000) found similar
results to those of Freedle and Kostin (1993) such that information found earlier in a passage was
more easily accessed than information found later in a passage. Sheehan and Ginther also found
that for different types of questions on a reading comprehension test (i.e., main idea), the expected
location of information differed. Response options for questions were more highly activated when
they were in the expected location of the information required to answer the question.
The second activation-related effect, correspondence effects, refers to the lexical and semantic
similarity between the response option and the text, or what Freedle and Kostin (1993) might call
本文档为【那篇研究ETS source finder的论文】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。