首页 那篇研究ETS source finder的论文

那篇研究ETS source finder的论文

举报
开通vip

那篇研究ETS source finder的论文 http://apm.sagepub.com Applied Psychological Measurement DOI: 10.1177/0146621606288554 2006; 30; 394 Applied Psychological Measurement Joanna S. Gorin and Susan E. Embretson Item Diffficulty Modeling of Paragraph Comprehension Items http://apm.sagepub...

那篇研究ETS source finder的论文
http://apm.sagepub.com Applied Psychological Measurement DOI: 10.1177/0146621606288554 2006; 30; 394 Applied Psychological Measurement Joanna S. Gorin and Susan E. Embretson Item Diffficulty Modeling of Paragraph Comprehension Items http://apm.sagepub.com/cgi/content/abstract/30/5/394 The online version of this article can be found at: Published by: http://www.sagepublications.com can be found at:Applied Psychological Measurement Additional services and information for http://apm.sagepub.com/cgi/alerts Email Alerts: http://apm.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.navReprints: http://www.sagepub.com/journalsPermissions.navPermissions: © 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on June 23, 2008 http://apm.sagepub.comDownloaded from Item Difficulty Modeling of Paragraph Comprehension Items Joanna S. Gorin, Arizona State University Susan E. Embretson, Georgia Institute of Technology Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more difficult to model than other item types due to the complexities of quantifying text. However, recent developments in artificial intelligence for text analysis permit quantitative indices to represent cognitive sources of difficulty. The current study attempts to identify generative components for the Graduate Record Examination paragraph comprehension items through the cognitive decomposition of item difficulty. Text comprehension and decision processes accounted for a significant amount of the variance in item difficulties. The decisionmodel variables contributed significantly to variance in item difficulties, whereas the text representation variables did not. Implications for score interpretation and future possibilities for item generation are discussed. Index terms: difficulty modeling, construct validity, comprehension tests, item generation For the past century, the use of standardized tests has been one of the most successful methods of evaluation and placement for large populations. The growing use of such tests has placed an enormous demand on the assessment community to develop large numbers of test questions of suf- ficient psychometric quality to sustain perpetual testing. One of the most recent advances in large- scale assessment is the promise of a new technology for test developers called item generation (Bejar, 1993; Embretson, 1998; Irvine & Kyllonen, 2002). The benefits of item generation for the assessment field are numerous, but a major challenge exists. Adequate knowledge of the response processes to permit accurate prediction of an item’s psychometric features from its generative components must be acquired (Bejar, 1993). In fact, this requires two types of knowledge: (a) knowledge of the relevant processes guiding the item solution and (b) knowledge of the manip- ulable task features corresponding to cognitive processing. To date, generative approaches have been applied to item types in several domains, including mental rotation (Bejar, 1993; Embretson, 1994; Embretson & Gorin, 2001), abstract reasoning (Embretson, 1999), and hidden figures (Bejar & Yocom, 1991), among others. These item types have been particularly suited to generative approaches given the ease with which relevant task features can be manipulated to control processing difficulty. For example, mental rotation item difficulty can be manipulated by specifying the length of a line or the width of an angle. The instantiation of the generative features is such that processing difficulty can be manipulated reli- ably by a simple algorithm for either a human item writer or a computerized item generation Applied Psychological Measurement,Vol. 30 No. 5, September 2006, 394–411 394 DOI: 10.1177/0146621606288554 � 2006 Sage Publications © 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on June 23, 2008 http://apm.sagepub.comDownloaded from program. Other item types, specifically those measuring verbal processes, present a unique chal- lenge to item generation. Although a large body of cognitive research on verbal reasoning exists, clear construct definitions and specific processing models for standardized test items can be far more complex than for nonverbal tasks. In addition, the typical tasks used to measure verbal rea- soning, such as reading comprehension questions, are difficult to quantify in terms of task fea- tures. Some advances in artificial intelligence (AI), such as automatic essay scorers like e-rater (Powers, Burstein, Chodorow, Fowles, & Kukich, 2000) or text parsers like SourceFinder (Pas- sonneau, Hemat, Plante, & Sheehan, 2002), have improved our capabilities of reliably measuring characteristics of texts. However, even with these advances, until the relevant generative features associated with processing are more clearly identified, the technology will need to wait for the the- ory to catch up. The current study seeks to gather evidence supporting a cognitive model of processing for read- ing comprehension items on the Graduate Record Examination (GRE)–Verbal for several pur- poses: (a) to provide a theoretical rationale for test score interpretation by explaining item difficulty through cognitive processes, (b) to gather evidence of the substantive aspect of the valid- ity of test score interpretation, and (c) to examine possible aspects of item design useful for future developments in item generation. This study approaches the examination of item processing with a systematic application of the first three steps of Embretson’s Cognitive Design System (CDS; Embretson, 1993, 1998). The conceptual framework of the CDS approach distinguishes between the two aspects of construct validity: construct representation and nomothetic span (Messick, 1995). Construct representation concerns the processes, strategies, and knowledge structures that are involved in item solving and allows cognitive theory to have a central role in test development and interpretation (Embretson & Gorin, 2001). Items can be created by combining processing components of items that link directly back to a comprehensive definition of the construct based in cognitive theory (Bejar, 1993; Embretson, 1994, 1999). The procedural framework of the CDS not only elaborates the stages involved in developing process models of item performance but also relates item processes to test validity. Embretson (1994) outlines a procedural framework for generating items to measure specific constructs: (a) specify the goals of measurement, (b) identify the features in the task domain, (c) develop a cogni- tive model, (d) generate the items, (e) evaluate the model of generated tests, (f) bank the items by cognitive complexity, and (g) validate by checking for nomothetic span. The methods in this study focus on the first three stages of the CDS for reading comprehension items, emphasizing the devel- opment and validation of a cognitive model, followed by suggestions for future research related to item generation. Reading Comprehension Questions Text-based reading comprehension (RC) questions, consisting of short passages followed by a related set of multiple-choice questions, are commonly included on tests of verbal reasoning. The GRE, which is used as a partial criterion for admission to graduate school, requires students to respond to these items as an indicator of general verbal reasoning. According to Educational Test- ing Service’s (ETS; 1998a) supporting material for the GRE-V, the verbal ability measure is designed to test one’s ability to reason with words in solving problems. Reasoning effectively in a verbal medium depends primarily on ability to discern, comprehend, and analyze relationships among words or groups of words and within larger units of discourse such as sentences and written passages. Passages should either present an argument or a comparison between concepts, such as theories, models, and approaches. Small parts of the passages can be descriptions, but comparisons are needed to ask inference questions. The difficulty of the reading comprehension items is intended J. S. GORIN and S. E. EMBRETSON ITEMDIFFICULTYMODELINGOFPARAGRAPH 395 © 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on June 23, 2008 http://apm.sagepub.comDownloaded from to be based in the passage complexity, not from the difficulty of the question itself. Complexity should therefore arise from the argument and diction complexity (ETS, 1998b). The validity of these assumptions regarding sources of item difficulty is tied intrinsically to the validity of scores- based inferences. Several studies have examined the relationship between cognitive models and processing difficulty for reading comprehension test questions (Anderson, 1982; Embretson &Wetzel, 1987; Freedle & Kostin, 1991, 1992, 1993; Mitchell, 1983; Sheehan & Ginther, 2000). The processing components of prime interest vary across models and include propositional density (Embretson & Wetzel, 1987; Kintsch & Van Dijk, 1978), context of to-be-learned information (Landauer, 1998; Landauer & Dumais, 1997; Landauer, Foltz, & Laham, 1998), location of relevant information (Freedle & Kostin, 1992; Sheehan & Ginther, 2000), and correspondence between passage and question information (Alderson, 1990; Embretson &Wetzel, 1987; Freedle & Kostin, 1993; Shee- han & Ginther, 2000). Although the cognitive models examined in previous studies differ from one another, the methodological approaches are similar. Once the relevant strategies and knowl- edges are integrated into a cohesive cognitive model, related features of existing items can be quantified. The validity of the items and the models can then be interpreted in terms of the empiri- cally established links between task features and processing components. Embretson and Wetzel (1987) model. Embretson and Wetzel (1987) developed a cognitive pro- cessing model of reading comprehension to describe the processing difficulty of the Armed Ser- vices Vocational Aptitude Battery (ASVAB) items. To validate the model with these items, they identified stimulus features that were theoretically related to processing components of the model and then scored items in terms of the features. Figure 1 outlines the general order of processing, as well as specific subcomponents of the processes. The model describes sources of cognitive complexity derived from two general processes: text representation and response decision. Text representation processes consist of the encoding and coherence of the passage for a set of items. The difficulty of encoding is controlled by linguistic features of the passage, particularly vocabulary difficulty (Drum, Calfee, & Cook, 1981; Graves, 1986). Passages with high levels of vocabulary are more difficult to encode and consequently more difficult to retrieve when respond- ing to comprehension questions. Coherence is the process of connecting wordmeanings and propo- sitions into a meaningful representation of the text. Kintsch and Van Dijk (1978) described text comprehension as an iterative process of construction and integration, wherein text is processed as propositional units that are continuously integrated with prior knowledge. The construction inte- gration theory (Kintsch, 1988, 1998), derived from earlier work by Kintsch andVan Dijk, describes text comprehension as cyclic propositional processing. During each processing cycle, propositions are retrieved from the text and arranged into a network. At the integration phase, activation spreads throughout the network and accumulates primarily at points of high interconnectivity. Following each of these cycles, the most highly activated propositions are carried over into the next cycle with Text Representation Response Decision Encoding: construction Coherence Processes: integration Encoding & Coherence Processes Text Mapping EvaluateTruth Status Figure 1 Embretson andWetzel’s (1987) Reading Comprehension ProcessingModel 396 Volume 30 Number 5 September 2006 APPLIEDPSYCHOLOGICALMEASUREMENT © 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on June 23, 2008 http://apm.sagepub.comDownloaded from working memory for further processing (Kintsch & Van Dijk, 1978). The difficulty of coherence processes is most strongly influenced by the propositional density of the text, which is the ratio of the number of propositions to the total length of the passage. Several studies have concluded that propositionally dense text is difficult to process and integrate for later recall and comprehension (Kintsch, 1994; Kintsch & Keenan, 1973; Kintsch & Van Dijk, 1978). This finding may be related to limitations in working memory capacity that preclude holding large amounts of information simultaneously. If propositions are not well integrated into the working knowledge representation, then the information may not be available for later recall (Kintsch, 1994; Kintsch & Van Dijk, 1978). The remainder of the model describes three decision processes: encoding and coherence, text mapping, and evaluating the truth status of the response alternatives. Encoding and coherence are the same as in text representation, except that they apply to questions and response alternatives rather than the passage. Text mapping is the process of relating the propositions in the question and response alternatives to the information retrieved from the passage. Difficulty in text mapping is partially influenced by the amount of information needed from the text to answer the question. According to Embretson andWetzel (1987), as the amount of text relevant to answering a question increases, so do the demands on memory, encoding, and item difficulty. Finally, evaluating truth status involves a two-stage process of falsification and confirmation of response alternatives. The decision processes of falsification and confirmation were the strongest predictors of item difficulty in the Embretson and Wetzel (1987) study. These two decision pro- cesses describe the extent to which information given in the passage could be used to make deci- sions regarding the response options. Items with correct responses that were directly confirmable or distractors that were explicitly contradicted by the text required little processing. Their findings are consistent with other research suggesting that the overlap or matching between the text and a question can affect response processes (Alderson, 1990; Freedle & Kostin, 1992, 1993). Embretson andWetzel (1987) postulated that decision processes were also affected by vocabu- lary difficulty of the response options. The vocabulary level of the response alternatives affected the likelihood that an examinee would consider the alternative as a potential correct response. Dis- tractors with difficult vocabulary were less likely to be processed for consideration as potential alternatives and required less processing than low vocabulary distractors, as measured by response time and item difficulty. The reverse effect was found for the vocabulary level of the cor- rect response. Examinees were less likely to confirm a response alternative if the vocabulary level was high. In addition to vocabulary level, the phrasing of the information in the alternatives also affects decision processes. The reasoning level of the response alternatives represents the relationship between the structure of the propositions in the alternatives and those in the passage. Anderson (1982) proposed a taxonomy describing the levels of transformation needed to match a question to text. The lowest level is verbatim, in which the exact words used in the question are found in the passage. These items are assumed to be easy because little to no transformation of information must be conducted to identify the location of the item answer. The highest level question is trans- formed paraphrase, in which neither the order nor the wording of information in the question matches that of the passage text. Items with transformed paraphrase questions are assumed to be hard because ideas in the passage must be reworded and reordered to map the question to the loca- tion of the information needed to correctly answer it (Craik & Lockhart, 1972). Sheehan and Ginther (2000) model. Similar research by Sheehan and Ginther (2000) examined the relationships between task features and item difficulties for main idea RC questions from the Test of English as a Foreign Language (TOEFL-2000). Sheehan and Ginther modeled item diffi- culties in terms of activation processes by which an individual selects a response alternative. They J. S. GORIN and S. E. EMBRETSON ITEMDIFFICULTYMODELINGOFPARAGRAPH 397 © 2006 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on June 23, 2008 http://apm.sagepub.comDownloaded from described a memory-type model of processing, in which the examinee selects the response that is most highly activated in the individuals’ minds. First, grossly incorrect distractors are eliminated during early global falsification. The remaining model of item difficulty was defined by item and passage features that define two intermediate structures: activation of the key and activation of the remaining distractors. The activation of the response option is similar to activation of nodes in memory theory; the element with the highest level of activation is most likely to be selected. In the context of multiple-choice questions, the response alternative that is most highly activated is selected as the correct answer. Questions with high key activation and low distractor activation should be easy because the key is far more likely to be selected than any of the incorrect responses. Sheehan and Ginther (2000) found three types of item and passage feature effects to be critical for defining activation in main idea questions: location effects, correspondence effects, and elabo- ration of information. Location effects refer to the location within the text of relevant information for answering a particular question. Kintsch (1998) suggested that as comprehension proceeds while reading a text, the location of information in mental representation, the representational text, is related to the location of the information in the text itself. Therefore, information closely positioned in the text is more easily found in and retrieved from memory because it is stored in rel- atively close proximity. The location of relevant information within the text was found to be related to comprehension item difficulty. Furthermore, Sheehan and Ginther (2000) found similar results to those of Freedle and Kostin (1993) such that information found earlier in a passage was more easily accessed than information found later in a passage. Sheehan and Ginther also found that for different types of questions on a reading comprehension test (i.e., main idea), the expected location of information differed. Response options for questions were more highly activated when they were in the expected location of the information required to answer the question. The second activation-related effect, correspondence effects, refers to the lexical and semantic similarity between the response option and the text, or what Freedle and Kostin (1993) might call
本文档为【那篇研究ETS source finder的论文】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_265684
暂无简介~
格式:pdf
大小:163KB
软件:PDF阅读器
页数:19
分类:英语六级
上传时间:2012-02-18
浏览量:14