|
BN /
LangLanguage Testing ExamplesThis hypothetical language test was introduced in Invalid BibTex Entry! in order to illustrate how Bayesian networks could be used to untangle evidence coming from "integrated tasks"---tasks that tapped more than one language modality. PurposeLanguage generally consists of four modalities: Reading, Writing, Speaking and Listening. Reading and Listening can be measured in isolation. For example, multiple choice Reading questions can have stimulus material, instructions and options all presented as written text. Similarly, Listening items can be created using audio instructions, stimulus and options. Writing and Speaking are more difficult to test in isolation. At a minimum, the instructions need to be either in text or audio form. However, Integrated Tasks, which use more than one modality are highly valued in the language education community. After all conversation consists of alternatively using receptive and productive language skills. The problem with integrated tasks is that if the student does not respond correctly, the blame must be apportioned among the skills that were tested. For example, if after a reading a short passage, a student asks a question which is difficult to understand and may or may not be on topic, how do we know if the student is having difficulty with Reading or Speaking or both. In this simple example, the receptive modalities, Reading and Listening, are tested with pure single-proficiency tasks, and the productive modalities, Writing and Speaking are tested with Integrated Tasks. Proficiency ModelsThe proficiency model has four variables Reading, Listening, Writing and Speaking, all of which are modelled with three levels. This is a saturated model, with all four variables dependent on each other. The receptive skills, Reading and Listening are put earlier in the order because foreign language learners usually acquire those skills first. ![]() Task ModelsThere are four task models taken from Mislevy (1995):
As this was a conceptual paper, the test was never actually built. The simulations below assume that variants can be made of these tasks. The "fixed" simulations assume that the variants are identical except for Incidental? task model variables. The "random" simulations assume that the variants range in difficulty around the default parameters in the evidence model. The "high" ("low") variant assumes that the Radical? task model variables have been manipulated to make the task very difficult? (easy). Evidence ModelsThis section only gives an outline of the evidence models. For conditional probability tables, see the links below.
The picture below shows the proficiency model and all four evidence models (using the original Mislevy, 1995, model structures and parameterization). ![]() Assembly ModelThe original Mislevy (1995) paper did not provide a complete assessment, it only described four hypothetical tasks. As this is rather short, we created longer forms by treating the original tasks as task models and replicating them. We produced two forms.
The the replication was done by generating new links? with the same structure as the EvidenceModel?. The parameters for the links where determined by one of four algorithms:
In theory, one could create new simulated test designs by mixing and matching from the various variants. Data SetsThe complete model description for both the original and reparameterized models, and several randomly generated test forms, as well as random data sets generated from those test forms are available at: https://pluto.coe.fsu.edu/BNinEA/LanguageTesting/ ReferencesInvalid BibTex Entry! |