WO2001004862A2 - Method for automatically producing a computerized adaptive testing questionnaire - Google Patents

Method for automatically producing a computerized adaptive testing questionnaire Download PDF

Info

Publication number
WO2001004862A2
WO2001004862A2 PCT/US2000/019002 US0019002W WO0104862A2 WO 2001004862 A2 WO2001004862 A2 WO 2001004862A2 US 0019002 W US0019002 W US 0019002W WO 0104862 A2 WO0104862 A2 WO 0104862A2
Authority
WO
WIPO (PCT)
Prior art keywords
questions
question
irt
irt model
statistical modeling
Prior art date
Application number
PCT/US2000/019002
Other languages
French (fr)
Other versions
WO2001004862A3 (en
Inventor
Thierry Levy
Original Assignee
Quiz Studio, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quiz Studio, Inc. filed Critical Quiz Studio, Inc.
Priority to AU60913/00A priority Critical patent/AU6091300A/en
Publication of WO2001004862A2 publication Critical patent/WO2001004862A2/en
Publication of WO2001004862A3 publication Critical patent/WO2001004862A3/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student

Definitions

  • This invention relates to the field of computerized, interactive skills- assessment and to statistical validation of adaptive questionnaires in particular.
  • Computerized Adaptive Testing refers to skill assessments that receive feedback from a test-taker and dynamically adapt the skill level, or difficulty, of subsequent questions put to the test-taker.
  • CAT Computerized Adaptive Testing
  • a CAT system is able to calculate an approximation of the skill level of the test-taker in order to next ask the most relevant question available in a set of questions.
  • CAT is a fast and accurate way of determining the proficiency of people in a given field, and yet it has been out of reach for almost all tests because of the difficulty and expenses of transforming a regular questionnaire into a CAT questionnaire.
  • a CAT system is based on Item Response Theory (IRT).
  • IRT Item Response Theory
  • the current state of the art is a three- parameter model, where the three parameters are indicative of: a difficulty level of the question; a discrimination of the question; and guessing.
  • a human specialist is required to begin with a set of questions and produce from it a CAT questionnaire.
  • the human specialist referred to as a psychometrician
  • the psychometrician then gathers data by administering the questionnaire on a sample population and statistically analyzing the results using an IRT to produce a CAT questionnaire.
  • producing the CAT questionnaire requires a long time and is expensive, due to the human involvement of psychometricians, statisticians, etc.
  • What is needed instead is a method for automatically producing a CAT questionnaire from a set of questions with reduced involvement of human specialists.
  • This invention is a method for generating a statistically validated CAT questionnaire on a computer.
  • An object of the invention is to enable a user to transform a set of questions into a CAT questionnaire.
  • Another object of the invention is to enable a user with no special training to transform a set of questions into a CAT questionnaire.
  • FIG. 1 A illustrates the steps for transforming a regular questionnaire into a CAT questionnaire according to an embodiment.
  • FIG. IB shows a process flow for transforming statistically non- calibrated questions into statistically calibrated questions according to an embodiment.
  • FIG. 2 illustrates calibration and statistical analysis steps according to an embodiment.
  • FIG. 3 is a flow chart summarizing a system embodiment for analyzing the global results of the calibration and statistical analysis for each question in a questionnaire.
  • FIG. 4 is a flow chart summarizing a system embodiment for diagnosing the results of the calibration and statistical analysis for each question in a questionnaire.
  • FIG. 1 A illustrates the overall process flow in an embodiment where the method is used to build a skill assessment questionnaire.
  • the present invention is not limited to this application and may be applied to other applications. Alternate embodiments may be based on the same overall structure using different algorithms, formulas and/or different decision trees to produce similar types of results. Embodiments extend over fields where IRT may apply.
  • an alternate embodiment is a system to automatically build fast and efficient market survey systems (computerized adaptive surveys) where IRT is relevant.
  • Preferred embodiments have a 3-parameter model; however, other embodiments have a 2-parameter model or a 1 -parameter model.
  • a formulary for an exemplary IRT model is given below at the end of the description.
  • a user authors a set of questions.
  • the only limitation placed on the type of questions is that the answer to a particular question must be either correct or incorrect. That is, there are no partially correct answers. Partially correct answers have to be considered as incorrect answers.
  • the user builds a questionnaire. This comprises sub-steps (not shown) of gathering questions, some of which may already have been calibrated (statistically validated) inside another questionnaire.
  • the questions must pertain to the same subject matter. If some of the questions have been previously calibrated, the calibration is most preferably to have been with a relevant sample population. That is, the same type of population as the one that will take the final CAT questionnaire produced by the method of this invention.
  • an indication is supplied of the number of sample candidates needed to calibrate the questions that are, as yet, not calibrated.
  • This indicative figure may be based on empirical laws that take into account the number of non-calibrated questions and the number of calibrated questions as well as the previous results of the calibration. As described, the number of sample candidates given at this step is only an indication. If the calibration does not give good results, the program produces a new estimation of the number of candidates needed.
  • the questionnaire is posted on the Internet or on an Intranet in order to gather data for the calibration and analysis. While such computer networks are preferred, other embodiments have an examinee take the questionnaire without using a computer network.
  • sample candidates take the test, or questionnaire, as a regular sequential test.
  • the test may be split into several parts with test candidates taking only a portion of the test.
  • results of preceding statistical analyses and calibration are reviewed to determine which questions are not suitable. If particular questions seem unsuitable, possible reasons are determined. Optionally, a report containing the results of this post-calibration analysis may be generated.
  • a generated report is given to the user.
  • the report contains two types of information. First, a report concerning the overall calibration and then several sub-reports concerning questions that showed a potential problem. Here, a user sees a report and decides on further action. Typical "Global" actions are: having more sample candidates take the test; automatically removing the bad questions and creating a questionnaire with the remainder; and reviewing the per question reports. Reviewing the question may still imply making a later choice between the two first global actions, above. The last global action, above, may not be proposed to the user if results show that the calibration globally failed. The user may be provided additional advice for that choice.
  • an evaluation for global failure of the calibration is made.
  • a per-question review shows the problem that appeared for some of the questions, the possible causes for the problem and the recommended actions in each case.
  • Possible "per-question" actions include: removing the question; modifying the key (the correct answer to the question); and modifying the question (rephrasing the question for instance). If a user chooses to modify some of the questions or added new questions (see block 109).
  • the questionnaire may be recalibrated with new sample candidates. If the user modifies only keys (correct answers) of some questions (see block
  • the calibration may be restarted and statistical analysis continued with the same data.
  • Block 112 is reached only when no question was modified. Possibly some questions were removed (those that showed a problem).
  • the questionnaire is a CAT test or a statistically validated questionnaire. It may now be used in a CAT test-taking system, or in a sequential test-taking system or in other types of systems.
  • the IRT parameters of each question as calculated at block 105, may be imported into a CAT test-taking system.
  • FIG. IB illustrates how questions are managed according to an embodiment.
  • an author creates questions at blocks 130, 135 and 140.
  • the genesis of the questions may be direct, indirect, or by modification.
  • Logically, questions have two states: calibrated and non-calibrated. In practice, this may be a Boolean flag stored with the question in a database. The only way a question can change from a non-calibrated to calibrated state is through statistical analysis. Questions are created in a non-calibrated state, as illustrated by block 145. Questions are then calibrated by a statistical analysis at block
  • a question and calculated IRT parameters may be used in a CAT questionnaire concerning the same field and designed for the same type of population as used for the calibration in block 150. If calibration at block 150 reveals problems with the questions at block 170, the question may be removed entirely, block
  • FIG. 2 is a description of sub-steps included in block 105 (see FIG. 1A) according to an embodiment. The formulae and algorithm involved are detailed below.
  • Initializations including initialization of the discrete distribution of the levels (i.e. the proficiency variable), are performed at block 200.
  • all of the data are imported from a database.
  • initial values of 3 IRT parameters are calculated using standard statistics, as described in detail below.
  • a process loop including blocks 203, 204 and 205 obtains precise values for the IRT parameters of the questions by iteration by calculating a Bayes modal estimate (or maximum marginal likelihood estimate) of the three IRT parameters "a,” “b” and “c.”
  • a Bayes modal estimate or maximum marginal likelihood estimate
  • an initial estimate of the IRT parameters "a,” “b” and “c” for a question is accomplished using a standard statistical analysis of answers given by candidates to this question at block 202, FIG. 2. For example, consider a particular question posed to candidates. Let “p” be the proportion of candidates who answered this question correctly. Let “r” be the bi-serial correlation between the total score of candidates and the fact that they answered correctly the particular question (the score is simply computed by calculating the proportion of question answered correctly).
  • N is the number of candidates that answered the question
  • s n is the score of the nth candidate
  • x n is 1 if candidate number n answered correctly the question, 0 if not.
  • the IRT "c” parameter is defined as the reciprocal of the number of possible answers for this question.
  • the IRT difficulty parameter "d" is calculated to take the guessing into account.
  • the general algorithm used for the Bayes modal estimate in blocks 203-205 is an EM (Estimation - Maximization) algorithm.
  • the first part of this algorithm is the estimation step at block 203.
  • an estimation of the number of candidates in each level is calculated, as well as the proportion of candidates in each level that answered correctly each question.
  • the second part of the EM algorithm is the maximization step at block 204.
  • the parameters "a,” "b” and "c" are calculated to maximize a complete likelihood function.
  • Sub-steps in blocks 203-204 are detailed below.
  • Block 205 is a condition to end the EM algorithm. If the changes in the parameters calculated at block 204 are less than a test value, or if the maximum number of loops for the algorithm was reached, the program exits the EM algorithm and proceeds to block 206. Further detail regarding block 205 is set forth below after an explanation of the proceeding blocks. At block 206, a very accurate estimation of the level of the sample candidates is calculated. At block 206, the IRT model and information function as in a CAT test-taking system.
  • standardized residuals are calculated to determine if the question truly fits the model.
  • an answer/level correlation is calculated for each proposed answer of each question.
  • the answer/level correlation is the bi-serial correlation between the estimated proficiency of the candidate and the fact that he gave a particular answer to a given question or not.
  • An endorsement rate calculated at block 209 is the proportion of candidates that gave a particular answer to a given question.
  • an endorsement rate is calculated for each proposed answer to each question.
  • the first part of an EM algorithm is an estimation step at block 203.
  • an estimation of the number of candidates in each level is calculated, as well as the proportion of candidates in each level that answered correctly each question.
  • the following formulas and algorithms are those applied during sub-steps at block 203 of FIG. 2.
  • the goal of the estimation step is to find an estimation of the number of candidates for each q k level and the number of candidates who have a q level and who answered question j correctly.
  • the estimation step includes calculating those values using the following formulae:
  • nl %) y ⁇ u ⁇ v, (20)
  • the second part of the EM algorithm is the maximization step at block 204 of FIG. 2.
  • the IRT parameters "a,” "b” and “c” are calculated to maximize a complete likelihood function.
  • the following formulae and algorithms are those applied during the step noted 204 in FIG. 2.
  • a new estimation of the three IRT parameters for each question is calculated by numerically solving a set of equations.
  • the Baysian estimation includes solving a set of equations to find the maximum of a likelihood function (denoted L). For this, the point where the derivative of L is null is determined.
  • the set of equations can be split in J simple sets of three equations, each set corresponding to one question.
  • g a is the prior distribution of the a parameter
  • g b is the prior distribution of the b parameter
  • g c j is the prior distribution of the Cj parameter.
  • the three formulae above represent the derivative of the likelihood with respect to aj, bj and Cj, which are null at the point that is the maximum for the complete likelihood.
  • laj k (s) , lbjk (s) , lcj (s) are functions of aj, bj and Cj.
  • La j , Lb j , LCJ are functions of aj, bj and Cj.
  • laa jk (s) , lbbj k (s) , lcc jk (s) , lab jk (s) , lac jk (s) , lbc jk (s) are functions of aj, bj and c,.
  • Laa s) , Lbb j (s) , Lcc S) , Lab, 00 , Lac, (s) , Lbc s) are functions of a,, b j and C j .
  • intermediate values are calculated and replace the expressions in the above formulas for efficiency. More precisely, the intermediate values are:
  • the gradient method includes modifying the parameters using a fraction of the gradient as increment. This method is relatively slow but is the most stable to find the maximum. Precisely the formulas are:
  • Kl may be either 0.0005 or 0.00025.
  • the parameters are modified using a fraction of the increment used for a Newton-Ralphson method.
  • This method is more stable than the normal Newton-Ralphson and less stable than the gradient method, but it is faster than the gradient method and slower than the normal
  • K 2 is a real number. In a preferred embodiment, K 2 is 0.1.
  • the Newton-Raphson method included solving the equation with an order one approximation of an (La, Lb, Lc) vector. That is, an order two approximation of L. This method is the less stable of the three but fastest.
  • the iterative process is done in two imbricated loops.
  • the outer loop will be called the "trial” loop
  • the inner loop will be called the "phase” loop.
  • Laj (s) , Lb ) , LCJ (S) , Laa S) , Lbb j (s) , Lccj (s) , Lab j (s) , Lac j (s) , Lbc s) are calculated.
  • the Hessian determinant which is the determinant of the A matrix defined above is calculated:
  • this inner loop is executed at most a thousand times for each trial. If, at the end of the thousand loops, there is no convergence, another trial is commenced.
  • the first trial is a "normal" trial.
  • the initial values taken for a j (t) , b j (t) and c 0 are A 0 , B 0 and C 0 .
  • the value for Ki is 0.0005.
  • the initial values taken for a j (t) , b l) and C j (t) are:
  • ⁇ ' P a + 1 ⁇ 5 ⁇ a random b, (,) - ⁇ h + 1.5 ⁇ h random 2 (56) random,
  • random and random are three uncorrelated random values between 0 and 1.
  • the value for Kj is 0.00025.
  • the initial values taken for aj (t) , bj (t) and c l) are:
  • randomi, random 2 and random 3 are three uncorrelated random values between 0 and 1.
  • the value for Ki is 0.00025.
  • the inner loop is called "phase" loop, because there can be different phases in the process that use a different method to estimate the parameters.
  • phase there are 3 different phases: a first phase using the gradient method to calculate the parameters; a second phase using the modified Newton-Ralphson method; and a third phase using the Newton-Ralphson method.
  • the program can switch several times to a same phase.
  • a trial starts with the first phase.
  • the start values A 0 , Bo and Co are saved in the variables A], Bi and C ⁇ .
  • a switch to the second phase and then to the third phase should occur to find accurate results more quickly.
  • there is a complete branching system to detect if one phase is converging enough or diverging to switch from one phase to the other.
  • a typical procedure for an embodiment follows. Once La j (s) , Lb s) , Lc S) , Laa, (s) , Lbb s) , Lc Cj (s) , Lab, (s) , Lac s) , Lbc s) , G s) and H, (s) are calculated, the formula to calculate a, b and c that corresponds to the current phase are applied. Then, tests are made to detect if a change of phase should occur. If the tests show that the current phase is diverging, a switch is made to a previous phase and the current values of a, b, and c are replaced by the corresponding saved values Ai, Bi, Ci.
  • the criteria for determining when a switch from one phase to the other should occur is also changed. If the test shows that the current phase converged, a switch is made to the next step and the current values of "a,” "b” and "c" are saved in Ai, Bj, C ⁇ . In the case when the determinant is too small to apply the Newton-Ralphson method safely (normal or modified), a switch is made directly to the first phase, even before doing the calculations corresponding to the current phase.
  • limit is a variable which is compared to the square norm of the gradient as a criterion to switch from one phase to the other.
  • the initial value of "Limit" is
  • Count is the number of loops spent in the same phase.
  • a stands for a s) , b for b s) , c for c s) , H for H, (s) and N for TABLE 1
  • an E step is performed, and then an M step.
  • all a, b and c parameters are compared which the value they used to have in the previous step.
  • the maximum of the absolute values of these differences is termed the maximum change in the parameters. In a preferred embodiment, if this maximum change is less than 0.05, the calibration is terminated because an adequate estimation of the parameters is complete. Whatever the changes in the parameters, however, after 12 loops the EM calibration is terminated in a preferred embodiments because continuing further will not bring additional precision.
  • an estimate of the level of the test candidates is determined.
  • the following formulas and algorithms are applied to arrive at the determination.
  • preferred embodiments first calculate intermediate values:
  • the information function is a function of a continuous variable and can be defined as follows:
  • the ⁇ variable is continuous and we use a Newton-Ralphson method to find the maximum of the information function which is the level of the candidate.
  • the first and second differential of the information function with respect to ⁇ is calculated. Since only the differentials of the information function are needed, the denominator of the information function, which is a constant, is unneeded. Therefore, define Ij( ⁇ ) as the modified information function (the logarithm of the real information function without the denominator). Let Itj( ⁇ ) be the first differential and Itti( ⁇ ) be the second differential of r,( ⁇ ).
  • a first case is when yy is 1.Here, only calculate itrj (s) and ittr. (s)
  • the first case is when yy is 1. Here, only calculate itr, (s) and ittrj (s) .
  • the second case is when yy is 0.
  • yy is 0.
  • the first case is when yi j is 1. Here, only calculate itr ⁇ (s) and itix
  • the second case is when yij is 0.
  • yij is 0.
  • a solution algorithm for an embodiment follows. For each question, an iterative process is used to calculate the estimation of the level of the candidate. This calculation is based on a Newton-Ralphson method. Let (s) be the index of the current step. At each step, we calculate the values of the derivative and the second derivative of I,( ⁇ ) called respectively Itj( ⁇ ) and Ittj( ⁇ ) at the point ⁇ , (s) using the formulas of the previous section. Then the value of ⁇ is updated applying the formula:
  • the criterion for ending this iterative process is the absolute value of It i( ⁇ (s) ). If this value is less than 10 "7 , the ⁇ j (s) sequence has converged and the last ⁇ (s) is taken as the value of ⁇ j. If after 20 steps, the absolute value of Itj( ⁇ j (s) ) is still greater than 10 "7 the ⁇ j (s) sequence is considered to have not converged and ⁇ i (0) is taken for the value of ⁇ j.
  • a residual is calculated.
  • the following formulae and algorithms are applied.
  • a solution algorithm of an embodiment first calculates the intermediate values S k . Then, equation (84) is applied by calculating Pj(qk) for each value of k.
  • a bi-serial correlation is determined.
  • the answer/level bi-serial correlation calculation is an extra statistical analysis used to determine how well a particular answer is correlated to the level of the candidate. Normally, the right answer's correlation should be the greatest and positive. Ideally, the other answers' correlations should all be negative. Those values are used to detect irrelevant questions or keying errors (when the right answer was not set correctly). In an embodiment, the following formulae and algorithms are applied.
  • n j be the number of classes of answers for the jth question.
  • zy n may be defined which equals 1 if candidate i gave an answer of the nth class for the question number j, and 0 otherwise.
  • the item/level correlation for question j and answer n is defined by:
  • a solution algorithm calculates the average value of the level ⁇ while calculating the level of each sample candidate, z . As well, the endorsement rate is calculated while building the classes of given answers. Then, equation (85) is applied for each class of answers for each question.
  • FIG. 3 is the schematic process of the global analysis of the calibration results and statistical analysis according to an embodiment.
  • a preferred embodiment includes a series of conditional branches, corresponding to block 106 of FIG. 1A. Messages and recommended actions contained FIG. 3 are detailed below for an embodiment, as are values for the tests at blocks 301, 303 and 304.
  • block 301 checks if any estimated "a,” "b” or "c" parameter of any question is Not a Number (noted NaN). This occurs when a non- allowable numerical calculation occurs during the calculations (such as division of zeros or infinite values).
  • a condition for "too many questions showing a problem" for an embodiment is: the proportion of questions showing a problem is greater than 0.2. What is termed a "question showing a problem” is a question for which a report was generated, as described below.
  • questions showing no problem, as defined above are classified in five groups according to difficulty levels. The intervals for the difficulty levels are, for an embodiment: [-3; -0.8416[ first interval: low level
  • FIG. 4 is the schematic process of the per-question analysis of the calibration results and statistical analysis. It includes a series of conditional branches. This process is performed for each question and corresponds to the step 302 of FIG. 3. The messages and recommended actions of this figure are detailed below for an embodiment as are values for the tests at blocks 401 to
  • Test 401 means that for question number j, the answer that has the highest answer/level correlation
  • Test 402 means that for question number j, the highest answer/level correlation (C jn ) is negative. Remark that under "normal" circumstances this should never occur.
  • Test 403 means that for question j, C j > 0.4.
  • Test 404 means that for question j, a, ⁇ 0.51.
  • Test 405 means that for question j, bj ⁇ -3.
  • Test 406 means that for question j, b j > 3.
  • Test 407 means that for question number j, the answer/level correlation (c jn ) of the right answer is less than 2 times the second highest answer/level correlation.
  • Test 411 means that the answer that has the highest answer/level correlation (c jn ) is a partially correct answer (this can occur with multiple response questions for instance. Remark that for the calibration, these answers were considered incorrect).
  • Test 412 means that the answer that has the second highest answer/level correlation (CJ ⁇ ) is a partially correct answer (this can occur with multiple response questions, for instance. Remark that for the calibration, these answers were considered incorrect).
  • Tests 408 and 413 to 421 means that for question j, ⁇ > 2.
  • preferred embodiments include a three-parameter IRT model.
  • the three parameters are: "a” referred to as the discrimination of the question; "b” referred to as the level of the question; and "c” referred to as the pseudo-guessing of the question.
  • is the level of the candidate
  • j is the index of the question and ranges between 1 and J, the number of questions.
  • N is the number of sample candidates that took the test.
  • J is the number of questions.
  • the observed data are the responses of the N candidates to the J questions which is contained in a N by J matrix, called Y.
  • y- ⁇ is 1 if candidate i answered correctly question j 0 if candidate i answered incorrectly question j.
  • each examinee has a level ⁇ , which is a missing data, ⁇ is referred to the latent variable.
  • the latent variable is considered to be a discrete variable that can take K known discrete value q k , k ranging form 1 to K and q evenly distributed between -Max and +Max. Therefore, each ⁇ j can take any of the q k values.
  • ⁇ k is the probability that a candidate has q k as his level
  • ( ⁇ i, ⁇ 2 , ..., 7i ⁇ ) is the distribution of the levels.
  • is taken as a normal Gaussian (with 0 as mean and 1 as standard deviation).
  • G( ⁇ ) The distribution of the levels will sometimes be considered as a continuous variable, in that case, it is called G( ⁇ ).
  • This function is a Gaussian with 0 as mean and 1 as standard deviation.
  • Baysian estimation of IRT parameters requires a prior distribution for each variable of the IRT model.
  • a preferred embodiment uses the same distribution for all the a parameters of all the questions, the same distribution for all the b parameters, but a different distribution for each C j parameter.
  • Preferred embodiment use a lognormal distribution for the IRT "a" parameter:
  • ⁇ a is referred to as the mean for this distribution and ⁇ a the standard deviation, then ⁇ ' a and ⁇ ' a are defined by
  • ⁇ a is 1.28 and ⁇ a is 0.2.
  • ⁇ b is the mean for this distribution, and ⁇ b the standard deviation.
  • ⁇ b is 0 and ⁇ b is 2.
  • M Cj is the number of possible answers for question number j
  • K c a constant (which is the same for all the questions).
  • K c 0.25.
  • Candidates with high proficiency tend to give an answer that is not the one entered as a correct answer.
  • the answer given by candidates with high proficiency is ⁇ data>.
  • the calibration shows a high guessing parameter for this question which indicates that there is a too high probability ( ⁇ data>) to guess the right answer.
  • Recommended Action 8 Modify this question in such a way that there are more plausible alternatives in the answers or remove it.
  • the calibration shows a very high difficulty level for this question (the proportion of the candidates that answered correctly is only ⁇ data>).
  • Recommended Action 14 Remove this question or, if possible, have candidates with more extreme levels (both high and low level to prevent biases) take the test to recalibrate this question.
  • Message 15 The calibration results are not relevant for this question. This can be due to the following problem: candidates with high proficiency relatively often give an answer that is not the one entered as a correct answer. The second most given answer is ⁇ data>.

Abstract

This invention is a method for generating a statistically validated CAT questionnaire on a computer. According to the invention, questions are calibrated with statistical modeling and a user is supplied with information indicative of the appropriateness of the question to a CAT questionnaire. Based on the information provided by the statistical modeling, the user may amend the set of questions, or corresponding answers, or a sample population used in the statistical modeling responsive to expert recommendations provided by the method. In an iterative manner, then, a user with no specialized training may arrive at a statistically validated CAT questionnaire.

Description

METHOD FOR AUTOMATICALLY PRODUCING A COMPUTERIZED ADAPTIVE TESTING QUESTIONNAIRE
BACKGROUND OF THE INVENTION Field of the Invention
This invention relates to the field of computerized, interactive skills- assessment and to statistical validation of adaptive questionnaires in particular.
Description of Related Art
Computerized Adaptive Testing (CAT) refers to skill assessments that receive feedback from a test-taker and dynamically adapt the skill level, or difficulty, of subsequent questions put to the test-taker. Thus, for every question that the test-taker answers, a CAT system is able to calculate an approximation of the skill level of the test-taker in order to next ask the most relevant question available in a set of questions. CAT is a fast and accurate way of determining the proficiency of people in a given field, and yet it has been out of reach for almost all tests because of the difficulty and expenses of transforming a regular questionnaire into a CAT questionnaire.
Typically, a CAT system is based on Item Response Theory (IRT). According to IRT, the probability of a test-taker of a certain proficiency to answer a particular question correctly is a mathematical function that can be described with a few parameters. The current state of the art is a three- parameter model, where the three parameters are indicative of: a difficulty level of the question; a discrimination of the question; and guessing. In order to implement a CAT test, then, some of the three IRT parameters for each question are needed. In related art, a human specialist is required to begin with a set of questions and produce from it a CAT questionnaire. The human specialist, referred to as a psychometrician, reviews a set of questions for bias, irrelevancy, ambiguity and other factors and assists the questionnaire's author in an iterative improvement of the questionnaire. The psychometrician then gathers data by administering the questionnaire on a sample population and statistically analyzing the results using an IRT to produce a CAT questionnaire. In these related methods, producing the CAT questionnaire requires a long time and is expensive, due to the human involvement of psychometricians, statisticians, etc.
What is needed instead is a method for automatically producing a CAT questionnaire from a set of questions with reduced involvement of human specialists.
SUMMARY OF INVENTION
This invention is a method for generating a statistically validated CAT questionnaire on a computer. An object of the invention is to enable a user to transform a set of questions into a CAT questionnaire. Another object of the invention is to enable a user with no special training to transform a set of questions into a CAT questionnaire.
These and other objects of the invention are achieved in an embodiment that calibrates questions in a set of questions with statistical modeling and supplies a user with information indicative of the appropriateness of the question to a CAT questionnaire. Based on the information provided by the statistical modeling, the user may amend the set of questions, or corresponding answers, or a sample population used in the statistical modeling. In an iterative manner, the user arrives at a statistically validated CAT questionnaire. A preferred embodiment guides the user with expert recommendations regarding improvements to the questionnaire and operates over a computer network.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 A illustrates the steps for transforming a regular questionnaire into a CAT questionnaire according to an embodiment.
FIG. IB shows a process flow for transforming statistically non- calibrated questions into statistically calibrated questions according to an embodiment.
FIG. 2 illustrates calibration and statistical analysis steps according to an embodiment.
FIG. 3 is a flow chart summarizing a system embodiment for analyzing the global results of the calibration and statistical analysis for each question in a questionnaire.
FIG. 4 is a flow chart summarizing a system embodiment for diagnosing the results of the calibration and statistical analysis for each question in a questionnaire.
DETAILED DESCRIPTION
FIG. 1 A illustrates the overall process flow in an embodiment where the method is used to build a skill assessment questionnaire. However the present invention is not limited to this application and may be applied to other applications. Alternate embodiments may be based on the same overall structure using different algorithms, formulas and/or different decision trees to produce similar types of results. Embodiments extend over fields where IRT may apply. For example, an alternate embodiment is a system to automatically build fast and efficient market survey systems (computerized adaptive surveys) where IRT is relevant. Preferred embodiments have a 3-parameter model; however, other embodiments have a 2-parameter model or a 1 -parameter model. A formulary for an exemplary IRT model is given below at the end of the description.
Referring to FIG. 1 A, block 100, a user authors a set of questions. The only limitation placed on the type of questions is that the answer to a particular question must be either correct or incorrect. That is, there are no partially correct answers. Partially correct answers have to be considered as incorrect answers.
At block 101 , the user builds a questionnaire. This comprises sub-steps (not shown) of gathering questions, some of which may already have been calibrated (statistically validated) inside another questionnaire. The questions must pertain to the same subject matter. If some of the questions have been previously calibrated, the calibration is most preferably to have been with a relevant sample population. That is, the same type of population as the one that will take the final CAT questionnaire produced by the method of this invention.
At block 102, an indication is supplied of the number of sample candidates needed to calibrate the questions that are, as yet, not calibrated. This indicative figure may be based on empirical laws that take into account the number of non-calibrated questions and the number of calibrated questions as well as the previous results of the calibration. As described, the number of sample candidates given at this step is only an indication. If the calibration does not give good results, the program produces a new estimation of the number of candidates needed.
At block 103, the questionnaire is posted on the Internet or on an Intranet in order to gather data for the calibration and analysis. While such computer networks are preferred, other embodiments have an examinee take the questionnaire without using a computer network.
At block 104, sample candidates take the test, or questionnaire, as a regular sequential test. In differing embodiments, the test may be split into several parts with test candidates taking only a portion of the test.
At block 105, all data are collected (i.e. the answers of the sample candidates to the questions) calculations are performed to determine the IRT parameters of the questions and if the question is suitable for a CAT or statistically valid test. These sub-steps are fully described below for an embodiment in FIG. 3.
At block 106, results of preceding statistical analyses and calibration are reviewed to determine which questions are not suitable. If particular questions seem unsuitable, possible reasons are determined. Optionally, a report containing the results of this post-calibration analysis may be generated.
At block 107, a generated report is given to the user. The report contains two types of information. First, a report concerning the overall calibration and then several sub-reports concerning questions that showed a potential problem. Here, a user sees a report and decides on further action. Typical "Global" actions are: having more sample candidates take the test; automatically removing the bad questions and creating a questionnaire with the remainder; and reviewing the per question reports. Reviewing the question may still imply making a later choice between the two first global actions, above. The last global action, above, may not be proposed to the user if results show that the calibration globally failed. The user may be provided additional advice for that choice. At block 108, an evaluation for global failure of the calibration is made.
If global failure occurred, more sample candidates should take the test. If there is no global failure and if the user want to review the question, the system goes to block 109.
At block 109, a per-question review shows the problem that appeared for some of the questions, the possible causes for the problem and the recommended actions in each case. Possible "per-question" actions include: removing the question; modifying the key (the correct answer to the question); and modifying the question (rephrasing the question for instance). If a user chooses to modify some of the questions or added new questions (see block
110, FIG. 1), the questionnaire may be recalibrated with new sample candidates. If the user modifies only keys (correct answers) of some questions (see block
111, FIG. 1), the calibration may be restarted and statistical analysis continued with the same data.
Block 112 is reached only when no question was modified. Possibly some questions were removed (those that showed a problem).
Finally, at block 113, the questionnaire is a CAT test or a statistically validated questionnaire. It may now be used in a CAT test-taking system, or in a sequential test-taking system or in other types of systems. In the case of a CAT test, the IRT parameters of each question, as calculated at block 105, may be imported into a CAT test-taking system.
FIG. IB illustrates how questions are managed according to an embodiment. In FIG. IB, an author creates questions at blocks 130, 135 and 140. The genesis of the questions may be direct, indirect, or by modification. Logically, questions have two states: calibrated and non-calibrated. In practice, this may be a Boolean flag stored with the question in a database. The only way a question can change from a non-calibrated to calibrated state is through statistical analysis. Questions are created in a non-calibrated state, as illustrated by block 145. Questions are then calibrated by a statistical analysis at block
150, proceeding to a calibrated state, shown at block 160. Once calibrated, a question and calculated IRT parameters may be used in a CAT questionnaire concerning the same field and designed for the same type of population as used for the calibration in block 150. If calibration at block 150 reveals problems with the questions at block 170, the question may be removed entirely, block
165, or modified, block 170. After any modification at blocks 175 or 185, a question is non-calibrated and should return to block 150 for calibration. Modification also invalidates all answers given to that question.
FIG. 2 is a description of sub-steps included in block 105 (see FIG. 1A) according to an embodiment. The formulae and algorithm involved are detailed below.
Initializations, including initialization of the discrete distribution of the levels (i.e. the proficiency variable), are performed at block 200. To set the scale of levels, this distribution is set to a normal Gaussian distribution (mean = 0, variance = 1). At block 201, all of the data (the answers of all sample candidates to each question) are imported from a database.
At block 202, initial values of 3 IRT parameters ("a," "b" and "c") are calculated using standard statistics, as described in detail below. A process loop including blocks 203, 204 and 205 obtains precise values for the IRT parameters of the questions by iteration by calculating a Bayes modal estimate (or maximum marginal likelihood estimate) of the three IRT parameters "a," "b" and "c." As described above, an initial estimate of the IRT parameters "a," "b" and "c" for a question is accomplished using a standard statistical analysis of answers given by candidates to this question at block 202, FIG. 2. For example, consider a particular question posed to candidates. Let "p" be the proportion of candidates who answered this question correctly. Let "r" be the bi-serial correlation between the total score of candidates and the fact that they answered correctly the particular question (the score is simply computed by calculating the proportion of question answered correctly).
With the above, r is defined by
Figure imgf000010_0001
where N is the number of candidates that answered the question, sn is the score of the nth candidate, xn is 1 if candidate number n answered correctly the question, 0 if not.
The IRT "c" parameter is defined as the reciprocal of the number of possible answers for this question.
The IRT difficulty parameter "d" is calculated to take the guessing into account.
rf =^ l - C 00)
Then, p, a corrected bi-serial correlation, is calculated.
Figure imgf000011_0001
with z such that:
Figure imgf000011_0002
The initial value of "a" is
a = 1.702 P (13)
^ p'
The initial value of "b" is b = -— (14)
P
Further, "a" is bounded at 0.85 and 3.4 and "b" is bounded at -3 and 3.
The above formulae are applied as written with the except for z, which is defined implicitly. To determine z, an approximate Gaussian integral using the trapezium method is preferred. Determining the Gaussian reciprocal function with more precision is not necessary since this first calculation of the IRT parameters is only approximate.
Referring again to the embodiment described in FIG. 2, the general algorithm used for the Bayes modal estimate in blocks 203-205 is an EM (Estimation - Maximization) algorithm. The first part of this algorithm is the estimation step at block 203. Here, an estimation of the number of candidates in each level is calculated, as well as the proportion of candidates in each level that answered correctly each question. The second part of the EM algorithm is the maximization step at block 204. Here, the parameters "a," "b" and "c" are calculated to maximize a complete likelihood function. Sub-steps in blocks 203-204 are detailed below.
Block 205 is a condition to end the EM algorithm. If the changes in the parameters calculated at block 204 are less than a test value, or if the maximum number of loops for the algorithm was reached, the program exits the EM algorithm and proceeds to block 206. Further detail regarding block 205 is set forth below after an explanation of the proceeding blocks. At block 206, a very accurate estimation of the level of the sample candidates is calculated. At block 206, the IRT model and information function as in a CAT test-taking system.
At block 207, standardized residuals are calculated to determine if the question truly fits the model. In step 208, an answer/level correlation is calculated for each proposed answer of each question. Preferably, the answer/level correlation is the bi-serial correlation between the estimated proficiency of the candidate and the fact that he gave a particular answer to a given question or not. An endorsement rate calculated at block 209 is the proportion of candidates that gave a particular answer to a given question. Preferably, an endorsement rate is calculated for each proposed answer to each question.
As described above, the first part of an EM algorithm is an estimation step at block 203. Here, an estimation of the number of candidates in each level is calculated, as well as the proportion of candidates in each level that answered correctly each question. The following formulas and algorithms are those applied during sub-steps at block 203 of FIG. 2. The goal of the estimation step is to find an estimation of the number of candidates for each qk level and the number of candidates who have a q level and who answered question j correctly.
Defining nk (s) as the estimation at step (s) of the number of candidates among the sample candidates that have qk as level, and rjk (s) as the number of candidates that have qk as level and that answered question j correctly, the estimation step includes calculating those values using the following formulae:
Figure imgf000013_0001
Figure imgf000013_0002
As an optimization to make this calculation in a reasonable time, first calculate intermediate values:
Figure imgf000013_0003
v, = ^F— OS)
Then the calculation of nk (s) and r,k (s) becomes
nl%) = yυuΛv, (20)
Figure imgf000013_0004
As described above, the second part of the EM algorithm is the maximization step at block 204 of FIG. 2. Here, the IRT parameters "a," "b" and "c" are calculated to maximize a complete likelihood function. The following formulae and algorithms are those applied during the step noted 204 in FIG. 2.
At block 204, a new estimation of the three IRT parameters for each question is calculated by numerically solving a set of equations. The Baysian estimation includes solving a set of equations to find the maximum of a likelihood function (denoted L). For this, the point where the derivative of L is null is determined. In an embodiment, the set of equations can be split in J simple sets of three equations, each set corresponding to one question.
In an embodiment, then, for each question (viz. for each j) the three equations are:
Figure imgf000014_0001
where the three variables are a\, b and c,; ga is the prior distribution of the a parameter; gb is the prior distribution of the b parameter; gcj is the prior distribution of the Cj parameter. The three formulae above represent the derivative of the likelihood with respect to aj, bj and Cj, which are null at the point that is the maximum for the complete likelihood.
To simplify the equations, above, functions are defined as.
lα ω r - nk»P,(gk) dP,(qk)
Jk ^ -- PP kk )).YMk< J
Figure imgf000014_0002
lajk (s), lbjk(s), lcj (s) are functions of aj, bj and Cj.
Figure imgf000015_0001
Laj, Lbj, LCJ are functions of aj, bj and Cj.
ιuu k - daj
lbb™= — '—
Jk db, dlb
1 dc,
(24) _ d dllaay( ) _ d dllbh)(js> db J, da J dlaω dlc( )
Figure imgf000015_0002
dlb( ) dlc( )
' dc, db,
laajk (s), lbbjk (s), lccjk (s), labjk (s), lacjk (s), lbcjk (s) are functions of aj, bj and c,.
Figure imgf000016_0001
Laa s), Lbbj (s), Lcc S), Lab,00, Lac,(s), Lbc s) are functions of a,, bj and Cj.
rLa^ Lb? is therefore the gradient of the function L(i' (a, ,b, ,c ,) ,
\ LLcCJ( )
and is the Hessian of the same function.
Figure imgf000016_0002
In a preferred embodiment all the analytical formulae for the derivatives and second derivatives, above, are factorized for computational efficiency.
The following paragraphs give explicit formulas of the terms involving the prior distributions (ga, gb and gCJ) used in equations (21) and in definitions (23) and (25).
Figure imgf000017_0001
Knowing that one is in step (s), dealing with question number j, and with level number k (called q ), the following is a simplification of notation:
write n instead of nk r *" a the current value of a, b the current value of b, c the current value of Cj q qk
E e-",(t>rqk )
P C . + —r- l + e~
Thus, the following are the formulas that are applied to get the first and second derivative of the likelihood function in a preferred embodiment.
Figure imgf000018_0001
lby = (nc-r)E + n-ra
Jk (cE + iχE + 1) lc(s) = (nc-r)E + n-r
Jk (cE + l)(c-l)
Figure imgf000018_0002
_ [{<J2r-n lcc (™") _ = K(^r — McC)),C ~ r)E + (r~ ")2c]E - n + r
'jk ((cE + l)(c-l))2
Iab<j? (»c - r)(a(b -q)- \)cE + r-nc2 + 2(n - r)(a(b -q)- l)c]g + a(b - q)(n - rc) + (r - 2n)c + 2r - Π]E +
((cE + l)(E + l))2 facϊ = — ^— (b-q) Jk (cE + l)2 lbc =- E
(cE+iy
(30)
For efficiency in a preferred embodiment, a great number of intermediate values are calculated and the expressions are replaced in the above formulas. More precisely, the intermediate values are:
Ix=b-q
12 = a(b -q) = alx E = e-a«t-«) =e
\-c
P = c + - = c +
1- i.e-»«,-ι» 1 + E
= (a(b- -q)-\)c-- --(I2-\)c
= (nc- r)E + n- r h = cE + 1
J 1 1
J6
(cE + l)(E + \) (E + l)/5
I J 1 1 ι = (cE + l)(c-l) (c-l)I5
(nc-r)E + n-r (cE + l)(E + l) = {[((nc ~ r)E + 2(" - r))E ~r}: + n}E = [((/4 + n - r)E - r)c + n]E
-1
'.o = ((cE + lXE + 1))2 _
Figure imgf000019_0001
Er Er
1x2 (cE + l)2 I 52
(31)
Using these intermediate values, formulae (26), (27) and (28) become:
ia(;k ] = v,
Figure imgf000019_0002
laa™=InI2 lbb"=IΩa2 (33)
Ice ={[{(2r-nc)c-r)E + (r-n)2c]E-n + r}l7 2 lab jk S - f((MC -r)I3E + r- nc2 + 2(n - r)I3 )ε + I2(n- c) + (r- 2n)c + 2r - «]E + r - n]li0 lac$ = InIx lbc$=Iua
(34)
In a case when E, above, is too close to 0 to avoid numerical difficulties in computations, preferred embodiments use different formulae. In a preferred embodiment, these modified formulae are used when P > 0.9999999, that is, when E is almost 0. These formulae are more stable and avoid getting NaN (Not a Number) by dividing zeros or infinities and they give more accurate results as E gets closer to 0. These formulae are an asymptotic development of the formulae (28), (29) and (30).
la^=(n-r)(b-q) W™=(n-r)a (35)
J c-\
laa^=E(rc-ή)(b-q)2 lbb™=E(rc-ή)a2 (36) lcc J"k = n~r
(c-1)2
lab™=n-r lac"=Er(b-q) (37)
Ibc™ = Era
For efficiency, intermediate values are calculated and the above expressions are replaced in the formulas. More precisely, the intermediate values are: Ix=b-q
I2 = a(b -q) = al
E = e-a(l-b) =e
P = C + \-c r-r- = C + \-c
I3=n-r (38)
74 =(rc-ή)E
I5=Er
c-\ n — r
11 . i3 6 c-\
Using these intermediate values, formulas (35), (36) and (37) become:
Figure imgf000021_0001
lb = I3a (39)
^'=
laa"^I 2 lbb™=I,a2 (40)
Figure imgf000021_0002
lab™=I3 lac"=I5I (41) lbc™=I5a
In a case when E is to close to the positive infinity to prevent numerical difficulties in a computation, preferred embodiments use different formulae. These modified formulae are used when (P - c) < 0.0000001, that is, when E is almost infinite. These formulae are more stable, avoiding getting NaN (Not a Number) by dividing zeros or infinities. Moreover, they give more accurate results as E gets greater. The formulae are an asymptotic development of the formulae (28), (29) and (30).
nc- la™ = (b-q) cE
Figure imgf000022_0001
nc-r lc™ c(c-\)
/αaω_. »*-',. q ^ ι,k b- y cE nc-
Ibb Jj™k = (43) cE (4) (2r-nc)c-r lcc -J™k =
(c(c-l))2
{ ) _(nc-r)(a(b-q)-\) lab™ =
,k cE r lac™=-^(b-q) (44) c^E lbc™= —a J c2E
In a preferred embodiment, intermediate values are calculated and replace the expressions in the above formulas for efficiency. More precisely, the intermediate values are:
I =b-q
1 = a(b — q) = al
E = e-a(l-b) =e
P D = c + -c —— = c + !~c
\ + e'a{l>-h) 1 + E
I3 -nc-r
/,= c(c l-\)
= ncz = II 6 cE
T r rIA
Using these intermediate values, formulas (35), (36) and (37) become:
la™ = I6Iλ lb™ = I6a (46)
Figure imgf000023_0001
laa™=-I6I2 lbb™=-I6a2 (47) lcc™ = ((2r - nc)c -r)l5Is
lab™=-I3(I2- i) l c™=Iη (48) lbc™=Iηa
The formulae, above, allow a computer program to calculate the gradient and the Hessian of the likelihood function at any given point. Since the equation systems (21) cannot be solved explicitly, a gradient and Hessian are used to iteratively compute approximations of the solutions to the equations. Let (t) be the step of this iterative calculation.
The gradient method includes modifying the parameters using a fraction of the gradient as increment. This method is relatively slow but is the most stable to find the maximum. Precisely the formulas are:
Figure imgf000024_0001
b = b + κlLb™(a;( ),b;( c;)) (49)
Figure imgf000024_0002
In preferred embodiments, Kl may be either 0.0005 or 0.00025.
In a preferred embodiment, the parameters are modified using a fraction of the increment used for a Newton-Ralphson method. This method is more stable than the normal Newton-Ralphson and less stable than the gradient method, but it is faster than the gradient method and slower than the normal
Newton-Ralphson method.
Precisely, the formulas are given by the following. Let A be the Hessian of L(s) taken at the point (a,(t). b,(t), c,(t)):
Figure imgf000024_0003
(50)
Inverting A, using a transpose of the cofactors method to arrive at the new values of "a," "b" and "c" using:
Figure imgf000025_0001
where K2 is a real number. In a preferred embodiment, K2 is 0.1.
The Newton-Raphson method included solving the equation with an order one approximation of an (La, Lb, Lc) vector. That is, an order two approximation of L. This method is the less stable of the three but fastest.
Precisely, the formulas are given by the following. Let A be the Hessian of L(s) taken at the point (a,(t), b,(t), c 0):
Laa™(a ,b ,c?) Lab?(a, b;( c?) Lac? {a^ ,b^ c^)^
A - Lab?(a?,b?,c?)
Figure imgf000025_0002
Lbc™(a? .ftJ , )
Lac (>) ((aA<),b ,c?) Lbc (»)(a„?('),b A^(<),c A^'))" Lcc (a)'>,b)'>,c)'>)
(52)
Inverting A, using a transpose of the cofactors method to arrive at the new values of "a," "b" and "c" using:
Figure imgf000025_0003
In a solution procedure, an iterative process is performed for each question to get an approximation of the solution to the equations (21). The criterion for the convergence will be the norm of the gradient vector (aj (t), bj (t), Cj (t)) noted G,(t):
Figure imgf000025_0004
(54) The iterative process is done in two imbricated loops. The outer loop will be called the "trial" loop, and the inner loop will be called the "phase" loop.
Before starting one of the loops, the current values of "a," "b" and "c"
(which were obtained at the previous M step) are retained in variables called A0, Bo, Co-
At each step of the inner loop, Laj(s), Lb ), LCJ(S), Laa S), Lbbj (s), Lccj(s), Labj (s), Lacj (s), Lbc s) are calculated. Then, the Hessian determinant, which is the determinant of the A matrix defined above is calculated:
H?(a?,b?,c )
Figure imgf000026_0001
= [Laa™ Lbb? Lcc? + 2Lab? Lbc? Lac? - Laa? Lbc? Lbc?
- Lbb? Lac? Lac? - Lcc? Lab? Lab? ](a? ,b? , c? )
(55)
If at this point, the calculation failed (viz., one of the values La s), Lbj s)... or Hj( ) is either infinite or not a number), the process is stopped for this question, moving on to the next question. If (N(t))2 is sufficiently small, for instance less than 10" , it is considered that the maximum of L is close and the process for this question is stopped.
In the other cases, the values of "a," "b" and "c" are modified using one of the three methods described above. The values of "a," "b" and "c" are then bounded at their maximum value to prevent divergence. The maximum for "b" is 3.5 and the minimum -3.5; the maximum for "a" is 4.3 and the minimum
0.43; the maximum for "c" is μCj + 3.5σCj and the minimum is μCJ - 3.5σCJ. In a preferred embodiment, this inner loop is executed at most a thousand times for each trial. If, at the end of the thousand loops, there is no convergence, another trial is commenced.
In a preferred embodiment, there are at most 3 trials for a question during an M step. The first trial is a "normal" trial. The initial values taken for aj (t), bj (t) and c 0 are A0, B0 and C0. The value for Ki is 0.0005. For the second trial, the initial values taken for aj (t), b l) and Cj (t) are:
α< ' = Pa + 1 ■5σarandom b,(,) - μh + 1.5σhrandom2 (56) random,
Figure imgf000027_0001
where random], random and random are three uncorrelated random values between 0 and 1. The value for Kj is 0.00025. For the third trial, the initial values taken for aj(t), bj(t) and c l) are:
a? = μa + 2σ arandomx b? = μh + 2σhrandom2 (57) c(,) = μc + 2σc random
where randomi, random2 and random3 are three uncorrelated random values between 0 and 1. The value for Ki is 0.00025.
The inner loop is called "phase" loop, because there can be different phases in the process that use a different method to estimate the parameters. In a preferred embodiment, there are 3 different phases: a first phase using the gradient method to calculate the parameters; a second phase using the modified Newton-Ralphson method; and a third phase using the Newton-Ralphson method. Note that during one trial, the program can switch several times to a same phase. In a preferred embodiment, a trial starts with the first phase. The start values A0, Bo and Co are saved in the variables A], Bi and C\. Preferably, as the iterative process proceeds, a switch to the second phase and then to the third phase should occur to find accurate results more quickly. However, there is a complete branching system to detect if one phase is converging enough or diverging to switch from one phase to the other.
A typical procedure for an embodiment follows. Once Laj (s), Lb s), Lc S), Laa,(s), Lbb s), LcCj (s), Lab,(s), Lac s), Lbc s), G s) and H,(s) are calculated, the formula to calculate a, b and c that corresponds to the current phase are applied. Then, tests are made to detect if a change of phase should occur. If the tests show that the current phase is diverging, a switch is made to a previous phase and the current values of a, b, and c are replaced by the corresponding saved values Ai, Bi, Ci. The criteria for determining when a switch from one phase to the other should occur is also changed. If the test shows that the current phase converged, a switch is made to the next step and the current values of "a," "b" and "c" are saved in Ai, Bj, C\. In the case when the determinant is too small to apply the Newton-Ralphson method safely (normal or modified), a switch is made directly to the first phase, even before doing the calculations corresponding to the current phase.
Table 1 , below, summarizes the phase branching system. In Table 1 , limit is a variable which is compared to the square norm of the gradient as a criterion to switch from one phase to the other. The initial value of "Limit" is
100 in a preferred embodiment. Count is the number of loops spent in the same phase. In Table 1, a stands for a s), b for b s), c for c s), H for H,(s) and N for TABLE 1
Figure imgf000029_0001
Referring now to block 205 of FIG. 2, at each step of the main iterative process, an E step is performed, and then an M step. At the end of the M step, all a, b and c parameters are compared which the value they used to have in the previous step. The maximum of the absolute values of these differences is termed the maximum change in the parameters. In a preferred embodiment, if this maximum change is less than 0.05, the calibration is terminated because an adequate estimation of the parameters is complete. Whatever the changes in the parameters, however, after 12 loops the EM calibration is terminated in a preferred embodiments because continuing further will not bring additional precision.
At block 206 of FIG. 2, an estimate of the level of the test candidates is determined. In an embodiment, the following formulas and algorithms are applied to arrive at the determination. For computational efficiency, preferred embodiments first calculate intermediate values:
Figure imgf000030_0001
Figure imgf000030_0002
Figure imgf000030_0003
wi considered as a function of k is called the information function. First, this function is only calculated for each qk. A first estimate of the level is calculated using the formula:
Figure imgf000030_0004
In fact, the information function is a function of a continuous variable and can be defined as follows:
Figure imgf000030_0005
To have more precision in the level of the candidate, suppose that the θ variable is continuous and we use a Newton-Ralphson method to find the maximum of the information function which is the level of the candidate. For that purpose, the first and second differential of the information function with respect to θ is calculated. Since only the differentials of the information function are needed, the denominator of the information function, which is a constant, is unneeded. Therefore, define Ij(θ) as the modified information function (the logarithm of the real information function without the denominator). Let Itj(θ) be the first differential and Itti(θ) be the second differential of r,(θ).
iχθ) = \n{G(θ))+ yl,\n(p,(θ))+(\-yυ)\r \-PJ(θ)) (63)
7=1
Defining:
d\r{P,(θ)) itr,(θ) dθ d\n(\-P,(θ)) itw,(θ) = dθ
(64) d2ln(P,(θ)) ittr^θ) dθ2 d2ln(l-P,(θ)) ittw,(θ) = dθ2
Therefore
Itl(θ) = ^ - = -θ + ∑yuitrJ(θ) + (l-y,/)itw,(θ) (65) dθ 7=1
lit, (θ) = ^ - = ^β = -1 + ∑ y^tr} (θ) + (1 - y„ )ittw, (θ) (66)
To calculate all the derivatives and second derivatives, above, preferred embodiments use analytical formulas. The following shows how such formulas are factorized for computational efficiency in an embodiment.
Knowing that one is in step (s), dealing with candidate number i, and with question number j, the following simplifications are applied: write θ instead of θi(s) a aj
Figure imgf000032_0001
c CJ
Figure imgf000032_0002
P c
-« ')-ι>1 )
1 + e
The following are the formulae that are normally applied to get the first and second derivative of the information function.
A first case is when yy is 1.Here, only calculate itrj(s) and ittr. (s)
itr. ( _ (l -c)aE
(E + l)(cE + l)
(67)
( \\ --cc))aaEE((ccE -\) ittr™ = - (E + \)2(cE + \)2
For efficiency, intermediate values are calulated and these expressions are replaced in the above formulae. More precisely, the intermediate values are:
(E + \)(cE + \)
(68)
(\ -c)aE = = (\ -c)aEIl (E + l)(cE + \)
Using these intermediate values, formulas (67) become:
itr™ = L
(69) ittr™ = l2a(cEl -\)lx A second case is when yy is 0. Here, only calculate itwι(s) and ittWj(s).
itw - . -a
(E + l)
(70) a2E ittw™ = -
(E + l)2
For efficiency, intermediate values are calculated and these expressions are placed in the above formulas. More precisely, the intermediate values are:
'. = (E + l)
(71) - a
= -aL
(£ + 1)
Using these intermediate values, formulas (67) become:
itr? = I2
(72) ittr™ = LaEL
When E is to close to 0, different formulae are used. These modified formulas are used when P > 0.9999999, that is, when E is almost 0. These formulas are more stable, they prevent from getting NaN (Not a Number) by dividing zeros or infinities and they give more accurate results as E gets closer to 0. These formulas are an asymptotic development of the formulae (63).
The first case is when yy is 1. Here, only calculate itr,(s) and ittrj(s).
itr™ = (\ -c)aE
(73) ittr? = -(\ - c)a2 For efficiency, intermediate values are calculated and these expressions are replaced in the above formulae. More precisely, the intermediate values are:
Il = (l - c)aE (74)
Using these intermediate values, formulas (73) become:
itr? = L
(75) ittr? = -al,
The second case is when yy is 0. Here, only calculate itwj(s) and ittWj(s).
itw™ = -a
(76) ittw, (-0 -a2E
When E is too close to the positive infinity, different formulae are used. These modified formulas are used when (P - c) < 0.0000001, that is, when E is almost infinite. These formulas are more stable, they prevent from getting NaN (Not a Number) by dividing zeros or infinities and they give more accurate results as E gets greater. These formulae are an asymptotic development of the formulae (63).
The first case is when yij is 1. Here, only calculate itrι(s) and itix
cF
, (77)
. ( ) (l - c)a2 ittr™ = ^ - — cE
For efficiency, intermediate values are calculated and these expressions are placed in the above formulas. More precisely, the intermediate values are: cE
Using these intermediate values, formulas (77) become:
itr? = 7,
(79) ittr™ = aL
The second case is when yij is 0. Here, only calculate itw,(s) and ittw,(s).
itw ,, _ a
Figure imgf000035_0001
For efficiency, intermediate values are calculated and these expressions are placed in the above formulae. More precisely, the intermediate values are:
', = ~ E (81)
Using these intermediate values, formulas (80) become:
itw? = I
1 (82) ittw™ = a
A solution algorithm for an embodiment follows. For each question, an iterative process is used to calculate the estimation of the level of the candidate. This calculation is based on a Newton-Ralphson method. Let (s) be the index of the current step. At each step, we calculate the values of the derivative and the second derivative of I,(θ) called respectively Itj(θ) and Ittj(θ) at the point θ,(s) using the formulas of the previous section. Then the value of θ is updated applying the formula:
It,(θ?) θ. (-+1)
(83)
Itt,(θ?)
The criterion for ending this iterative process is the absolute value of It i(θι(s)). If this value is less than 10"7, the θj(s) sequence has converged and the last θι(s) is taken as the value of θj. If after 20 steps, the absolute value of Itj(θj(s)) is still greater than 10"7 the θj(s) sequence is considered to have not converged and θi(0) is taken for the value of θj.
At block 207 of FIG. 2, a residual is calculated. In an embodiment, the following formulae and algorithms are applied.
To estimate how well the data fit the model, standardized residuals are used. With the IRT parameters being calculated for each question, the probability Pj(θ) is completely defined (see equation (1)). A residual (noted η), then, is calculated for each question using the formulae:
= ∑v<k
7=1
Figure imgf000036_0001
A solution algorithm of an embodiment first calculates the intermediate values Sk. Then, equation (84) is applied by calculating Pj(qk) for each value of k.
At block 208 of FIG. 2, a bi-serial correlation is determined. The answer/level bi-serial correlation calculation is an extra statistical analysis used to determine how well a particular answer is correlated to the level of the candidate. Normally, the right answer's correlation should be the greatest and positive. Ideally, the other answers' correlations should all be negative. Those values are used to detect irrelevant questions or keying errors (when the right answer was not set correctly). In an embodiment, the following formulae and algorithms are applied.
Taking question number j, all the answers given by the sample candidates for that question are considered. They are sorted into classes of "equivalent" answers. Let nj be the number of classes of answers for the jth question. Then, zyn may be defined which equals 1 if candidate i gave an answer of the nth class for the question number j, and 0 otherwise. The item/level correlation for question j and answer n is defined by:
Σ,=,(^,, -^ )(£, -£)
°jn = (85)
with
_ YN Θ
N
Σ N (86) _ ι=l Z,
N A solution algorithm calculates the average value of the level θ while calculating the level of each sample candidate, z . As well, the endorsement rate is calculated while building the classes of given answers. Then, equation (85) is applied for each class of answers for each question.
FIG. 3 is the schematic process of the global analysis of the calibration results and statistical analysis according to an embodiment. A preferred embodiment includes a series of conditional branches, corresponding to block 106 of FIG. 1A. Messages and recommended actions contained FIG. 3 are detailed below for an embodiment, as are values for the tests at blocks 301, 303 and 304.
In FIG. 3, block 301 checks if any estimated "a," "b" or "c" parameter of any question is Not a Number (noted NaN). This occurs when a non- allowable numerical calculation occurs during the calculations (such as division of zeros or infinite values). At block 303, a condition for "too many questions showing a problem" for an embodiment is: the proportion of questions showing a problem is greater than 0.2. What is termed a "question showing a problem" is a question for which a report was generated, as described below. At block 304, questions showing no problem, as defined above, are classified in five groups according to difficulty levels. The intervals for the difficulty levels are, for an embodiment: [-3; -0.8416[ first interval: low level
[-0.8416; -0.2533[ second interval: medium low level
[-0.2533; 0.2533] third interval: medium level
]0.2533; 0.8416] fourth interval: medium high level
]0.8416; 3] fifth interval: high level These intervals correspond to an even partition of a Gaussian distribution into five parts, with the condition that the values are bounded by -3 and 3.
For each group, calculate:
7lβ,eG,
where Gi is group number i (i from 1 to 5), Qj is question number j (j from 1 to J). Mj can be interpreted as the total of the maximum information of group number i. At block 304, a condition for "Not enough discriminative questions for each level" is " 3i e {1.2..5} | M, < 30 " FIG. 4 is the schematic process of the per-question analysis of the calibration results and statistical analysis. It includes a series of conditional branches. This process is performed for each question and corresponds to the step 302 of FIG. 3. The messages and recommended actions of this figure are detailed below for an embodiment as are values for the tests at blocks 401 to
408 and 411 to 421.
The following tests and corresponding block numbers are those used during the process described in FIG. 4 for an embodiment. Test 401 means that for question number j, the answer that has the highest answer/level correlation
(cjn) is not the answer the author entered as the correct answer. Test 402 means that for question number j, the highest answer/level correlation (Cjn) is negative. Remark that under "normal" circumstances this should never occur. Test 403 means that for question j, Cj > 0.4. Test 404 means that for question j, a, < 0.51. Test 405 means that for question j, bj < -3. Test 406 means that for question j, bj > 3. Test 407 means that for question number j, the answer/level correlation (cjn) of the right answer is less than 2 times the second highest answer/level correlation. Test 411 means that the answer that has the highest answer/level correlation (cjn) is a partially correct answer (this can occur with multiple response questions for instance. Remark that for the calibration, these answers were considered incorrect). Test 412 means that the answer that has the second highest answer/level correlation (CJΠ) is a partially correct answer (this can occur with multiple response questions, for instance. Remark that for the calibration, these answers were considered incorrect). Tests 408 and 413 to 421 means that for question j, η > 2.
Addendum 1: IRT Formulary
As detailed above, preferred embodiments include a three-parameter IRT model. The three parameters are: "a" referred to as the discrimination of the question; "b" referred to as the level of the question; and "c" referred to as the pseudo-guessing of the question.
The mathematical formula describing the three-parameter IRT model is:
W^ +T^ v (1)
Where θ is the level of the candidate, j is the index of the question and ranges between 1 and J, the number of questions.
Observed data come from people taking the questions in a test. N is the number of sample candidates that took the test. J is the number of questions.
The observed data, then, are the responses of the N candidates to the J questions which is contained in a N by J matrix, called Y. Thus, y-ή is 1 if candidate i answered correctly question j 0 if candidate i answered incorrectly question j.
Each examinee has a level θ, which is a missing data, θ is referred to the latent variable. For the purpose of a statistical question calibration, the latent variable is considered to be a discrete variable that can take K known discrete value qk, k ranging form 1 to K and q evenly distributed between -Max and +Max. Therefore, each θj can take any of the qk values.
πk is the probability that a candidate has qk as his level, π = (πi, π2, ..., 7iκ) is the distribution of the levels. To set the scale of the levels, π is taken as a normal Gaussian (with 0 as mean and 1 as standard deviation). As a result, the πk won't be modified during the EM algorithm calculations. The distribution of the levels will sometimes be considered as a continuous variable, in that case, it is called G(θ). This function is a Gaussian with 0 as mean and 1 as standard deviation.
Baysian estimation of IRT parameters requires a prior distribution for each variable of the IRT model. A preferred embodiment uses the same distribution for all the a parameters of all the questions, the same distribution for all the b parameters, but a different distribution for each Cj parameter.
Preferred embodiment use a lognormal distribution for the IRT "a" parameter:
Figure imgf000041_0001
where Ka is such that f ga (x)dx = 1. Note that an explicit value of Ka is not needed.
If μa is referred to as the mean for this distribution and σa the standard deviation, then μ'a and σ'a are defined by
Figure imgf000041_0002
Figure imgf000041_0003
In the above implementation, μa is 1.28 and σa is 0.2.
For the IRT "b" parameter, preferred embodiments use a normal distribution:
Figure imgf000042_0001
where μb is the mean for this distribution, and σb the standard deviation. In the implementation μb is 0 and σb is 2.
For the IRT "c" parameter, preferred embodiments use a normal distribution:
Figure imgf000042_0002
where μCj is the mean for this distribution, and σcj the standard deviation. For the "c" parameters, the distribution is different for each question, more precisely:
μ- = m
σ< = K,μ<, (8)
where MCj is the number of possible answers for question number j, and Kc a constant (which is the same for all the questions). In the implementation,
Kc is 0.25.
Addendum 2: post-calibration result and messages in FIG. 3-4:
The following messages and recommended actions refer to the embodiment detailed in FIG. 3. NOTE: Data are provided by the program to help a user to make decisions and to give advice about what to do. The interpretation of the data depends on the type of message and recommended action. It will be noted <data> in the following.
On the other hand, the advised number of sample candidates needed to perform calibration is always given. In the following <additional number> will denote the number of additional sample candidates recommended for calibration.
Message 1 Questionnaire calibration failed. More sample candidates are needed to calibrate the questionnaire. Recommended Action 1 Have approximately <additional number> more sample candidates take the test.
Message 2 <data> percent of the question showed a problem at calibration. Most probably the calibration failed because there were too few sample candidates. Recommended Action 2 Have approximately <additional number> more sample candidates take the test.
Message 3 Calibration succeeded, but figures show that there are too few questions at least for some level of difficulty. This will result in a loss of efficiency if used in a
CAT. Recommended Action 3 First review per question reports. Then create new items as indicated in table <data> (Table shows an approximate number of questions needed for the different levels). Then restart a calibration. Message 4 Calibration succeeded. <data> percent of the questions showed problem and can be either corrected or removed. A questionnaire without those question will still be suitable for a CAT. Recommended Action 4 Review per question reports. You can then use the remaining questions.
The following messages and recommended actions refer to FIG. 4.
NOTE: Data are provided by the program to help a user to make decisions and to give advice about what to do. The interpretation of the data depends on the type of message and recommended action. It will be noted <data> in the following.
Message 1 The calibration results are not relevant for this question. This can be due to keying error. Candidates with high proficiency tend to give an answer that is not the one entered as a correct answer. The answer given by candidates with high proficiency is <data>.
However this answer is partially correct, and this problem may occur without any keying error. Recommended Action 1 First, check if the answer given as the correct answer is really the right one. If there is no error, this might mean that the question if misleading. If you find what can be misleading, modify or rephrase the question. If not, this is probably because partially correct answers can be given to this question. If you can, modify the question in such a way that there are less possible partially correct answers. If you are not sure about what the problem is or if you don't want to recalibrate the questionnaire, remove this question. Message 2 The calibration showed a problem. This can be due to keying error. Candidates with high proficiency tend to give an answer that is not, the one entered as a correct answer. The answer given by candidates with high proficiency is <data>. However this answer is partially correct, and this problem may occur without any keying error. Recommended Action 2 First, check if the answer given as the correct answer is really the right one. If there is no error, this might mean that the question is misleading. If you find what can be misleading, modify or rephrase the question. If not this is probably because partially correct answers can be given to this question. If you can, modify the question in such a way that there are less possible partially correct answers. If you are not sure about what the problem is or if you don't want to recalibrate the questionnaire, remove this question.
Message 3 The calibration results are not relevant for this question. This is apparently due to a keying error. Candidates with high proficiency end to give an answer that is not the one entered as a correct answer. Actually, the answer given by candidates with high proficiency is <data>.
Recommended Action 3 Check if the answer given as correct answer is really the right one. If there is no error, this means that the question if misleading. If you find what can be misleading, modify or rephrase the question. If not or if you don't want to recalibrate the questionnaire, remove this question. Message 4 The calibration shows a possible keying error.
Candidates with high proficiency tend to give an answer that is not the one entered as a correct answer. The answer given by candidates with high proficiency is <data>.
Recommended Action 4 Check if the answer given as correct answer is really the right one. If there is no error, this means that the question if misleading. If you find what can be misleading, modify or rephrase the question. If not or if you don't want to recalibrate the questionnaire, remove this question.
Message 5 The calibration results are not relevant for this question. This question must be misleading since candidates with high proficiency tend to give the right answer less often than candidates with a lower proficiency do. Recommended Action 5 If you find what in the question can be misleading, modify or rephrase it. If not or if you don't want to recalibrate the questionnaire, remove this question.
Message 6 The calibration shows that this question must be misleading since candidates with high proficiency tend to give the right answer less often than candidates with a lower proficiency.
Recommended Action 6 If you find what in the question can be misleading, modify or rephrase it. If not or if you don't want to recalibrate the questionnaire, remove this question.
Message 7 The calibration results are not relevant for this question. This is apparently due to a high guessing parameter that indicates that there is too high a probability (<data>) for guessing the right answer. Recommended Action 7 Remove this question or modify it in such a way that there are more plausible alternatives among possible answers.
Message 8 The calibration shows a high guessing parameter for this question which indicates that there is a too high probability (<data>) to guess the right answer. Recommended Action 8 Modify this question in such a way that there are more plausible alternatives in the answers or remove it.
Message 9 The calibration results are not relevant for this question. This is apparently due to a low discrimination which indicates that candidate with high proficiency don't answer correctly significantly more often than candidates with a low proficiency (Ratio is <data>). Recommended Action 9 Remove this question.
Message 10 The calibration shows a low discrimination for this question which indicates that candidate with high proficiency don't answer correctly significantly more often then candidates with a low proficiency (Ratio is
<data>).
Recommended Action 10 Remove this question.
Message 11 The calibration results are not relevant for this question. This is apparently due to a very low difficulty level (the proportion of the candidate that answered correctly is indeed <data>). Recommended Action 11 Remove this question or, if possible, have candidates with more extreme levels (both high and low level to prevent biases) take the test to recalibrate this question.
Message 12 The calibration shows a very low difficulty level for this question (the proportion of the candidate that answered correctly is indeed <data>).
Recommended Action 12 Remove this question or, if possible, have candidates with more extreme levels (both high and low level to prevent biases) take the test to recalibrate this question.
Message 13 The calibration results are not relevant for this question. This is apparently due to a very high difficulty level (the proportion of the candidates that answered correctly is only <data>). Recommended Action 13 Remove this question or, if possible, have candidates with more extreme levels (both high and low level to prevent biases) take the test to recalibrate this question.
Message 14 The calibration shows a very high difficulty level for this question (the proportion of the candidates that answered correctly is only <data>). Recommended Action 14 Remove this question or, if possible, have candidates with more extreme levels (both high and low level to prevent biases) take the test to recalibrate this question. Message 15 The calibration results are not relevant for this question. This can be due to the following problem: candidates with high proficiency relatively often give an answer that is not the one entered as a correct answer. The second most given answer is <data>.
This may be because the question is misleading. However this answer is partially correct, and this problem may occur even if the question is not misleading. Recommended Action 15 If you find what can be misleading, modify or rephrase the question. If not, this is probably because partially correct answers can be given to this question. If you can, modify the question in such a way that there are less possible partially correct answers. Unless you find out what the problem is and you don't mind re-calibrating the questionnaire, remove this question.
Message 16 The calibration shows a problem for this question. Candidates with high proficiency relatively often give an answer that is not the one entered as a correct answer. The second most given answer is <data>. This may be because the question is misleading. However this given answer is partially correct, and this problem may occur even if the question is not misleading. Recommended Action 16 If you find what can be misleading, modify or rephrase the question. If not, this is probably because partially correct answers can be given to this question. If you can, modify the question in such a way that there are less possible partially correct answers. If you are not sure about what the problem is or if you don't want to recalibrate the questionnaire, remove this question.
Message 17 The calibration results are not relevant for this question. This is apparently because the question is relatively misleading. Candidates with high proficiency relatively often give an answer that is not the correct answer. The second most given answer is <data>. Recommended Action 17 If you find what can be misleading, modify or rephrase the question. If not or if you don't want to recalibrate the questionnaire, remove this question.
Message 18 The calibration shows that this question is relatively misleading. Candidates with high proficiency relatively often give an answer that is not the correct answer. This second most given answer is <data>. Recommended Action 18 If you find what can be misleading, modify or rephrase the question. If not or if you don't want to recalibrate the questionnaire, remove this question.
Message 19 The calibration results are not relevant for this question. Probably, this question doesn't follow the general model (which assumes that the higher the proficiency of candidate, the higher his probability of answering correctly).
Recommended Action 19 remove this question.

Claims

CLAIMSWhat is claimed is:
1. A method for iteratively generating a CAT questionnaire on a computer, comprising: calibrating a set of questions with statistical modeling; and amending the set of questions responsive to the statistical modeling.
2. The method of claim 1, wherein calibrating a set of questions includes testing a sample population by posting the set of questions on a computer network.
3. The method of claim 2, wherein statistical modeling includes an IRT model.
4. The method of claim 3, wherein the IRT model is a 3-parameter IRT model.
5. The method of claim 3, wherein the IRT model is a 2-parameter IRT model.
6. The method of claim 3, wherein the IRT model is a 1 -parameter IRT model.
7. The method of claim 3, wherein statistical modeling further includes a Bayesian modal estimate of IRT parameters.
8. The method of claim 7, wherein the Bayesian modal estimate includes an iterative EM algorithm.
9. A method for interactively generating a CAT questionnaire on a computer, comprising: calibrating a set of questions with statistical modeling; and interacting with a user to amend the set of questions responsive to the statistical modeling.
10. The method of claim 9, wherein calibrating a set of questions includes testing a sample population by posting the set of questions on a computer network.
11. The method of claim 10, wherein statistical modeling includes an IRT model
12. The method of claim 11, wherein the IRT model is a 3-parameter IRT model.
13. The method of claim 11, wherein the IRT model is a 2-parameter IRT model.
14. The method of claim 11 , wherein the IRT model is a 1 -parameter
IRT model.
15. The method of claim 11, wherein statistical modeling further includes a Bayesian modal estimate of IRT parameters.
16. The method of claim 15, wherein the Bayesian modal estimate includes an iterative EM algorithm.
17. A method for interactively generating a CAT questionnaire on a computer, comprising: calibrating questions in a set of questions with statistical modeling, wherein the statistical modeling generates at least one figure-of-merit measuring the appropriateness of questions to a CAT questionnaire; supplying a user with information indicative of the appropriateness of the question to a CAT questionnaire, the information based on at least one figure-of-merit; receiving user-input to amend a set of questions, corresponding answers from an answer key; and providing a statistically validated CAT questionnaire responsive to the user-input.
18. The method of claim 17, further comprising: receiving the set of questions and the answer key, wherein each question in the set of questions has an identified, exclusively correct answer in the answer key; and receiving responses to the set of questions, wherein the responses are from a population of sample candidates to which the set of questions was posed.
19. The method of claim 17, further comprising: generating the set of questions and the answer key interactively with a user, wherein each question in the set of questions has an identified, exclusively correct answer in the answer key; and receiving responses to the set of questions, wherein the responses are from a population of sample candidates to which the set of questions was posed.
20. The method of claim 17, wherein statistical modeling includes an IRT model.
21. The method of claim 20, wherein the IRT model is a 3-parameter IRT model.
22. The method of claim 20, wherein the IRT model is a 2-parameter IRT model.
23. The method of claim 20, wherein the IRT model is a 1 -parameter IRT model.
24. The method of claim 20, wherein statistical modeling further includes a Bayesian modal estimate of IRT parameters.
25. The method of claim 24, wherein the Bayesian modal estimate includes an iterative EM algorithm.
26. The method of claim 17, wherein supplying a user with information further includes supplying the user with recommended actions, the recommended actions based on at least one figure-of-merit.
27. The method of claim 17, wherein the user-input is supplied over a computer network.
28. The method of claim 18, wherein responses are received over a computer network.
29. The method of claim 18, further comprising receiving user-input to amend the population of sample candidates responsive to at least one figure-of- merit.
30. The method of claim 19, wherein responses are received over a computer network.
31. The method of claim 19, further comprising receiving user- input to amend the population of sample candidates responsive to at least one figure-of- merit.
PCT/US2000/019002 1999-07-13 2000-07-13 Method for automatically producing a computerized adaptive testing questionnaire WO2001004862A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU60913/00A AU6091300A (en) 1999-07-13 2000-07-13 Method for automatically producing a computerized adaptive testing questionnaire

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14356599P 1999-07-13 1999-07-13
US60/143,565 1999-07-13

Publications (2)

Publication Number Publication Date
WO2001004862A2 true WO2001004862A2 (en) 2001-01-18
WO2001004862A3 WO2001004862A3 (en) 2001-10-18

Family

ID=22504611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/019002 WO2001004862A2 (en) 1999-07-13 2000-07-13 Method for automatically producing a computerized adaptive testing questionnaire

Country Status (2)

Country Link
AU (1) AU6091300A (en)
WO (1) WO2001004862A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009023802A1 (en) * 2007-08-14 2009-02-19 Knewton Inc. Methods, systems, and media for computer-based learning
US7580237B2 (en) 2003-05-29 2009-08-25 Taser International, Inc. Systems and methods for immobilization with repetition rate control
US7602598B2 (en) 2003-02-11 2009-10-13 Taser International, Inc. Systems and methods for immobilizing using waveform shaping
US8046251B2 (en) 2000-08-03 2011-10-25 Kronos Talent Management Inc. Electronic employee selection systems and methods
US10885803B2 (en) 2015-01-23 2021-01-05 Massachusetts Institute Of Technology System and method for real-time analysis and guidance of learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5059127A (en) * 1989-10-26 1991-10-22 Educational Testing Service Computerized mastery testing system, a computer administered variable length sequential testing system for making pass/fail decisions
EP0553674A2 (en) * 1992-01-31 1993-08-04 Educational Testing Service Method of item selection for computerized adaptive tests

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5059127A (en) * 1989-10-26 1991-10-22 Educational Testing Service Computerized mastery testing system, a computer administered variable length sequential testing system for making pass/fail decisions
EP0553674A2 (en) * 1992-01-31 1993-08-04 Educational Testing Service Method of item selection for computerized adaptive tests

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FRICK T W: "COMPUTERIZED ADAPTIVE MASTERY TESTS AS EXPERT SYSTEMS" JOURNAL OF EDUCATIONAL COMPUTING RESEARCH,US,FARMINGDALE, NY, vol. 8, no. 2, 1992, pages 187-213, XP000764312 ISSN: 0735-6331 *
RIOS A ET AL: "Internet based evaluation system" ARTIFICIAL INTELLIGENCE IN EDUCATION. OPEN LEARNING ENVIRONMENTS: NEW COMPUTATIONAL TECHNOLOGIES TO SUPPORT LEARNING, EXPLORATION AND COLLABORATION, PROCEEDINGS OF AIED99: 9TH CONFERENCE ON ARTIFICIAL INTELLIGENCE IN EDUCATION, LE MANS, FRANCE, 19-23, pages 387-394, XP000997978 1999, Amsterdam, Netherlands, IOS Press, Netherlands *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8046251B2 (en) 2000-08-03 2011-10-25 Kronos Talent Management Inc. Electronic employee selection systems and methods
US8265977B2 (en) 2000-08-03 2012-09-11 Kronos Talent Management Inc. Electronic employee selection systems and methods
US7602598B2 (en) 2003-02-11 2009-10-13 Taser International, Inc. Systems and methods for immobilizing using waveform shaping
US7580237B2 (en) 2003-05-29 2009-08-25 Taser International, Inc. Systems and methods for immobilization with repetition rate control
US7586733B2 (en) 2003-05-29 2009-09-08 Taser International, Inc. Systems and methods for immobilization with time monitoring
US7916446B2 (en) 2003-05-29 2011-03-29 Taser International, Inc. Systems and methods for immobilization with variation of output signal power
WO2009023802A1 (en) * 2007-08-14 2009-02-19 Knewton Inc. Methods, systems, and media for computer-based learning
US8672686B2 (en) 2007-08-14 2014-03-18 Knewton, Inc. Methods, media, and systems for computer-based learning
US10885803B2 (en) 2015-01-23 2021-01-05 Massachusetts Institute Of Technology System and method for real-time analysis and guidance of learning

Also Published As

Publication number Publication date
AU6091300A (en) 2001-01-30
WO2001004862A3 (en) 2001-10-18

Similar Documents

Publication Publication Date Title
CN111582694B (en) Learning evaluation method and device
Valliant et al. Finite population sampling and inference: a prediction approach
Juslin et al. PROBabilities from EXemplars (PROBEX): A “lazy” algorithm for probabilistic inference from generic knowledge
Milani et al. Orbit determination with very short arcs: II. Identifications
US7756810B2 (en) Software tool for training and testing a knowledge base
US20060241950A1 (en) Decision support system and method
CN113851020A (en) Self-adaptive learning platform based on knowledge graph
US20230032058A1 (en) Ann-based program testing method, testing system and application
CN113409174A (en) Knowledge point evaluation method and device
Clerc et al. The architect’s mindset
Adam et al. Possibilistic preference elicitation by minimax regret
Rechkoski et al. Evaluation of grade prediction using model-based collaborative filtering methods
WO2001004862A2 (en) Method for automatically producing a computerized adaptive testing questionnaire
Villano Computerized knowledge assessment: Building the knowledge structure and calibrating the assessment routine
Carpenter Missing data
Bayram et al. Application of reference class forecasting in Turkish public construction projects: contractor perspective
CN108921349B (en) Method for predicting question making error position based on Bayesian network
CN108921434B (en) Method for completing user capability prediction through man-machine interaction
Highfield Forecasting with Bayesian state space models
Kalaba et al. A multicriteria approach to model specification and estimation
Patton et al. A genetic algorithm approach to focused software usage testing
Scarlatos et al. Process-BERT: A framework for representation learning on educational process data
Oka et al. Scalable bayesian approach for the dina q-matrix estimation combining stochastic optimization and variational inference
Reyes et al. Case retrieval in CBR-Tutor
Harris Conceptualizing uncertainty: the IPCC, model robustness and the weight of evidence

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (R.69(1)EPC)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP