Measuring Item Reliability – Horst

What is Horst?

The Horst Partial Knowledge Index or PKI (Maxinity terminology) is a statistic which can be used to help understand and consequently improve item reliability. The formula was developed by Horst in the 1950’s, and measures the difference between the number of candidates who answered the correct option for an item Vs the most popular distractor. This is then divided by the number of candidates who sat the item, leaving a number between -1 and +1.

When can the Horst PKI be used?

The Horst PKI can only be used for ‘single select’ multiple choice items i.e. where the candidates only pick one option. The statistic does not work for any other type of question.

How useful is the Horst PKI?

The Horst PKI looks at an item in a different way to the PCC (Pearson’s product moment correlation coefficient) or discrimination. It is particularly useful at identifying problem items, as a negative Horst PKI indicates that more candidates answered a question with a specific wrong answer (most popular distractor) than the correct answer. This indicates that either the question itself is wrong (or possibly the question is out of date and the single best answer has changed), or that the teaching/understanding is wrong. Either way it’s a concern – particularly if a large proportion of the upper group also chose the most popular distractor.

Within the Maxinity software a negative Horst PKI would immediately be highlighted in red, indicating that there is a potential issue needing further investigation.

The Horst PKI can also be used in combination with other item reliability statistics to help understand how an item is performing – for example:

You would expect a low discrimination and a high Horst PKI (close to 1) for essential knowledge questions.

A middle value mean and a low PKI (close to 0) would suggest the item has only one performing distractor.

The theory behind the Horst PKI

Horst’s theory was that when candidates answer a multiple choice question, they will first eliminate the answers they know and then make a random guess between the remaining items. He assumed that all candidates will eliminate the same options first if they have the same knowledge base. As the candidate’s knowledge grows, more options can be deemed as incorrect by the candidate and therefore ‘eliminated’ from the random selection.

An example Horst PKI calculation

In the below graph, the item had six possible answers A-F

In order to make the explanation of Horst easier to understand, we have ordered these same answers below from the least picked to the most picked option – in this case ‘C’. In this example C is also the right answer.

  • 3 candidates answered ‘F’. According to Horst’s theory, the people who answered F had no knowledge of the subject as they were unable to eliminate any of the other items, so this was a pure guess. If 3 people made a pure guess and guessed F, it is follows under Horst that 3 other people would have guessed A, 3 more guessed D, another 3 E etc. So in total 18 candidates had no subject knowledge.
  • In total 7 people answered A. According to the theory, 3 of those would have made a completely random guess (see above), but the other 4 knew enough to eliminate F, but no other options. These people therefore randomly guessed A, having eliminated F – so following Horst a further 4 people would have randomly guessed D, another 4 E, etc.
  • 11 people answered D. Of these, 4 candidates knew enough to eliminate F and A as options and therefore chose randomly out of 4 options. Therefore 16 candidates in total shared this level of subject knowledge.
  • 15 people answered E. Of these 4 candidates knew enough to eliminate F, A and D as options therefore chose randomly out of 3 options. 4 candidates would also therefore have guessed B and another 4 guessed C.
  • 21 people answered B (the biggest distractor). Of these 6 candidates knew enough to eliminate F, A, D and E as options therefore chose randomly out of 2 options. It follows that 6 people with the same level of subject knowledge would have also guessed C.
  • In total 43 people gave the right answer of C. 22 of these candidates knew enough to eliminate all other options and so knew that C is the correct option. This shows they have full knowledge on the subject the question is trying to test.

Horst’s PKI = ((count of C (correct answer) – count of B (biggest distractor))/(total candidate count)

In this case the Host PKI would be 22/100 = 0.22

Real example of Horst PKI showing a problematic item, where more candidates chose the most popular distractor than the correct answer.

The graph above shows the number of candidates from the upper (green), middle (yellow) and lower (red) groups answers to a single select multiple choice question where option B is the correct answer.

On the surface this item shows an averagely performing question with a positive discrimination of 0.15 and a positive correlation of 0.21 indicating that the people who answered this question right were likely to also do well in the exam. As such, there would be no argument for further investigation if these were the only statistics looked at.

However, this question has a negative Horst PKI of -0.31 giving reason to delve deeper into the question’s performance characteristics. From further analysis it can be seen that candidates in all groups are picking A (the most popular distractor) over option B. This is even the case in the upper group, providing evidence that candidates are not guessing, they are actively choosing the wrong answer. This possibly could be from incorrect teaching or an error in the question itself.

In conclusion

The Horst PKI is another method to try and understand the reliability of an item – for a multiple choice, single select item. Its’ greatest strength is that it offers a really clear way to identify problem items – a negative PKI indicating there is a problem with the question or the teaching.