Measuring Item Reliability – What’s the point of Point Biserial?

What is point biserial?

Point biserial in the context of an exam is a way of measuring the consistency of the relationship between a candidate’s overall exam mark (a continuous variable – i.e. anywhere from 0-100%) and a candidate’s item mark (a dichotomous variable i.e. with only two possible outcomes). It gives an indication of how strong or weak this correlation is compared to the other items in that exam. In other words, does the way candidates answer that item help to indicate whether they are strong or weak candidates?

Is point biserial a useful way of calculating item correlation?

The theory of using item-exam correlation as a way of identifying problematic items is strong, however at Maxinity we believe there is a large misconception around the use of point biserial as in reality it is useless for all but very basic tests. In our experience, most people who think they want to undertake point biserial are actually confusing it with an alternative approach, such as biserial or Pearson’s product-moment correlation coefficient (but more of that later!).

This stems from the fact that the point biserial calculation requires the item you are investigating to be strictly dichotomous – i.e. that there can only be two possible answers e.g. yes/no or true/false. The statistics simply don’t work if there are more than two possible answers for each item. This means that in order to be able to calculate a point biserial correlation on each item in an exam, all items for which you calculate it must offer just two possible answers – which we would not advise for a high stakes exam!

What about creating an artificial dichotomy?

Some people will argue that a point biserial calculation can be made to work if there are more than two answers to the item by creating an ‘artificial dichotomy’. This effectively means creating a dichotomy by having the right answer on one side and all possible other answers being grouped together as one on the other side.

In fact, this way of measuring correlation is no longer ‘point biserial’, but instead ‘biserial’ (as a true point biserial only measures against a strict dichotomous variable). Biserial, in our view, is still a poor way to measure correlation particularly within high stakes exams. It has limiting factors that make it not as useful, nor flexible as something like Pearson’s product-moment correlation coefficient. These limitations are perhaps easiest to explain when an item offers half marks for a particular option (prevalent in OSCE exams). How do you group this into an artificial dichotomy? It isn’t incorrect, yet it also isn’t correct – biserial correlation falls over at this point.

There is a further limitation with biserial correlation; it is only calculable at an item level. Unlike Pearson’s product-moment correlation coefficient, it is not possible to measure the biserial correlation at a question or scenario level (if they contain more than one item), essentially due to the same restrictions of having to deal with the shades of grey between right and wrong.

So then, what is the point of point biserial?

Good question. In our view there isn’t really a point to point biserial, and in fact internally we refer to it as ‘pointless biserial’. As a result, we do not support point biserial in our Maxexam software. In a nutshell our reasoning for this view is as follows:

It can only be used on dichotomous items, potentially leading to poorer exams as it forces all items to offer only two potential answers.

Point biserial, and biserial, can only be calculated at the item level, making it much less flexible and powerful than other measures such as Pearson’s product-moment correlation coefficient, which can be calculated at item, question and scenario level.

Our experience is also that point biserial is also often confused with biserial or Pearson’s product-moment correlation coefficient (PCC) which is another method of determining item correlation. The PCC will be the focus of our next blog.

Do you think you are using point biserial?

As stated above we believe that there is often confusion around when point biserial is being or can be used. If you answer no to either of the following questions then you cannot be using point biserial:

As stated above we believe that there is often confusion around when point biserial is being or can be used. If you answer no to either of the following questions then you cannot be using point biserial:

  1. Are there only two potential answers for each item being looked at (i.e. is it strictly dichotic)?
  2. Is the calculation only being performed at an item level?

Do you agree?

We would really like to know your comments and thoughts about point biserial correlation as it is something that we have been asked for within Maxexam, but do not provide for the reasons stated above. However, this is only our opinion – we are open to having it changed if anyone can provide good arguments for where point biserial can be used effectively within high stakes exams.