The Problem with Ebel.

Ebel is a well-established standard setting method used by universities and other educational institutions across the UK and abroad.  A full explanation of Ebel can be found in a previous blog here.

At the core of Ebel lies a matrix developed by the examining body and which categorises what proportion of candidates would be expected to get a question right according to the difficulty and importance of the question. An example matrix can be seen below:


Subject matter experts then categorise each question/item according to how difficult (easy, medium or difficult) and relevant (essential, important, supplementary) it is. An average of their responses or a standard setting meeting will then determine how each question is categorised and a pass mark for the exam is calculated. 

The aim is to produce a balanced exam according to the purpose of the exam. E.g. if the focus of the exam is to understand if candidates know the core information there would be a greater proportion of essential questions.  However, if it is a ranking exam, you would see more important and supplementary questions appear as well as more difficult questions.

So what’s the issue with using Ebel?

The problem with Ebel arises from the fact that the structure/type of question is often not taken into account when determining the matrix itself, or where a particular question sits within the matrix.

For example, if the question was:

‘What day of the week is it today?’

Most people would agree that was an easy question – whether it was essential, important or supplementary would depend on what was being examined. If the candidate was a newsreader it might be essential, while for a retail assistant as long as they knew they were working that day the actual day of the week might be supplementary.

However, a question that might seem easy to an examiner will not be easy for everyone and their chances of getting it right will depend on the question type. To give an extreme example, let’s imagine a monkey were sitting the exam using a pointer to indicate the answer.

If the question was:


‘Is today Tuesday?’ (presuming it is for this example), and the options for answers were:

  1. Yes
  2. No

The monkey would have a 50/50 chance of pointing and getting the question right.

If the question was:

Multiple choice (5 answer)

What day of the week is it?

  1. Monday
  2. Tuesday
  3. Wednesday
  4. Thursday
  5. Friday

The monkey would now have a 20% chance of getting the question right.

Open Ended Question

Finally, if the monkey were asked to write down the day of the week, we can assuming it would have a 0% chance of getting it right.

As the example above illustrates, the way a question is structured will affect the candidate’s chance of getting it right. Or to ‘borrow’ from Orwell:

All questions are easy, but some are more easy than others.

Another Example

The previous example looked at an easy question, but this issue can have an even bigger effect if we look at difficult/supplementary questions where the likelihood of getting it right is typically estimated as much lower.

If the question was ‘Who won the FA Cup in 1958?’ the majority of people would find this difficult to answer. Even if the candidates were potential sports reporters this would still likely be a difficult, supplementary question.

Returning to the example matrix (repeated below), this would suggest that around 30% of candidates would be likely to get it right.


However, the question structure could hugely affect this percentage:

  1. Dichotomous Question

Who won the FA Cup in 1958?

  1. Bolton Wanderers
  2. Manchester United

Without having any knowledge, the candidate’s chance of getting this right would be 50%, so much higher than the 30% that might have been expected according to the matrix. Clearly this matrix was not created with dichotic questions in mind.

  1. Multiple Choice 5 options

Who won the FA Cup in 1958?

  1. Aston Villa
  2. Bolton Wanderers
  3. Manchester United
  4. Nottingham Forest
  5. Luton Town

In this case, candidates have a 20% (or higher if they can eliminate some options) chance of getting this right even if they don’t know the answer. The example matrix would seem reasonable in this case.

  1. Open ended question

Who won the FA Cup in 1958?

As there are so many potential answers to this, candidates are only likely to get it right if they have a good knowledge of the FA Cup. 30% looks very optimistic.

In Summary

The success or failure of Ebel as a successful standard setting method depends on both the original matrix being right and also the questions being correctly categorized within the matrix.

However, in reality, we understand that in many cases a matrix is developed and then used across a series of exams. The question then becomes whether when subject matter experts decide if a question is easy, medium or hard are they taking into account the structure of the question? It’s very hard to when there are only 3 categories to choose between! 

Returning to the days of the week example, can the question be anything other than easy regardless of the format?  Potentially not, but the way the question is structured will still affect the likelihood of candidates getting it right.

In which case the Ebel matrix must be used to reflect the structure of the questions but this is often overlooked. The result of this is that if a matrix is reused for a different type of exam, or the balance of the exam changes (e.g. one mostly made up of dichotomous questions that previously was mostly multiple choice questions or vice versa), then the pass mark calculated using Ebel is likely to be inaccurate.

Should examining bodies use Ebel as a standard setting method?

The simple answer to this is ‘it depends’.

If matrices are developed according to the structure as well as the difficulty & relevancy of the questions, then Ebel can be an effective method of standard setting – but as with other standard setting methods we would always suggest that more than one method be used.

To use Ebel effectively you need to have a different matrix according to the structure of the exam and the more diverse the range of question types, the more tricky it is to construct a matrix – in such cases it may not be advisable to use Ebel.

The following table suggests some examples of where Ebel can or cannot be effectively used and how straightforward it would be to develop an appropriate matrix:

In other words, the matrix needs to be sympathetic to the types and balance of questions used. If the combination of question structures changes then the matrix will need to be adjusted with it.

It is worth noting here that even modest changes to the combination of question types can have a big effect on the difficulty of the exam so the matrix will need to be adjusted to alter the pass mark appropriately.


Ebel is a popular absolute standard setting method for setting an exam mark pre-exam.

However, we believe that in many cases, when subject matter experts are asked to rate how easy or difficult a question is they de-couple their judgement from the format of question being asked.

As long as exam administrators are aware of the issue and Ebel matrices are determined according to the type of questions in the exam, then Ebel can be used as a valid standard setting method. This is of particularly importance if a regular exam has a change in the combination of question structures – the Ebel matrix must be adjusted or the resulting pass mark may be invalid and certainly out of line with previous exams.

Angoff can also be used to set a pass mark pre-exam and while it can be a more time intensive method than Ebel, it does not suffer from the same question structure issues.  This is because the decision about the difficulty of each item is explicitly tied to the nature of the question. Subject matter experts are asked “What percentage of borderline candidates would answer this item correctly?” and naturally take into consideration the full item details including its structure.

If you are uncertain either about the long-term structure of your exam or how to develop the right matrix for your exam, we can give you some guidance or you may prefer to consider the use of Angoff instead

With careful use, Ebel can be a great tool, but using it inappropriately can cause real issues in setting an appropriate pass mark.