This blog will summarise how they differ and where it is best to use the different methods.
A reminder – What is standard setting?
Standard setting is a way to define levels of achievement or proficiency in an exam and the cut-off marks corresponding to those levels. A cut-off mark is the lowest possible mark that a student must achieve in order to reach a particular level (often to pass an exam), and separates them from those that fail.
Are all methods of standard setting equally valid?
Different types of standard setting are more suitable depending on the type of exam being sat, due to differences in how they determine cut-off marks.
Criterion-referenced (Absolute) standard setting methods –
These set the cut-off mark based on performance in relation to a defined standard. Thus, depending on the cohort, any number of students can pass or fail.
With high stakes exams e.g. qualifying medical or dental exams, it is vital that candidates reach a defined level if they are to be safe to practise, so criterion-referenced methods are more suitable and widely used in high stakes exams. Examples of criterion- referenced standard setting methods include:
- Borderline regression
It is important to point out here that Borderline regression will only be achievable in scenario-based exams and not standard written exams. Other than Borderline regression, the rest can be used in both types of exams.
Norm-referenced (Relative) standard setting methods
Here, the cut-off mark is based on performance in relation to the other cohort of candidates in the exam. These methods are useful for when you require a certain percentage of candidates to pass i.e. entrance exams. However, they are not often used in high stakes exams – in theory there could be a terrible cohort of candidates, therefore a ‘pass’ in this case wouldn’t prove that they were safe to practise.
Compromise methods –
These take advantage of both criterion and norm referencing methods e.g. Cohen and Hoftsee.
Cohen for example is calculated by taking the candidate in the 95th percentile (the candidate whose score is higher than 95% of the rest of the candidates taking the same exam) and finding 60% of their score. It is a compromise method, as it has elements of both absolute methods (based on the performance of candidates in relation to a defined standard) and relative methods (where the number of passing candidates is relative to the rest of the candidates taking the exam).
At Maxinity, we would suggest compromise methods are generally less valid than absolute methods as a standalone way of standard setting for high stakes exams as the only thing that separates them from an arbitrary cut-off mark (for example, one standard error of measure below 50%) is that it bases the cut-off mark on the performance of the highest performing candidate.
Compromise methods such as Cohen do have the advantage that they are much quicker and cheaper to calculate than absolute methods (as they do not involve a panel of judges), and with studies indicating Cohen produces similar cut-off marks for the same exam over different years and cohorts of students, we would suggest they are suitable for formative in-house exams.
Many institutions use Cohen as a kind of second check for high stakes exams as it is quick and easy to run, so can be calculated to see if it’s in line with your other standard setting methods.
How can Maxexam help support standard setting?
Without software like Maxexam, Angoff/Ebel can require a lot of paperwork/meetings to discuss standard setter’s judgements. Maxexam supports the judgements of Angoff/Ebel (only to those with relevant permissions of course), with full audit trails of individual judgements against different question/scenario versions. It can be made so that the standard setting must be completed, and a user is barred from submitting their standard setting review until they do – this ensures the whole job is done and you’re not left with any gaps. Meetings may still be required if there is disagreement among setters, but these meetings can be focused on those specific questions.
Maxexam highlights where there is disagreement between judgements automatically. While it automatically calculates the average of all judgements as a suggested standard, it also highlights differences in a colour scheme depending on the agreement. For instance, if there are 5 judges who all chose 70%, then they are all in agreement, the average is 70% and will be bright green. On the other hand, if the judgements are 20%, 30%, 50%, 70%, 70%, clearly there is a big disagreement, so the average of 48% will be in a strong red to draw attention to this.
Calculates Cut-off mark automatically
Once the standards for each question are calculated, then a cut-off mark can be auto-generated to give the whole exam’s expected standard.
Easier more accurate analysis
Borderline regression takes a quite a lot of computation to work out manually (and not everybody knows how to do it). Maxexam can compute the borderline regression of a scenario in 3 simple, quick steps that require no mathematical calculations by the end user, allowing it to become readily accessible to all end users. It is much more accurate and precise vs doing it by hand as it eliminates human error, as well as being significantly quicker.
Combination with previous results
Maxexam also allows you to use previous exams and statistics if a scenario has been used in an historic exam, making your results much more reliable due to a larger sample of candidates. With Borderline regression, it is only considered a reliable figure if at least 200(ish) students have been put through the calculation. This could be from one big exam, or 10 lots of 20 candidate exams that the scenario has been run in.
Compromise and norm-referenced methods
Maxexam can calculate the cut-off mark for you using common methods such as Cohen and the exam mean minus one standard error of measure. It also incorporates the option to define your own equations and use them as the standard setting method.
The method (or methods) of standard setting you choose will depend whether the examination you are running is low or high stakes as well as internal preferences. Often, institutions will use more than one kind of standard setting pre and post exam to check whether there are any anomalies and to try and calculate why – e.g. is there a problem with the exam itself, the quality of the teaching or the standard setters?
Whichever method you choose, Maxexam can help to make standard setting much quicker and simpler for all involved, due both to how quickly it can run the figures required and also the features incorporated which highlight potential issues with particular questions.
If you would like to find out more, please do give our team a call – or drop us an email or contact form and we will be back in touch.