Products and Services
Products and Services
Our innovative products and services for learners, authors and customers are based on world-class research and are relevant, exciting and inspiring.
We unlock the potential of millions of people worldwide. Our assessments, publications and research spread knowledge, spark enquiry and aid understanding around the world.
No matter who you are, what you do, or where you come from, you’ll feel proud to work here.
Why choose us?
Research and Consultancy
Exams and tests
You are here:
Our staff – the largest dedicated research team of any UK-based language assessment organisation – are our greatest asset in delivering our commitment to excellence. Our rigorous systems of quality are subject to independent checks and meet international standards, providing accountability and giving confidence to those who rely on our exams.
The Cambridge English Principles of Good Practice outline the systems and processes that drive our search for excellence and continuous improvement. While these systems involve complex research and technology, the underlying philosophy is simple:
We have published Principles of Good Practice to:
Download Principles of Good Practice (PDF 798kb)
In Principles of Good Practice we state our commitment to providing users with data that will allow them to evaluate for themselves the reliability of our exams (appendix, Reliability section F). That data can be found below in the Reporting Reliability section.
The tools and analysis used to develop these figures are also listed for those unfamiliar with analysing and reporting test reliability.
Reliability and validity are the two most important properties of a test. They form part of the Cambridge English VRIPQ approach as described in the Principles of Good Practice booklet. It is a general principle that in any testing situation one needs to maximise validity and reliability to produce the most useful results for test users, within existing practical constraints.
Cambridge English takes the view that reliability is an integral component of validity; there can be no validity without reliability. Hence any approach to estimating reliability must reflect potential sources of evidence for the construct validity of the tests.
Reliability (expressed normally by a figure between 0 and 1) indicates the replicability of the test scores when a test is given twice or more to the same group of people, when two tests that are constructed in the same manner are given to the same group of people , or when the same performance is marked independently by two different examiners. The expectation is that the candidates would receive nearly the same results on all occasions. If the candidates’ results are consistent on all occasions, the test is said to be reliable; the degree of score consistency is therefore a measure of reliability of the test.
There are various ways to estimate the reliability of an exam. Most Cambridge English exams have two main types of component: objective papers and performance papers. Objective papers are the ones that do not require human judgement for their scoring, i.e. tests of reading comprehension, listening comprehension and use of English. The scores achieved in these sub-components are simply calculated by adding up the total number of correct responses to each section. The reliability estimates for these papers are calculated using a statistic called Cronbach’s Alpha. The closer the Alpha is to 1, the more reliable the test is.
Writing performance is usually marked by one human rater, however a selection of responses are marked by a second or third rater as well. We use this sample of responses, marked by more than one examiner, to estimate reliability for writing by calculating a statistic called Gwet’s AC2. This statistic is an estimate of inter-rater reliability.
For speaking, the Feldt Reliability Test is applied. This can be used when the score of a test is the sum of scores given by two raters or judges. We use it to assess reliability for speaking, as almost all Cambridge English speaking tests use a paired format structure where two Oral Examiners assess the performance of the candidates.
What is common to all these methods is a scale which ranges between 0 and 1, very similar to the Alpha used for objective papers.
Scores from the sub-components of a qualification are reported on the Cambridge English Scale (CES). These sub-component CES scores are used to calculate a candidate’s overall score, also reported on the CES, and it is this which determines a candidate’s grade, and CEFR level where relevant. While it is worth having a measure of the reliability of each sub-component, what matters most to candidates and test users is the reliability of the overall score. We use the standard error of measurement (SEM) from the sub-components, as well as the standard deviation of the overall CES scores to calculate the reliability of the overall score.
SEM is not a separate approach to estimating reliability, but rather a different way of reporting it. Language testing is subject to the influence of many factors that are not relevant to the ability being measured. Such irrelevant factors contribute to what is called ‘measurement error’. The SEM is a transformation of reliability in terms of test scores. While reliability refers to a group of test takers, the SEM shows the impact of reliability on the likely score of an individual: it indicates how close a test taker’s score is likely to be to their ‘true score’, to within some stated probability. For example, where a candidate receives an overall CES score of 186 with an SEM of 2.5, there is a high probability that their true score is between 181 and 191. This is a very useful piece of information that test users can use in their decision making.
Tables 1–10 below report typical reliability and SEM figures for Cambridge English exams for 2022.
Components: The reliability figures for objective and performance papers are based on raw scores. SEM figures are based on CES scores for Cambridge English Qualifications and raw scores for TKT and YLE.
We can see from the tables below, that reliability is typically above 0.8 across all components. SEM is around 5 to 7 CES scores for objective components and around 2-3 for performance components, as well as for TKT and YLE.
Overall score: The overall reliability for these exams is typically above 0.90 and the SEM is around 2.3. These figures demonstrate a high degree of trustworthiness in the overall CES scores reported.
Table 1: Cambridge English: A2 Key (KET)
Table 2: Cambridge English: A2 Key for Schools (KET for Schools)
Table 3: Cambridge English: B1 Preliminary (PET)
Table 4: Cambridge English: B1 Preliminary for Schools (PET for Schools)
Table 5: Cambridge English: B2 First (FCE)
Table 6: Cambridge English: B2 First for Schools (FCE for Schools)
Table 7: Cambridge English: C1 Advanced (CAE)
Table 8: Cambridge English: C2 Proficiency (CPE)
Table 9: Cambridge English: Young Learners
Table 10: TKT (Teaching Knowledge Test)