Fitness for purpose

Research in English

Research Notes

Issue 88 - August 2025

This issue presents six papers from the 2024 English Australia/Cambridge University Press & Assessment Action Research in ELICOS (English Language Intensive Courses for Overseas Students) Program. The theme is how AI could be used as a resource for doing action research.

Download Issue 88

Learning Oriented Assessment (LOA)

Common European Framework of Reference (CEFR)

The Council of Europe’s Common European Framework of Reference for Languages (CEFR) is a series of descriptions of abilities which can be applied to any language. These descriptors can be used to set clear targets for achievements within language learning, to help define language proficiency levels and to interpret language qualifications. It has become accepted as a way of benchmarking language ability, not only within Europe but worldwide, and plays a central role in language and education policy.

Cambridge English was involved in the development of the CEFR and we continue to work towards its future development through projects such as SurveyLang and English Profile.

Cambridge English was involved in the early development of this standard and all of our examinations are aligned with the levels described by the CEFR. The CEFR offers a valuable frame of reference for our work and our stakeholders. This is consistent with the original aspirations behind the Framework as described by Professor John Trim, one of the CEFR authors:

‘What we were aiming at was something which will be a common reference point that people working in different fields and people using it for entirely different things and in very different ways could refer to in order to feel that they were part of a common universe’ (Saville 2005:281).

The relationship between Cambridge English exams and the CEFR can be seen from a number of perspectives:

There is growing evidence to support the view that the Cambridge English exams embody or reflect the CEFR in a variety of ways. The benefits of the relationship between the CEFR and Cambridge English exams are perhaps best judged by the extent to which together they enable language learning/teaching to flourish and encourage achievements to be recognised, and so enrich the lives of individuals and communities.

Find more information on the CEFR on the Council of Europe website
Examples of Speaking test performance at CEFR Levels A2–C2

English Profile

English Profile is a long-term programme of research that will extend the CEFR.

English Profile’s main aim is to deliver the CEFR for English. It is producing Reference Level Descriptors – practical descriptions of how learners can be expected to use English at each level of the CEFR. English Profile is set to play a vital role in the production of resources for the development of curricula, wordlists, course materials and teaching guides, delivering materials of practical use for learners, teachers and, indeed, any professionals involved in language learning.

Cambridge University is playing a leading role in English Profile with:

Cambridge English
Cambridge University Press
Department of Theoretical and Applied Linguistics (Cambridge University)

working together with:

For more information, visit the English Profile website.

References

Saville, N (2005) An interview with John Trim at 80, Language Assessment Quarterly 2 (4), 263–288.
Taylor, L and Jones, N (2006) Cambridge ESOL exams and the Common European Framework of Reference for Languages (CEFR) (PDF, 92kB), Research Notes 24 (1).

The origins of the Common European Framework of Reference for Languages (CEFR) date back to the early 1970s when the Council of Europe sponsored work within its Modern Languages Project to develop two levels (Waystage and Threshold) as sets of specified learning objectives for language teaching purposes.

Waystage and Threshold were both relatively low proficiency levels designed to reflect achievable and meaningful levels of functional language competence. They were also to form part of a European unit/credit system for adult language learning. In 1977 David Wilkins (author of 'The Functional Approach') first proposed the concept of a set of ‘Council of Europe levels’, which could provide an explicit pathway for language teaching and learning with opportunities to accredit achievement outcomes along the way.

Cambridge English exams, starting with Cambridge English: Proficiency (CPE) in 1913, followed by the LCE exam – now known as Cambridge English: First (FCE) - in 1939, and then Cambridge English: Preliminary (PET) in 1980, were all designed to offer learners and teachers useful curriculum and examination levels. In the late 1980s Cambridge English was one of several stakeholder organisations to provide funding and professional support for revising Threshold and Waystage (Van Ek and Trim 1998a, 1998b). These revised level descriptions underpinned test specifications for a revised Cambridge English: Preliminary in the mid 1980s and a new Cambridge English: Key (KET) in the early 1990s. In 1991 the new Cambridge English: Advanced (CAE) bridged the gap between Cambridge English: First and Cambridge English: Proficiency. By the early 1990s, therefore, the range of Cambridge English exams:

was providing well-established and recognised accreditation ‘stepping stones’ along the language teaching/learning pathway. The concept of a ‘framework’ of reference levels for English language learning, teaching and assessment was beginning to take on a more concrete form and the scene was set for the Council of Europe’s Common European Framework Project.

Further information

References

Van Ek, J A and Trim, J L M (1998a) Threshold 1990, Cambridge: Cambridge University Press.
— (1998b) Waystage 1990, Cambridge: Cambridge University Press.

The Council of Europe’s Common European Framework Project was managed between 1993 and 1996, with considerable input from the Eurocentres organisation. The project’s overarching aim was to construct a common framework in the European context which would be comprehensive, transparent and coherent, and would assist a variety of users in defining language learning, teaching and assessment objectives.

A framework of conceptual levels

The emerging descriptive framework formalised conceptual levels that the English language teaching world (i.e. learners, teachers and publishers) had been working with for many years, using familiar labels such as ‘preliminary’, ‘intermediate’ or ‘advanced’. While it aimed to build upon this shared understanding among teachers and other ELT stakeholders, it also sought to resolve some difficulties of relating language courses and assessments to one another. The goal was a common meta-language to talk about learning objectives and language levels and to encourage practitioners to reflect on and share their practice.

At the same time, member organisations of the new Association of Language Testers in Europe (ALTE) were also working systematically to co-locate their respective qualifications across different European languages and proficiency levels within a shared framework of reference. Their aim was to develop a framework establishing common proficiency levels to promote the transnational recognition of language certification in Europe. The process involved analysing test content, creating quality guidelines for exam production, and developing empirically validated performance indicators (Can Do statements) in different European languages.

During the mid-1990s the five-level ALTE Framework developed simultaneously alongside the six-level CEFR published in draft form in 1997. Both frameworks shared a common conceptual origin, similar aims – transparency and coherence – as well as comparable scales of empirically developed descriptors.

Cambridge English takes part in development of ‘Vantage’ level

Cambridge English and its ALTE partners decided to conduct several studies to verify the alignment of the two frameworks. Description of a third, higher-proficiency functional level began in the 1990s, with support and participation from ALTE; work on this level took account of Cambridge English: First (FCE) and led to the publication of Vantage in 1999 (Van Ek and Trim 2001). Following publication of the CEFR in 2001, the ALTE members adopted the six CEFR levels (A1–C2).

One of the strengths of this conceptual approach to framework development has undoubtedly been its ‘organic’ nature and its ability to benefit from synergy with similar framework development projects.

Further information

References

Van Ek, J A and Trim, J L M (2001) Vantage, Cambridge: Cambridge University Press.

Claims of linkage or alignment to the CEFR need to be examined carefully. Simply to assert that a test is aligned with a particular CEFR level does not necessarily make it so, even if assertion is based on intuitive or reasonable subjective judgement.

Measurement theory became increasingly important as attempts were made to validate aspects of the CEFR empirically (North and Schneider 1998, North 2000a) and to link assessments to it (North 2000b). To some extent, alignment can be achieved historically and conceptually, but empirical alignment requires more rigorous analytical approaches. Appropriate evidence needs to be accumulated and scrutinised.

The ALTE Can Do Project in 1998–2000 (Jones 2000, 2001, 2002) was one empirical approach used by Cambridge English for aligning its original five levels with the six-level CEFR. Other empirical support for alignment comes from the item-banking methodology underpinning the Cambridge English approach to all test development and validation (Weir and Milanovic 2003).

Latent trait methods have been used since the early 1990s to link the various Cambridge English exam levels onto a common measurement scale using a range of quantitative approaches, e.g. IRT Rasch-based methodology.

The Manual for Relating Language Examinations to the CEFR

More recently, Cambridge English supported the authoring and piloting of the Council of Europe’s Manual for Relating Language Examinations to the CEFR with its linking process based on three sets of procedures: specification; standardisation; empirical validation.

Specification procedures were used when Cambridge English: Preliminary (PET) and Cambridge English: Key (KET) were originally based upon Threshold and Waystage levels, and when the ALTE partners’ exams were aligned within the ALTE Framework. Extensive documentation for all the Cambridge English exams specifies the content and purpose of existing/new exams with direct reference to the CEFR. This documentation includes test specifications, item writer guidelines, examiner training materials, test handbooks and examination reports. In fact, the Manual alignment procedures are embedded within the test development and validation cycle of Cambridge English.

Cambridge English helped develop the standardised materials needed to benchmark tests against CEFR levels. These include calibrated test items and tasks of General English Reading and Listening together with exemplar Writing test performances from writing examiner co-ordination packs and Speaking test performances from Oral Examiner standardisation materials at each CEFR level. The benchmarking materials, incorporating both classroom-based and test-based materials, are available from the Council of Europe on CD or DVD.

Empirical validation studies

Empirical validation studies are a greater challenge sometimes requiring specialist expertise and resources. Cambridge English is among a relatively small number of examination providers undertaking this sort of research (e.g. Galaczi, ffrench, Hubbard and Green 2011; Lim, Geranpayeh, Khalifa and Buckendahl 2013), partly through our routine item-banking and test calibration methodology and also through instrumental research and case studies, such as the Common Scale for Writing Project (e.g. Hawkey and Barker 2004; Khalifa, ffrench and Salamoura 2010).

Further information

References

Galaczi, E D, ffrench, A, Hubbard, C, and Green, A (2011) Developing assessment scales for large-scale speaking tests: a multiple method approach, Assessment in Education: Principles, Policy & Practice 18(3), 217-237.
Hawkey, R and Barker, F (2004) Developing a Common Scale for the Assessment of Writing, Assessing Writing 9(2), 122–159.
Jones, N (2000) Background to the validation of the ALTE ‘Can-do’ project and the revised Common European Framework (PDF 186kb), Research Notes 2, 11–13.
Jones, N (2001) The ALTE Can Do Project and the role of measurement in constructing a proficiency framework (PDF 177kb), Research Notes 5, 5–8.
Jones, N (2002) Relating the ALTE Framework to the Common European Framework of Reference, in Council of Europe, Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Case Studies, Strasbourg: Council of Europe Publishing, 167–183.
Khalifa, H, ffrench, A and Salamoura, A (2010) Maintaining Alignment to the CEFR: the FCE case study, in Martinyuk, W (Ed), Aligning Tests with the CEFR, Studies in Language Testing 33, Cambridge: UCLES/Cambridge University Press, 80-101.
Lim, GS, Geranpayeh, A, Khalifa, H, and Buckendahl, C (2013) Standard Setting to an International Reference Framework: Implications for Theory and Practice, International Journal of Testing, 13(1), 32-49.
North, B (2000a) The development of a common framework scale of language proficiency, New York: Peter Lang.
North, B (2000b) Linking Language Assessments: an example in a low-stakes context, System 28, 555–577.
North, B and Schneider, G (1998) Scaling Descriptors for Language Proficiency Scales, Language Testing 15 (2), 217–262.
Weir, C J and Milanovic, M (Eds) (2003) Continuity and Innovation: The History of the CPE 1913–2002, Studies in Language Testing 15, Cambridge: UCLES/Cambridge University Press

Cambridge English has a suite of level based certificate exams which target particular levels of the CEFR, and candidates are encouraged to take the exam most suitable to their needs and level of ability.

However, while each level based exam focuses on a particular CEFR level, each exam contains test material at the levels adjacent to the one targeted, (e.g. for Cambridge English: First, which is aimed at B2, there are test items that cover B1 and C1). This allows for inferences to be drawn about candidates’ abilities if they are a level below or above the one targeted. This being the case, and in keeping with the positive and achievement oriented nature of the CEFR, candidates who are at a lower or higher level than the exam they sat are recognised for demonstrating such ability and issued certificates indicating that level. Candidates whose abilities are more than one CEFR level below the target are not awarded a certificate.

The basis of this Cambridge English practice lies in careful test construction, to ensure coverage of content and inclusion of items at the appropriate levels of difficulty. These are described in detail in the Studies in Language Testing volumes dealing with each of the four skills (Geranpayeh and Taylor 2013; Khalifa and Weir 2009; Shaw and Weir 2007; Taylor 2011). In the case of Reading, Listening and Use of English, this is underpinned by a Rasch-based item banking approach whereby the exams are calibrated onto a common measurement scale. In the case of Speaking and Writing, the inferences are also supported by analytic assessment scales, the descriptors of which were developed explicitly taking the CEFR into account (Galaczi, ffrench, Hubbard and Green 2011; Lim 2012). The scales for each level include the descriptors for the levels adjacent to it, facilitating inferences about candidates’ abilities vis-à-vis the CEFR.

Careful triangulation confirms the appropriacy of the certification decisions made. This is done, for example, through analyses of the performance of a large number of candidates who regularly take more than one Cambridge English exam in quick succession, allowing Cambridge English to confirm how performance on one examination relates to performance on another examination.

For more on the work of Cambridge English with regard to the CEFR, see also Research Notes Volume 37.

References

Geranpayeh, A and Taylor, L (Eds) (2013) Examining listening: Research and practice in assessing second language listening, Cambridge: Cambridge ESOL/Cambridge University Press.

Galaczi, ED, ffrench, A, Hubbard, C and Green, A (2011) Developing assessment scales for large-scale speaking tests: a multiple method approach, Assessment in Education: Principles, Policy & Practice, 18(3), 217-237.

Khalifa, H and Weir, CJ (2009) Examining reading: Research and practice in assessing second language reading, Cambridge: Cambridge ESOL/Cambridge University Press.

Lim, GS (2012) Developing and validating a mark scheme for writing, Research Notes 49, 6-10.

Shaw, SD and Weir, CJ (2007) Examining writing: Research and practice in
assessing second language writing, Cambridge: Cambridge ESOL/Cambridge University Press.

Taylor, L (Ed) (2011) Examining speaking: Research and practice in assessing second language speaking, Cambridge: Cambridge ESOL/Cambridge University Press.

While the Common European Framework of Reference for Languages (CEFR) describes speaking performance at each of its six levels, it is difficult to know precisely what these levels mean until they can be seen in the context of live interaction.

Each of the five clips below features the performance of students taking the Speaking test section of Cambridge English exams which reflect the following levels:

A2: Cambridge English: Key ↓
B1: Cambridge English: Preliminary ↓
B2: Cambridge English: First ↓
C1: Cambridge English: Advanced ↓
C2: Cambridge English: Proficiency ↓

Each video clip has been carefully selected to represent a typical performance (of at least one student) at that level.

Please watch each video in conjunction with the commentary provided, describing the performance and the test scores and explaining how the elements of that performance make it typical of that particular CEFR level.

Legal note: The persons shown on these recordings have given their consent to the use of these recordings for research and training purposes only. Permission is given for the use of this material for examiner and teacher training in non-commercial contexts. No part of this recording may be reproduced, stored, transmitted or sold without prior written permission. Written permission must also be sought for the use of this material in fee-paying training programmes. While you may link to this page in your website, blog or other online material, to protect the consent of those students who contributed, please do not link directly to the video clips.

A2 This video clip contains the speaking performances of Mansour (CEFR Level A2) and Arvids (CEFR Level A2) taking Cambridge English: Key.

B1 This video clip contains the recorded speaking performances of Veronica (CEFR Level B1) and Melisa (CEFR Level B1) taking Cambridge English: Preliminary.

B2 This video clip contains the recorded speaking performances of Gabriela (CEFR Level B2) and Rino (CEFR Level B2) taking Cambridge English: First.

C1 This video clip contains the recorded speaking performances of Christian (CEFR Level C1) and Laurent (CEFR Level C1) taking Cambridge English: Advanced. Please note that this test is from before the revision of 2008.

C2 This video clip contains the recorded speaking performances of Ben (CEFR Level C1/C1+) and Aliser (CEFR Level C2) taking Cambridge English: Proficiency.

How the examples were selected

The above examples were drawn from materials for use by Cambridge English in standardisation training for speaking examiners. We have published these extracts as an aid to understanding the CEFR levels for researchers and those with an interest in language learning and teaching. The link below is to a paper outlining how these clips were selected and the processes of data analysis, verification and calibration that enabled us to identify them as typical of their stated levels.

Project overview: Examples of Speaking Performances at CEFR Levels A2 to C2 (PDF 170Kb)

Fairness

A test can only be considered good if it is also fair. This may seem self-evident, but defining what fairness is and how it should operate can be complex, requiring difficult judgements. 

Fairness has been integral to Cambridge English since we offered our first exam in 1913. Outlined below is the approach to fairness we have developed over 100 years, combining our experience with the latest research and technology.

We are a not-for-profit department of the University of Cambridge: our commitment to fairness in educational testing and assessment is part of our larger concern for education and ethical behaviour within society. Our approach to test fairness, therefore, looks at the entire experience and consequences of testing for individuals, groups, and society as a whole.

We believe a fair test is one in which the ability being tested (in the assessment field this is called the ‘test construct’) is the primary focus and where all irrelevant barriers to candidate performance have been removed.

Tests are often used to make important decisions and can have serious consequences for an individual’s career or life chances. They can affect what happens in teaching and learning at classroom or school level and can also influence regional or national educational systems. They can affect employers’ selection and recruitment of staff and impact on civic life in areas such as immigration or access to university education. In light of this, test producers must be sensitive to the consequential issues surrounding their tests, monitoring them against accepted professional standards and ethical codes.

Fairness touches upon all aspects of a test:

Construction

Construction refers to how a test is conceived and designed. For example, the fairness of the test can be affected by who designs the test specifications and what level of expertise they bring to that task; or through the choice of the language domain (e.g. General or Business English, low-level or high-level proficiency) and the tasks or items that are chosen to represent it.

The use of technology is also becoming an important fairness issue in test construction: great care has to be taken to ensure that the technology fits the test purpose and the candidates. The view of Cambridge English is that different test formats (i.e. paper-based, face-to-face, computer-based) are not inherently superior, but have to be considered in the context of their fitness for a specific test purpose. Because of this, we believe that fairness to both candidates and the organisations that rely on computer-based exams is largely about matching the right test to the purpose.

Impact

Impact (the effect a test has on teaching and the wider community) is another important aspect of test fairness that has implications for test construction. It is inevitable that teachers will adapt their course content to prepare students to pass the test they are studying towards. If the test construct is too narrow, then students can pass their test, but not have sufficient knowledge and ability for real communication – which is unfair to the students and the organisations that might rely on their exam certificates. The test construct, therefore, must be sufficiently broad and carefully specified against known data of how language is used to communicate in the real world (such as written and spoken corpora – data banks of real language usage). Rigorous procedures must then be used to ensure that the developed test behaves in the way it has been specified to: this is called test validation.

Administration

Administration refers to how the test is delivered and conducted. To ensure fairness of the test certain conditions must always be in place, such as standardised conditions for taking the test and secure test despatch.

Evaluation

Evaluation refers to how the candidate’s performance is marked. A consistent approach to marking must be in place in order for a test to be fair. This is especially important when human raters are involved in assessing test performance. For this reason we conduct rigorous examiner training and have clear marking and grading procedures in place for all of our examinations.

As well as constantly pursuing fairness in its own examinations, Cambridge English has also contributed to the development of testing standards and professional practice throughout the world as a founder member of the Association of Language Testers in Europe (ALTE). The ALTE Consider adding links to the Code of Practice (1994) and Minimum Standards provides a set of general principles that offer an explicit framework for reviewing the fairness of language tests. The Code generates a set of ALTE Minimum Standards (2001), or guidelines for good practice, relating to the specific areas of:

test construction
administration and logistics
marking and grading
test analysis
communication with stakeholders.

The ALTE Code has informed other professional language testing standards such as the ILTA Code of Ethics (2001) and the EALTA Guidelines for Good Practice (2006).

Cambridge English systems contribute to fairness in each of these key areas of exam development and delivery:

Test construction	Marking and grading
candidate information analyses detailed test specification in-depth item writer training extensive pretesting and item calibration trialling of speaking and writing tasks modified tests for test takers with special requirements	rigorous examiner training marking and grading procedures, including checking detailed appeals procedure
	Test analysis
	comprehensive routine post-test analyses, e.g.: Differential Item Functioning analyses ongoing validation and evaluation studies, e.g.: impact investigation regular revision projects
Administration and logistics	Communication with stakeholders
comprehensive test centre regulations test centre staff training, management and monitoring secure test despatch secure and confidential test results extensive support systems – web, hotline, etc. Cambridge English website sample/past test materials regular stakeholder consultation	teacher handbooks research publications teacher seminars conference presentations

This process needs to be evaluated and revised over time to ascertain that it continues to be fair. By using the latest research and advanced statistical software we can make analyses and assurances about quality which would have been impractical in previous times. The Cambridge English programme of continual evaluation and revision ensures all of our examinations meet the high standards of fairness, accuracy and reliability that we demand and upon which millions of people who take our tests each year rely. Read more in: Saville, N (2003) in Weir, C J and Milanovic, M (Eds) (2003) Continuity and Innovation: The History of the CPE 1913–2002, Studies in Language Testing 15.

Cambridge English Profile Corpus

The Cambridge English Profile Corpus (CEPC) is a corpus of learner English produced by students worldwide, and is being built by Cambridge University Press & Assessment and Cambridge English, in collaboration with a network of participating educational establishments across the world. These establishments include schools, universities, and private language schools, along with research centres, government bodies (such as ministries of education) and individual education professionals.

The CEPC aims to provide 10 million words of data, covering both spoken (20%) and written (80%) language. Both General English (60%) and English for Specific Purposes (40%) are included.

For more information visit the English Profile website.

Learning Oriented Assessment (LOA)

Common European Framework of Reference (CEFR)

Common European Framework of Reference (CEFR)

Cambridge English, the CEFR and English Profile

English Profile

References

CEFR - the historical perspective

Further information

References

CEFR - the conceptual perspective

A framework of conceptual levels

Cambridge English takes part in development of ‘Vantage’ level

Further information

References

CEFR - the empirical perspective

The Manual for Relating Language Examinations to the CEFR

Empirical validation studies

Further information

References

CEFR - certification at different levels

Examples of Speaking tests

How the examples were selected

Fairness

What is the Cambridge English approach to test fairness?

Why is test fairness important?

Which aspects of testing need to be considered in relation to fairness?

Construction

Impact

Administration

Evaluation

Working for greater fairness in language assessment?

How does Cambridge English implement the ALTE standards?

Cambridge English Profile Corpus