Objective Structured Clinical Examination Test: Comparison with Traditional Clinical Examinations
Ojuka Daniel, Nyaim Elly, Kiptoon Dan, Ndaguatha Peter
Department of Surgery, University of Nairobi
Correspondence to: Dr. Ojuka Daniel, P.O Box 30197-00100, Nairobi. Email:
Background: Examination methods change over time, and audits are useful for quality assurance and improvement. Objective: Comparison of traditional clinical tests and objective structured clinical examinations (OSCE) at the department of surgery. Methodology: Examination records of results of the fifth year MBChB examinations for 2012–2013 (traditional) and results for 2014–2015 OSCE were analysed. The pass rate for the clinical examinations in each year was calculated and these figures were subjected to the t-test to determine if there were significant differences between the 2 years and type of clinical test. P-value of <0.05 was used to determine significant statistical differences in the test score. Results: We analysed 1,178 results; OSCE had a slightly higher numberof students (55.6%). The average clinical scores examinations were 59.7% for traditional vs 60.1% for OSCE, with significant difference in means between OSCE and traditional examination (p =0.001). The stations that tested physical performance, such as physical examination and basic surgical skills, were positively skewed. Conclusion: OSCE in the same setting of teaching and examiners may award more marks than the traditional clinical examination, and is better at detecting areas of inadequacies to be emphasised in teaching.
Key words: Clinical examination, traditional, OSCE, comparison
Ann Afr Surg. 2020; 17(2):***
Conflicts of Interest: None
© 2020 Author. This work is licensed under the Creative Commons Attribution 4.0 International License.
The assessment of clinical competence is one of the major tasks that any medical teacher faces at the end of the term, and it gets more gruesome at the end of the year. This is because the teacher is faced with 1) a decision that determines, in a short space of time, whether the candidate passes or fails, and 2) protecting the safety of the community in which the candidate will be released to practice medicine. When incorporated within the course, this assessment has relevant feedback for students and teachers. It informs students about what is important and how they learn (1). It informs teachers on areas to improve on to produce clinically competent doctors who will be able to have good performance in their internship years (2). The traditional examination tests a different competence; the student is given time to take a patient’s history, perform physical examination and form an impression on the pathology and management of the patient (3). The candidate is then examined using an oral, unstructured examination that can test the breadth and the depth of the issue. The weakness of this oral examination has been its reliability. A good performance in one case does not predict a good performance on another. The usual practice in this examination, has to give students one long case. Students may also be subjected to several short cases that do not allow the student to get a detailed history and perform an extensive clinical examination. One long case was chosen because of the logistics. The implicit reason in choosing one case was, perhaps rather naïvely, the assumption that experienced doctors had the skills to immediately identify good or weak students on a single patient interaction, and that this was predictive of any patient interaction (4). It is not surprising therefore that once the importance of context specificity was realised, both undergraduate and postgraduate clinical assessments have moved to the multi-station format of the objective structured clinical examinations (OSCE) (4). OSCEs were first introduced by a Scottish doctor, Ronald Harden in 1970s. It has however undergone numerous variations depending on the resources and the context. OCSE is now considered the gold standard for medical examination because of its validity and reliability (5). We introduced OSCE the department of surgery in 2014 for the undergraduate’s surgical summative assessment for the final year MBChB candidates. In2015, we decided to analyse the results and compare them with results from the two years of traditional clinical exams preceding the introduction of OSCEs.
This study was a retrospective analysis of examination results records of candidates from 2012–2015. The surgical department of the University of Nairobi has been examining undergraduates since its inception in 1967. The student does junior clerkship and senior clerkship. For the five-year program, the junior clerkship is 8 weeks in the third year, with an end of rotation assessment that is mainly multiple-choice questions (MCQ). The senior clerkship in the fifth year was 6-weeks culminating with MCQs and a long case clinical examination. The final assessment in surgery at the end of the 5-year course comprises a written component with an essay and MCQs. A progressive assessment comprises marks obtained during both the junior and senior clerkships while rotating in the various departments—ophthalmology, otorhinolaryngology, radiology, anaesthesiology—while considering attendance, log book, and finally the clinical assessment during rotation. The final total score is 700 marks, with the essay making 50, MCQ 150, progressive 200, clinical examination 300. A student must pass the clinical examination component to have an overall pass. The clinical examination was the traditional examination until 2014.
The traditional clinical examination comprised of one long case: where the student interacted with a patient for 45 minutes to obtain history, perform physical examination, formulate diagnosis and differentials, and make notes on how they would manage the patient. The examination would take 15 minutes, in which the candidate was given time to share the history and physical findings. Discussion then took place without a structured way of giving marks. This was followed by four to five short cases, where the candidate was shown a patient with signs or at times radiological images for quick diagnoses and discussion. The number of short cases given to the student depended on the performance of the student, the poor student would get more chance to prove themselves. The candidate would then be taken through an oral examination where they were given equipment and other anaesthesiologist material and discussion then ensued after identification or lack of recognition. The student also went through the otorhinolaryngology and ophthalmology clinical stations which was patterned in the manner of short cases. It would be noted that in this setting, the long case could be either orthopaedics or general surgery. The short cases would be either general surgical case or orthopaedics or even speciality cases like burns, paediatric surgery, cardiothoracic cases or neurosurgical cases.
In 2014, when the OSCE was introduced, the department formed an OSCE committee that met and discussed and agreed that they would form ten active stations and two rest stations. The ten stations were 4 stations for general surgery (history taking station, physical examination station, management stations and communication skills station), 3 orthopaedic stations (history taking stations, physical examination station and management station). The other departments had one station each; anaesthesiology, otorhinolaryngology and ophthalmology. The examination took seven days. In 2015, the general surgery added one stations (basic surgical skills stations) to make them five and replaced communication station with interpretation of results. Due to the large number of students, the number of resting stations varied between 5 and 7 each day for the seven (7) days. Each station took about 10 minutes with one minute for transfer to the next station. The examination was performed in three ward settings, with each station asking the same question in each ward for the two years except in 2014, where the sites were 4 wards. History taking station had standardized patients with different cases each day with emphasis on the techniques. This was similar in the other stations. The questions were moderated by the OSCE committee and the marks were to be given as per a checklist.
Our examination takes about three weeks. The essay paper has 6 questions; 2 being compulsory and the candidate must choose two from the next four questions. The structure is that the two compulsory questions are from general surgery and orthopaedics and the other four is spread among the sub-specialities of paediatrics surgery, cardiothoracic surgery, plastic surgery and neurosurgery. The MCQs are 100 questions of best answer type.
We studied the records of the results for 5th year MBChB examinations for 2012, 2013, 2014 and 2015. The candidates that had incomplete results were excluded. Marks scored in clinical examination, progressive assessments and the written paper were tabulated for each of the 4 years under study. Using 50% as the pre-agreed pass mark, the pass rate (percentage pass) for the clinical examinations in each year was calculated. We calculated the means and compared the mean for each type of examination, each year and each test. The mean scores (percent) for each type of examination were subjected to statistical analysis using analysis of variance for two-factor. The two factors are the type of examination and the year of examination.
There were 1178 students who completed their examination in the four years under analysis with a progressive increase in the number of candidates seen from 2012 to 2015; 222,301,315,340 respectively. The mean score for various examination component also increased from 2012 to 2015 as shown in Table 1.
Analysis of variance was performed for the examinations and Table 2 shows the results of that analysis
The clinical examination reliability (Cronbach’s α) from 2012 to 2015 were 0.58, 0.77, 0.79 and 0.72 respectively. An independent-samples t-test was conducted to compare mean of traditional clinical examination score and OSCE clinical examination score. There was a significant difference in the mean scores for traditional (Mean=178.96, Standard deviation=24.47, Variance=598.82) and OSCE (Mean=186.42, Standard deviation=29.18, Variance=851.33) clinical examinations; t (1176) =4.68, p = 0.000. Pass rate for clinical examinations between 2012to 2015 was 95%,94%, 92.1% and 97.4% respectively. In general, the pass rate for traditional examination in the two years was 94.5%, while that of OSCE for the two years was 94.8%.
Analysis of the examination in 2015
From the six questions in essay, 2 were recall, and 4 were comprehension questions. Majority of students picked the sixth question (224/340) because it was a recall question. The Pearson correlation between all the types of examination was weak (Table 3).
MCQs had only 13% being problem-based questions, 87% were the recall type of questions. The Point Biserial ranged between -0.13-0.36 with a Cronbach α of 0.53.
In OSCE stations where students were required to demonstrate skills like physical examination and basic surgical skills, there was positive skewedness compared to those of talking skills such as history taking station or management stations (Figs. 1 and 2).
The standard deviation shows wide variation that may point to interobserver variability: given the stations were manned at different sites, testing same questions but different markers (Table 4).
The movement towards objective means of assessment in medical education has seen the replacement of traditional methods of assessment of long case, short case and orals with OSCEs in most medical schools world over (6). The replacement has been occasioned by poor reliability in long case and orals. Wass et al, using the generalisability mathematical model, predicted that one student needs 10 long cases examined by two people to get a reliability of 0.80(4). The reliability of orals varies between 0.50-0.80(7), while that of ward rating is 0.25-0.37(8). Our study reveals an average reliability of 0.68 (0.58, 0.77) for traditional clinical examination and 0.76 (0.79.0.72) for the OSCE clinical examinations. This is within the range of 0.46-0.88 that has been quoted for reliability of OSCE in the literature (9,10).
The mean score for OSCE and the pass rate for OSCE would suggest that either the OSCE was easy or candidates were of better quality in our study. This is different from what is in the literature that suggests that OSCE was a downgrading score (11,12). However, considering the context, one would say that the reason could be because of the examiners who might still be using the long case method. In that test there was a lot of prompting compared to OSCE where the examiner ought to be an observer of the performance. Instead of just knowledge or “know how”, it is about “show how”. The OSCE examiner, when using a checklist, is a “recorder” of behaviour rather than an “interpreter” of behaviour (13,14). When one transitions from global rating to checklist without clear learning on how to use checklist, it may result in an upgrade when you give marks even after prompting the candidate.
The other reason as to why the scores for OSCE may be higher in this study is because it has been shown that OSCE, because of the multi-station effects, evens out stringency whereas the traditional clinical examination has low chance of evening out stringency in awarding marks for examination (15). The other factor could be where standardized patients are used and not properly trained, the patient themselves could help the candidate by giving cues (16).
When one considers the examination done by candidates in 2015 a prototype of the examinations, a number of issues arise that need improvement in order for the quality of assessment to improve. The written essay questions’ ability to test all that the students learn could be achieved by changing from the traditional essay to modified essay questions with use of clinical vignettes. This will test higher levels of the Millers pyramid (17). Though constructed- response type of questions or their modification could test higher order thinking, they have been found to take time to construct and respond to. Their inter-rater reliability in marking is always low, hence some reviewers think they should not be used in any high stakes examination (18,19). Our essays’ construction did not cover all the subjects, and some were context-free. They would not assess higher functions as they were meant to hence the need to change.
The Biserial point for most of the MCQ was low with a very low number of problem-solving questions. There is need to improve these through faculty training on how to set quality multiple choice questions. Well-constructed selected –response question with clinical vignettes have been shown to evaluate higher –order thinking in the modified millers pyramid (17). But our questions may need modification in that respect.
Our correlation test between MCQ and clinical examinations was very low. This could either be demonstrating that this examination tests different domains of knowledge or concepts (11,12). However other studies found that they correlate very well (17). Low correlation points to poor constructs of questions as would be indicated by the other indices like Point Biserial (17,20).
Assessment is one of the determinants of students’ learning-style (1). When the students realize what is required is ‘know-how’ and not ‘show-how’, as is the traditional long case, they may resort to learning styles that are considered superficial as opposed to deep (17). This is because it is a high stakes examination where people fear failure and its consequences. In the traditional clinical examination where the history, physical examination is not observed, it is easy for candidates to learn know and “know-how “and fail on the “show-how” side. That is seen in our OSCE results where the “show-how” stations scores are skewed to the left. If deep learning style among students is desirable, we need to change the mode of assessment where what is valued is not the score of the “know” or “know-how” but the progressive assessment is taken seriously and its marks used as a major determinant of a pass or a fail.
The OSCE in this study seems to upgrade students more than the former traditional long case clinical examination, though the examination in 2015 upgraded students in general. The study reveals a weakness in the stations where “show-how” is required as opposed to “know-how”.
D, Jaeger K. The effect of assessments and examinations on the learning of medical students. Med Educ. 1983;17(3):165–71.
Newble D. Assessing clinical competence at the undergraduate level. Med Educ. 1992;26(6):503–11.
Ponnamperuma G, Karunathilake I, McAleer S, et al. The long case and its modifications: a literature review. Med Educ. 2009;43(10):936–41.
Wass V, Van Der Vleuten C. The long case. Med Educ. 2004;38(11):1176–80.
Hodges B. OSCE! Variations on a theme by Harden. Med Educ. 2003;37(12):1134–40.
Vleuten C, Norman G, Graaff E. Pitfalls in the pursuit of objectivity: issues of reliability. Med Educ. 1991;25(2):110–8.
Muzzin L, Hart L. Oral examinations. In: Neufeld, Victor R, Norman G (eds), Assessing Clinical Competence. Springer, New York. 1985; 71-83.
Streiner D. Global rating scales. Assess Clin Competence N Y Springer. 1985;119–41.
Walters K, Osborn D, Raven P. The development, validity and reliability of a multimodality objective structured clinical examination in psychiatry. Med Educ. 2005;39(3):292–8.
Turner J, Dankoski M. Objective structured clinical exams: a critical review. Fam Med. 2008;40(8):574–8.
Coovadia H, Moosa A. A comparison of traditional assessment with the objective structured clinical examination (OSCE). S Afr Med J. 1985;67(20):810–2.
Harden R, Gleeson F. Assessment of clinical competence using an objective structured clinical examination (OSCE). Med Educ. 1979;13(1):39–54.
Yudkowsky R, Park Y, Riddle J, et al. Clinically Discriminating Checklists Versus Thoroughness Checklists: Improving the Validity of Performance Test Scores. Acad Med. 2014;89(7):1057–62.
MacRae H, Vu N, Graham B, et al. Comparing checklists and databases with physicians’ ratings as measures of students’ history and physical-examination skills. Acad Med. 1995;70(4):313–7.
Finn Y, Cantillon P, Flaherty G. Exploration of a possible relationship between examiner stringency and personality factors in clinical assessments: a pilot study. BMC Med Educ. 2014;14(1):1052.
Newble D, Hoare J, Sheldrake P. The selection and training of examiners for clinical examinations. Med Educ. 1980;14(5):345–9.
Hift R. Should essays and other “open-ended”-type questions retain a place in written summative assessment in clinical medicine? BMC Med Educ. 2014;14(1):249.
Palmer E, Duggan P, Devitt P, et al. The modified essay question: Its exit from the exit examination? Med Teach. 2010;32(7):e300–7.
Palmer E, Devitt P. Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple-choice questions? Research paper. BMC Med Educ. 2007;7(1):49.
Ramos P, Ramirez G, Vallejo E, et al. Performance of an Objective Structured Clinical Examination in a National Certification Process of Trainees in Rheumatology. Rheumatolclin 2015: 11 (4) 215-20).