How we train our residents to become orthopaedists affects all of us, whether as educators now, or patients of these young surgeons later. So naturally we ask, given time and resource constraints in our training programs, are residents competent when they graduate? And how do we determine competence, anyway? The Accreditation Council for Graduate Medical Education and the American Board of Orthopaedic Surgery’s Milestones program provide some more objective definitions, and perhaps more than before, we now assess competence rather than merely track time in service [6].

figure 1

Don Anderson PhD

Against this backdrop of more rigorous assessments in surgical education, surgical educators have developed some more objective tools. In particular, the Objective Structured Assessment of Technical Skills (OSATS) has gained wider acceptance, with recent iterations including global rating scales and detailed checklists [7]. A recent review [3] concluded that the OSATS is generally a valid tool for formative feedback to the learner, but evidence of OSATS’s validity remains insufficiently robust to use this tool for high-stakes examinations and certifications. Why might there be problems with the OSATS’s validity? First, we must remember that validity is a reflection of how well something conforms to reality, and depending on what aspect is measured, validity can be divided into different types. The OSATS, as an evaluation tool for technique and skill, performs well in terms of construct validity; that is, the test seems to measure accurately the things it sets out to measure [2]. However, it is less clear that we are measuring the most important things in the first place (content validity), and we just do not know to what degree the measurements predict important real-world endpoints, like the effective performance of actual surgery (concurrent validity).

How might we measure the technical proficiency of a surgical learner in the context of actual (or simulated) surgery? When piecing together intra-articular fracture fragments, we strive for perfect realignment of the joint surface; we believe that quality of the reduction matters [1]. Additionally, we strive for fixation stability [5]. Aside from directly checking patient outcomes that usually can only be measured long after a procedure—and so are hard to use for the assessment of trainees—reduction quality and mechanical integrity are probably as close as we can get to measuring how well an operation was performed.

Does the OSATS really measure how well an operation is performed? Donald D. Anderson PhD from the University of Iowa and his team explored the relationship between what we measure (OSATS scores) and what is important (joint reduction quality and fixation integrity). His group has extensive experience in using the OSATS tool [4], and in this environment of more granular assessments, they recognized the pressing need to develop more relevant evaluation tools. Dr. Anderson and his colleagues tackled the difficult issue of validity using a clever approach—by judging the OSATS against what we know as something that matters in a well-executed operation. By doing so, they found that the OSATS overestimated a trainee’s skills and poorly correlated with what really matters—the quality of the surgical result.

As we transition deeper into this new era of resident education and assessment, please join me for an exploration into this critical topic with Dr. Don Anderson in the Take-5 interview that follows.

Take Five Interview with Don Anderson PhD, lead author of “Objective Structured Assessments of Technical Skills (OSATS) Does Not Assess the Quality of the Surgical Result Effectively”

M. Daniel Wongworawat MD: What prompted your interest in the assessment of technical skills?

Don Anderson PhD: Our group saw orthopaedic residencies being mandated to implement surgical skills training curricula for use outside of the operating room, and we wanted to ensure that assessment of performance in this context could be done meaningfully. Most orthopaedic surgical procedures address specific mechanical shortcomings. So it makes sense to focus assessments on mechanical outcomes. With expertise in biomechanics, human factors and engineering, we felt that we could do this. We also are developing methods for surgical planning that we want to evaluate in the skills lab, as a prelude to evaluation in the operating room. Thus, the assessment of technical skills addressed urgent needs both in our research and in our training programs.

Dr. Wongworawat: In general, checklists work. We have seen that in multiple industries. Why do you think the results are not so clear here?

Dr. Anderson: Checklists are useful when a task requires specific and readily observable steps. Training for such tasks amounts to teaching one how to perform the steps, and assessing whether the steps were followed. A good example in orthopaedics might be a diagnostic arthroscopy, which should generally evaluate a joint in a systematic manner. However, fracture reduction and fixation tasks depend on a complex series of actions involving tactile interaction and dexterous manipulation. Such actions are difficult to assess unambiguously as pass or fail. In these cases, standalone checklists become less useful.

Dr. Wongworawat: Where do you think OSATS fit into the educational toolbox for orthopaedic residents?

Dr. Anderson: We believe that using OSATS with task trainers can be a productive way to confirm that orthopaedic residents are adhering to well-accepted technical practices in surgery. Approaches can be readily taught, and underlying concepts explained. However, we think that unambiguous, physical, objective and unbiased measures of skill performance also have an important role to play in training and assessment. In the specific context of our work, we felt that for articular fractures the actual quality of the reduction achieved fills this role and for extra-articular fractures the structural integrity of the final fixation construct does similarly.

Dr. Wongworawat: Is there another dimension to operative skill that we are not assessing, or that is difficult to assess objectively, and how might we assess that?

Dr. Anderson: When the desired mechanical outcome of a surgical procedure involves a complex combination of factors, something has to be measured at completion. This is where assessment gets tricky. Knot-tying strength can be readily measured. So, too, can the mechanical integrity of a fracture fixation construct. Finally, if precise reduction of articular fracture fragments is desired, doesn’t it make sense that measuring the reduction quality is the best way to assess that performance? These few examples highlight instances where we believe that objective measures can be used in the training of orthopaedic residents.

Dr. Wongworawat: In which direction do you think trainee assessments should go?

Dr. Anderson: Do not get us wrong—there is a clear role for OSATS in assessing the progress that trainees are making in attaining surgical skill proficiency. Our impression is that these assessments are relatively low-hanging fruit, and that as we continue down this path, we need to strive to also go after the other undeniably more challenging aspects of performance assessment in the context of training competent orthopaedic surgeons.