Also available on:

Watch on:

Overview

The American Board of Surgery In-Training Examination will officially be switching from reporting percentile scores by year level to percent of questions correct. What does this change mean for residents? Podcast hosts Dr. Ananya Anand, Dr. Joe L’Huillier, and Dr. Rebecca Moreci are joined by three fellow CoSEF members for this discussion: Dr. Gus Godley, Dr. Colleen McDermott, and Dr. Josh Roshal.

Hosts:

–Dr. Ananya Anand, Stanford University, @AnanyaAnandMD, ananya_anand@stanford.edu

–Dr. Joseph L’Huillier, University at Buffalo, @JoeLHuillier101, josephlh@buffalo.edu

–Dr. Rebecca Moreci, Louisiana State University, @md_moreci, morecir@med.umich.edu

–COSEF: @surgedfellows

Special guests:

-Dr. Gus Godley, University of Chicago, frederick.godley@uchicagomedicine.org, @GusGodley

-Dr. Colleen McDermott, University of Utah, colleen.mcdermott@hsc.utah.edu

-Dr. Josh Roshal, Brigham and Women’s Hospital, jaroshal@utmb.edu, @Joshua_Roshal

Learning Objectives:

Listeners will:
– Understand the changes to the ABSITE score reporting by the American Board of Surgery

– Describe both positive impacts and limitations of this change from the resident perspective

– List possible ideas for further refinements to standardized exams in medicine

References:
-Yeo HL, Dolan PT, Mao J, Sosa JA. Association of Demographic and Program Factors With American Board of Surgery Qualifying and Certifying Examinations Pass Rates. JAMA Surg. Jan 1 2020;155(1):22-30. doi:10.1001/jamasurg.2019.4081 https://pubmed.ncbi.nlm.nih.gov/31617872/

-Sathe TS, Wang JJ, Yap A, Zhao NW, O’Sullivan P, Alseidi A. Proposed Reforms to the American Board of Surgery In-Training Examination (ABSITE). https://www.ideasurg.pub/proposed-absite-reforms/

-Miller AT, Swain GW, Midmar M, Divino CM. How Important Are American Board of Surgery In-Training Examination Scores When Applying for Fellowships? J Surg Educ. 2010;67(3):149-151. doi:10.1016/j.jsurg.2010.02.007
https://pubmed.ncbi.nlm.nih.gov/20630424/

-Savoie KB, Kulaylat AN, Huntington JT, Kelley-Quon L, Gonzalez DO, Richards H, Besner G, Nwomeh BC, Fisher JG. The pediatric surgery match by the numbers: Defining the successful application. J Pediatr Surg. 2020;55(6):1053-1057. doi:10.1016/j.jpedsurg.2020.02.052 https://pubmed.ncbi.nlm.nih.gov/32197826/

-Alnahhal KI, Lyden SP, Caputo FJ, Sorour AA, Rowe VL, Colglazier JJ, Smith BK, Shames ML, Kirksey L. The USMLE® STEP 1 Pass or Fail Era of the Vascular Surgery Residency Application Process: Implications for Structural Bias and Recommendations. Annals of Vascular Surgery. 2023;94:195-204. doi:10.1016/j.avsg.2023.04.018
https://pubmed.ncbi.nlm.nih.gov/37120072/

-Williams M, Kim EJ, Pappas K, Uwemedimo O, Marrast L, Pekmezaris R, Martinez J. The impact of United States Medical Licensing Exam (USMLE) step 1 cutoff scores on recruitment of underrepresented minorities in medicine: A retrospective cross?sectional study. Health Sci Rep. 2020;3(2):e2161. doi:10.1002/hsr2.161 https://pubmed.ncbi.nlm.nih.gov/32318628/

-Lucey CR, Saguil A. The Consequences of Structural Racism on MCAT Scores and Medical School Admissions: The Past Is Prologue. Academic Medicine. 2020;95(3):351. doi:10.1097/ACM.0000000000002939 https://pubmed.ncbi.nlm.nih.gov/31425184/

-Natanson H, Svrluga S. The SAT is coming back at some colleges. It’s stressing everyone out. Washington Post. https://www.washingtonpost.com/education/2024/03/18/sat-test-policies-confuse-students/. Published March 19, 2024. Accessed April 5, 2024.

-de Virgilio C, Yaghoubian A, Kaji A, Collins JC, Deveney K, Dolich M, Easter D, Hines OJ, Katz S, Liu T, Mahmoud A, Melcher ML, Parks S, Reeves M, Salim A, Scherer L, Takanishi D, Waxman K.. Predicting Performance on the American Board of Surgery Qualifying and Certifying Examinations: A Multi-institutional Study. Archives of Surgery. 2010;145(9):852-856. doi:10.1001/archsurg.2010.177 https://pubmed.ncbi.nlm.nih.gov/20855755/

-Weighted test content from the ABS:
https://www.absurgery.org/wp-content/uploads/2023/01/GS-ITE.pdf

-USMLE program announces upcoming policy changes | USMLE. Accessed April 9, 2024. https://www.usmle.org/usmle-program-announces-upcoming-policy-changes

Please visit https://behindtheknife.org to access other high-yield surgical education podcasts, videos and more.

If you liked this episode, check out our recent episodes here: https://app.behindtheknife.org/listen

Transcript

Scoring Changes to the ABSITE_ The Trainee Perspective on Impact and Ramifications

[00:00:00]

Hello everybody and welcome to the third Behind the Knife surgical education episode brought to you by the Collaboration of Surgical Education Fellows or COSEF. We hope you enjoyed our last episode discussing the salary of resident physicians and we're excited to bring you more interesting content about surgical education today.

So I'm Joe LaHoullier, I'm a general surgery resident at the University at Buffalo. I'm in my academic development time away from the hospital pursuing my master's in health professions education from the MGH Institute for Health Professions Education. I'm also a research fellow working on an e crib grant from the New York State Department of Health, studying trauma outcomes.

But my favorite title lately has been dad to my twin one year olds. Hey everyone, I'm Rebecca Maurici. I'm a general surgery resident at LSU in New Orleans, currently in my second year of surgical education research at the University of Michigan, and also getting an MHPE, which is a master's in health professions education, also at the University of Michigan.

for tuning in. And my research is mainly around trainee assessment and the

[00:01:00]

transition to competency based training models. And I'm Ananya Anand the third host of our mini series. I'm a surgery resident at Stanford, currently spending my professional development time as one of Stanford's Surgical Education Fellows.

And I'm in the second and final year of my fellowship, after which I'll be returning back to residency. I'm also pursuing an MHPE at the University of Illinois, Chicago. And my research interests are in wellness, professional identity formation, and improving feedback for surgical residents. So Rebecca, Joe, and I are your three consistent hosts for our six COSEF Surgical Education Podcasts over the next two years.

And you might be wondering, what the heck is COSEF? Well, we're a multi institutional organization of surgical education research fellows, all working together to foster peer mentorship, networking, and scholarly collaboration. So we meet every week to discuss our ongoing research efforts either by individuals in the group or smaller groups within COSEF.

And we've actually recently become a formal part of the Association for Surgical

[00:02:00]

Education. So if you're a surgical education fellow or surgery resident interested in education, and you are thinking about potentially joining COSEF or learning more, please email us at cosefconnect at gmail. com.

That's C O S E F connect at gmail. com. Thanks, Ananya. And we are so excited to share a pretty exciting topic for discussion today, which we hope will engage both students and residents and also attending surgeons across the country. This is quite a hot topic that might be controversial, but really, we just want to start the conversation and encourage collaboration of multiple stakeholders to participate in this ongoing discussion.

So with all that being said today, we are going to talk about the abcide. And before we get started, we have three of our fellow COSEF members and friends joining us today. Gus Godley, he's a General Surgery Resident at the University of Chicago and a Surgical Education and Simulation Fellow at the Massachusetts General Hospital.

Josh Rochelle, he's a General Surgery

[00:03:00]

Resident at Brigham and Women's Hospital. And the Surgical Simulation and Education Research Fellow at the University of Texas Medical Branch. And Colleen McDermott, she's a General Surgery Resident at the University of Utah and is a current Surgical Education and Leadership Research Fellow there as well.

So many of us start surgical training without even knowing what the ab site is or what it means. But almost all of us take this multi hour long test once a year. Gus, could you maybe share some info with us on the history of the abcide and what residents should know about it? Sure. Thanks, Rebecca. So the abcide, or the American Board of Surgery in training exam, is an annual test given to surgical residents.

It was actually first administered in 1975 and features between 200 to 250 multiple choice questions designed to assess residents knowledge and expertise in core surgical care. For It is administered by the ABS and intended to help monitor the progress of residents across various domains of surgery.

It was initially designed as

[00:04:00]

a low stakes formative tool, providing feedback without impacting board certification eligibility. This makes it not only a good preparatory step for the more significant ABS qualifying exam, but also a compliance measure with the ACG& E program requirements for general surgery.

This allows program directors to track residents knowledge growth without it being the sole determinant for advancement in their program. Originally, the app site was developed due to the varying levels of knowledge and skill amongst searchable residents. Its primary aim was to standardize the assessment of resident progress nationwide, ensuring that each resident acquired essential knowledge to advance in their training.

Furthermore, it provided a clear and objective metric to highlight areas needing improvement, both for individuals and for their programs. Ultimately, the app site's tool is intended to shape educational curricula and align training programs with the essential competencies of our field. Each year it guides both residents and programs to recognize strengths and areas of improvement, ensuring consistent progress throughout their training.

All right,

[00:05:00]

so we know that the abcide is conducted every year and that the intention by the American Board of Surgery was for it to be low stakes and formative for residents. But recently, the app site has made headlines among the surgical community for recent changes in score reporting that they plan to instill in 2025.

So Gus, could you tell us a little bit about what these changes are and why they were done by the ABS? Absolutely. Yeah, as you know, we're all surgeons. So we have a lot of trouble making things low stakes. The app sites got undergone a lot of changes, both in the length of the exam as well as the content tested each year.

It does try to closely follow the score curriculum, which we all know is the guiding light for the app site, which is, you know, supposed to provide residents for a clear guideline for preparation and understanding of our material. One significant change that's garnering, I think, the most attention is the new method of score reporting.

Starting with the January 2025 exam, the ABS will no longer report percentile scores. The last exam to feature these scores will be in

[00:06:00]

January of 2024. This decision aligns with ABSTED's primary role as this, like, low stakes formative tool intended to gauge resident progress. The move to eliminate percentile scores comes as these were occasionally used for high stakes decisions, such as resident promotions, terminations, and fellowship selections.

These applications often depended heavily on percentile scores, just like applying to residency, which can be problematic. Percentiles don't always distribute normally, and small variations in raw scores can lead to significant percentile changes. This makes them an imprecise reflection of a resident's actual knowledge level.

So in response, the ABS recommended a shift towards using standard scores and introducing a new data tool that correlates these scores with the likelihood of passing the qualifying exam, which is really what we all care about. This approach aims to provide a more accurate measure of a resident's knowledge as well as progress.

However, this change comes amid broader shifts in medical exams everywhere, such as the USMLE moving Step 1 to a pass fail system. The

[00:07:00]

impact of these changes is significant, as evidenced by reports in the AMA, which highlight improved student experiences under the new system. I think we all remember what it was like trying to study for Step 1.

Months at a time, lots of work, long hours. Oh man, that was awful. It was, I know. So it seems that, at least for the medical students now, that some of that has gone away and some of the stress that comes along with it as well is lessened, which is great. These changes to the abcide in general, though, aim to refine how residents progress and how resident progress is measured and reported, ensuring that evaluations are both fair and indicative of their actual performance.

And really, it's going to be interesting to see how these adjustments influence both the exam itself as well as the broader field of surgical education. Thanks for that summary, Gus. Now, to put it all into context a little bit more, Colleen, could you share a little bit more about what the abcite was originally intended to be used for?

Yeah, so as we've been discussing, there are obviously a lot of

[00:08:00]

shortcomings of standardized testing, particularly with over indexing on standardized test performance as a predictor of future success. For example, studies of Shown that, while the app site helps predict qualifying exam performance, there's little data to support the app site use for evaluating anything other than the residence ability to pass the American board of surgery qualifying exam.

And similarly, the U. S. Emily has not been found to predict clinical performance. And while it was too late for all of us to avoid the gauntlet, gauntlet of studying for a graded step one exam, it became pass fail in January of 2022. Yeah, that must've been nice. But at the same time, program directors have to use some metric, right?

To sort through the hundreds, if not thousands of applications they get. So I'm sympathetic to that piece of it as well. Ultimately, it sounds like step two CK is just filling the void left by this transition. But ultimately the stress of these exams is. Has really just shifted, I think, from step one to step two.

I don't know if you guys would agree with that. Yeah. Like not trying to be like back in our day, we had to walk

[00:09:00]

to uphill to the library, both ways, studying for step one. I totally appreciate that for medical students now, step two, step two CK has just become another stressor for them. And that's probably a fair point that the NBME will have to contend with down the road.

And in their original February 2020 announcement regarding the scoring change, the NBME, it sort of re emphasized the U. S. Mali Step 1's purpose as a licensing exam that was designed to be used by state medical boards. And indicated that secondary uses such as residency application screening had concerning implications for medical student well being and overall medical education, right?

So, it was really only ever supposed to be a licensing exam that we had to pass to obtain a state medical license. Yeah, exactly. And eventually we sort of lost the plot and turned it into what we knew it as, which was an exam that essentially acted as a gatekeeper for residency interviews, particularly.

There were strong concerns about how the graded Step 1 exam affected underrepresented in medicine applicants. Sorry, Colleen, you're

[00:10:00]

saying there are concerns that the graded USMLE Step 1 disadvantaged URAM applicants disproportionately? Statistically speaking, yes, so studies found that underrepresented applicants and scores that distributed throughout the spectrum and certainly a lot of underrepresented students scored exceptionally.

Well, however. The average underrepresented student step score has been lower in a single institution study of 10, 000 residency applicants found that having a step 1, cut off of 220 excluded almost 70 percent of applicants from underrepresented group compared to only 39 percent of white applicants. And the authors of these studies indicated that multi level systemic inequality over individuals.

life course can play a role in standardized testing outcomes, whether this is school funding in the area where someone grew up, inability to afford test prep materials, or need to work to support family during college and medical school that might leave less time available for studying.

[00:11:00]

So that's a pretty big topic, Colleen.

I guess I'm wondering, is there any advice on how To equitably assess standardized test scores, then I know I'm not expecting you to have, you know, a particular answer. This is a question really for society. But what are your thoughts? Yeah, and I think that is like a 10, 000 dollar question. I liked there is a perspectives piece and we can link it in the show notes regarding the use of the MCAT in underrepresented applicants that urge admissions committees to consider scores in context and consider.

That multiple scores making for an applicant's ability to be successful and I think that this can be a good framework to start with. For example, say that you have an applicant who is a 1st generation student who has no access to test prep resources. They deal with chronic stress and cognitive load of discrimination in their daily life.

They're worrying about the safety of their family and their community. They're working full time, et cetera. If that person is still managed to score in the average range of accepted students on the MCAT, they certainly still have the academic aptitude to handle medical school.

[00:12:00]

And schools may not need to see a 95th percentile MCAT score to feel truly confident that this person could be successful as a medical student.

And that was sort of what the perspectives piece was. promoting. So going back to the USMLE, the reality is that cognitive ability and fund of knowledge needed to handle the work of being a physician may be related to performance above a certain threshold, for example, a passing score, rather than the need to be at the top of a bell curve, which to be clear on the abcide and the USMLE this bell curve is already made up of the most dedicated students in the country.

And Most of whom will make capable physicians and surgeons and therefore many felt that pass fail testing may lead to more equitable outcomes in graduate medical education. And the it's unfortunate that the change in exam scoring is still so recent that it's hard to draw broad conclusions yet about the USMLE change.

But anecdotally, I had a conversation with a colleague of mine a few years back. That offered a really interesting perspective. So this colleague that I

[00:13:00]

was talking to is underrepresented across several identities and she said that she actually disagrees with the move to pass fail testing. The reason being that she said that she often struggled with clinical evaluations in medical school because she felt that it was hard to relate to attendings and residents evaluating her that she felt her evaluation often represented a number of stereotypes and that she was being viewed in a biased way.

And I'm sure we've all had experiences where are, we know that our clinical evaluations in medical school are often based on the relationships and dynamics that we have, like, the interpersonal stuff. And we've seen that in the literature too, that people describe women or people identifying as women by how they look in letters of rec, or that they're quote nice and they describe men as being more sort of competent and holding leadership positions.

So there's definitely some element of discrimination that exists there. Yeah, exactly. And it sounds like this was possibly even worse. So my colleague reasoned that she would have struggled with residency

[00:14:00]

applications if the only thing that had been, if she had only been at the mercy of her clinical evaluations, but luckily, she had the ability to take a graded step one exam and felt that performing exceptionally well on step allowed her to show her strengths in an unbiased way and let her be successful during residency applications.

Yeah, I think that's pretty interesting. I also saw a study published in 2020 was by Yao et al in JAMA surgery. It showed an association with certain demographic factors and the ABS qualifying and certifying exams. Dr. Sosa, the chair at UCSF was the senior author on that, I think. But ultimately standardized tests can definitely be biased.

But what it sounds like you're saying, Colleen, is that other assessment metrics may be even more biased and that moving away from a graded exam may remove a less biased metric. Absolutely. Thank you. Did I get that right? Yeah. Unfortunately, that seems to be the case. And the same sentiment was echoed in a recent Washington Post article regarding SAT testing being optional for

[00:15:00]

college admissions during the pandemic.

Okay. Now we're going all the way back to the SATs. Yes. Get your number two pencil ready. Although I think that kids now take it on the computer, but anyway, so for several years, test score reporting Reporting was optional because of the pandemic, but more and more schools are bringing back. The is a requirement mostly starting in 2025 and in the article representatives from various selective schools commented that they found that in the test mandatory years.

They were able to matriculate more students from disadvantaged backgrounds because the quote without testing, they were less able to evaluate a student's chances of thriving at our institution. And the article didn't describe in detail why that was, but it's likely a combination of conscious and unconscious biases from the admissions committees or system, systematic differences.

In opportunities that would otherwise convey academic rigor, for example, research opportunities, the ability to take

[00:16:00]

college courses that are very expensive, extracurricular activities, et cetera. So, overall, it seems that while standardized testing outcomes may be affected by an applicant's background, they still serve as an important and maybe less.

Subjective strategy to help students set themselves apart, especially those who may not have had other opportunities to demonstrate academic success. So what should we be doing then? So the perspectives piece I mentioned earlier that emphasized evaluating scores, starting in context gives us a good idea for evaluating scores, but then they go forth.

To say that making an effort to create an equitable learning environment moving forward is important. So colleges have a lot of work to create an equitable environment for each student to be successful applying the medical school. But the goal is that the work of this equity from college and medical school and will create less and less disparity as time goes on.

Although, obviously, we'll never get to perfect parity. In the meantime, the takeaway is that we can probably apply this

[00:17:00]

advice surgery residency as well. But the primary goal should be to create an equitable opportunity for research and leadership and also using more granular assessment, such as or other competency based metrics.

That may be less fraught with bias, although they're still not perfect. And while we're working towards this, the absent can still serve as less subjective metric that underrepresented residents can use to set themselves apart for fellowship. Yeah. Thanks for that, Colleen. Let's pause kind of the underrepresented in medicine discussion and just turn this slightly here.

So I think a lot of the residents listening to this podcast, particularly the interns who haven't taken the exam before, maybe wondering, you know, Regardless of the validity evidence or what folks are going to do with these test scores, I have to take the test. So how do I do well? But even before we address that, we should probably clarify what doing well even means.

And I'd say that everybody in a surgery residency is in the country's top 1 percent for intelligence, ambition, capability. It's a pretty tough group to compete against.

[00:18:00]

Yeah. I feel like it probably depends a lot on a variety of factors, which we'll get into, but Also on an individual's personal goals.

But for now, Colleen let's try to put some stressed out newly matched interns minds at ease. What should they be doing to study for the abcite for this upcoming year? Yeah, it's true that the test can be of different levels of importance to different people. And for preliminary interns hoping to secure a categorical position in general surgery or elsewhere, the exam can be perceived as high stakes, even if fellowship apps are years away.

So, my number one piece of advice to remember is to take care of yourselves during the app site season, and this can go for anyone in all of residency. The science is we know that the brain doesn't consolidate memory. Well, without sleep, physical activity can help with alertness and test taking stamina.

Wellness and psychological safety can mitigate cognitive overload and decision fatigue. And, of course, managing stress can help improve executive function as well. Right, but leisure time for physical activity,

[00:19:00]

wellness, sleep, psychological safety, doesn't quite sound like surgery intern year to me. You guys didn't have an abundance of free time for all those things as an intern, what?

Yeah, and that's probably a lot to unpack that we don't have time for in this episode, but the point is that there's likely very little benefit in staying up to do another hour of practice questions if this means you'll be sleeping for less than six hours, and it's important to be mindful of balancing study, studying efficiently, efficiently while also taking care of yourself.

Yeah, I feel like cramming in that last month just makes life ultimately stressful without any gain. I mean, all of general surgery is considered fair game for this test, and you couldn't really cram it even if you wanted to, I feel. And that was my next point, that in order to keep the abscite from destroying your ability to do all of your instrumental ADLs in the winter studying the abscite is something that needs to be started well before January.

So, spending a little time now before intern year to Make a study

[00:20:00]

schedule or plan can help break up content review into manageable pieces. And one thing to think about is lining up studies topics with your fall rotation schedule, like vascular and vascular, colorectal and colorectal. Et cetera this is another way to force yourself to synthesize information between rotation, content, and lectures and app site material.

And it can help prep for cases and look good in the also, it's important to be realistic when making a study schedule, like your rotation. That's really tough may not be the 1 that you do most of your studying on and you may only have to do 5 questions per day to get through a block of 1000 questions in a test bank.

And that's something that you can do on your phone on an app while you walk to. A different ward of the hospital to see a consult or even while waiting for anesthesia to induce the patient. So you're saying that in this one specific situation, we don't blame anesthesia. We actually say thank you for facilitating a few minutes for us to do some questions.

Yes. OR to earn overtime is not your attendings friend, but it's your friend for getting things done as a junior resident. And

[00:21:00]

another way to get the most bang for your buck with limited study time is that you have to really focus on active versus passive strategies. So passive things like rereading and highlighting are really easy to do, but they don't do a lot for you.

So it's better to spend that time. With techniques that facilitate active learning, and this can be teaching the material to somebody else, like a study group, synthesizing info from multiple sources, doing practice questions, et cetera. And. Some passive techniques like listening to the BTK AppSite podcast can be great as a first pass for the material while you're still figuring out what information to know, but it's important to supplement these with active recall later on.

Yeah, I think everyone basically listens to Behind the Knife on repeat all of January while driving to and from the hospital. Yeah, my Spotify rep thinks BTK AppSite was my favorite artist last year, so I'm certainly guilty of it. Oh man, you are studying a lot more. But passive setting is still better than nothing, but if you have the time to work on a more active method, the more you'll

[00:22:00]

get out of it.

And that being said, your intern fall schedule may or may not lend itself to high yield study, but as, unfortunately, not all the apps that content's created equal. Yeah, I like transplant, a rotation that I've never done, but I think I get every year one question about Tacrolimus and that's about it.

Yeah. Is it Tacro How do you even say that? Is it Tacrolimus? I don't think it's Tacrolimus. I thought it was Tacrolimus. I don't know. I'm in Camp Tacrolimus. I'm in Camp Tacrolimus. Camp Tacrolimus. Maybe just Tacro. Tacro. Okay, so with everything else an intern has to learn, you may be best taking the L on the Tacro, Tacrolimus, Tacrolimus question and focusing study time on high yield topics.

So, things like breast surgery, trauma management, thyroid lesions are very testable and the BTK episodes always make a point of indicating when something is high yield and the ABS also publishes a way to content outline. So as it gets close to the exam BTK has some videos with mnemonics and

[00:23:00]

high yield info that might be good to review right before.

Yeah, you're right. I actually think the episode is called don't open this until you're on the way to the ab site. Yeah, it couldn't be more clear. So hopefully that helps as a rough study plan. In terms, I'd talk to other residents at your program and see what study tips they might have. Also, I alluded to this earlier, but going back to the bigger picture to help give our interns more context, what sort of things are the absent scores typically being used for?

A lot of people may think of fellowship applications or prelim to categorical applications, but what else is there? You know, that's a great question, Colleen, and like you mentioned, on the road to becoming a doctor, you have to take a lot of tests, whether it's SAT, MCAT, USMLE. And so I think it's important to think about how these tests are designed and how that influences their ultimate value.

So, in education, there are basically two types of test design. You have formative and summative exams. So, for example, the USMLE is this three step exam that's meant to

[00:24:00]

approximate how much medical knowledge that somebody has to inform licensure decisions in the United States. And that's so that our patients can be confident that anybody in the nation has passed all three steps of the USMLE and met a common standard of what a doctor should know.

And so essentially the USMLE acts as an objective measure of what somebody learns in medical school. And this carries a lot of value as a summative assessment. And so residency program directors, like we alluded to earlier, use it as a way to make selection decisions in the residency application process.

But even as a summative assessment, the USMLE website Says, in quote, it's appropriate to consider step examination scores in conjunction with other criteria such as course grades and faculty evaluations, rather than just using test scores as the sole basis for decisions. And in a time when program directors across all specialties are inundated with thousands of applications, as much objective data as possible can be tempting to try to make that selection process

[00:25:00]

easier.

So as you can imagine, in the next in the next natural selection process, which is for surgical fellowships. Those program directors are trying to use as many objective pieces of data as possible to make their selections too. And so it's very tempting to want to use this nice test called the abcite.

And in fact, there was a survey that was published in the Journal of Surgical Education back in 2010 that looked at how important abcite scores were in the fellowship application review process. And fellowship directors were asked to rank how they use things like residency program name, publication number, app site score, and even who your letters of recommendations were from for their determinations.

And app site scores came in third after residency program name and the letters of recommendation. And although it's a small percentage, 20 percent of programs actually have a percentile cutoff. For consideration to their fellowship programs, and there was actually another study done that looked at pediatric surgery applicants specifically, and they found that matched applicants had higher app site scores

[00:26:00]

compared to unmatched applicants.

And so from this data, you can get an idea that. Fellowships view app site scores as some sort of objective measure of candidate quality, but this is just published data, and you always hear stories of how value is assigned to high app site scores at individual residency programs. And I'm sure that each of us have some interesting stories of how that is being taken in our own residency programs.

So, for example, do any of you have any, any stories to share about how the app site has value in your programs? Yeah, so at our program when abscite was being reported based on percentiles residents who scored above the 90th percentile would win an award in front of the department at the end of the year, but it wasn't just like a paper certificate.

It also came with like a financial award. So there was a lot of signaling that the abscite was something to be taken seriously and that you would be rewarded for doing it. Quote unquote, doing well on the exam. Yeah. Same with our program and Anya. We

[00:27:00]

similarly have an end of the year graduation party and every resident like one per year who scored the highest gets a little shout out and and an award as well.

Yeah, and in addition to the awards that we get in Chicago as well, we also have an incentive to have the program support the purchase of our loops, which is, you know, not only a nice shout out, but an incentive that's tied directly to the things we do every day. And I've even heard of programs sort of going the other way where they send out an email of like a name and shame where they, they put the residents who were struggling and had the lowest percentiles out.

in an email to the whole department to sort of, I guess, motivate them to do better. It seems confusing. Wow. That's crazy to hear. Like the old carrots and sticks argument. I don't know. I'm a fan of carrots, not sticks, but yeah in Buffalo, we like many of you guys use carrots and reward good performance.

So, you know, with all of the data that we've talked

[00:28:00]

about and some of our like own personal stories in mind What does the American Board of Surgery say about what the abcite should be used for, and what should it actually be used for? What's that gap there? Ah, interesting. So let's actually take a look at the American Board of Surgery website, and I'm going to read this verbatim.

The exam, which is the abcite, are furnished to program directors as a formative evaluation instrument to assess trainees progress. The results are released only to program directors and should not be shared outside of the department's graduate medical education division. Scores may be shared with the individual trainee.

And so, That's the definition provided by the American Board of Surgery of what the scores should be used for. And what really should they be used for? It depends on what we as stakeholders want the value of the test to be, like I talked about earlier. If we really want it to be a summative assessment of trainees so that we can have an objective benchmark of how they're doing at each level of their training, so that we can

[00:29:00]

make informed selection decisions, Then we need to identify what those knowledge benchmarks are for each year of clinical training and consider redesigning the test to reflect those standards.

So either you need a level specific abcite for PGY 1, PGY 2, etc. Or you take a look at the questions on the current test, which is what everyone takes at the same time, and define at which point in training someone is supposed to get those questions right. So that when you take the abcite, you're like okay, I got 99 percent of PGY1 questions right, and only 25 percent of PGY2 level questions right.

I'm on track if I'm at end of the year or middle of the year PGY1. And that gives you at least some more level space. specific milestones that people can strive for on their training journey. And I use an analogy to explain this, or let me say it this way. I think the best way to understand this is with an analogy.

So for example, Joe, if you're taking a language test, you can't expect someone who's just starting to learn the language to perform as well as a native speaker.

[00:30:00]

Rather than saying, Oh, you're performing better than 30 percent of people who have been learning this language for three months. Which is less useful, that information.

You can define exactly what people who have been learning this language for three months should know. And the same thing for surgical education. And that can go into a whole discussion about competency metrics, etc. And if you do that, if you decide to make this type of formal, objective based assessment paradigm, then you need to invest in creating learning and testing environments.

You need to create, you need to invest in creating a learning and testing environment and a resource market that's effective in helping people achieve high scores on those high valued exams. But on the other hand, if we want a more formative assessment that trainees can really use to guide their progress through their training programs, then two things need to happen.

One, we need to stop using it for high stakes selection decisions and stick to the script that's provided by the American Board of Surgery. And number two, we need to really invest in creating an infrastructure where the feedback from the

[00:31:00]

exam is actionable and residents feel supported in acting on it.

Like Ananya or Gus, what do you think about how the exam is currently structured to provide you feedback when you get your score report? Right. So like you mentioned, we currently get a generated score report from app site, which has the content areas we could improve on. But really, this is very vague.

And we don't actually have access to the actual questions we got wrong with explanations that we can then learn from and formative assessments with the app site is You know, touted to be, as we all know, as surgical educators, these types of assessments really do require high quality feedback. Otherwise, we can't actually learn from them, and a score report really may not cut it.

Yeah, for sure. I mean, let's talk about this, this formative feedback that we get from the score report. I mean, so here's a snippet of mine last year, sharing all my secrets and hopefully my future colorectal surgery program director won't be listening. Apparently I really need to focus on all of rectal cancer.

So the score report starts off by saying,

[00:32:00]

But for the section in colorectal surgery, I got 25 of 33 questions correct. I wanted to say that the topic areas for the incorrect questions are hemorrhoids, rectal cancer. No specifics, just rectal cancer, colitis, ischemic colitis appendicitis. That one must have been before the coffee hit dysphagia.

That must have been difficult to swallow. Got me. And then colonic Crohn's disease, operative management, ulcerative colitis, operative management. So, I mean, you know, it's Friday. I'm about to go into a weekend that's booked with studying for sure but really it would be fabulous for all of us to get a little bit more than, you know, you could do better.

And then, you know, how about throwing in something like an actual question or an explanation that would really be something that I would be interested in and focus on. I don't really look at the score report because this is not really useful for me. But Gus, be fair to yourself though.

Some of those API questions are actually pretty hard, right? I appreciate you, Joe. Thank you. Yeah, it's like taking a

[00:33:00]

protractor and actually measuring how far the mass is from the tip of the appendix and if it involves the base. Wait, Josh, you don't do that?

That's just must be an LSU thing. So, you know, we talk about how getting like actual feedback for formative assessments is important, but I think one other really critical factor we need to consider when we're talking about the abcite Is that the ability to be successful in standardized tests is often related to your access to prep materials and resources.

And given that not all residencies pay for comprehensive app site prep resources and many of us are actually expected to pay for these very expensive resources out of pocket. How do we navigate that issue, you know, how do we create a more like equitable way of distributing these types of resources. So that's not really the main factor determining your success on the exam.

Yeah, I think that's a complicated and nuanced discussion. If you start with, for example, the

[00:34:00]

backward design theory of curriculum development, and you have an assessment, in response to that, there should either be a targeted push to create learning resources or instructional content to help people achieve those benchmarks on that assessment, or the market will create itself in a way to help people achieve those scores.

And in academic surgery, we do a good job of creating curricula within individual institutions that are validated and effective, but it's hard, especially as busy clinical surgeons to scale learning resources so that they're effective for the entire country. And so that's why things like score and true learn are so valuable because it was a nationwide or an economic.

Motivation for these institutions to come together to want to create these learning resources, but when it comes down to who should be paying for those resources is a tough question. At the end of the day, whose interest is it in to achieve good scores on the assessments? Is it the individual who's trying to do it as a formative assessment?

Or is it the program who wants

[00:35:00]

to show that their learners are doing well and will continue to do well and will pass the qualifying exam and the certifying exam, for example. So I think it depends, and I think the market ultimately shows who's more interested in supporting the learners to achieve their goals.

Yeah, I agree with all that, Josh. And the other thing to consider on the side of the fellowship directors, so we know that historically they've used absent scores as a way to rank or evaluate their candidates. And theoretically, if I'm thinking back to my algebra days, right, you can back calculate percentiles if you know, I think it's what the median and the standard deviation, I think, right?

Yeah. Okay. So they could attempt that will they do this? I don't know. I think that's really up for them to decide. But ultimately, I think the message that this transition 2 percent correct says is this. So instead of perceiving a difference in 20 percentile between two applicants, you'll be looking at a much smaller difference in the absolute score, which underlines

[00:36:00]

that, hey, there's a ceiling effect here.

The vast majority of us are smart people, right? Who work hard. The number of questions that separate us. are really quite small. So in spirit, I definitely appreciate the change, regardless of what fellowship directors decide to do with that. Just curious, like, as a potential, you know, program director or fellowship director in the future you know, do we think there are alternatives to the abcite?

Specifically for fellowship directors, if they want to use some sort of test or metric to standardize the evaluation of their candidates? Like, are there other things we could evaluate besides the likelihood of passing the boards? I think that's an interesting point, and I think that all eyes will probably be on the new and trustable professional activities topic and assessment framework that's rolled out by the American Board of Surgery as being that objective assessment tool that individuals, whether they're learners, or even people who are making

[00:37:00]

decisions can use in order to determine whether or not that person has the knowledge of a general surgeon that is expected within the United States.

And so as we start to accumulate more data and start to correlate these levels of competency with this subjective measurement framework, and then seeing how that leads to people either going into practice or applying for fellowship. We'll be able to start to get an idea of whether or not it's helpful, but ideally that would be the way right if you have a true validated effective competency framework for which you can say, this person knows what they're supposed to know in order to be a practicing clinical surgeon.

That's an objective measure. I can use that to make my selection decisions. And I think that ultimately, that'll be helpful. Help reflect what every attending or faculty member within their program thinks of their abilities and will be a much broader, yet hopefully objective and valuable measure of their capabilities.

I think with the EPA's though, one of the

[00:38:00]

differences is that you may still see rater bias, like faculty raters differ across the country and that kind of goes back to sort of the program culture and the faculty at those programs, and you may see more of that than on, say, a standardized exam for sure.

And I think that's a good point. And at some point, there might be a combination of the 2 where a program director might get a set of your evaluations throughout your years of training. As well as some more objective high stakes exam that's thought to be less subject to bias. But even when we were talking with Colleen, there's clearly some bias associated with these exams as well.

And so the question will be how do we combine those pieces of information to try to make the best informed decisions? Yeah, no, you're totally right. I think that you know, even if it's just a matter of deciding how many EPAs a trainee needs to have in order for sort of that faculty bias or the faculty variation to be

[00:39:00]

eliminated or reduced that might play a huge role in in the future.

Sorry, Gus, I cut you off. No, no, totally fine. I was just going to say I like, I just talking about even EPA is every time I just think of how much you know, how many times people have to assess and be assessed. And, you know, I wonder if someday we're going to have, you know, the whole world of AI in this conversation, thinking about ways that we can look at, you know, technical ability as well as medical knowledge and in some way that's, you know, hopefully not biased and can be done efficiently and provide good feedback.

There's some rudimentary models that exist. But That I've seen for providing video feedback and actually improving the quality of feedback that we give to to residents. But I think we're in the early stages and it'll be interesting to see where this goes. Yeah, great. Oh, go ahead. It seems like, I mean, it seems like the app site's going to be here to stay.

And so I do appreciate the ABS is like, move. You know, intention behind the move to go away from reporting percentiles and focus more on, like, the actual

[00:40:00]

raw score and what it means in terms of its correlation with passing your qualifying exam. But I think, like, our discussion has uncovered that even with that good intention and the move to try to be consistent with the whole low stakes formative assessment, there's still a tremendous amount of area that needs to be improved upon when it comes to this type of standardized testing, and particularly the implications of what it ends up being used for, which may not actually be consistent with what it was designed for.

So I truly think this group could talk about this topic for hours, but for the sake of creating a contained podcast, I think we'll wrap up the discussion there. If you're interested in learning more, one of our COSEF members Tejas Sethi, he just published in Global Surgical Education, a really interesting article called Propor Proposed Reforms.

to the American Board of Surgery and Training Examination. Really interesting. Give it a read if you want to learn more. The senior author on that study is Dr. El Saidy. He is also a past president of the Association for Surgical

[00:41:00]

Education so bringing in a lot of expertise there. Pat O'Sullivan, Dr.

O'Sullivan is also on this study. She's a senior researcher also at UCSF. A lot of expertise in qualitative methods. So I think that's a really good article written by a really good group of people. So, ultimately the abcide is a really big undertaking, and we've taken some time to explore the trainee perspective on the scoring changes, how to do well on the exam, and for the future, what else can be done to make this fair and equitable.

Thanks, Joe. And we'd really like to thank sincerely, our three guests who are fellow COSEF members, Gus Godley, Colleen McDermott, and Josh Roschel. And as always, go forth, be kind, don't forget to collaborate, and dominate the day. Bye bye.

Behind the knife

Scoring Changes to the ABSITE: The Trainee Perspective on Impact and Ramifications

Scoring Changes to the ABSITE: The Trainee Perspective on Impact and Ramifications

Scoring Changes to the ABSITE_ The Trainee Perspective on Impact and Ramifications

Ready to dominate the day?

Explore Other Topics