Skip to content Skip to navigation
The racist and classist roots of standardized testing found a home at Stanford — and they still endure today Graphic featuring standardized test bubbles with a student inside of one

Home

As wealthy parents rush to hire teachers for private instruction while those with limited financial barriers choose to take time off from school, the extent of educational inequity in the United States is more apparent than ever. Yet, such inequities are not new — they are evident throughout the development of one of the most fundamental facets of U.S. education: standardized testing. The modern field of testing found its roots here at Stanford, where eugenics shaped the notion of meritocracy, and intellectual measurement systems advertised as “objective” were designed to reinforce the social order.
Title IX background
Racist and Classist Roots


The inception of standardized tests in the Western world can be traced back to the Industrial Revolution and the progressive movement of the early 19th century. With these changes came a growing emphasis on education and an accompanying need to assess students on a larger scale. For Alfred Binet, the French psychologist who conceptualized the intelligence quotient (IQ), the central goal was to identify students in need of assistance by evaluating their intellectual abilites.

“Binet introduced the IQ test as a response to the problem of kids from impoverished backgrounds coming into Paris in the 19th century and the need for some kind of measure that would tell you where those kids were in their intellectual development,” said Associate Professor of Psychology Gregory Walton, whose research interests include group differences in academic achievement and affirmative action. “So the original focus had an assumption that the intellectual qualities of a person can be developed, and there was a need for understanding where a person was to tailor the correct educational services to them.”

Binet’s goals, however, were co-opted by proponents of racist and classist ideologies in the United States. Amid rising European immigration, white Anglo-Saxon Protestant social scientists feared the arrival of “impure” and less intelligent students in American public schools. The “Negro education debates” that ensued in the aftermath of the 1896 Plessy v. Ferguson decision — which upheld the constitutionality of “separate but equal” racial segregation — brought the notion of Black educability to the forefront of the national conversation, leading many to promote the bigoted notion of Black intellectual inferiority. Jim Crow laws and the growth of the eugenics movement in the United States also emphasized racial categorization and division.

Lewis Terman
(Photo: Stanford Libraries)

For Stanford psychologist Lewis Terman (pictured), Binet’s intelligence test provided the ideal foundation with which to establish a “scientific” justification for white superiority. Terman, who chaired Stanford’s psychology department for 20 years, endorsed eugenics and viewed intelligence as an innate, biological difference between racial groups. Terman’s adaptation of Binet’s test to create the Stanford-Binet intelligence test — which is still used today as an IQ test — represented an “inversion” of Binet’s principles, according to Walton.

“The IQ test in Terman’s hands became a tool to identify the brilliant people, and those brilliant people were assumed to have their brilliance because of their genetic endowment,” Walton said. “That sort of basic assumption of growth and support for people who may have not had the same kinds of opportunities as others was inverted by Terman in the United States.”

Terman’s test overwhelmingly identified white people as “brilliant people.” Of the 1,528 subjects whom Terman dubbed “brilliant” in 1928, two were Black, six were Japanese American and one was Native American.

His individual test, which was published in 1916, defined intelligence in purely quantitative terms and was used to justify the forced sterilization of minority groups in the United States. Terman later administered his test to identify officers and separate soldiers by race and test scores during World War I. His introduction of the IQ test in America fueled an emphasis on meritocracy, and he and his allies soon began to advocate for widespread testing among the general public.

Carl Brigham
(Photo: Princeton University)

Carl Brigham (pictured), a Princeton psychologist and fellow member of the American Eugenics Society, built on Terman’s work to develop the SAT with the College Board in 1926. The test became a ubiquitous tool in college admissions by the end of World War II. Terman and Brigham’s work further impacted immigration policy in the United States: Brigham influenced the 1924 Reed-Johnson Act, which strengthened the immigration quota system and was founded upon the supposed inferiority of immigrants.

During the civil rights movement, eugenics ideology and intelligence testing again came into play, serving an important role in the debate over school integration by allowing detractors to hearken back to principles of genetic difference and scientific racism. The same ideology continues to impact modern discourse — the 1994 book “The Bell Curve” rejuvenated these beliefs and remains a source of debate.

Testing as a cultural product


For Terman, Brigham and their colleagues, standardized testing provided a means with which to justify arbitrary racial hierarchies behind the guise of objectivity. Their racist and classist values were reflected in their tests, which measured the skills that they determined to be indicative of true intelligence. For education professor Guillermo Solano-Flores, this aspect of testing is central, but often overlooked — it is a cultural product.

Solano-Flores explained that tests mirror developers’ cultural experiences. They measure what developers deem reflective of intelligence and what they decide students need to know.

University of Illinois education scholar Clarence Karier has highlighted how Terman’s IQ test reflected his values. Karier wrote in 1972 that Terman “developed questions which were based on presumed progressive difficulty in performing tasks which he believed were necessary for achievement in ascending the hierarchical occupational structure. He then proceeded to find that according to the results of his tests the intelligence of different occupational classes fit his ascending hierarchy.”

“It was little wonder that IQ reflected social class bias,” Karier wrote. “It was, in fact, based on the social class order.”

The disproportionate performance between groups revealed by Terman’s test, according to Karier, were by design.

University of Washington education professor Wayne Au explains performance disparities by highlighting the inherent relationship between failure and testing. According to Au, tests emerged to try to affirm and justify existing social hierarchies — and upholding this system requires the failure of certain groups.

A bar graph from Levine’s study showing the average percentage of points for authors by gender and race/ethnicity across 110 exams
A graph from Levine’s study reveals the slow inclusion from 1900-2018 of authors who are women and people of color in New York State Regents Exams.

Assistant Professor of Education Sarah Levine, who has studied the impact of societal pressures on the development of standardized literature tests, evaluated a century of exams administered in New York’s public schools, and found that social and cultural thought was ingrained in the tests’ questions.

Levine cited “the generally shared belief among many white male and some female educators at the turn of the 20th century that the purpose of literature is to uplift and civilize” as an example of a value reflected in questions at the time. Later, during both world wars, she saw tests mirror the predominant belief that “literature is meant to be patriotic, to help instill love of country.”

Not until the 1970s were authors of color introduced into English education and testing. According to Levine, that was in large part due to the growing diversity of students. “It’s not until you follow the growth of who’s going to school [that] you will see the beginnings of shifts in the kinds of questions that the tests are asking,” she said. This change was especially impactful given the strong relationship between testing and curriculum.

W.E.B. DuBois
(Photo: United States Library of Congress)

Though some educators embraced the supposed objectivity of standardized tests, the assessments’ inherent inequity did not go unnoticed in the early and mid-20th century. Their impacts on marginalized groups and employment by elite institutions and the state were unsurprising and apparent to some scholars at the time.

In 1940, sociologist and civil rights activist W.E.B. DuBois (pictured) described psychological tests as “quickly adjusted so as to put Black folk absolutely beyond the possibility of civilization.”

Since the inception of standardized tests, scholars, educators and developers have attempted to rectify testing inequities and disparities in performance. Identifying bias in testing is a central component of establishing fair tests, according to Solano-Flores, and requires looking beyond the experiences students from different backgrounds may share in school.

“Most of the learning that might be relevant to correctly answering an item comes from personal experience that is outside of school,” Solano-Flores said. “And if you have more opportunities to have those experiences in life because of your socioeconomic status, because of the income of your family, then you’re going to do better on those items.”

For Solano-Flores, one important and necessary reform to test-making is in the form of cultural bias reviews, which involve assessing the performance of distinct groups on testing items to identify disparities that may be due to different cultural norms. This process, however, requires time, patience and resources.

“When someone tells me that they need to develop a test in one year, I say, ‘There’s no way,’” he said. “I think that the field of educational measurement needs to somehow enrich their practice to give more time for test developers to work with their items, to address issues of language and culture and review and try them out with pilot students from multiple cultural groups so that they can find aspects of the items that may be an issue.”

Identifying cultural biases is a fundamental component of modern test development, according to education professor Edward Haertel, who specializes in the field of educational testing and assessment.

“Today there’s a great deal of care that goes into trying to make sure that individual items are not biased against one group or another, or that if there are subtle biases, those are balanced out in the course of the overall measurement,” he said.

Diversity, meritocracy and affirmative action


Despite efforts to reduce bias in testing, disparities in performance persist. In 2013, The New York Times found performance disparities on the National Assessment of Educational Progress between racial and socioeconomic groups. And a 2017 study by the Brookings Institution found that Black and Latinx students have disproportionately lower scores on the math section of the SAT in comparison to white and Asian students.

Standardized tests have also dramatically expanded in scope and number since the era of Terman and Brigham. Students now face a barrage of standardized tests, ranging from state-level exams to the SAT, ACT, AP and IB tests, all before they graduate from high school.

On a federal level, various presidential administrations have promoted testing. President Bush’s signing in 2002 of the No Child Left Behind Act placed additional emphasis on testing in public schools; districts whose students performed poorly faced losses in valuable federal funding dollars. Arne Duncan, President Obama’s secretary of education further endorsed the use of standardized tests to measure performance, a tactic many argue led to school closures and teacher firings that disproportionately impacted students of color.

Levine herself is a former Chicago public school teacher and has witnessed standardized testing operate up close. She explained that the importance of standardized testing in public schools differs by socioeconomic status and race.

“Schools in high-poverty neighborhoods and schools with lots of kids of color do more test prep than do wealthy white schools because the tests are often more important, and having good test scores for the school is often more important for that school,” Levine said.

Levine also criticized what she identified as a trend where tests are increasingly geared toward the identification of a “right” answer.

“[Students are] doing test prep that suggests that when you read there’s one answer to questions about literature, there's one meaning and you have to identify it — and even more than identify it is that you’ve got to figure out what a set of test-makers thinks you should say,” Levine said. “And I think, oftentimes, what that really means is, ‘What do the white people who made this test think I should say?’ It’s just another way that kids of color have to be playing a particular kind of game through white, middle-class norms.”

For Haertel, the way tests are used is perhaps their greatest flaw. While we often interpret “lower scores in some schools as evidence that those schools are not doing a good job,” Haertel said, such criticism may be misdirected: “Schools may be educating student populations that just have greater educational challenges. We end up blaming the school for things that may be happening in the community.”

Beyond socioeconomic and environmental differences’ influence on disparities in performance — wealthier students have access to test prep services and live in districts with greater resources, or may attend private schools — Walton’s research has led him to identify an additional source of inequality: negative stereotype threat. According to Walton, groups that are negatively stereotyped face a psychological barrier that leads them to perform worse on tests, and does not reflect their true, intellectual ability.

Walton cited a study which found that asking students to answer demographic questions, such as identifying their gender or race, right before the AP Calculus test exacerbates inequality in performance, especially in terms of gender disparities. He said that when test providers ask students to recall these aspects of their identity before the test, they are “calling forth to people’s minds, if they aren’t already thinking about it, that their gender and their race might be relevant to the test.” Generally speaking, minority students may fear that poor performance will give justification to stereotypes, making them anxious and leading them to perform below their true potential.

The notion of socially situated intelligence — that every individual approaches a test from a distinct social situation and with a unique collection of experiences, and some face disproportionate psychological and socioeconomic barriers — has given rise to Walton’s conception of “affirmative meritocracy.”

In Walton’s view, “affirmative action has run into this dilemma where people seem to think that it’s pitting diversity and meritocracy against each other.” In reality, standardized tests are systematically underestimating the true ability of disadvantaged groups, according to Walton. To make decisions that are truly based on merit, Walton contends, we must take group identities into account.

This year, the University of California (UC) Academic Senate authored a report which concluded that the UC’s use of standardized tests in admissions does not decrease diversity. The Senate’s task force found that because UC schools consider test scores in the context of each individual, the admissions process ultimately accounts for performance disparities between groups. “The distributions of test scores among applicants are very different by group,” the task force wrote, “but the distributions of test scores among admitted students are also very different by group, and in almost exactly the identical way.”

Elite institutions are increasingly committing to a contextual evaluation of applicants’ test scores and emphasizing a “holistic” admissions system that looks beyond scores. Amid the COVID-19 pandemic, standardized tests have become largely obsolete, with many schools — including Stanford — electing to go test-optional for the upcoming admissions cycle.

It remains an open question as to who will benefit most from this change. The New York Times columnist Frank Bruni wrote earlier this month, “on one hand, affluent students who are coached for these exams and usually take them repeatedly won’t get to flaunt their high scores,” but “on the other hand, less privileged students from high schools whose academic rigor is a question mark in screeners’ minds won’t have impressive scores to prove their mettle.”

Despite the complexities of COVID-19, testing remains a pillar of American “meritocracy.” And regardless of the role of contextual admissions, the very testing system that was designed to sterilize and seclude, to divide and diminish — but which some nevertheless argue to be wholly objective — in many ways continues to rob disadvantaged students of necessary resources and keep them from the halls of the nation’s elite institutions.

On average, students of color receive lower scores on standardized college admissions tests and are therefore less likely to gain admission and merit-based scholarships, according to The National Center for Fair and Open Testing. Almost 18% of Stanford’s Class of 2023 are legacy students or relatives of donors. Approximately 75% of Harvard’s white students who are athletes, legacy students, on the dean’s interest list or children of faculty or staff would have been rejected without this status. The 2019 college admissions scandal, in which wealthy parents paid thousands for fabricated test scores, highlighted the extent to which money can buy admission, especially when it comes to standardized testing.

“There are various ways in which testing can easily reinforce existing patterns of societal differences,” Haertel said. “We need to be very careful about that.”

The question, then, is how best to reform a system that is rooted in racist and classist values and, in many ways, continues to reinforce these divisions — or whether this system needs to be dismantled entirely.

“I think that tests like the SAT do a lot more work to re-enfranchise the social inequalities that exist in our society where they advantage students and perpetuate that advantage into the future,” Walton said.

In an attempt to gain access to the very campus where Lewis Terman promoted eugenics through intelligence testing, laying the foundation for the modern testing system, Stanford applicants race to increase their scores. Some face socioeconomic barriers to test preparation. Others reap the benefits of private instruction. Most will remember that number for the rest of their lives — and many will view it as having played a central role in their ultimate admission or rejection.

Contact Georgia Rosenberg at georgiar 'at' stanford.edu.