High-stakes testing has been a controversial issue in the United States since education became politicized by the Reagan administration in the sixties. As time has progressed, government regulations have continued to force schools to conduct standardized testing as a method for increasing standards-based accountability. However, these tests have been proven ineffective at increasing student achievement, have cost the U.S. taxpayers tens of billions of dollars per year, have decrease the economic and educational opportunities of people from low socioeconomic backgrounds, have narrowed school curriculum and classroom pedagogy, and have been responsible for decreasing motivation in students. Resistance has begun to develop from parents, teachers, and students who have been continually frustrated by these high-stakes tests, and many researchers have begun to examine alternatives for resolving the testing for accountability problem. In order to better understand this issue, it is important for to examine the historical accounts and social consequences of high-stakes testing, and then explore alternative to high-stakes testing that increase school accountability, provide teachers with physical evidence of student learning outcomes, and increase the opportunities that teachers have to provide individualized instruction to students.
History of High-Stakes Testing
Standardized examinations have been used to make high-stakes educational and employment based decisions for over a thousand years in China. The impact of these assessments may have had positive effects in official decision making procedures due to the restrictive nature of the testing process, but there were also many negative social consequences that are still a major concern in modern high-stakes testing practices. In order to better understand these consequences, it is necessary to explore the history of the Civil Servant Examination (科举 kējǔ) used in China between the years 606 and 1905 AD, and it’s observational link to studies of consequential validity. Messick (1995) defines consequential constructs of validity as “evidence and rationale for evaluating the intended and unintended consequences of score interpretation and use in both the short and long term.” This is an important concept to understand since the primary argument against high-stakes testing, and the subsequent recommendations for high-stakes test replacement in this paper will be based on the elimination of consequential constructs of validity.
The Civil Service Exams consisted of three sets of tests that looked at each participants’ knowledge of the nine Confucian classic texts, their poetry writing skills, their ability to write official documents, and their understanding of national policy issues (Suen & Yu, 2006). These exams worked in a tier system that required the participant to pass the local exam before being accepted to take the provincial exam, and the provincial exam before taking the palace exam. The palace exam was considered to be the most difficult and took nine days for the participants to complete, but the rewards for achieving high scores on this test guaranteed the participant a high ranking official appointment, high social status, and financial security. This promise of social status and riches led many people to spend their whole life studying writing, poetry, national policy, and Confucian text, for the sole purpose of passing the test. Consequently, the high-stakes nature of the Civil Servant Exam caused many issues involving cheating, rote learning, memorization of modal essays, schools teaching to the test, and the narrowing of curriculum to meet the requirements of the test (Suen & Yu, 2006). During each dynasty, there were changes in the writing style of the test that was meant to force participants to show their knowledge of Confucian texts. However, the same problems mentions above continued to occur regardless of the writing style, since the test had such an important outcome for the financial security of the participants’ families. Moreover, students who could not pass the exam throughout their entire life contributed nothing to society, and due to this stress many people suffered from psychological problems that often led to suicide (Suen & Yu, 2006). The same problems persist in modern high-stakes testing arenas, which shows that the consequences of high stakes testing are not only difficult to solve, but are a social burden that restricts people economically and educationally, as well as wastes resources that could be better spent.
United States Movement towards High Stakes Testing
The increasing dependence on high-stakes testing as a source of school accountability has been a controversial issue in the United States education system ever since test scores related to achievement in Mathematics and Science began to decline in the sixties (Hillocks, 2002). This decline in achievement, along with the failure to beat the Soviet Union to space, led US education policy to adopt minimum competency assessments in the seventies (Amrein & Berliner, 2002). During this time, Florida began requiring students to pass these competency assessments in order to graduate, but ceased this practice after high rates of minority and students with low socio-economic backgrounds failed to graduate as a consequence of testing (Amrein & Berliner, 2002). However, the problem of student underachievement and complaints from universities and businesses over poor writing skills continued to place education in the political spotlight (Hillocks, 2002). This lead to the publication of A Nation at Risk by the National Commission on Excellence in Education in 1983, which proclaimed that American educational achievement had deteriorated to a point that “…commerce, industry, science, and technological innovation is being overtaken by competitors throughout the world.” A Nation at Risk supported its claims by stating that 23 million American were functionally illiterate, only 20% of 17 year olds could write persuasive essays, only 33% could complete basic math problems, and that achievement test were lower than that of other industrialized countries (Hillocks, 2002). The results of this essay led to the development of rigorous curriculum based standards that were enforced through standardized testing with high-stakes that would reward or penalize schools based on performance outcomes (Amrein & Berliner, 2002).
This trend in the reliance of high-stakes testing continued to expand during Gorge W. Bush’s presidential campaign, which politicized accountability in education, and led to the passing of the No Child Left Behind Act (NCLB) in 2001. The No Child Left Behind Act mandated that schools administer standardized tests to all students in grades three through eight in reading and math. If the scores of these tests stayed stagnant or did not increase by around 5 percent, federal funding would be redirected to charter schools as punishment (Hillocks, 2002). Moreover, this punishment system extended to teachers and administrators of many states by preventing salary increases, disallowing tenure, and job dismissal in extreme cases (Amrein & Berliner, 2002). The resulting effects of NCLB has been that schools have to spend a large portion of their budget on testing, there have been no substantial gains in student achievement, student motivation has decreased, students are unprepared for college and the job market, teachers have had to spend a large portion of instruction time teaching test-taking skills, and school administrations have resorted to manipulating test data through IEP and GED track loopholes (Baines & Stanley, 2014). Furthermore, Kohn points out that these tests do not give valid information about students and educational quality, other than that the results are influenced by poverty rates and the location of the community where the school is based (2000). Currently, there is still much political debate over the development of school standards and the role of high-stakes testing in education. Common Core State Standards have been the most influential standards nationwide between the years 2014 to 2016, and there have already been hundreds of thousands of students opting-out of high-stakes assessments as a form of protest (Singer, 2016). Recent reports in the Washington Post claim that the 2016 Common Core State Test administered in New York Public Schools was unreasonably long, had poorly chosen content, had questions that were overly confusing and above grade level (Strauss, 2016).
Social Consequences of High Stakes Testing
Restricting the Disadvantaged
High-stakes testing has been found historically to have consequences related to cheating, rote learning, teaching to the test, the narrowing of curriculum, decreased motivation in students, increased suicide ideations, and increased financial costs for schools. However, the correlation between high-stakes testing and the economic and educational restriction that it imposes on low-income and ethnic population should also be a major concern for educators and policy makers. Baker & Johnson point out that students who come from low-SES background have lower expectations for academic achievement from their parents, have lower self-esteem, have limited access to quality schools, and have a low amount of financial, academic, technological, and emotional support (2010). Furthermore, Baker & Johnson explain that low-SES students have low reading scores in kindergarten, and that family structure, maternal attributes, the skill level of peers, and rural and non-rural location have a major impact on educational achievement (2010). This background information on the cause of educational inequality is important to consider, and directly affects how well students score on standardized assessments. Kohn explains that since standardized tests are norm-referenced tests, many questions that minority students do well on might be discarded on the assessment (2000). He states that low-SES students have less access to test preparation material, have less access to quality education which is continually defunded by low standardized test scores, and consequently, teachers are driven away from their profession, and students fail to graduate (2000). This cycle of punishment for low assessment scores has had highly damaging effects to the education system, and continues to restrict disadvantaged students from having equal access to education, and a fair chance of succeeding in the job market.
Financial Burden on Schools
High-stakes testing not only has social consequences for low-SES students, but also has a major impact on the allocation of financial support and resources to academic programs. Baines & Stanley explain that the United States spends between $20-50 billion dollars annually on tests that have little impact on the learning outcomes of students, which only leaves a 50 percent budget that is devoted to regular instruction (2014). Moreover, the expenses of testing does not include the cost of remediation for students, which would require an additional $25 billion, and in the next few years, many schools might have to spend up to an additional 30 percent of their available funds to account for the new accountability requirements of NCLB (Baines & Stanley, 2014). In contrast, the U.S. Department of Education is only allocating $200 billion in mandatory and discretionary funds for education in 2016, which includes grants, loans, and work-study assistance for post-secondary education. The allocation of capital to fund standardized assessments that have been shown to be ineffective at improving student academic achievement and actively damages the function of the educational system should be reconsidered in the near future. Baines and Stanley point out that $50 billion could be better spent improving the infrastructure of public schools, paying for an additional one million teachers, providing every student with laptops with internet access, or by providing three meals a day to every child in North America (2014).
Narrowing School Curriculum (Teaching to the Test)
School standards and curriculum are at the center of any education system. The standards and curriculum are the guidelines behind what knowledge and skills are required for students to learn throughout the duration of an academic program. Unfortunately, high-stakes testing has caused a decrease in the diversity of the curriculum of many schools within the United States. Wayne Au conducted a qualitative metasynthesis examining 49 studies that examined the relation between high-stakes testing and curriculum. Au found that an overwhelming amount of these studies had a significant narrowing of curriculum to meet the subjects related to the high-stakes tests (2014). The study found that the majority of the curriculum being removed were non-tested subjects, particularly social studies and language arts courses (2014). In terms of formal control, Au reported that the majority of studies saw an increase in fragmentation of knowledge being taught in the classroom (2014). In terms of pedagogical control, Au found that the majority of studies observed an increase in teacher-centered instruction as a result of high-stakes testing. The results from this research show that high-stakes testing negatively affect the content and pedagogy associated with curriculum in schools. This trend is particularly worrying since most research suggests that having a diverse range of curriculum that emphasized student-centered learning, and integration of subjects, has the greatest influence on student achievement.
Psychological Consequences of High Stakes Testing
Motivation, Anxiety, and Depression
Many proponents of high-stakes testing have claimed that examinations increase student motivation to achieve academically. This claim may be true when looking at motivation as it relates to formative assessments that are designed in regard to class content and student demographic information, since the test items can be written in ways correspond to students’ background knowledge and goals. However, high-stakes testing may decrease student motivation since the test may lack relevance to the social environment of many students, and passing the test might be seen as an unachievable goal (Madaus & Clarke, 2001). Motivational issues that students experience may also have a connection to the stress and anxiety that is associated with high-stakes testing. The prevalence of test anxiety in non-high-stakes testing has been found to affect 10 percent to 30 percent of the student population, with rates of African American students in elementary school being as high as 41 percent (Sagool, et al., 2013). However, Sagool et al. found that even though all testing causes an increase in symptoms of anxiety among students, there were significantly greater levels of anxiety in students taking high-stakes assessments than classroom testing (2013). Test anxiety is known to increase the amount of stress hormones in students, and in many cases can lead to worry, rumination, and possible depression. Marzano point out that negative thinking patterns should be a major concern for educators since fear, anger, worry, and rumination are strongly associated with negative emotional responses that cloud judgment, degrade mood, and increase the risk of depression (2015). Moreover, Wang found that when students perform worse than expected on high-stakes tests, there is an increase in the risk of suicide (2015). However, Wang also found that reducing the stakes and frequency of testing has the potential for decreasing suicide ideations in the at risk student population (2015).
Developing Alternatives for Accountability and Evidence of Learning Outcomes
The Finland Model for Educational Reform
High-stakes testing was initially put into place as a way to increase accountability and the quality of education for all students in the U.S. However, there have been major consequences associated with high-stakes testing that have caused many educators to look for alternatives to high-stakes testing that allow for improvements in the educational system. Many Western educational and political leaders have begun to examine Finland’s education system for answers, since their educational system assigns less homework to students than most countries, has more time devoted to creative play, and only requires one high-stakes test if a student wants to enter higher-education, yet their students still score in the top three countries on international PISA assessments (Partanen, 2011). This leads many people to question how there can be accountability in the educational system if there are no high-stakes tests tracking student progress or education quality in schools, and how the Finland model can help guide school reforms in the U.S. First, every school in Finland, kindergarten through university, are free to attend, and are finance entirely from government revenue (Partanen, 2011). Second, there are no standardized tests, but instead teachers design their own formative assessments, create individualized report cards at the end of each semester, and sample groups from various schools are chosen by the Minister of Education to take testing in order to track student progress (Partanen, 2011). Third, teachers and school administration in Finland must complete a Master’s degree in highly selective education programs in order to work at schools, they are given a descent salary, and are held in high esteem socially (Partanen, 2011). Fourth, students in Finish schools are not separated by ability level, additional teachers offer support in classrooms to assist students that have difficulties learning, there are after-school programs for tutoring, students are not ranked by ability, there are no non-academic activities such as athletics, and there are a wide variety of courses offered at schools (Hendrickson, 2012). Finally, one of the most important aspects of Finland’s high quality of education is that they strive for educational equality for all students, regardless of social, economic, and geographic background. This means that all the students in Finland have access to free meals, health care, counseling, and are guaranteed a safe environment for learning (Pantanen, 2011). On the other hand, there is a trend towards the privatization of education in the United States, schools are highly unequal due to school financing practices, there is a reliance on high-stakes testing for accountability, unqualified teachers who have not completed an education program are being given teaching positions within schools, and there is unequal opportunities for students based on their SES and geographic background. If the United States is going to succeed at producing well-informed citizens that are capable of filling skill based job positions in the future, it will be extremely important to reform the education system in a way that strives to improve education, rather than just increasing standards through restrictive high-stakes tests.
Formative assessments are strategies that can be implemented by teachers to gain information about student learning outcomes in order to monitor progress, and adapt teaching adaptations. These strategies can be as simple as asking questions to students, monitoring the class during activities, collecting exit tickets where students answer questions about what they learned in class, having students use white boards to answer questions, using response cards or clickers to assess students understanding of content, having students paraphrase what they learned in class, having groups discuss the main points of a lesson, playing games that use content knowledge learned in the class, having student complete graphic organizers, playing games such as Jeopardy, having bell work that reinforces content through an assignment, or by having short quizzes that gauge student performance (Knight, 2013). In order to decide what strategies to implement in class, a teacher should first identify the knowledge, skills, and big ideas that students need to learn, make a class outline that addresses how students can learn the knowledge, skills, and big ideas of a lesson, identify proficiencies that address what to teach and assess, and choose which formative assessment strategies will provide the most accurate information regarding student learning outcomes (Knight, 2013). When students experience difficulties in achieving these proficiencies, the teacher can then modify instruction, increase individual teaching time, provide immediate feedback, meet with the student to discuss their work, and break down the content into smaller components in order to assure that the students achieve the class objectives (Knight, 2013). However, it is important that teachers make sure to prepare effective questions that check the students understanding of the content, have students explain their answers during discussions so that other students can learn from their point of view, give positive feedback and reinforce learner-friendly behavior, and pay attention to non-verbal cues that provide hints about the students understanding of the content (Knight, 2013).
Performance assessments are standards based tasks that encompass many skills, relate to real life situations, allow for student input in choosing the task, require core content knowledge usage, have a scoring system, and have measurement validity (Burke, 2005). Performance tasks can be either restricted performance based, tasks that are structured to fit a single objective, or they can be extended performance based, tasks that involve multiple objectives (Burke, 2005). These performance tasks can be completed over an extended period of time and can include multiple smaller tasks that connect to the main task, but are graded separately (Burke, 2005). These tasks can include problem solving skills, or psychomotor skills with or without products that can include but are not limited to: video interviews, art projects, situation based projects, giving speeches, performing a scene from a book, communicating in a foreign language, writing lab reports, typing professional emails, applying for a job, solving a difficult problem, using lab equipment, writing a computer program, repairing an engine, and growing a garden. (Burke, 2005). Performance tasks can be individualized to meet the background of students, but should relate to the knowledge, skills, and big ideas that relate to the class objective and state proficiency standards. The task should also have clear instructions and specific criteria that can be used when scoring the assessment (Burke, 2005). Creating a rubric with a list of the requirements and how they will be scored prior should be completed before assigning a student a task. Furthermore, it is important that the student is actively involved in developing the task and its criteria, and that the teacher helps guide the student to make goals that are achievable, appropriate to the objective, and are challenging (Burke, 2005). Once a performance test is complete, the teacher should make sure to give the student immediate feedback, and offer support for any observed gaps in knowledge or skills that are not mastered by the student.
A portfolio is a collection of work or assignments that shows a student’s progress or achievement in a given subject area. Portfolios can be used to document a student’s progress towards meeting an academic standard, can provide evidence of achievement, can show a student’s understanding of content knowledge requirements, and can be used outside of school for finding jobs, or applying to universities (Burke, 2005). The purpose of using portfolios in class can be to collect evidence of student achievement, to help assess student understanding of content knowledge, to increase student motivation for completing tasks, for better assessing a student’s strengths and weaknesses, by creating a sense of purpose for completing a task, and for helping a student to set and achieve goals that relate to the learning requirements of a class. Portfolios can also help students to achieve flow because the students will have clear goals, will have the resources to achieve the goals, will pay attention to the task, and will experience short term success at reaching a larger goal (Burke, 2005). Portfolios are also very simple to design, and the artifacts that are used to build the portfolio can consist of the homework assignments, classwork, and performance assessments that were completed throughout the school year. Furthermore, portfolios can help teachers to show evidence of student achievement as it relates to local, state, or national educational standards.
High-stakes testing is a major constraint in the United States education system that has fueled inequality and increased social and financial barriers for families with low socioeconomic backgrounds. It has cost the United States education system hundreds of billions of dollars over the past decade, and has done little to improve the overall achievement levels of students. On the contrary, high-stakes testing has been responsible the defunding of low income schools, has decrease the opportunities of low-SES students, has increase teacher-centered pedagogy, and has narrowed school curriculum to subjects that correspond to the test. Furthermore, high-stakes testing has been responsible for decreasing motivation in students, and increasing negative anxiety based emotions in students that have led many students to develop symptoms of depression or even suicide in extreme cases. However, there are many other alternatives to high-stakes testing that are more adaptive to modern classrooms, increase accountability, provide physical evidence of achievement, and allow for better informed individualized instruction and remediation. In the future, policy makers and educational leaders should take the steps necessary for eliminating the dependence on high-stakes testing, by working to create accountability through evidence based on artifacts, formative assessments, and random group testing. Furthermore, policy makers and educational leaders should strive to improve the quality of our education system by hiring highly qualified teacher that have a degree in education, completely fund all schools equally, eliminate the ranking of students, and provide free meals, counseling, and health care to all students in every school.
Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing & student learning. Education Policy Analysis Archives Epaa, 10(0), 18-74. doi:10.14507/epaa.v10n18.2002
Au, W. (2007). High-stakes testing and curricular control: a qualitative metasynthesis. Educational Researcher, 36(5), 258-267. doi:10.3102/0013189×07306523
Baines, L. A., & Stanley, G. K. (2005). High-stakes hustle: public schools and the new billion Dollar Accountability. The Educational Forum, 69(1), 8-15. doi:10.1080/00131720408984660
Baker, M., & Johnston, P. (2010). The impact of socioeconomic status on high stakes testing reexamined. Journal Ot Instructional Psychology, 37(3), 193-199. Retrieved from http://eric.ed.gov/?id=EJ952120
Burke, K. (2005). How to assess authentic learning (4th ed.). Thousand Oaks, CA: Corwin Press.
Madaus, G. F., & Clarke, M. (2001). The adverse impact of high stakes testing on minority students: evidence from 100 years of test data. In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation.
Hendrickson, K. (2012). Assessment in finland: A scholarly reflection on one country’s use of formative, summative, and evaluative practices. Mid-Western Educational Researcher, 25(1/2), 33-43. Retrieved April 13, 2016, from http://www.mwera.org/MWER/volumes/v25/issue1-2/v25n1-2-Hendrickson- graduate-student-section.pdf
Hillocks, G. (2002). The testing trap: How state writing assessments control learning. New York: Teachers College Press.
Knight, J. (2013). High-impact instruction: a framework for great teaching. London, UK: Sage Publications.
Kohn, A. (2000). The case against standardized testing: Raising the scores, ruining the schools. Portsmouth, NH: Heinemann.
Marzano, R. J., & Marzano, J. S. (2015). Managing the inner world of teaching: emotions, interpretations, and actions. Bloomington, IN: Marzano Research.
Messick, S. (1995). Standards of validity and the validity of standards in performance assessments. Educational Measurements 14(4), 5-8.
Partanen, A. (2011, December 29). What americans keep ignoring about finland’s school success. The Atlantic. Retrieved April 11, 2016, from http://www.theatlantic.com/national/archive/2011/12/what-americans-keep-ignoring- about-finlands-school-success/250564/
Segool, N. K., Carlson, J. S., Goforth, A. N., Embse, N. V., & Barterian, J. A. (2013). Heightened test anxiety among young children: elementary school students’ anxious responses to high-stakes testing. Psychol. Schs. Psychology in the Schools, 50(5), 489- 499. doi:10.1002/pits.21689
Singer, A. (2016, April 07). Thousands refuse common core testing, calls for national opt- out and washington march. Huffington Post. Retrieved April 09, 2016, from http://www.huffingtonpost.com/alan-singer/thousands-refuse-common- c_b_9631956.html
Strauss, V. (2016, April 12). Teacher: what third-graders are being asked to do on 2016 common core test. The Washington Post. Retrieved April 13, 2016, from https://www.washingtonpost.com/news/answer-sheet/wp/2016/04/12/teacher-what-third- graders-are-being-asked-to-do-on-2016-common-core-test/
Suen, H. K., & Yu, L. (2006). Chronic consequences of high‐stakes testing? lessons from the chinese civil service exam. Comparative Education Review, 50(1), 46-65. Retrieved April 03, 2016.
Wang, L. C. (2015). The effect of high-stakes testing on suicidal ideation of teenagers with reference-dependent preferences. Journal of Population Economics J Popul Econ, 29(2), 345-364. doi:10.1007/s00148-015-0575-7