Kenji Hakuta's Points on SAT-9 Performance and Proposition 227

Points on SAT-9 Performance and Proposition 227

Kenji Hakuta

August 22, 2000

Recent media coverage, in particular an article in the New York Times (August 20, 2000), has occasioned a cascade of media calls to my office. Here are a set of points I have tried to make to the reporters. These points are based on the analysis I have conducted of SAT-9 scores with Evelyn Orr, Yuko Goto Butler and Michele Bousquet (click here), as well as on my own experiences as a researcher in the education of language minority students.

1. Any given school district's pattern of performance by LEP students should be considered in light of statewide patterns of performance by LEP and by native English speakers; our analysis shows that there have been statewide increases in SAT-9 scores for both LEP and native English speakers, following patterns that are virtually identical -- large increases in the early grades, and then tapering off in the fourth grade and beyond. This is not a Proposition 227 effect, but something much more specific to SAT-9.

2. The increases are due to a number of possible causes. Advocates of reforms such as Proposition 227, class size reduction, and increased school accountability would certainly like to give credit to their own individual causes, but there are other explanations that must also be considered. For example, schools and districts have taken the SAT-9 much more seriously this past year, and have taught to the test. Younger children's scores are probably more likely to benefit from increased attention by teachers and school officials to the importance of the test. Also, districts seem to vary considerably in who they included as LEP or as non-LEP, and in percentages of the LEP students that they tested. Of course, the results of a school or district's LEP students would depend a great deal on who they count as LEP and which LEP students were tested. Each claim about "success" for LEP students would need to be scrutinized. It is certainly premature to claim any sort of victory for Proposition 227.

3. SAT-9 is a poor excuse of a measure of English development and academic achievement for LEP students. The test was developed to give normative data in reading and math for native English speakers. The test measures things that are qualititatively different from what would be expected of students learning English. Consider an analogy. Imagine if you had just finished a first set of golf lessons in a driving range, and then you were taken out to a golf course, asked to play a full 18 holes, and kept score. Unless you were a prodigy, your score would be virtually meaningless, measuring luck much more than it would your ability. The golf score is very meaningful for those who have played for a while (Tiger Woods), but not for beginners (being one, I can testify that I never keep score -- I keep score in a different way, which is the percentage of solid contacts I make per swing). Given that SAT-9 is a weak measure of English for LEP students, we can only expect it to tell us very gross information. It is certainly not refined enough to tell us about differences between program labels, such as bilingual vs. English immersion. (Would I really be able to tell the difference between the effectiveness of different golf instructional approaches based on golf scores for beginners?).

4. The data from 1998 to 2000 show that all districts show rises, pretty much following statewide patterns. There are increases in school districts that have retained bilingual education, in school districts that had English immersion even before Proposition 227 (and therefore were not impacted by the policy), and in Oceanside, which has been acknowledged by the press for having switched faithfully from bilingual to English-only. Because SAT-9 is a bad measure for LEP students (golf scores), the scores for schools and districts are characterized by a lot of random noise, but they did rise in a rough way. That is, all the scores are rising, but the margins of errors are so large that it is not possible to distinguish between different types of language programs.

5. Why did Oceanside LEP students show such big gains from 1998 to 2000? Partly, one has to wonder how it managed to be so low in 1998 -- the average LEP 2nd grader at the 12th percentile (compared to LEP at the 19th percentile statewide), and the average 3rd grader at the 9th percentile (compared to 14th percentile statewide). So, they started out among the lowest in a group of students who score low to begin with. One of the laws of statistics is that the lower the beginning score, the more it will be expected to rise upon retesting. Also, an important perspective is that one can pretty easily find schools report having well-run bilingual education programs, that have equally dramatic gains as did Oceanside. A picture says a thousand words. Click here to see how Oceanside stacks up compared to some bilingual schools. (click here for pdf format picture)