INTRODUCTION

Since 1982, women have surpassed men in American college enrollment and graduation every year and are rapidly achieving gender parity in many traditionally male-dominated academic fields (Halpern et al., 2007). Since 1986, at the graduate school level, female enrollment has grown 2% each year, compared with 1% for men. In 2008, 61% of graduate students in the U.S. were women. Women outnumber men in all major fields of graduate education, except math, computer sciences, engineering, physical sciences and business (Snyder and Dillow, 2009).

However, despite these successes, women still score lower than men on the math section of the high-stakes standardized tests used for admissions to college and graduate school including the SAT and the Graduate Record Examination (Halpern et al., 2007). Although women generally receive higher grades than men in high school and college, women underperform men in math and the physical sciences when tests are not closely related to material that has been previously taught (Willingham and Cole 1997; Halpern et al., 2007). The mean difference between men and women on the math portion of the SAT (SAT -M) has remained virtually unchanged for the past 35 years, with men outscoring women by an average of 38 points (College Board, 2009). In 1972, the mean SAT-M score for women was 38 points lower than that for men; by 2009, this difference was 35 points. Among the top scorers (700 points or higher out of a possible 800) on the 2009 SAT-M, men outnumbered women almost 2-1. This math-gender gap may be one reason behind the persistent sex segregation at the doctoral level in graduate education. Although the numbers of earned doctorates awarded to women went from 14% in 1971 to nearly 50% in 2006, in the fields of math, engineering and economics, the percentages of all doctoral degree recipients who were women were only 21, 12 and 30% respectively in 2006 (England, 2010).

In a meta-analysis of 100 studies published between 1963 and 1987, Hyde et al. (1990a) observed a complex pattern regarding gender differences in math performance. At the elementary and middle school levels, girls are superior to boys in computation and equal to boys in understanding mathematical concepts. Gender differences favoring boys emerge in high school on problem-solving tasks and the differences persist on SAT-M. The magnitude of gender differences in math performance grows larger with more selective samples; while gender-math differences were moderate for samples of college students (d = 0.33), the differences were much larger (d = 0.54) for samples of students from highly selective colleges and graduate students.

Among mathematically gifted children, boys have consistently outscored girls in math. Data collected from 1972 through 1991 in the Study of Mathematically Precocious Youth (SMPY) showed that among the intellectually talented 12 and 13 year -old American youths who scored 700 or more on the SAT- M, there were 13 boys for every 1 girl (Lubinski and Benbow 1992). Since then, this gender- math gap has narrowed considerably, but there is still a 4:1 boy-girl ratio among those scoring 700 or more on the SAT-M (Halpern et al., 2007).

While some scholars have suggested that biological factors underlie the gender gap in math (e.g., Benbow 1988; Halpern et al., 2007), most researchers have argued that the gender gap is a function of socio-cultural factors (e.g., Hyde et al., 1990b; Keller, 2001; Stage and Maple 1996; Tiedemann, 2000).

Stereotype threat and group differences on standardized tests: Steele and Aronson (1995) introduced an intriguing phenomenon, stereotype threat, to explain racial differences on standardized test scores. In four laboratory experiments using Stanford University undergraduate students, Steele and Aronson (1995) showed that "making African American participants vulnerable to judgment by negative stereotypes about their group's intellectual ability depressed their standardized test performance relative to White participants, while conditions designed to alleviate this threat, improved their performance, equating the two groups once their differences in SATs were controlled". To explain this phenomenon, Steele and Aronson (1995) posited that a well-known negative stereotype can become a self induced threat to the members of the stereotyped group, depressing their performance on tasks that are the target of this specific stereotype. In their experiments, Steele and Aronson (1995) found that the stereotype threat was more evident when the threat was made salient by instructions telling participants that the test measured their cognitive ability. Pointing to the fact that the African-American participants in their study were strong students who identified with the material on the test, Steele and Aronson (1995) speculated that stereotype threat may have a particularly negative effect on the academically more able members of the stereotyped groups. Identifying with the domain in question, strong students may be more anxious to not confirm the stereotype than weak students; and such fear may lead them "try hard with impaired efficiency," resulting in low test scores (Steele and Aronson 1995). Based on these findings, they suggested that stereotype threat may offer at least a partial explanation for the persistent gap in standardized test scores between black and white students.

Spencer et al. (1999) were the first to study the effects of a math-gender stereotype threat on women's math performance in a laboratory setting. Using female students from elite universities, they found that women scored significantly lower than equally qualified men on a difficult math test when they were told that there were gender differences on the test, but performance differences "could be eliminated" when the test givers "lowered stereotype threat by describing the test as not producing gender differences" (Spencer et al., 1999,). Because this experiment demonstrated that the math-gender stereotype threat can dramatically impair the test performance of high-math-ability women, Spencer et al., suggested that stereotype threat may underlie the consistent gender differences in advanced math performance. Inasmuch as their study (and many subsequent laboratory studies) was conducted at elite American colleges and universities with participating students who were good at math, stereotype threat seems to offer a plausible explanation for the gender differences among high -math-ability students. To further test the nature and strength of the gender-math stereotype threat, several laboratory studies used procedures to explicitly reject the negative math-gender stereotype and found significant improvement of female participants on difficult math tests (McIntyre et al., 2003; Pronin et al., 2004; Walton and Cohen, 2003).

Carr and Steele (2009) utilized two classic psychological problem-solving situations-the Luchins Water-jar task and the Wisconsin Card Sorting Test (Berg, 1948)-to show that stereotype threat engenders inflexibility. Under threat conditions their participants exhibited significantly more maladaptive perseverance than did participants under reduced threat conditions.

Gender and mathematics in China: In China, the belief that women are weaker than men in math has a long history. Despite a continuous official "women holding up half of the sky" campaign since the 1950s and a consistent government effort promoting equal education for women and men, most Chinese, both men and women, still see math and science as a male domain (Broaded and Liu, 1996). Today few top mathematicians, engineers and natural scientists in China are women. In 2004, among all university faculty members and scientists who served as doctoral-program advisors, only 9% were women Educational Statistics Yearbook of China. In academic senior high schools, boys outnumber girls in the science track, while in the humanities track, 80% of the students are girls. This gender gap is surprising given the fact that there are no gender differences in the mean math score on the Chinese College Entrance Examination, an equivalent of the SAT (Tsui, 2007).

Testing stereotype threat hypothesis in China: Venator (2008) conducted a study in China to examine the cross-cultural generalizability of stereotype threat involving gender and math among Chinese students. Because this study was published in China, in Chinese, the descriptions which follow are somewhat more detailed than typical accounts of previously published research.

In a 2 (gender) X 3 (threat) complete factorial design, the participants worked on a difficult math test (our dependent variable). Details concerning the math test can be found in the method section of experiment 1 presented below. The first page of each test booklet included instructions and questions. Embedded in the instructions was one of three statements concerning gender norms for the math test (boys better than girls, no gender difference, girls better than boys), our threat manipulation. Our participants (196 men and 84 women) were biology, physics and computer science majors from three universities in Wuhan, China.

The experimenter told the students they would be taking a math test with questions taken from the GRE math subject test, which is taken by students who apply for admission to study mathematics at the graduate level. They were also told that the test would be part of their term evaluation and would be compared with students from other universities. The experimenter then read aloud the gender norms that were printed on the first page of the test booklet. At the completion of the test, the students were told that the test they had just taken was part of a research project and that the statements about gender were not true. They were also told that they would receive course credit for their participation, but that their individual scores would not be counted toward their term grades and...