广州中考英语词频信息对全国中考语的预测效力:一项微型研究

四季读书网 1 0
广州中考英语词频信息对全国中考语的预测效力:一项微型研究
一、研究背景及问题
现有100套广州中考及一模题,经统计得到:
1. 单词列表。
2. 单词的覆盖范围(range,即在多少套考试中出现),例如the的range为100,表示在100套题目中都有出现
3. 频率(word frequency)。例如the的频率为15448。
广州中考英语词频信息对全国中考语的预测效力:一项微型研究 第1张
另有200套全国各地中考题,经统计得到单词列表及对应的范围分布及频率。
广州中考英语词频信息对全国中考语的预测效力:一项微型研究 第2张
:广州语料库的词频,能在多大程度上预测全国语料库中的词频?
二、研究过程
挑选在全国语料库中词频≥5%的单词,共1522个。为了剔除高频词(如the、in、we、two)的影响,剔除掉词频全国语料库中词频≥95%的单词94个, 剩余1428个,进入统计。
三、研究发现
1. 相关性
在全范围单词(1428个单词)中,两组数据的覆盖范围相关性为0.95,频率相关性也为0.95,高度相关。即广州中考的词频信息,能很好预测全国中考的词频。
2. 差异性
2.1 覆盖范围相近的词
经计算两者覆盖范围之差(全国减去广州)可发现,两者范围相近(±5%)以内的单词共有794个,占全范围单词的55.6%,即广州单词中,超过一半的覆盖范围和全国的相近。
2.2 在全国中考更高频出现的词
结果显示,两者覆盖范围在5%以上的共有239个,平均差异为10.6%。其中差异最大的为club,在全国中考出现概率为51%,而在广州仅为18%,相差33%。其它差异较大的词有:
原形
衍生词
全国200Range
广州100Range
差异

club

club (288); clubs (32)

51%

18%

33%
museum
museum (164); museums (20)
52%
24%
28%
weather
weather (199); weathers (4); weathered (1)
55%
29%
26%
bus
bus (186); buses (8)
52%
27%
25%
Sunday
sundays (8); sunday (82)
34%
11%
23%
bank
bank (77); banks (3); banking (2)
28%
6%
22%
tour
tourists (60); tour (59); tourist (19); tours (14); tourism (17); touring (2)
42%
20%
22%
basketball
basketball (109)
37%
15%
22%
bike
bike (150); bikes (14); biking (17); bikers (1); biked (1)
45%
24%
21%
train
train (144); trainer (10); trained (36); training (101); trains (16); trainers (4); untrained (1); trainings (1)
62%
41%
21%
library
library (176); libraries (17); librarian (6); librarians (2)
50%
29%
21%
mate
classmates (126); classmate (35); schoolmates (5); roommate (6); workmates (3); roommates (3); workmate (1); deskmate (4); mated (1); mate (3); teammates (11); mates (2); schoolmate (1); deskmates (2)
56%
36%
20%
swim
swimming (116); swim (63); swam (15); swimmer (9); swims (4); swimmers (3)
45%
25%
20%
photo
photos (162); photo (78)
49%
29%
20%
2.3 在广州中考更高频出现的词
采用与2.1中的方法,可计算出在广州中考更常考察的词汇,覆盖范围在5%以上的共395个,平均值为10.4%。其中差异最大的为frighten,在广州中考出现概率为33%,而在全国范围仅为5%,相差29%。其它差异较大的词有
原形
衍生词
全国200Range
广州100Range
差异
frighten
frightened (8); frightening (3)
5%
33%
-29%
die
death (21); died (29); dead (18); die (37); deadly (3); dying (7); deaths (4); dies (6)
38%
64%
-26%
company
company (86); companies (39); companion (2); accompanying (2)
28%
53%
-26%
require
required (17); require (17); requires (28); requiring (2); requirement (1); requirements (5)
26%
51%
-26%
except
except (34); exception (1)
16%
41%
-26%
reduce
reduced (4); reduce (51); reduces (5); reduction (3); reducing (10)
20%
45%
-25%
search
search (36); searched (4); searching (15); searches (1)
22%
46%
-24%
effect
effect (27); effects (28); effective (12); effectively (4); effectiveness (2)
19%
42%
-24%
press
pressure (27); press (6); pressed (6); impressive (3); impress (2); presses (1); depression (1); impresses (2); impressed (3); impressions (1); impression (2); pressures (1); pressing (3)
18%
41%
-23%
type
types (32); type (33); typed (2)
19%
41%
-23%
adult
adult (29); adulthood (3); adults (39)
21%
43%
-22%
state
statements (10); states (22); state (20); stating (1); statement (1)
19%
41%
-22%
whether
whether (106)
37%
58%
-22%
event
events (50); event (50)
36%
57%
-22%
born
born (105)
38%
58%
-21%
appear
disappearing (9); disappears (5); appeared (47); appearance (13); appears (6); appearing (2); appear (28); disappeared (21); disappear (24); disappearances (2); appearances (1); disappearance (1)
47%
67%
-20%
cross
across (109); cross (27); crossed (6); crossing (10); crossings (6)
42%
62%
-20%
regular
regular (16); regularly (7); regulations (1)
8%
28%
-20%
until
until (163); till (23)
60%
79%
-20%
immediate
immediately (17); immediate (3)
10%
29%
-20%

抱歉,评论功能暂时关闭!