我们拿SAT真题测试AI,结果有点意外……

四季读书网 2026-06-18 08:55:25 1 0

在报班前的“摸底模考”中，我们有遇到这样的情况：有同学做完题，发现系统给出的官方答案和自己用AI查到的不一样，于是：

急冲冲地来找咨询老师：“老师，你们的系统答案是不是录错了？AI说这道题应该选C啊！”

之前就有一位美高学生，模考后拿AI批改整套试卷，AI给出总分1500，孩子认定自己基础很好不用系统补习；但我们标准化模考系统严格对照官方判分标准，最终得分仅 1400。

家长半信半疑，没当回事，直到后续真实SAT实考出分刚好卡在1420左右，和我们模考结果高度吻合，AI给出的高分完全是虚高误导。认清AI判分、解题的漏洞后，家长果断报课系统集训，最终孩子正式考试拿下1560高分。

面对考点严谨、陷阱密布的SAT考题，AI也很容易 “一本正经地答错”。

因此，PDA特邀我们SAT名师团中的Astraea和Hellin老师为大家共同撰写推文，用这篇文章我们就用真实真题现场“打假”，带大家看清AI的短板，同时也和各位家长说说备考中该如何理性使用AI工具。

现场翻车！真题硬核拆解

SAT模考是帮孩子找准定位的关键一步。

最近我们收到一些家长和学员的反馈：孩子完成模考后，习惯借助ChatGPT等AI工具核对答案，可比对结果却让人一头雾水。

AI给出的答案，和我们模考系统的官方标准答案有出入。

不少家长心急地前来咨询，怀疑是我们的题库答案录入出错，甚至担忧答案失误会误判孩子的真实水平，进而影响后续整体学习效果。

在这里先跟大家明确结论：绝大多数情况下，出错的并不是专业模考题库，而是看似万能的AI。

今天，我们就拿出2024年3月SAT亚太卷真题来拆解：

案例一（逻辑判断题）

The Canadian Longitudinal Study on Aging (CLSA-ÉLCV) is a longitudinal study surveying approximately 1,000 individuals in Canada to glean extended trends in aging. To carry out the necessary sampling of the population over many years, CLSA-ÉLCV needs extensive financial support, but this method provides valuable insights into causal relationships. However, when questions of causation are irrelevant, as with a fitness study seeking only to reveal the percentage of regular exercisers in a city who do weight training, longitudinal methods are unnecessary, and so_____.

Which choice most logically completes the text?

A) the success of the fitness study likely requires significantly less financial outlay than that needed for CLSA-ÉLCV.

B) the expense of CLSA-ÉLCV is likely greater than the cost of longitudinal studies of fitness.

C) longitudinal methods are suitable for studies of aging but ought to be avoided for those of fitness.

D) conclusions drawn from CLSA-ÉLCV are likely to be more authoritative than those from the fitness study.

AI给出的答案：C

AI的答题思路：

文章前半段肯定了‘纵向研究方法’（longitudinal methods）在衰老研究中的价值；后半段则明确指出，在健身研究中这种方法是‘不必要的’（unnecessary）。

因此，将两部分对比合并，最合乎逻辑的结论就是：纵向方法适合衰老研究，但在健身研究中应当被避免（ought to be avoided）。选C，逻辑完美！

正确答案：A

PDA老师的正确思路：

这道题完美暴露了AI对英语词汇“逻辑边界”的模糊，以及它习惯性“脑补”的弱点。SAT推断题的本质是：要求100%的文本对应。

我们来看文章真正的因果逻辑链条：

纵向研究缺点是需要巨大的资金支持（extensive financial support）。
健身研究不需要研究因果，从而不需要（unnecessary）使用纵向方法。
结论（and so...）：既然健身研究不需要使用那种“昂贵且耗时”的纵向方法，那么它的花费自然就低。所以，健身研究所需的资金支出（financial outlay），很可能明显少于衰老研究所需的资金。A选项完美闭环！

C选项错误的原因在于over-generalization，文章中想表达的是当研究不涉及到“因果关系”的时候，不需要使用longitudinal method，fitness是作为例子出现的，C选项却直接归纳为“fitness的研究不应当使用纵向研究”，这违背了文章想表达的意思。

🌰举个例子：如果研究的是“工资收入水平对做fitness的人选择什么类型运动的影响”，那仍然与fitness有关、且涉及因果，那纵向研究也是可以使用的。

☀老师的小课堂：

有一种SAT推断题选项的设计逻辑，是依托于段落中呈现出来的A to B to C to D之间的关系的。如果当A发生变化，B/C/D自然也会随之变化，这里考察的就是这个逻辑。所以当段落出现比较多的terms的时候，需要有意识地筛选出这些terms之间的关系。

AI无法分清“不必要”与“禁止/避免”的逻辑边界，直接落入了SAT经典的“正反过渡推论（over-inference）”陷阱。

案例二（语法题）

The relationship between genomes and epigenomes reveals how cells with identical DNA develop different _____ whereas the genome in each cell contains a complete DNA sequence, the epigenome consists of chemical compounds that determine which traits in the sequence will be expressed.

Which choice completes the text so that it conforms to the conventions of Standard English?

A) functions:

B) functions,

C) functions and,

D) functions

AI给出的答案：B