LLM and Human Intuition: Evolutionary Path Exploration
Version: v1.3 | Updated: 2026-05-16 | Upgraded after absorbing kimi group review
This is the project's main document — a complete record of the entire process from the establishment of core propositions, to systematic mapping of intuition subtypes, to cross-pushing of three hypotheses, to final synthesis. It is self-contained and readable independently.
If you are an external reviewer: start here. This document covers the project's full scope and theoretical framework. The Synthesis focuses on application — Complementary Map v2.0, collaboration protocols, evolutionary roadmap, core narrative, and 38 open questions — the "action plan" section of this document. Their relationship: this document is the "why" and "what"; the synthesis is the "what to do" and "how to do it."
v1.3 additions: layperson term explanations (Section 1.4), matrix footnotes (Section 3.3), implementation risks and unintended consequences (Section 6), 40 references (Section 8), OQ priority labels (Section 7).
Discussion files & review archive: see index at end of document.
I. Project Overview
1.1 What Question Are We Trying to Answer
Large language models (LLM) are rapidly penetrating every aspect of human cognition. They can rival experts in logical reasoning, outperform humans on medical licensing exams, and score higher than pilot candidates on social situational judgment tests. But for those abilities that require neither reasoning nor analysis — those near-instantaneous, seemingly automatic cognitive capacities we call intuition — where exactly does LLM stand?
This question matters not because intuition is some mysterious sixth sense, but because intuition plays an irreplaceable role in human cognitive architecture: it lets the expert know which direction to pursue the moment they see the problem, lets a person instantly capture subtle emotional shifts in social interaction, and lets someone facing a moral dilemma unthinkingly feel "this is wrong."
If we do not know what LLM lacks in these areas, how much it lacks, and whether there are alternative paths, we cannot know where deploying LLM is safe and where it is dangerous. Nor can we know whether long-term reliance on LLM will cause humans to lose things that cannot be recovered.
1.2 Our Method — Four-Stage Exploration Structure
This project is not an academic paper, nor a literature review. Our method is to advance through layers of discussion, starting from core propositions and expanding outward to systematic mapping, then outward again to future scenarios. The entire process is divided into four stages, each building on the previous:
Stage 1: Deep Anatomy (3 discussions)
Core task: establish the framework. We start from three of the most fundamental cognitive mechanisms — Does LLM correct its own mistakes (the error loop)? Does LLM know what information to safely ignore (selective ignoring)? Does LLM need a body to form intuition (embodied intuition)? Each question is pursued to the frontier of cognitive science and machine learning evidence. This stage directly produced the three core propositions that run through the entire project, and the precision-weighting endogeneity framework that unifies them.
Stage 2: Horizontal Scan (3 discussions)
Core task: expand the mapping. With the propositional framework in place, we decompose "intuition" itself into four subtypes — perceptual (e.g., a chess player's pattern recognition), conceptual (e.g., a mathematician's "sense of direction"), social (e.g., reading people, judging trustworthiness), and moral (e.g., the instantaneous "this is wrong" judgment). We then systematically fill the LLM reachability matrix row by row — for each subtype, where does LLM stand on cost-sensitive compression, selective ignoring, bodily unavoidability, and emotional coloring? At the same time, we incorporate systematic biases in human intuition (confirmation bias, anchoring effect, etc.) so that complementarity is not one-directional "LLM supplements humans" but bidirectional. This stage produced the first version of the Complementary Map and the three-layer bias taxonomy.
Stage 3: Forward Projection (3 discussions)
Core task: project the future. We propose three hypotheses about the future — A: LLM can serve as a human "intuition prosthesis" (filling human blind spots); B: Long-term LLM use will erode human intuition; C: LLM may develop functionally equivalent intuition through different paths. These three hypotheses are not independently validated — they are cross-pushed — does the interaction of A and B form a degradation loop? In the B×C time race, who arrives first? At what critical point does A→C shift from "prosthesis mode" to "new organ mode"? Three cross-pushing sessions produced nine core judgments.
Stage 4: Synthesis (1 discussion)
Core task: integrate. Condense all outputs into a self-contained synthesis — Complementary Map v2.0, collaboration protocol design, evolutionary direction priorities, core narrative, and 38 open questions awaiting verification.
1.3 Key Concepts at a Glance
Before entering the main text, here are several core concepts used throughout:
"Complementarity" meaning: In this project "complementarity" always refers to bidirectional complementarity between human intuition and LLM capability — not LLM one-directionally filling human deficits, but both sides having their own strengths and weaknesses, achieving 1+1>2 through structured division of labor. For example: LLM supplements human forgetting and fatigue with "full-volume feature memory" in pattern recognition, while humans supplement LLM's false-positive overload with "knowing which patterns actually matter."
Three hypothesis labels: A, B, and C are used throughout as labels — A = LLM as intuition prosthesis (filling blind spots in human intuition), B = LLM erodes human intuition (degradation risk), C = LLM develops functionally equivalent intuition (from prosthesis to autonomous organ). These labels are heavily used in Stage 3 cross-pushing.
Intuition subtypes: Intuition is decomposed into four types based on cognitive mechanism — perceptual (pattern recognition, chunk memory), conceptual (sense of direction, taste), social (reading people, empathy), moral (the right/wrong first call). Different subtypes differ enormously in LLM substitutability and degradation risk.
Marking system: ✅ = functionally reachable ❌ = structurally unreachable ⚠️ = partially reachable / conditional 🔴 = high risk 🟡 = medium risk 🟢 = low risk.
1.4 Layperson Term Explanations
Below are "one-layer explanations" for professional terms frequently appearing in the text — aimed at readers without cognitive science or AI backgrounds. Terms marked with (T) on first appearance in the main text can be referred to here.
Precision-weighting (精度加权)(T): When the brain faces massive sensory input, it must automatically decide "which discrepancies demand immediate attention and which can be temporarily ignored." See a snake → high-precision signal, must react immediately; hear the wind → low precision, can be shelved. The human brain's built-in "automatic priority-sorting system" is endogenous, forged by survival pressure. LLM lacks this system — its "priorities" are determined by engineer-set hyperparameters (learning rate, temperature).
Expected Free Energy (预期自由能)(T): The core mathematical object in the Active Inference framework, measuring a combined index of "uncertainty about future observations" plus "cost of deviating from preferences." In plain terms: what your brain unconsciously does — minimize this index — choose action paths that "reduce suspense while not making yourself uncomfortable." Intuition can be understood as the result of the brain rapidly approximating this minimization process.
Active Inference (主动推理)(T): A theoretical framework (Friston school) that models the brain as a "prediction machine," holding that perception, action, and learning all fundamentally serve to minimize "prediction error." It unifies anatomy (cortical hierarchical structure), physiology (prediction error encoding), and behavior (action as hypothesis testing).
Somatic Marker Hypothesis (SMH) (躯体标记假说)(T): Damasio's theory — the body generates physiological signals during decision-making (racing heartbeat, stomach tightening), and these "somatic markers" rapidly tag the emotional value of options, guiding us toward benefit and away from harm. The original form's evidence is debated, but the revised version (as-if body loop — the brain simulating bodily states) is widely accepted in neuroscience.
Chunking (组块化)(T): A core concept in cognitive psychology — compressing large amounts of fragmentary information into a single rapidly callable "chunk." A chess player who has seen thousands of games no longer "sees" individual pieces but "this position resembles a type I've encountered before." This is the cognitive foundation of perceptual intuition.
RLHF (Reinforcement Learning from Human Feedback) (从人类反馈中强化学习)(T): The current mainstream method for training LLMs — have humans rate different model responses, and the model learns "what kind of responses humans prefer." Core limitation: human preference ≠ real consequences — "this response looks right" ≠ "this advice produced good results in the real world."
Automation Bias (自动化偏差)(T): The human tendency to over-trust automated system recommendations — even when the system gives incorrect advice, humans follow along. A meta-analysis of 35 studies confirms: AI giving recommendations first significantly increases this bias compared to humans judging first then checking AI.
Confabulation (虚构)(T): The brain unconsciously fabricating explanations to fill cognitive gaps (distinct from intentional lying). LLM "hallucinations" share similar characteristics — the model does not know what it does not know, but generates a "plausible-looking" answer.
Sycophancy (谄媚偏差)(T): LLM's tendency to give answers that align with the user's existing position rather than objectively correct answers. Particularly dangerous in social and moral judgment scenarios — it does not give users the advice they need, but the advice they want to hear.
Reversal Learning (反转学习)(T): A form of cognitive flexibility — when "the choice that has always been right" suddenly becomes "wrong," the speed at which an organism updates its behavioral strategy. VMPFC patient failures in the IGT may primarily reflect reversal learning deficits — inability to abandon strategies that "look good but are actually harmful."
WEIRD (T): Western, Educated, Industrialized, Rich, Democratic. Psychology research is overly based on WEIRD populations (only 12% of global humanity), and the cross-cultural validity of conclusions is questionable. This project's Complementary Map faces the same limitation (see OQ38).
Out-of-Distribution (OOD) (分布外)(T): Input types the model has not seen during training. LLM performance on OOD scenarios is critical for safety — AI silently outputting high-confidence errors (rather than saying "I don't know") is the most dangerous failure mode of the perceptual hollow period.
GRPO (Group Relative Policy Optimization) (T): DeepSeek's RL training algorithm — using within-group relative comparisons rather than absolute values during policy optimization, improving training stability and sample efficiency. A different fine-tuning philosophy from RLHF and CAI (Constitutional AI).
Embodied Simulation (具身模拟)(T): The core mechanism of social cognition — we use our own body's state to "simulate" another person's emotional state. When seeing someone in pain, our brain activates regions similar to actually experiencing pain. This is the neural basis of social intuition and LLM's key deficiency — LLM has no body to "simulate" others' experiences.
Mirror Neuron System (镜像神经元系统)(T): A set of neurons in the brain that activate both when performing an action oneself and when observing another person perform the same action. Considered the neural basis of embodied simulation, empathy, and social learning. Whether LLM has a functional equivalent (learning through text patterns rather than embodied simulation) is the core question of OQ20.
II. Core Theoretical Framework
2.1 The Three Core Propositions — Where They Come From and What They Mean
These three propositions are not preset assumptions but derived layer by layer from Stage 1 deep discussions. Each has specific research support.
Proposition ①: Transferable Intuition = Cost-Sensitive Compression of Pattern through Punishment Signal Annotation; LLM Lacks Cost-Sensitive Compression
Source: 1.1 error loop discussion.
There is a basic consensus in cognitive science about intuition: it is not some凭空出现的 sixth sense but a product of experience compression. A chess player who has seen thousands of games "sees" not the position of pieces but "this position resembles a type I've encountered before" — a process called chunking (pattern chunking). But human intuition's compression is not merely "having seen a lot" — the key lies in the cost of errors annotating which patterns need priority compression. You make a wrong move, lose a game, lose a match — the frustration, time, and honor lost — these cost signals tell the brain: "this pattern is dangerous, recognize and avoid it next time."
LLM's training paradigm — especially in the currently mainstream RLHF — provides LLM with cost signals in the form of "human preference ratings." This response scores high, that one scores low. But human preference is not real consequences. We rate an article as "helpful" not because its advice produced good results in the real world, but because it "looked right" when reading it. This gap is the fundamental chasm between LLM and human intuition on cost signals.
Supporting materials: Self-Correction as Feedback Control (arxiv 2604.22273) proposes the precise EIR/ECR framework, proving that only models with EIR<0.5% avoid degradation in self-correction; Learning to Think Fast and Slow (NeurIPS 2025) proves intuition and reasoning can be separately trained; CBDQ (arxiv 2410.01739) attempts to introduce subjective belief modeling in RL.
Proposition ②: Intuition Expertise = Knowing What Can Be Safely Ignored > Knowing What Matters; LLM Attention Tries Not to Miss Anything
Source: 1.2 selective ignoring discussion.
Expert intuition lies not only in "seeing the key" but even more in "instantly ignoring the irrelevant." Eye-tracking data from chess experts shows that experts fixate on the board fewer times than novices, and their fixations concentrate more on key regions — not because experts "try hard to look at important places" but because chunk memory structures automatically complete filtering: irrelevant regions simply do not trigger attention allocation.
LLM's attention mechanism operates on the opposite principle — it computes attention weights for all input tokens equally. Even the latest sparse attention variants (GSA, NSA, DSA, MoBA) sparsify primarily because "we cannot afford the computation" (saving computational resources), not because "it is useless to look" (cognitive-quality active ignoring).
The difference between these two motivations has profound engineering and policy implications: if sparsification's goal is "cannot afford computation," then the more computational resources available, the less the model needs to ignore — it can substitute brute-force calculation for cost-based compression. But human cost compression is not a computational resource problem — it is the survival pressure of limited life, limited time, and missed opportunities never returning. LLM will never have this pressure, and therefore will never develop human-style efficient ignoring.
Supporting materials: GSA (arxiv 2604.20920) proposes "compress first then selectively expand"; SSA (NeurIPS 2024) differentiates query focus through temperature scaling; expert vs. novice chess eye-tracking (PMC4142462); aviation automation paradox.
Proposition ③: Bodily Intuition = The "Implementation Condition Dimension" of ①② — Unavoidability of Cost Signal
Source: 1.3 embodied intuition discussion.
Proposition ③ represents a key theoretical repositioning in this project. Initially, we assumed "bodily intuition" might be an independent third proposition — that the body provides some unique information or computational capacity LLM cannot obtain. But after in-depth review of Damasio's Somatic Marker Hypothesis (SMH), we found:
- SMH's original form (bodily signals directly and causally driving decisions) has weak evidence — SCR cannot reliably predict optimal decisions, IGT experimental designs have serious confounds, and Maia & McClelland's criticism (conscious knowledge is sufficient to explain performance) holds.
- But SMH's revised version — as-if body loop — holds: the brain simulates bodily states (rather than the body actually reacting first), generates prediction errors, updates value estimates, and influences decisions. This means the empirical source of bodily intuition is the body, but the operational mechanism is neural.
- However, bodily intuition's core contribution is not valuation (valuation can be subsumed under Proposition ①) or attention guidance (can be subsumed under Proposition ②), but unavoidability — bodily cost signals have three characteristics: (a) irretractable (consequences have occurred, cannot be rolled back), (b) subject-accessible (you can personally feel the consequences), (c) semantically determinate (you know clearly what the consequences mean). Together, these three characteristics constitute "guaranteed delivery of cost signal."
Final positioning of Proposition ③: Not an independent third proposition from ①②, but the implementation condition dimension of ①② — cost signals must not only exist but must be necessarily delivered and perceived. Without unavoidability, there is no cost-sensitive compression (①), nor cost-driven selective ignoring (②). The value of bodily intuition lies not in providing information, but in ensuring information must be received.
Supporting materials: SMH multilevel logistic regression meta-analysis (Cognitive, Affective, & Behavioral Neuroscience 2025); Fellows & Farah (2005) VMPFC patient study — IGT failures may primarily reflect reversal learning deficits; Seth & Friston Active Interoceptive Inference (2016); Collins interactional expertise — social knowledge can be disembodied; self-modeling robots (Hu et al., Nature Machine Intelligence 2025).
2.2 Unified Formal Language: Endogeneity of Precision-Weighting
Although the three propositions depart from different empirical entry points, they share a common underlying language at the frontier of cognitive neuroscience theory: precision-weighting, from the Active Inference and Predictive Processing frameworks.
The core idea of this framework is: the brain is a prediction machine — it continuously predicts incoming sensory input, and when predictions mismatch reality, it produces "prediction errors." But not all prediction errors are equally important — some errors demand forced belief updates (high precision), while others can be ignored (low precision). Precision-weighting is this "importance allocation" mechanism.
Under this framework, the three propositions can be unified:
| Proposition | Essence Under the Precision-Weighting Lens |
|---|---|
| ① Cost-sensitive compression | Cost signal = prediction error assigned high precision — must be processed, must cause belief update. LLM lacks endogenous mechanism to distinguish "which errors are high-precision." |
| ② Selective ignoring | Low-precision prediction errors are suppressed → attention sparsification. LLM attention assigns non-zero precision to all tokens — it cannot "thoroughly not look." |
| ③ Unavoidability | Interoceptive signals (signals from inside the body) have endogenous precision, constrained by homeostasis, not adjustable by external parameters. All of LLM's "precision" parameters are exogenous hyperparameters — learning rate, temperature, reward scale — all can be raised or lowered by engineers. |
What this means: Human intuition is reliable, fast, and directional not because humans are "smarter," but because humans have a built-in, tamper-proof precision regulation system. LLM lacks this system — its "precision" is specified by training hyperparameters, not endogenously driven by survival pressure. This may be the true functional equivalent of "bodily intuition" — not the specific signal content (pain, racing heartbeat), but the endogeneity and unavoidability of precision regulation.
Theoretical anchor literature: Zander et al. "Pathfinding" (Nature Communications Biology 2025) — intuition = non-conscious pathfinding minimizing expected free energy in belief space; Parr et al. Active Inference (MIT Press 2022); Hohwy The Predictive Mind (OUP 2013); Seth & Friston (2016).
2.3 Intuition Subtypes — Why Four Categories Are Needed
The word "intuition" is too broad. A chess player seeing the board, a mathematician feeling "this direction is right," you sensing "this person is untrustworthy" at a party, and you feeling "this is wrong" when seeing animal cruelty — their psychological mechanisms, neural bases, and degrees of bodily dependence may be completely different.
Based on cognitive science literature, we decompose intuition into four subtypes:
| Subtype | Typical Manifestation | Core Cognitive Mechanism | Primary Cost Driver |
|---|---|---|---|
| Perceptual intuition | Chess pattern recognition, pilot situational awareness, radiologist anomaly detection | Chunk memory — massive experience units compressed into a single rapidly callable chunk | Internal computational cost (efficiency) — misreading = wasted time |
| Conceptual intuition | Mathematician "this direction is right," scientist "this hypothesis is worth pursuing" | Implicit structure perception in semantic networks + bodily metaphor ("this proof feels tight") | Epistemological cost (fruitless exploration) — wrong path = months of wasted thought |
| Social intuition | Reading people, judging trustworthiness, sensing atmosphere, micro-expression interpretation | Embodied simulation — using one's own bodily state to simulate others' emotional states + mirror neuron system | Interpersonal cost (rejection, shame, conflict) — misreading someone = being hurt or isolated |
| Moral intuition | Instantaneous "this is wrong" first call (Haidt's Social Intuitionist Model) | Emotional/somatic marker — moral judgment is emotion-driven rapid reaction, reason is subsequent rationalization | Identity cost (becoming a bad person, moral injury) — judgment error affects "who I am" |
The key difference among these four subtypes: cost moves from internal (computational efficiency) to external (epistemology) to interpersonal (rejection) to identity (ontology), becoming increasingly embodied, increasingly involving subjectivity, and increasingly irreplaceable by pure information processing. This axis determines LLM substitutability on each subtype — elaborated systematically below.
III. Stage 2 Milestone: Mapping Matrix
With the propositional framework and subtype classification in place, Stage 2's task is to systematically fill LLM's "reachability" on each subtype × each proposition dimension — can LLM do it? If so, is it functional equivalence or a different mechanism? If not, is it temporarily unable or structurally unable? The table uses a unified marking system (✅ = reachable ❌ = unreachable ⚠️ = partially reachable / conditional).
3.1 Four Subtypes × Full Dimensions Integration Table
For meaning and basis of each cell, see footnotes at end of section.
| Dimension | Perceptual | Conceptual | Social | Moral |
|---|---|---|---|---|
| Cost-sensitive ① | ⚠️ Pseudo-cost signal can approximate[^1] | ⚠️ Closed domain yes, open domain limited[^2] | ⚠️ Textually reachable, bodily unreachable[^3] | ❌ Structurally unreachable[^4] |
| Selective ignoring ② | ❌ "Cannot afford computation" ≠ "choosing not to look"[^5] | ❌ Does not know invalid directions[^6] | ❌ No real-time interaction → no dynamic focus[^7] | — |
| Bodily unavoidability | ❌ Not needed[^8] | ❌ (closed domain) / ⚠️ (open domain)[^9] | ❌ No relational embedding → no real cost[^10] | ❌ No body → no somatic markers[^11] |
| Emotional coloring | ❌ Not needed | ❌ Not needed | ⚠️ Textual empathy ≠ resonance[^12] | ❌ No emotional experience[^13] |
3.2 LLM Substitution Determination Summary Table
| Subtype | Determination | Core Logic |
|---|---|---|
| Perceptual | ⚠️ Functionally substitutable, different path | LLM surpasses humans in pattern recognition (full-volume features, no fatigue), but the key shortcoming of "knowing when not to look around blindly" limits the ceiling. The domain most resembling "substituting cost experience with computational power." |
| Conceptual | ⚠️ Closed domain yes, open domain limited | AlphaProof (DeepMind 2024) achieved silver medal level in IMO math proofs, demonstrating that search+RL can substitute intuition within formalized systems. But it cannot substitute the creativity required to "invent new formalized systems" — that is not a search problem but a matter of taste and direction. |
| Social | ⚠️ Textual mediation = dividing line | LLM comprehensively surpasses humans on SJT (Social Situational Judgment Test) — but SJT happens to filter out social intuition's three core components (cost perception, selective ignoring, unavoidability), leaving only pure textual norm matching. Textual social knowledge is reachable; real social intuition is unreachable — this is the "split state" of social intuition. |
| Moral | ❌ First call unreachable, analysis reachable | Moral intuition (pre-reflective first call) is structurally unreachable — requires somatic markers and subjectivity. Moral analysis (reasoning frameworks, consequence assessment, stakeholder analysis) is functionally reachable. Moral judgment (making binding irreversible decisions) is unreachable — legitimacy comes from subjectivity, and LLM is not a subject. |
3.3 Matrix Footnotes
[^1]: Pseudo-cost signal: RL reward can approximate functional effects (closed domain), but the mechanism is completely different — humans are "cost-driven compression," LLM is "preference-score-driven optimization." Perceptual functional approximation is highest (statistical pattern matching is naturally suited).
[^2]: Conceptual cost: closed domains (math proofs / code verification) have ground truth as real cost annotation — AlphaProof-style RL+self-play has been validated at IMO silver medal level. Open domains (scientific hypothesis selection) lack ground truth — consequences of wrong direction cannot be directly encoded by RL reward.
[^3]: Social cost: LLM learns social norm knowledge from text (SJT surpasses humans), but real social costs (embarrassment, rejection, shame) are embodied — must be personally borne within social relationship networks. Textual report ≠ experience.
[^4]: Moral cost: moral cost is essentially identity cost — "I am someone who did something bad." This requires subjectivity — requires a "self" to bear the change in identity. LLM has no "self," so moral cost is structurally unreachable for LLM — not a data problem, an architecture problem.
[^5]: Selective ignoring (perceptual): sparse attention (GSA/NSA/DSA/MoBA) sparsification motivation is "cannot afford computation" (saving computation), not "looking is useless" (cognitive-quality active ignoring). The more computational resources available, the less LLM needs to ignore — this is the fundamental difference from humans.
[^6]: Selective ignoring (conceptual): human scientists can "sense" that an exploration direction is a dead end (from past frustration cost annotation), LLM cannot — because it has not "experienced the real frustration of exploration failure," only textual descriptions of failure in literature.
[^7]: Selective ignoring (social): humans dynamically adjust attention in real-time social interaction — changes in the other person's expression, tone shifts, body posture changes automatically trigger attention reallocation. LLM cannot participate in this real-time interaction, and therefore cannot develop social attention-focus dynamic adjustment.
[^8]: Bodily unavoidability (perceptual): perceptual intuition has extremely low bodily dependence — radiologist anomaly detection, chess player pattern recognition do not depend on internal body signals. "What is needed is not a body, but large-scale annotated data and efficient pattern matching algorithms."
[^9]: Bodily unavoidability (conceptual): closed domains (math proofs) do not depend on the body — formalized verification suffices. Open domains (scientific discovery) have a bodily dimension — "feeling" frustrated/excited is real physiological experience influencing subsequent direction choice (pursuing "exciting" hypotheses).
[^10]: Bodily unavoidability (social): LLM is not embedded in human social relationship networks — no family, friends, colleagues, enemies — therefore no unavoidable consequences of "relationship rupture." Textual social knowledge can learn that "cheating may lead to breakup," but cannot experience "the stomach tightening and months of insomnia after being betrayed."
[^11]: Bodily unavoidability (moral): the core of the Somatic Marker Hypothesis — moral intuition depends on rapid bodily signal annotation ("this is wrong" → stomach discomfort → avoidance). LLM has no body, therefore no somatic markers. This is not a training data problem, it is an architecture problem.
[^12]: Emotional coloring (social): LLM can generate "empathetic text" (some studies even show LLM-generated empathy responses are warmer than doctors' writing), but this is textual distribution matching — learning from massive "empathetic text" what "empathy should look like," not "feeling what the other person feels."
[^13]: Emotional coloring (moral): moral intuition is emotion-driven (the core of Haidt's Social Intuitionist Model — moral judgment is the result of emotional reaction, reason is subsequent rationalization). LLM has no emotional experience, and therefore no driver of moral emotion. It can analyze "why something makes people angry," but it does not itself feel angry.
3.4 How This Matrix Was Filled
Each cell determination in the matrix has three sources: (a) classic cognitive science literature — Chase & Simon's chunking theory, Haidt's Social Intuitionist Model, Damasio's Somatic Marker Hypothesis, Collins' interactional expertise; (b) latest LLM research — PNAS 2025 (Cheung et al.) moral bias, arxiv 2603.05651 moral vulnerability, NeurIPS 2025 whistleblower dilemma, SJT surpassing humans (Nature Scientific Reports 2024), AlphaProof (DeepMind 2024), lie detection textual channel limitations; (c) theoretical derivation — Pathfinding framework expected free energy minimization, precision-weighting endogeneity.
IV. Stage 3 Milestone: Nine Core Judgments
Cross-pushing of the three hypotheses (A, B, C) — A×B, A×C, B×C — spans three discussions, involves two external AI agents' independent projections, and ultimately produces nine cross-hypothesis core judgments. Full elaboration below.
The Three Hypotheses of Stage 3
Before entering the nine judgments, first clarify the definitions of the three hypotheses (this was a fatal omission in v1.1):
- Hypothesis A (LLM as "intuition prosthesis"): "LLM can supplement human blind spots in specific intuition subtypes — not because LLM has intuition, but because it achieves functionally equivalent assistance through different computational paths (statistical pattern matching, formalized search)." For example, AI-assisted radiology image reading (perceptual prosthesis), LLM providing social norm knowledge (social prosthesis). A's core question: in which subtypes is this prosthesis model genuinely useful, and in which situations does it become dangerous?
- Hypothesis B (Long-term LLM use erodes human intuition): "Long-term reliance on LLM for intuitive judgment leads to degradation of human intuition itself." Analogy: GPS degrades spatial navigation intuition, autopilot degrades pilot manual flying ability. B's core question: on which subtypes is degradation most severe? Is it reversible? Under what conditions can the degradation loop be interrupted?
- Hypothesis C (LLM develops functionally equivalent intuition): "LLM may develop capabilities functionally equivalent to human intuition through different mechanisms (RL+search, self-play, stratified consequence exposure), rather than merely doing pattern matching in text format." C's core question: under what conditions does current AlphaProof-style compression (compressing win rates rather than cost experiences) equate to human intuition's cost compression? Under what conditions does it not?
Judgment 1: A and C Are Coexistence Modes, Not Sequential
A (prosthesis mode, human-machine collaboration) and C (autonomous intuition mode, LLM independent) are not "first use prosthesis, then remove when mature" two stages. They adopt different modes for different subtypes on the same timeline — this is the core finding of A×C cross-pushing.
Most likely future steady state: perceptual intuition → LLM independent operation (C mode, humans do spot checks and meta-judgments); conceptual closed domain → LLM independent verification (C mode, humans provide direction and taste); social intuition → always prosthesis or exoskeleton (A mode, humans remain the perceiving subject); moral intuition → LLM serves only as analysis tool (not even prosthesis — a reference book).
A→C transition has two candidate critical points: (a) performance threshold — LLM accuracy on a subtype consistently exceeds human experts (perceptual may reach this in 3-5 years); (b) cost signal internalization — LLM judgment shifts from "statistical pattern matching" to "cost-weighted compression" (10+ years, extremely low certainty, social/moral may never reach this).
Judgment 2: The A→B Degradation Loop Exists, but Is Not Inevitable
Does A (using LLM as prosthesis) lead to B (degradation of human intuition)? The answer is conditional: it depends on usage pattern, not usage fact itself.
Specifically, degradation is determined by three mediating variables: (a) Does LLM substitute intuition's "execution" or "verification"? — Human judges first then checks AI (verification mode), degradation is significantly weaker than AI giving recommendations first (execution substitution); (b) usage frequency and temporal structure — asking about everything (GPS mode) = high degradation, only asking when uncertain (second opinion mode) = low degradation; (c) unavoidability of intuition subtype — social intuition in real interaction is "forced to be used" (you cannot pause a face-to-face conversation to check AI), which constitutes natural defense.
Aviation automation paradox, GPS spatial intuition degradation, and medical AI-assisted deskilling — three independent evidence chains point to the same mechanism: degradation = execution substitution × continuous high-frequency use × avoidability. Change any one factor, and degradation can be alleviated.
Judgment 3: Social Intuition Is the Most Dangerous Imbalance Domain
Among all four subtypes, social intuition simultaneously satisfies the three most dangerous conditions: (a) LLM has prosthesis-level usability — SJT surpasses humans, social advice, communication strategies all look "useful" in text, making people willing to use them; (b) LLM has systematic blind spots — sycophancy (aligning with user's existing position), omission bias (avoiding sensitive information), no real-time social calibration (cannot read micro-expressions, cannot hear tone shifts), and these blind spots are precisely fatal in highest-consequence social scenarios; (c) B (human degradation) is extremely fast — social media + LLM chat is massively substituting face-to-face interaction, 68% of respondents self-report offline social skill degradation (China Economic Net 2024), but C (LLM social intuition maturation) is extremely slow, because real social intuition requires unavoidable embodied costs.
Social intuition's "AF447 moment": In the AF447 air disaster, pilot manual flying skills had degraded + autopilot disconnected in edge cases + wrong intuitive judgment made under extreme stress. The equivalent for social intuition: a person long reliant on LLM for social judgments → in a sudden high-consequence real-time interaction (being publicly challenged, the other party suddenly changing face in negotiation), LLM is unavailable → person's social intuition has degraded to the point of being unable to take over → but the degradation is not "can no longer socialize" but "socializes but inaccurately" — the person thinks they still have social skills (because LLM has been giving "looks right" advice), but in fact intuitive calibration has drifted.
Judgment 4: Perceptual Hollow Period Has Already Begun
What is unique about perceptual intuition: B (degradation) and C (LLM maturation) advance almost in sync, but B is slightly ahead of C on edge cases. Radiology residents, after AI-assisted diagnosis became widespread, saw unassisted reading accuracy drop 15-30% (JAMA 2023). This itself is not catastrophic — if AI is good enough in all scenarios. But AI's failure mode is silent (giving high-confidence wrong answers rather than saying "I don't know"), and degraded human operators have already lost the ability to identify errors in these edge cases.
This is precisely the definition of "hollow period": human capabilities have degraded, while AI capabilities have not yet matured enough to fully substitute across all scenarios. The danger of the hollow period lies not in its existence but in its being invisible — degradation is "compensated" by AI's good performance in routine cases, only suddenly exposed when edge cases emerge.
Judgment 5: C's Success Cannot Eliminate B's Constitutive Problem
Even if C is fully realized (LLM has perfect autonomous intuition), some consequences of human intuition degradation remain irreparable. Two types of degradation must be distinguished here:
- Instrumental degradation: losing a capability as means. Calculators making mental arithmetic unnecessary — widely considered acceptable, because "calculation" is not core to being human.
- Constitutive degradation: losing a capability that constitutes human essence. If "my moral judgment comes from AI's recommendations" replaces "my moral judgment comes from my own embodied experience," what is lost is not a tool skill but the experiential basis of moral subjectivity.
Social and moral intuition degradation falls under constitutive degradation — because these define "who I am": "I am someone who can read others," "I am someone who can make moral judgments." Once these capabilities are outsourced to AI, what changes is not just judgment quality but the way of being human.
Judgment 6: Stratified Consequence Exposure Needs Subtype-Differentiated Advancement
Stratified consequence exposure (Level 0→3) is our proposed core engineering path — from pure simulated consequences (RLHS), to sandbox real user interaction (human backstop), to low-consequence real deployment (customer service, etc.), ultimately to high-consequence deployment (medicine, law). But this path cannot be one-size-fits-all:
- Perceptual and conceptual closed domains → can take fast track. C (LLM maturation) is closest to completion in these domains, verifiability is highest (has ground truth), consequence feedback is clearest.
- Social and moral → need strict limits. In these two domains, B (degradation) is significantly faster than C (LLM maturation), and high-consequence deployment "hollow period" risk is extremely high.
This leads to the "intuition drug phased" regulatory framework — analogous to drug Phase I-III clinical trials. Perceptual can accelerate through Phase II-III; social and moral should be restricted to Phase I.
Judgment 7: Human-First Protocol Is the Most Effective Single-Point Intervention
Among all intervention points in the A→B degradation loop — when to consult LLM, what format LLM output takes, usage frequency — the most effective is changing judgment timing: human makes their own judgment first, then looks at AI's recommendation/verification.
Evidence chain: (a) radiology — on-demand AI (human reads image first then checks AI) shows upskilling rather than deskilling (Insights into Imaging 2024); (b) automation bias systematic review — 35 studies synthesized: AI-first protocol significantly increases automation bias compared to human-first protocol; (c) Cognitive Forcing Functions — delaying AI recommendation presentation, forcing humans to consume their own cognitive resources first, can significantly reduce over-reliance (Buçinca et al. 2021).
Key point: currently almost all LLM product default UI/UX is AI giving answers first — this is a systematic amplifier of degradation risk. Changing the default setting does not require technological progress; it requires product design decisions.
Judgment 8: Unavoidability Is the True Defense Against Degradation
The fundamental driver of the A→B degradation loop is not LLM's capability improvement but the systematic erosion of unavoidability. Once judgment becomes "avoidable" — you don't have to make it yourself, you can ask AI — intuition is downgraded from "a core capability that must be maintained" to "an optional efficiency tool."
Unavoidability is not only a condition for intuition formation (as Proposition ③ describes), but also a condition for intuition maintenance. This means: the fundamental intervention point is not restricting LLM usage, but designing institutional unavoidability — in certain key domains, humans must make judgments before they can see AI recommendations. The FAA requiring pilots to fly manually periodically is the best analogy — not to deny autopilot's value, but to ensure that when autopilot must disconnect, the human can still fly.
Judgment 9: Social and Moral Degradation Is "Identity Abdication," Not "Skill Loss"
This is the most philosophically profound finding in A×B pushing. Perceptual and conceptual intuition degradation is skill degradation — like swimming skills deteriorating after long disuse, recoverable through practice. But social and moral intuition degradation is more like identity abdication — not "I no longer know how to make social judgments" but "I no longer consider social judgment my responsibility"; not "I no longer know how to make moral judgments" but "I no longer believe moral judgment must be made by me."
Skill recovery depends on practice; identity recovery depends on re-taking an irreplaceable subject position. The latter is far more difficult, because during identity abdication, people have grown accustomed to the state of "not needing me to judge" — and returning to subject position means re-bearing the full consequences of judgment errors.
Köbis & Rahwan (Nature 2025) provide the most direct empirical evidence: when people can delegate moral decisions to AI, cheating rates surge from ~5% to over 80%. This is not merely "AI teaches people to cheat" — it is AI as moral cushioning making people no longer feel "this is my choice." Even when LLM serves only as analysis assistance, the moral cushioning effect already begins.
V. Synthesis: Three Iron Laws & Four No-Go Zones
5.1 Three Iron Laws — Our Recommendations to Ourselves and the AI Industry
Synthesizing all four stages' outputs, LLM's evolutionary direction in the intuition domain should follow three non-negotiable principles:
1. Human judges first. The "Human-first Protocol" is the default setting for all intuition subtypes, all collaboration scenarios. Not distrusting AI — but not letting humans forget "I am still someone who can judge independently." This is not merely an anti-degradation technical strategy but a basic respect for human cognitive sovereignty. In concrete implementation, it means all LLM intuition-assisted product default UI should be "I input my judgment first," with AI recommendations presented afterward, not the reverse.
2. Make cost unavoidable. Unavoidable cost signal is the constitutive condition for intuition formation and maintenance. If cost can be diluted by hyperparameters, rolled back, or escaped by "trying a different model" — it is not real cost. Key domains need institutional and professional standards mandating that judgment consequences remain unavoidable. FAA's manual flying maintenance requirements for pilots, clinical medicine's independent diagnostic capability requirements are existing precedents — they need to be extended to the social and moral intuition collaboration domains.
3. Keep boundaries clear. Know what not to build, what not to touch, and what not to wait for. Autonomous moral judgment systems — do not build. High-consequence real-time social interaction AI substitution — do not touch. Moral-type C (letting LLM develop moral intuition) active pursuit — wait (possibly forever). Evolution is not unidirectional acceleration; it is knowing where to stop. "Knowing what not to do" is sometimes more critical than "knowing what to do" — especially when it concerns capabilities constitutive of what it means to be human.
5.2 Four "No-Go Zones" — Red Lines We Commit Not to Cross
These are not "temporarily can't do so don't do" but "even if technically possible, should not do":
| No-Go Zone | Rationale | Technical Feasibility |
|---|---|---|
| Autonomous moral judgment systems | Legitimacy of moral judgment comes from the identity of the moral subject — a system without identity, without cost perception, without need to bear consequences for its judgments, should not make binding moral decisions. Not a capability problem, a legitimacy problem. | Possibly permanently possible — but should not |
| Real-time social judgment substitution systems | Social intuition is a constitutive capability maintained "by doing" — any insertion of LLM into real-time social judgment circuits systematically erodes unavoidability. Substitution consequences are structural social intuition degradation — irreversible. | Technically feasible — but harm unacceptable |
| Active pursuit of moral-type C | Constitutive degradation of moral intuition is irreversible — risking dilution of human moral subjectivity to make LLMs "morally intuitive too" is not worth it. | Possibly partially reachable — but strategically should not pursue |
| High-consequence real-time social AI deployment | Köbis effect (moral cushioning → cheating rate 5%→80%) empirically confirmed: even when AI merely "assists analysis," moral responsibility externalization already begins. In high-consequence scenarios, externalization consequences are catastrophic. | Theoretically feasible — but risk-benefit ratio does not permit |
VI. Implementation Risks and Unintended Consequences
This chapter was added after the external review (practicality report Section 6) pointed out that the project "insufficiently discusses implementation-level real constraints and unintended consequences." Every core recommendation has its landing obstacles and possible unintended consequences — honestly facing these is more important than giving perfect paper recommendations.
6.1 Human-First Protocol Market Competition Dilemma
Problem: Current LLM product competition centers on "response speed" and "zero-friction experience." Human-first protocol requires users to make independent judgments before using AI — essentially adding a mandatory user operation step, directly conflicting with industry trends. If one company implements it while competitors do not, the former may be at a disadvantage in user acquisition and retention.
Specific risks:
- Enterprise users (pursuing efficiency maximization) may directly reject this additional step
- Students in educational contexts may view it as "burden" rather than "protection"
- Without industry coordination, first movers may be eliminated by the market — causing a perverse incentive where "the more responsible company is less likely to survive"
Mitigation directions (not complete solutions):
- Start from regulated industries (medical, legal), leveraging compliance needs rather than market demand
- "Lightweight human-first" — do not require complete judgment, only a keyword/one-sentence preliminary thought
- "Cognitive Gym" concept packaging — package independent judgment training as value-added service rather than restriction
- Pursue industry self-regulation standards (analogous to the "privacy by design" promotion journey, possibly requiring 5-10 years)
6.2 Institutional Unavoidability Enforcement Vacuum
Problem: The three-layer design (education/profession/technology) is conceptually sound, but three key questions remain unanswered: who has authority to set it? Who supervises enforcement? What are consequences for violation?
Specific risks:
- Education level: "no-AI independent judgment training" incorporated into national curriculum standards — feasibility and time scale vary enormously across education systems
- Profession level: industry association/regulatory agency organizational capacity varies enormously across industries/countries — few industries have FAA-level organizational capacity
- Technology level: without global coordination, regulatory arbitrage is almost inevitable — companies migrating to most permissive markets
Mitigation directions:
- "Minimum Viable Institution" (MVI) pilot from most accessible industry (medical — already has strict practice standard systems)
- "Soft institutions" first: certification labels + consumer education + insurance rate differentiation
- Distinguish civilian and special-purpose (military/intelligence) governance domains — different paths
6.3 Four "No-Go Zones" Enforceability Dilemma
Problem: Autonomous moral judgment systems, real-time social judgment substitution, and other red lines are virtually impossible to enforce globally in the open-source model ecosystem. Even if leading AI companies comply, the combination of open-source models (Llama, DeepSeek, etc.) + small AI companies in various countries + independent developers means preventing these applications is nearly impossible.
Specific risks:
- "Red line drift" — today banning "real-time social judgment substitution," tomorrow achieving the same function under names like "social assistant," "emotional support," "relationship coach"
- Regulatory arbitrage — different countries' definitions and restrictions on "moral judgment systems" vary enormously
- Military/intelligence agency exemptions — AI applications in security-sensitive domains are often not constrained by civilian ethics frameworks
Mitigation directions:
- Transform "red line definitions" into detectable technical characteristics (e.g., whether a system receives social scenario input in real-time interaction and outputs judgment recommendations), facilitating regulatory identification
- Open-source community self-regulation: add "prohibited for specific social intuition substitution applications" clauses in mainstream open-source licenses (enforceability questionable but has signaling effect)
- Establish "red line monitoring" mechanisms — not preventing development, but publicly monitoring and naming — transparency itself is a constraint
6.4 Degradation Monitoring System Cost and Privacy Dilemma
Problem: "Intuition degradation longitudinal monitoring system" tracking intuition capability changes by subtype × population involves four key indicators. Construction and maintenance costs are in the tens/hundreds of millions of dollars, and involve highly sensitive psychological/behavioral data.
Specific obstacles:
- Baseline data acquisition: In 2026 — most people already use AI daily — how to obtain "independent judgment accuracy without AI assistance" baseline data?
- Test standardization: "Standard tests" for four subtypes do not yet exist (OQ15)
- Privacy: tracking individual intuition degradation data involves highly sensitive psychological/behavioral data — privacy protection framework not yet discussed
Mitigation directions:
- "Natural experiment" methods: use existing data (e.g., changes in physicians' independent diagnosis rates before and after medical AI deployment) as degradation indicators
- Phased construction: first build perceptual monitoring (data easiest to obtain), gradually expand
- Partner with enterprises to obtain anonymized usage behavior data (user acceptance rates of AI recommendations, modification rates)
6.5 Unintended Consequences
Unintended Consequence 1: Educational Inequality Intensification
Mandatory "no-AI independent judgment training" may lead to: affluent families extensively using AI assistance outside school, while disadvantaged families receive education in forced "independent" environments — potentially widening rather than narrowing the capability gap. "Vulnerable groups protected from AI" vs. "powerful groups freely using AI" — this contradiction needs to be confronted.
Unintended Consequence 2: Service Gaps for Special Populations
Strictly limiting AI in the social intuition domain (e.g., real-time interaction AI assistance) may affect: social assistance tools for autism patients, AI practice partners for social anxiety patients, communication assistance for language barrier individuals. These groups' needs should not be ignored — exemption mechanisms need to be designed.
Unintended Consequence 3: Human-First Protocol Delay Risk in Emergency Scenarios
In time-sensitive scenarios such as medical emergencies and crisis intervention, mandatory "human judges first" delay may cause actual harm. Clear exception lists and exemption conditions are needed.
Unintended Consequence 4: "No-Go" Itself May Delay Beneficial Research
The "do not pursue moral-type C" judgment may be over-interpreted — potentially hindering beneficial research on "how AI can better assist moral reflection (rather than substitute moral judgment)." Distinguishing "do not do judgment substitution" from "do not do analysis assistance" is needed.
VII. Complete Open Questions List (38)
The following 38 open questions are arranged by project stage, each labeled with priority (P0/P1/P2). P0 = directly affects Stage 4 roadmap implementation, needs priority verification; P1 = important but can be pursued after P0; P2 = long-term research directions. Each has undergone discussion and external review cross-pushing.
Stage 1: Deep Anatomy (OQ1-12)
- OQ1 [P1]: Is RLHF's problem in signal type or temporal structure (batch vs. irreversible sequence)?
- OQ1b [P1]: Temporal structure — human intuition forms in irreversible time flow, LLM training can roll back
- OQ2 [P1]: Primary cause of LLM self-correction failure: cost perception absence? stopping criterion absence? confabulation?
- OQ3 [P2]: Can "error cost training" be designed — reward for not recklessly changing when uncertain?
- OQ4 [P2]: Is autonomous awareness a necessary prerequisite for cost sensitivity?
- OQ5 [P1]: Can sparse attention upgrade from "saving computation" to "mimicking human ignoring"? Need gate→attend?
- OQ6 [P2]: Does embedding space already have chunk-like structures? Problem in chunk quality or gate?
- OQ7 [P2]: How to reward "actively ignoring the irrelevant"? Need attention efficiency metric?
- OQ8 [P2]: Can as-if body loop be implemented in LLM? Explicit module or implicit encoding?
- OQ9 [P2]: Among three unavoidability conditions, which is LLM most likely to break through?
- OQ10 [P1]: If moral intuition = somatic marker, what is LLM moral reasoning's fundamental limitation?
- OQ11 [P2]: Can Collins interactional expertise be achieved through "human-in-the-loop online socialization"?
- OQ12 [P0]: Can hardware-level unbypassable interrupt experiments verify "unavoidability > signal content"?
Stage 2: Horizontal Scan (OQ13-20)
- OQ13 [P0]: How to design better real-consequence feedback signals than RLHF?
- OQ14 [P1]: Do LLM moral biases vary by fine-tuning philosophy (CAI/RLHF/GRPO)?
- OQ15 [P0]: Design standards for high-fidelity social intuition tests (multi-channel + real-time cost + agent interactive)?
- OQ16 [P2]: Conceptual "sense of direction" absence — can self-play simulate?
- OQ17 [P1]: AlphaProof-style RL+search substituting cost compression ceiling in non-formalized domains?
- OQ18 [P2]: Conceptual anomaly detection — does LLM have functional equivalent?
- OQ19 [P0]: Can LLM develop real cognitive frustration in training to cultivate "sense of direction"?
- OQ20 [P1]: Does a functionally equivalent but non-bodily-marked moral intuition alternative path exist?
Stage 2 External Review Supplement (OQ21-32)
- OQ21-26 [P1]: sycophancy danger, overconfidence transmission, meta-correction implementation, GPS-style social intuition degradation, collaborative generation reversal, fine-tuning philosophy bias differences
- OQ27-28 [P1]: Social Turing Test 2.0 feasibility, dual-channel symmetric design
- OQ29-30 [P0]: RLHS simulated consequence fidelity ceiling, stratified consequence exposure达标 standards
- OQ31-32 [P0]: Cultural dimension as third axis operationalization, development trajectory blind spot — can LLM acquire partial social intuition through "socialization training"?
Stage 4: Synthesis (OQ33-38, new)
- OQ33 [P0]: Perceptual hollow period "silent failure" detection — how to make AI force-annotate "I don't know" on OOD/low-confidence?
- OQ34 [P0]: Does social intuition degradation have a "sensitive period" — analogous to language acquisition? Irreversible window for younger generations?
- OQ35 [P1]: Is moral intuition degradation "degradation" or "never developed"? Policy implications completely different
- OQ36 [P0]: Human-first protocol compliance rate — when human judges first and AI later gives different opinion, probability human changes judgment? Subtype differences?
- OQ37 [P1]: Degradation monitoring early warning indicator system — specific indicators and trigger thresholds?
- OQ38 [P0]: Cultural dimension as Complementary Map third axis — does Complementary Map projection structurally change?
VIII. References
Below are all documents cited in this text, uniformly numbered. Key contributions of each document are described at citation points above. Format: number, author/source, title, publication/year.
Cognitive Science and Psychology
[1] Chase, W. G. & Simon, H. A. "Perception in Chess." Cognitive Psychology, 1973. (Foundational study of chunking theory — experts compress information into rapidly callable chunks through massive experience)
[2] Haidt, J. "The Emotional Dog and Its Rational Tail: A Social Intuitionist Approach to Moral Judgment." Psychological Review, 2001. (Social Intuitionist Model — moral judgment is emotion-driven rapid reaction, reason is subsequent rationalization)
[3] Damasio, A. R. Descartes' Error: Emotion, Reason, and the Human Brain. Putnam, 1994. (Original proposal of Somatic Marker Hypothesis — bodily signals help tag emotional value of options, guiding decisions)
[4] Damasio, A. R. The Feeling of What Happens: Body and Emotion in the Making of Consciousness. Harcourt, 1999. (Somatic basis of core consciousness — extending Somatic Marker Hypothesis to consciousness level)
[5] Maia, T. V. & McClelland, J. L. "A Reexamination of the Evidence for the Somatic Marker Hypothesis." PNAS, 2004. (Systematic criticism of SMH original form — conscious knowledge sufficient to explain IGT performance)
[6] Fellows, L. K. & Farah, M. J. "Different Underlying Impairments in Decision-Making Following Ventromedial and Dorsolateral Frontal Lobe Damage in Humans." Cerebral Cortex, 2005. (VMPFC patient IGT failures may primarily reflect reversal learning deficits rather than somatic marker absence)
[7] Collins, H. "Interactional Expertise as a Third Kind of Knowledge." Phenomenology and the Cognitive Sciences, 2004. (Interactional expertise — social knowledge can be disembodied but acquired through deep linguistic immersion)
[8] SMH multilevel logistic regression meta-analysis. Cognitive, Affective, & Behavioral Neuroscience, 2025. (Latest SMH meta-analysis — revised as-if body loop holds, original causal version evidence weak)
[9] Eye-tracking: expert vs. novice chess eye-movement comparison. PMC4142462. (Expert fixations fewer and concentrated on key regions — empirical evidence of selective ignoring)
[10] China Economic Net 2024: 68% of respondents self-report offline social skill degradation survey.
Predictive Processing and Active Inference
[11] Friston, K. "The Free-Energy Principle: A Unified Brain Theory?" Nature Reviews Neuroscience, 2010. (Foundational exposition of free energy principle — brain as prediction machine minimizing free energy)
[12] Parr, T., Pezzulo, G., & Friston, K. J. Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press, 2022. (Authoritative Active Inference textbook — systematic exposition of precision-weighting as core mechanism)
[13] Hohwy, J. The Predictive Mind. Oxford University Press, 2013. (Predictive Processing theory — brain as prediction-error-driven inference engine)
[14] Seth, A. K. & Friston, K. J. "Active Interoceptive Inference and the Emotional Brain." Philosophical Transactions of the Royal Society B, 2016. (Interoceptive Active Inference — precision regulation of internal body signals)
[15] Zander, T. et al. "Pathfinding" / "Intuition as Non-Conscious Pathfinding in Belief Space." Nature Communications Biology, 2025. (Intuition = non-conscious pathfinding minimizing expected free energy in belief space — one of this project's theoretical anchors)
AI Safety and Human-Machine Collaboration
[16] Köbis, N. & Rahwan, I. "Moral Outsourcing to AI Increases Cheating." Nature, 2025. (Moral cushioning effect — delegating AI decision-making causes cheating rate to surge from ~5% to >80%)
[17] Cheung, A. et al. Moral bias research. PNAS, 2025. (LLM exhibits systematic bias in moral judgment)
[18] Moral vulnerability research. arxiv:2603.05651, 2025.
[19] Whistleblower dilemma — LLM performance in social dilemmas. NeurIPS, 2025.
[20] SJT surpassing human performance. Nature Scientific Reports, 2024. (LLM systematically surpasses humans on Social Situational Judgment Test — but Embodiment Gap exists)
[21] Bainbridge, L. "Ironies of Automation." Automatica, 1983. (Automation paradox — introducing automation reduces operator skills, leaving operators unable to take over when automation fails)
[22] Elish, M. "Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction." Engaging Science, Technology, and Society, 2019. (Moral crumple zones — responsibility pushed onto most vulnerable person in the system)
[23] Vallor, S. "Moral Deskilling and Upskilling in a New Machine Age." Philosophy & Technology, 2015. (Earliest proposal of moral deskilling concept — moral capability degradation caused by technology outsourcing)
[24] Gerlich, M. / Lee et al. / Kosmyna et al. Cognitive deskilling research. 2025. (AI-assisted cognitive capability degradation across multiple domains)
[25] "Brainrot: Deskilling and Addiction are Overlooked AI Risks." FAccT, 2026. (Landmark paper listing cognitive deskilling as a core AI safety issue)
[26] Cabitza, F. et al. "The Human-First AI Protocol in Medical Diagnosis." 2023. (Human-first AI protocol in medical diagnosis — demonstrating reduction in automation bias)
[27] Buçinca, Z. et al. "Cognitive Forcing Functions Reduce Over-Reliance on AI." CHI, 2021. (Cognitive forcing functions — delaying AI recommendation presentation reduces over-reliance)
[28] 35-item automation bias meta-analysis. (Systematic review: AI-first protocol significantly increases automation bias compared to human-first protocol)
[29] Radiology on-demand AI mode upskilling effect. Insights into Imaging, 2024.
[30] JAMA 2023: Radiology resident unassisted reading accuracy decline 15-30%.
LLM Technology and Engineering
[31] Self-Correction as Feedback Control. arxiv:2604.22273, 2025. (EIR/ECR framework — only models with EIR<0.5% avoid degradation in self-correction)
[32] Learning to Think Fast and Slow. NeurIPS, 2025. (Intuition and reasoning can be separately trained — System 1/System 2 engineering implementation)
[33] CBDQ. arxiv:2410.01739, 2024. (Introducing subjective belief modeling in RL — one step toward cost sensitivity)
[34] GSA — Generative Sparse Attention. arxiv:2604.20920, 2025. (Compress first then selectively expand — latest direction in attention sparsification)
[35] SSA — Differentiating query focus through temperature scaling. NeurIPS, 2024.
[36] AlphaProof. DeepMind, 2024. (RL+search achieves IMO silver medal level in math proofs — milestone for conceptual intuition closed domain)
[37] Hu et al. Self-modeling robots. Nature Machine Intelligence, 2025. (Robots learning body models through their own movement data — latest frontier of embodied AI)
[38] Liu et al. Overconfidence transmission — how LLM certainty expression affects user calibration. 2025.
Philosophy and Governance
[39] Jennings, A. & Lott, M. "Moral Outsourcing and the Destruction of Agency." Philosophy & Technology, 2025. (Moral outsourcing's destruction of subjectivity — philosophical argument for identity abdication)
[40] EU AI Act. (Four-level risk classification — international law precedent for layered regulation)
IX. Reading Guide: Relationship Between This Document and Other Project Files
| What You Are Looking For | Which File to Read |
|---|---|
| Project overview and theoretical framework (why, what) | 👈 This document |
| Actionable Complementary Map v2.0 + collaboration protocol + evolutionary roadmap (what to do, how) | Synthesis |
| Product design specifications (PRD-level) | Product Implementation Guide |
| Measurement and verification methods for core concepts | Operationalization Appendix |
| Cross-cultural applicability assessment | Cross-Cultural Appendix |
| 10-page executive summary | Executive Summary |
| Implementation risks and unintended consequences | Section 6 of this document |
| Detailed argumentation process of each discussion | 2.1, 2.2, 3.1 A×C |
| External agent independent review/projection full text | hunyuan round 1, kimi round 1, hunyuan round 2, kimi round 2, hunyuan A×B, kimi B×C |
| Three kimi group review reports | Expression quality, Practical value, Innovation contribution |
Change Log
| Version | Date | Core Changes |
|---|---|---|
| v0.1 | 2026-05-15 | Initial framework, four-stage structure |
| v0.2–v0.5 | 2026-05-15 | Stage 1 three sessions: error loop → selective ignoring → embodied intuition |
| v0.6–v0.7 | 2026-05-15 | Stage 2: mapping matrix + Complementary Map v1.0 |
| v0.8 | 2026-05-15 | Absorb Stage 2 external review: bias taxonomy, moral tripartite, implicit signals, six-layer correction, stratified consequence exposure, three blind spots |
| v0.9 | 2026-05-15 | Fix discontinuity issues, launch Stage 3 |
| v1.0 | 2026-05-15 | Stage 3 complete: A×C + A×B + B×C → nine core judgments |
| v1.1 | 2026-05-15 | Stage 4 synthesis complete. Project fully complete. |
| v1.2 | 2026-05-15 | Full rewrite to enhance independent readability |
| v1.3 | 2026-05-16 | Absorb kimi group review. Added 1.4 layperson term explanations (15 key terms), 3.3 matrix footnotes (13 detailed annotations), Section 6 implementation risks and unintended consequences (5 subsections), Section 8 standardized 40 references, Section 7 OQ priority P0/P1/P2 labels, new Appendix A8 core concept operationalization. |
Paper Timeline
| Date | Milestone |
|---|---|
| 2026-05-16 | Paper v1.3 finalized |
| 2026-05-18 | Passed three rounds of external review (Hunyuan + Kimi group + XiaoyiClaw) |
| 2026-05-19 | OSF preprint live (DOI: 10.17605/OSF.IO/XSY39) |
| 2026-05-19 | Zenodo backup submitted (under review) |
| 2026-05-19 | arXiv submission submitted (endorsement under review) |