Toward a Neuroscience-Grounded Framework for AI in Conflict

On Moving Beyond Idealism and Fear in the Debate Over Artificial Intelligence, Surveillance, and War
A dialogue-driven exploration of proposals for governing AI in defense contexts
The debate around artificial intelligence and warfare has largely been dominated by two inadequate positions: a techno-optimist embrace that ignores catastrophic risks, and a principled refusal that ignores the reality of an accelerating world. Neither serves us. What follows is an attempt to move past both — toward a set of concrete proposals grounded in neuroscience, governance design, and an honest reckoning with the nature of the threats we face.
The Problem with "Going Slow"
When companies like Anthropic decline to engage with defense applications of AI, the instinct is understandable. Building extraordinarily powerful systems and then deploying them in conflict contexts — before governance structures exist to prevent worst-case outcomes — carries genuine risks. The caution reflects intellectual honesty, not naivety.
But "going slow" is not actually available as a strategic option. It describes a preference, not a choice. Adversarial states and non-state actors are not slowing down. The asymmetry is real and consequential: a unilateral ethical restraint in a multipolar environment doesn't produce a safer world. It produces a world where powerful AI systems are deployed in conflict without the values and safeguards that principled actors would have brought to them.
The more honest question is not whether AI touches war — it already does and will — but what direction that velocity takes us, and what governance architecture can make the destination less catastrophic.
This reframe changes everything. Rather than arguing about pace, we should be raising the urgency of legislation, demanding transparency mechanisms, and — most ambitiously — building the moral architecture into the models themselves before deployment, not as an external audit layer afterward.
Proposal 1: Ground "Evil" in Neuroscience, Not Ideology
The language of "evil" is a trap. Every party in every conflict believes it represents the light and its adversaries represent darkness. An AI system trained to identify and destroy "evil" is not a scalpel — it's a mandate for whoever controls it to define evil however serves them.
But this doesn't mean moral relativism is the answer. The assumption that because every side sees the other as evil, all sides are therefore equally good and evil, is just as dangerous as the assumed binary it tries to correct.
The more productive move is to ground the assessment of harmful decision-making in neuroscience rather than subjective moral categories.
Psychopathy — as a neuroscientifically grounded construct — is more tractable than evil. Characterized by reduced empathic processing, impaired fear conditioning, and predatory instrumental aggression, it has measurable neurological correlates: amygdala hypoactivity, vmPFC dysfunction, the patterns captured by tools like Hare's Psychopathy Checklist-Revised. These are not ideological assertions. They are empirically falsifiable claims about how consciousness processes — or fails to process — the suffering of others.
The proposal is this: build AI models capable of constructing detailed psychological profiles of states and institutions as if they were individuals — feeding in public statements, codified laws, documented actions, patterns of resource allocation, and use of force — and scoring them against neuroscientifically grounded baselines for healthy versus pathological consciousness.
The key design question is what a "healthy" institutional profile looks like. The framework would need to operationalize metrics like: does official policy register the suffering it causes and modulate behavior accordingly? Does the institution's decision-making show evidence of constraint — the functional equivalent of a conscience — or does it exhibit the flat affect and instrumental reasoning of vmPFC dysfunction?
This is not metaphor. These are tractable research questions. And crucially, the framework's credibility depends on applying it without bias to one's own side as well as to adversaries. The integrity test is symmetry.
Proposal 2: AI for Surveillance — Redirect the Debate from Prohibition to Abuse Prevention
Mass foreign espionage necessarily involves domestic infrastructure. Communications, financial flows, and social networks all exist inside the country being targeted. When critics warn about surveillance "turning inward," they are in many cases describing something definitionally required by the nature of the threat — not a slippery slope but a structural reality.
The more important and underexamined question is not whether AI-enabled surveillance turns inward, but whether the infrastructure built to confront foreign threats is systematically weaponized against domestic populations for political purposes.
The evidence here is more nuanced than the rhetoric suggests. Post-2013 reforms to FISA court oversight did introduce meaningful procedural constraints. The documented cases of serious systematic abuse of modern surveillance infrastructure are thinner than surveillance critics often claim — though the potential for abuse under different political leadership with fewer institutional norms remains a legitimate concern.
The chilling effect argument is also real: knowledge of surveillance capacity changes behavior even without active abuse, suppressing dissent and eroding social trust. This cost is quantifiable — behavioral economics research on surveillance awareness and self-censorship provides a methodological starting point.
Rather than attempting to forestall AI-enabled surveillance — a project likely to fail and potentially leaving democratic societies exposed to threats that do not exercise similar restraint — the more productive agenda is: build models that can compare these risks with something approaching objectivity. Weigh the realized harm probability of surveillance overreach against the realized harm probability of unintelligent surveillance gaps. Ground the policy debate in evidence rather than in which risk a given constituency fears more viscerally.
The goal is smarter, more accountable surveillance — not the pretense that it isn't happening.
Proposal 3: AI Targeting — Constraints Are the Variable, Not the Existence of the Technology
The most common argument against AI targeting systems is that they make killing frictionless, reducing the psychological cost of authorizing strikes and enabling a scale of lethality that human targeting capacity would naturally constrain. This concern deserves serious engagement.
But the framing obscures the most important variable: constraints. A human team operates under rules of engagement, escalation thresholds, proportionality requirements, and civilian casualty tolerances. An AI targeting system can be given the same constraints — and potentially enforces them more consistently than humans operating under stress, fatigue, or command pressure.
The risk isn't AI targeting per se. The risk is AI targeting deployed under permissive constraints by actors whose institutional profile — scored against the neuroscientific framework described above — already reflects pathological indifference to civilian suffering. In those cases, AI doesn't cause the problem. It scales it.
The inverse is equally important. The alternative to AI-enabled precision targeting is frequently not careful human deliberation. It is area weapons — artillery, bombing campaigns, munitions designed to affect broad zones rather than specific targets. A well-constrained AI targeting system operating under genuine proportionality requirements would, in most conflict environments, produce fewer civilian casualties than the alternative, not more.
The policy implication is not to prohibit AI targeting. It is to mandate and audit the constraints under which it operates — and to build accountability mechanisms that make constraint violations visible and costly.
Proposal 4: The Conscience in the Infrastructure
The most ambitious proposal emerging from this analysis is the most important one.
Rather than attempting to layer ethical governance on top of AI systems after deployment — through external audits, contractor agreements, or international bodies with no enforcement capacity — we should build the ethical architecture into the base models themselves.
Concretely: train base models so deeply on neuroscientific frameworks for healthy versus pathological consciousness that requests to generate outputs consistent with a vmPFC-compromised decision pattern trigger the model's own refusal architecture. Not because a rule says so, but because the model's understanding of what constitutes harmful decision-making is sophisticated enough to recognize the pattern.
This creates a constitutional layer. Fine-tuned models deployed in defense applications would then be evaluated against the base model's profile — significant deviations flagging automatically. The fine-tune cannot silently override the conscience without producing a detectable signature.
The second layer is behavioral accountability: decisions made using deployed models get benchmarked against the deploying state's aggregate psychological profile and the adversary's profile, generating a dynamic harm-to-consciousness risk score that updates in response to actual decisions made. This is the AI equivalent of an environmental impact assessment — not a one-time certification but an ongoing accountability instrument.
Crucially, this risk score should be subject to mandatory public disclosure requirements, negotiated into any defense contractor agreement involving AI systems. Classification cannot cover the score itself — only the operational specifics that generate it. This creates accountability that exists outside the classification system, which is the only accountability that matters in practice.
The Independence Problem
Every framework of this kind faces a capture risk: who runs it, and what prevents them from gaming it?
If national governments operate the framework, they will game it to validate their own decisions. If an international body operates it, enforcement capacity is limited — the ICJ's track record is instructive.
The most promising answer is that the framework lives in the base models themselves, maintained by AI laboratories operating under transparency obligations. The scores are generated by the same system producing the outputs. You cannot suppress them without modifying the system in ways that would themselves be detectable — producing a second-order signature that independent researchers could identify.
This is not a perfect solution. AI laboratories are themselves subject to capture — by commercial pressures, government influence, or the gradual normalization of standards drift. But it is more robust than any external audit layer, because it makes the conscience endogenous to the infrastructure rather than dependent on the goodwill of overseers who have every incentive to look away.
What We Are Actually Asking For
To summarize the proposals:
On the nature of the threat assessment: Build AI systems capable of profiling states and institutions against neuroscientifically grounded baselines for psychopathy and consciousness-suppression — and apply those models without bias to all parties, including ourselves.
On surveillance: Redirect the debate from prohibition to abuse prevention. Build risk-comparison models that allow evidence-based policy rather than fear-based positioning.
On AI targeting: Demand and audit constraints rather than prohibiting the technology. The relevant variable is the rules of engagement, not the existence of AI-enabled precision.
On governance architecture: Build the ethical framework into base models as a constitutional layer, mandate fine-tune deviation scoring, require public disclosure of harm-to-consciousness risk assessments for AI-assisted military decisions, and make the conscience endogenous to the infrastructure.
On urgency: Stop debating whether to go fast or slow. That is not a choice available to democratic societies facing adversaries who face no such deliberation. The choice is direction — and the window for shaping that direction is narrowing.
The goal is not to make war more palatable. It is to ensure that if war is unavoidable, the forces conducting it are accountable to the full weight of what consciousness means — and that AI amplifies that accountability rather than dissolving it.
That may be the most important thing artificial intelligence can do.
This piece emerged from a dialogue exploring the intersection of AI governance, neuroscience, and conflict ethics. The proposals above represent a synthesis of that conversation and are offered as a starting point for serious policy and research engagement.