Home › Articles › Designing AI for Strategy Games Through Modding
How three Paradox game mods built coherent AI in a scripting language that can't compute, search, or plan, and what each gave up to get there.
Source links:
Game AI is not machine learning. It's architecture: a system that reads world state, evaluates options, and issues commands every tick, through the same interface a human uses but via code.
When game studios build AI with full access to the codebase, they pick from a known menu of architectures, each with different strengths and computational demands:
Utility systems score every available action against a utility function and pick the highest. The Sims does this: each Sim's eight needs (Hunger, Energy, Social, etc.) produce urgency curves, and each object advertises how much it satisfies each need. The product of urgency × satisfaction gives the score; the Sim picks the highest. Dave Mark's Infinite Axis Utility System (used in AAA titles) generalizes this: each action has multiple "considerations" — response curves mapping normalized inputs to [0,1] — multiplied together so any consideration scoring zero vetoes the action entirely. The architecture is clean, composable, and naturally produces coherent behavior from independent scoring functions. It requires: floating-point math, arbitrary functions (curves, exponentials), and the ability to evaluate all candidate actions per tick.
GOAP (Goal-Oriented Action Planning), used in F.E.A.R. and Killzone 2, runs A* over a state-space graph. Each action has preconditions (what must be true), effects (what changes after execution), and a cost. Given a goal, the planner searches backwards from the goal to the current state, finding the cheapest action sequence. This produces genuine multi-step plans — "to kill the enemy, I need a weapon; to get a weapon, I need to navigate to the armory." It requires: a symbolic world-state representation, correct action definitions, and enough CPU budget for A* search each time the agent replans. GOAP is primarily used in action and FPS games where the action space is small and well-defined. Strategy games rarely use formal planning systems — the combinatorial explosion of possible action sequences across economic, diplomatic, and military domains makes planning impractical. What strategy games use instead is prioritized reactive logic: evaluate current needs, pick the most urgent action, repeat next tick. But the concept of a planner that searches for action sequences is useful as a theoretical comparison, because it clarifies what the modder loses when the language can't express search.
HTNs (Hierarchical Task Networks), used in Killzone 2's high-level tactical layer, decompose compound tasks into subtasks using predefined methods. "Assault the position" decomposes into "throw grenade, advance while suppressed, clear area" — if the preconditions for that method hold. HTNs are faster than GOAP for well-structured domains because the decomposition rules prune the search space, but they can only find plans the author anticipated. There is no search for novel solutions.
Behavior trees structure reactive decision-making as a hierarchy of selectors (try children in priority order) and sequences (do all children in order). Halo 2 used them for combat AI. They generalize FSMs without the state-explosion problem, and they support modular addition — you can graft a new subtree without rewiring transitions. But they are reactive, not deliberative: a behavior tree cannot construct a plan. The designer must pre-enumerate every possible sequence as a subtree. For strategic reasoning (economy-military coordination, build-order planning), this means the tree either becomes a hand-authored encyclopedia of strategies or it breaks.
The common thread: all of these architectures require something that a modder scripting language typically doesn't provide — arbitrary computation, symbolic state representations, search algorithms, or at minimum, functions that return values. When you have C++ and a reasonable CPU budget, you choose the architecture that fits the problem. When you have a Paradox script file, you choose the architecture that fits the language.
Clausewitz/Jomini script has four primitives: events (fire on schedules or conditions), triggers (boolean tests of game state), effects (mutate game state), and data weights (influence the hardcoded AI's choices). Variables store numbers or booleans on scoped objects. Flags are boolean markers. Modifiers are persistent stat changes. Script values compute a number with left-to-right arithmetic (no operator precedence, 5 decimal places max) — these are functions that return a numeric value. Scripted triggers return booleans. But scripted effects, the workhorse of AI logic that changes game state, cannot return values.
What's missing: no arrays, no structs, no iteration over dynamic collections (both Clausewitz and Jomini have bounded while loops capped at 1000 iterations, and Jomini games have variable lists; neither lets you iterate over game-defined collections like "all building types" or "all candidates"), no generic runtime introspection, no exception handling. The language provides hundreds of named triggers for specific game-state queries — has_building, treasury > X, is_at_war, modifier:research_efficiency, scale_supply_demand — but these are specific introspection, not generic. You can't enumerate all modifiers on a scope, discover available operations, or query arbitrary object structure at runtime. You cannot write for each building in country. You enumerate every case explicitly. You cannot write a utility function that scores every candidate action and sorts by the result. You can multiply dynamically computed values inside a single script value — but you cannot evaluate that product for every candidate building in a loop, sort all candidates by the result, and pick the highest, which is what a utility system requires. You hardcode the scoring for each building into its own evaluation trigger.
Script runs on the main game thread. Every empire, every tick. A poorly designed AI mod halves game speed. The game is deterministic lockstep in multiplayer — any non-determinism causes out-of-sync errors — so there's no room for threading, caching strategies that might diverge, or probabilistic shortcuts. The only execution model is the daily/monthly pulse system: on_action callbacks that fire at regular intervals for every scoped object. Performance is a first-class design constraint. Time distribution, idle periods, selective data collection, and data packing all exist because of this.
The language shapes the design as much as the game does. You design an AI that script can express efficiently, not the ideal AI compressed into script.
Paradox games are deeply systemic. Economy, diplomacy, military, and internal politics all interact: building a factory affects employment, which affects political support, which affects what laws you can pass, which affects what factories you can build. Standard game AI handles this kind of cross-domain coupling through one of several mechanisms: a shared blackboard (all subsystems read/write a common data store), a command hierarchy (a strategic layer delegates goals to subsystems), or shared utility scoring (one evaluation function naturally produces coherent trade-offs).
Vanilla Paradox AI has none of these accessible to the script layer. It is split between hardcoded C++ (unmoddable) and data weights (moddable but limited). The C++ layer runs its own decision loops, and those subsystems likely share state internally — the building AI and the military AI presumably exchange information through C++ data structures. But whatever coordination exists at the C++ level is invisible to the script. The script can't read it, can't influence it, and can't observe whether it's working. What the modder sees is the observable output: decisions that don't cohere. The coherence problem isn't necessarily that the C++ subsystems never share state — it's that whatever shared state exists isn't enough, and the script layer can't supplement it. The result is the behavior that multi-agent theory predicts when subsystems optimize independently: the building AI doesn't know what the military AI is doing, and neither knows what the economy needs. Each subsystem operates in its own bubble, making locally rational decisions that are globally incoherent — or at least, that's what it looks like from outside the C++ layer.
Common failures are observable: builds randomly without regard to market conditions, ignores shortages until the economy collapses, fleets don't repair after battles, transports get stuck with invalid orders, wars stagnate into decades-long stalemates, empires bankrupt themselves building defense platforms they can't afford, tribes never enact laws or reform their government, AI countries never build roads or manage character loyalty.
The modder can't fix this by writing a utility system or a planner — the language can't express either one. They can't add a shared blackboard because the C++ subsystems don't expose their internal state. They can't write a command hierarchy because they can't replace the C++ decision loops. What they can do is build a parallel system in script that makes its own decisions, and then align the C++ layer's remaining choices through weight overrides. The three projects in this document chose three different points on that spectrum: coexist, replace, or compensate.
The goal isn't to make the AI play optimally. It's to make it play coherently — all subsystems pulling in the same direction, producing behavior that looks intentional rather than random. You don't design the ideal AI and then implement it in script. You design an AI that script can express efficiently.
The AI can only decide based on what the script can read. If the scripting language can't query a piece of game state, no amount of clever logic can use it. This is the deepest asymmetry between standard game AI and modded AI: a C++ AI architect can add a new query to the world-state interface. A modder cannot. The modder's AI is bounded not by what it can reason about, but by what it can observe.
Victoria 3: The scripting language cannot query what goods a building type produces, what technology it requires, or what its input/output ratios are. A C++ utility system would read these from the building definition at runtime. The modder must manually hardcode each building's properties as scripted triggers and effects — 49 buildings, each with separate evaluation, sanction, and allocation logic, plus 3,000 generated script values for data extraction. When a modder adds new buildings, those buildings are invisible to the AI unless someone writes a compatibility patch.
Imperator: Before patch 2.0.5, the script couldn't check research efficiency. The AI couldn't know if a country was at 40% or 175% efficiency, so there was no way to tell it "stop building research buildings, you've overcapped." When modifier: syntax was added, a previously impossible improvement became straightforward. Similarly, governor policy selection couldn't use precise province loyalty change values, and mercenary recruiting couldn't check actual maintenance costs — both became possible with the same syntax change. In standard game AI, adding a new sensor is an engineering task. In modded AI, it requires waiting for the engine team to ship a new syntax.
Stellaris: The script can't read the vanilla ship build queue, so the mod emulates queue occupancy with six busy-slot flags per starbase. It can't detect fleet damage precisely, so repair decisions use coarse health thresholds that get stricter with distance. The fleet upkeep calculation requires estimation: the economy evaluation subtracts an approximated fleet maintenance cost because the script can't query actual upkeep directly.
When new engine capabilities expose new data, entire categories of improvement become possible overnight. The modder who maintains a mental wish list of data-blocked improvements ships fastest when the engine exposes new capabilities.
These principles emerge across all three projects. Each one can be read as an approximation of a standard game AI architecture, adapted for what Clausewitz script can express and sacrificing something specific in the trade.
Approximates: The response curves in a utility system. In Dave Mark's Infinite Axis Utility System, each consideration maps a continuous input to a [0,1] output via a configurable curve — exponential for urgency, inverse for satiation, logistic for threshold behavior. The curve shape determines how the AI trades off competing needs, and multiple considerations multiply together so any zero vetoes the action. The modder can normalize values with arithmetic (divide by a known maximum, multiply by a scaling factor, use round/ceil for precision) — ARoAI's supply-vs-demand system does exactly this, mapping market imbalance to a 1-22 priority scale. What the language can't do is apply arbitrary mathematical functions: no exponentials, no logistic curves, no square roots without manually implementing them (ARoAI implements Newton's method for sqrt), no way to write a general response curve that maps an arbitrary input to an arbitrary output shape. The modder can linearly normalize and threshold; they can't bend the curve. What they can also do is quantize continuous values into discrete categories (low/medium/high) and use those as lookup keys. This is also a sound engineering choice on its own terms: fragile arithmetic chains break in scripting languages; threshold-based decisions are robust and composable. A quantized flag only changes when the underlying value crosses a defined boundary, so a rounding error can't flip a decision permanently.
Sacrifices: Smooth, arbitrary response curves. A utility system can express "energy is slightly below comfortable" and produce a proportionally moderate preference for energy-producing actions, with the shape of the response curve determining how quickly preference escalates. Stellaris's boolean economy flags lose this entirely — a flag can only say "energy low" or "energy not low." ARoAI's supply-vs-demand system preserves proportional preference (22 levels of shortage/surplus), and Imperator's weight system allows proportional preference through weight ratios (IRC 75% vs IRC 65%). What all three lose is the ability to shape the response curve: the mapping from input to urgency is fixed (linear steps for ARoAI, proportional-random weights for Imperator, binary thresholds for Stellaris). A utility system would use exponential curves for urgency (small deficits produce mild preference, large deficits produce overwhelming preference), logistic curves for thresholds (gradual transition around a critical point), or inverse curves for satiation (diminishing returns as need approaches satisfaction). The modder can stack multiple thresholds to approximate a curve — Stellaris uses low/medium/high/extreme — but each additional category costs script variables and evaluation time, and the approximation is piecewise-constant, not smooth.
A shared vocabulary of quantized state also approximates a shared blackboard — the inter-subsystem communication mechanism that standard multi-agent architectures use for coherence. The building planner and the fleet producer don't need to agree on what "rich" means in absolute terms because they both check the same flags. This is the blackboard pattern with quantized categories instead of a shared numeric workspace: weaker (no arithmetic on shared values) but sufficient for coherence.
Stellaris: The economy evaluation turns stockpiles and incomes into boolean flags (aai_boolean_energy_income_low, acai_boolean_minerals_income_extreme). Every downstream system reads these same flags. Mineral income is quantized in 5-unit steps from 5 to 510. Energy stock is computed as production × 6 + 300. The building planner, colony reserve system, robot logic, and ship logic all read the same derived flags rather than inventing different ideas of "rich" or "poor."
Victoria 3: Budget health score (-3 to +3) uses two-dimensional curves where the surplus requirement varies with debt level and vice versa. Negative health levels require both weeks_of_reserves < 156 AND either a surplus or debt threshold. Reserves act as override (156+ weeks blocks all negative health) and gate (positive health can be achieved through either sufficient surplus or sufficient reserves). Debt-free countries get easier thresholds that scale with how full their gold reserves are, creating smoother degradation within each quantized level than a simple threshold would.
Approximates: The replanning cadence of a deliberative architecture. A planner with a finite action space doesn't replan every frame — that would be computationally prohibitive. It replans when the world changes enough, or on a fixed interval. The modder approximates this by staggering work across game days, creating the appearance of a persistent planner that thinks continuously, when really each subsystem runs infrequently on its own schedule.
Sacrifices: Responsiveness. A planning system would replan immediately when a significant event occurs — a fleet destroyed, a war declared, a market crash. The script must wait for the next scheduled pulse. Victoria 3's 28-day cycle means a sudden economic shock can't trigger construction changes for up to two weeks. The idle buffer (days 15-28) is explicitly defensive engineering against cascading delays, but it also means the AI can't use that time even if something urgent happens. Stellaris's economy runs on days 1-7 and ships on days 8-15; a fleet destroyed on day 8 won't trigger economy recalibration until the next month's day 1. This is the fundamental trade-off of fixed-schedule planning: regularity and performance predictability in exchange for latency.
Stellaris: Economy on days 1-7, ships 8-15, robots 16-19, buildings in separate waves. Monthly and yearly cadences for systems that need less frequent updates.
Victoria 3: 28-day iteration cycle per country (minimum 14), with randomized start dates. Day 1 = preparation (data collection, tax/wage management, downsizing, tech redirection), Day 2 = evaluation, Days 3-14 = construction (one building type per day, up to 12 types per iteration), Days 15-28 = deliberately idle buffer. The idle time means a delayed Day 14 doesn't cascade into the next cycle.
Approximates: A command hierarchy. In standard game AI, a strategic layer sets high-level goals and delegates execution to tactical subsystems. A real-time strategy AI might have a commander that decides "expand economy" and a build manager that executes the construction queue. The key property: the commander can observe whether the build manager is succeeding and adjust goals accordingly.
Sacrifices: Bidirectional feedback. The modder's approximation is one-directional: the script sets a strategy (fleet doctrine, construction priority) and the static weights align the vanilla engine with it. But the static weights can't report back. If the script decides "corvette doctrine" and sets component weights accordingly, but the game state has shifted (enemy running point-defense counters), the weights can't observe that they're failing and request a doctrine change. They are baked-in biases, not a feedback loop. The architecture also can't handle conflicts between vanilla subsystems that the script doesn't control — if two hardcoded C++ subsystems have competing goals, the mod's static weights can only bias them, not mediate.
Stellaris: Ship doctrine flags (corvette-preference or battleship-preference) assigned at game start and cascaded through every layer. Tech weights, component weights, and ship section layouts all reference the doctrine flag — 44 tech/component checks align vanilla systems with the mod's strategic choice.
Victoria 3: Production method ai_values guide vanilla PM selection with clear progression (50,000 → 100,000 → 200,000 → 400,000) while three defines disable vanilla construction, tax management, and government spending AI entirely, giving full control to the scripted system.
Imperator: Invictus: Invention Relative Chance (IRC) values set high enough to overcome the randomized selection pool — IRC 95% translates to weight 361, which is 35x more likely than an IRC 35% option. The mod's own scripts handle law management, building decisions, and economic policies that the vanilla AI barely touches, while the static IRC weights ensure the vanilla research system picks reasonable inventions.
Approximates: The manual decomposition rules in an HTN. An HTN can only find plans the author anticipated: "Assault position" decomposes into "throw grenade, advance while suppressed, clear area" only if someone wrote that decomposition. The modder leans into this constraint: rather than fighting the language's inability to search for novel strategies, they reduce the decision space to a smaller set of known-good patterns. A narrow competent AI beats a broad confused one.
Sacrifices: Adaptability to change. When the game changes — patches, DLC, mods that add new content — the reduced decision space may miss viable strategies that a broader approach would find. Destroyers and cruisers aren't bad ship types in Stellaris; they're just harder to script competent behavior for, so the mod removes them from production. If a future patch makes destroyers dominant, the mod must be manually updated. A utility system with per-ship-type scoring would adapt automatically by shifting scores when the game's balance changed. The modder pays this cost willingly: a small number of strategies executed well beats a large number executed poorly, and the scripting language couldn't produce competent behavior across the full decision space anyway.
Stellaris: Corvette-or-battleship doctrine only; no mixed fleets, no destroyers or cruisers. Starbases are shipyard, anchorage, or trade hub — no hybrid layouts.
Victoria 3: Government buildings use formula-based evaluation (checking population, GDP, innovation targets, military threats) while production buildings use market supply/demand queries against actual trade data. Two fundamentally different methods, each suited to its domain. Within production buildings, a three-layer decision hierarchy: weight controls urgency, order breaks ties encoding dependency chains, and offset gates expansion by productivity requirements — three orthogonal levers, each tunable independently.
Imperator: Building priorities structured as ratio buildings first (matched to territory culture rights and trade goods), then modifier buildings that enhance what's already there. Once research efficiency is high enough, the AI stops building Academies and switches to Forums and Mills, preventing the failure where small AI countries filled every city with research buildings while starving for manpower and taxes.
The three projects chose three different integration strategies with the hardcoded C++ layer, and each one maps to a recognizable pattern in software architecture for integrating with legacy or unmodifiable systems.
The right choice depends on the game: how much the vanilla AI exposes to modding, how badly the vanilla logic fails, and how complex a replacement would be.
Approximates: The adapter pattern — wrapping a broken interface with one that produces correct behavior. In standard game AI, you'd refactor the decision function. The modder can't refactor C++ code, so they bypass the broken function entirely and reimplement its intended effects in script.
Sacrifices: Synchronization with the original system's state. When the mod zeros out vanilla edict weights and applies equivalent modifiers through script, the game's internal accounting for "which edicts are active" diverges from reality. Other systems that check "is this edict active?" will get wrong answers. The mod gets the functional effect right (the AI gets the bonuses it should) but may create inconsistencies in any system that tracks the original mechanism's state. This is acceptable when the original system is so broken that its state tracking is meaningless anyway, but it creates a maintenance burden: every new system added by Paradox that interacts with the bypassed mechanism needs to be checked against the mod's simulated version.
Stellaris: The mod sets ai_weight = 0 on most influence edicts, making the vanilla AI almost never cast them. Instead, script checks conditions and applies equivalent modifiers directly, gated by economy booleans and influence costs. Enclave trade deals are similarly simulated with tiered energy costs.
Imperator: Economic policy management. AI couldn't reliably choose between Free Trade and Trading Permits, so the mod implements the logic directly. Harsh Taxation (normally player-only) is used by AI when research efficiency is severely overcapped. Each policy decision that the vanilla AI ignores or bungles is handled directly.
Victoria 3: The strike event override. The entire vanilla strike event chain (9 events) is overridden so AI always chooses to break strikes rather than negotiate (negotiate=0, break=10 in the AI weighting). Negotiating creates economic promises the AI might not follow through on, leaving lingering negative modifiers.
Approximates: Satisficing rather than optimizing. A utility system maximizes expected utility across all possible actions; it considers what the AI should do. The modder minimizes worst-case damage — they consider what the AI is doing wrong and stop it. The objective function isn't "play well" but "stop playing badly." This is a deliberately different optimization target, and it's the correct one when the system can't express enough of the decision space to produce globally good behavior. If you can't write a utility function that correctly scores every option, you can still write targeted rules that prevent the worst options.
Sacrifices: Any behavior that isn't a known failure mode. The modder can only fix what they've observed failing. Suboptimal play that doesn't produce visible disasters — inefficient but stable economies, underutilized game mechanics, missed opportunities — persists because nobody filed a bug about it. An optimization-oriented AI would find these through its scoring function. A failure-mode AI only fixes what's broken enough to notice.
Stellaris: Transport stuck-fix (detect invalid orders, retry, eventually recreate). Strongest-fleet repair routing (detect damage, route to starbase, freeze and regen). Critical building enforcement (yearly scan ensures buildings that vanilla "forgets" still exist). No-platforms modifier (poor empires get -50 starbase defense capacity, stopping them from wasting money on defense platforms).
Victoria 3: Stalemate war resolution (every 30 days, advance counter through 24 levels over ~2 years, then force resolution: secessionists win secessionist wars, higher-population side wins revolutionary wars, other wars get white peace). Building downsizing (buildings with occupancy < 40% for 6+ iterations are removed).
Imperator: Preventing civil wars through bribing and free hands (vanilla was silently cheating by not paying political influence costs). Deleting excess ports that waste building slots. Farming settlements in states with many cities to prevent starvation. Moving province capitals away from weird post-annexation locations.
Built for Stellaris 2.1.3 with its tile-based planet system. The first generation: coexist with vanilla AI, built on top of the Glavius AI Fix Mod. The coexist strategy means the mod never fully owns any decision — it must constantly counteract vanilla AI's choices while producing its own. This produces an architecture that looks like a middleware layer with partial overrides: some decisions are intercepted and handled correctly, others are nudged via weight overrides, and still others are left to vanilla and hoped for the best.
The mod grew around Glavius's existing systems (colonization, critical building enforcement, edict simulation, war pressure) with a new layer: economy model, fleet production, starbase planning, repair, robot assembly, habitat building. This created two design eras in one mod — different naming conventions (gai_* for legacy, aai_*/acai_* for new), different assumptions, overlapping responsibilities.
In a C++ codebase, accumulated dead code would be flagged by static analysis and removed in a refactor. In a modding ecosystem, you can't refactor someone else's mod — you can only layer on top of it. The Clausewitz file-overwrite model (same filename wins entirely, last in load order) means merging requires replacing whole files, not patching individual lines. The archaeological layers persist because removing them risks breaking undocumented dependencies.
The edict simulation layer illustrates the coherence cost of this accumulation: most edict checks use inline energy thresholds rather than the shared economy booleans that the building and ship systems read, so tuning changes to the central economy model may not propagate to buff eligibility. This is the coherence problem again: two subsystems nominally sharing a blackboard, but one subsystem has a private copy of the data that drifts.
The central economy calculator (aai_calc_economy) converts raw stockpiles and incomes into a shared vocabulary of boolean flags. Every downstream system reads the same flags. Mineral income is quantized in 5-unit steps from 5 to 510. Food income uses 29 discrete levels for the low threshold, 61 for medium, and 61 for high, with scaling differing by empire size.
This is the blackboard pattern implemented through quantized flags. The economy calculator writes and every subsystem reads. Conflicts don't arise because the flag vocabulary is categorical, not propositional — "Energy low" doesn't conflict with "minerals high." The trade-off is expressiveness: the blackboard holds booleans and small integers, not structured proposals, so subsystems can only observe shared state and independently decide what to do about it.
The calculator subtracts estimated main-fleet upkeep when the fleet is parked at a friendly crew-quarters starbase. This estimation is necessary because the script can't query actual fleet upkeep. The approximation drifts. The paired mineral deduction (exactly 2× energy) encodes a game-design assumption about the energy:mineral cost ratio that may not hold across all fleet compositions. The inability to query the real number is the deeper problem.
The vanilla AI makes acceptable individual decisions but they don't form a coherent strategy. The mod takes over essentially every productive activity: building on planets (reading tile yields to choose between food, energy, minerals, research, and special deposits), building on habitats, upgrading through full tech chains, enforcing critical support buildings yearly, handling Rogue Servitor bio-trophy mechanics, controlling colonization timing with a 300-mineral savings reserve, and keeping robot assembly active.
A utility system would score each tile's best building option against current needs and pick the highest. The modder can't write that function — there's no way to express "score this tile's yield for food, multiply by food urgency, compare to the same calculation for energy and minerals." Instead, the building logic is a hand-authored decision tree: if food low, build farm; else if energy low, build plant; else if minerals low, build mine; else build research. This is a priority-ordered selector — the leftmost satisfied condition wins. It's correct when the priorities are right, but it can't make trade-offs. A tile with +4 minerals and +1 food gets a mine when the empire is slightly low on food, because food never gets priority over minerals in the tree. A utility system would see the +4 mineral yield and the slight food deficit and might choose differently. The modder accepts this rigidity because the alternative — encoding utility scoring in a language without functions — requires hardcoding every score combination explicitly, which explodes combinatorially.
Colonization is dual-gated: the outer check requires the shared economy boolean aai_energy_income_low = no, and the inner check requires sufficient stockpile plus 300+ minerals (or a full deposit reserve). A deposit system diverts minerals into a virtual reserve capped at 300 — empires below 3 planets deliberately save for colony ships rather than spending on buildings. On colonization, the mod removes the default shelter and relocates it to a better tile using cascading adjacency priority conditions. Species selection avoids pops under assimilation or purge, and for empires whose primary species has habitability below 20%, it creates an additional pop of a viable species.
Building upgrades form complete chains through tier 5 for all resource types, with empire-type-specific unity progressions (temple, uplink node, autochthon monument). Research labs specialize at tier 2 capital (random specialization weighted 20% physics / 40% biology / 30% engineering, gated behind years_passed > 10). The monthly building pass can also demolish and replace buildings for surplus rebalancing.
Ships are created at a starbase with free shipyard capacity, routed through a delayed delivery chain toward the main fleet. Six busy-slot flags per starbase emulate queue occupancy since the script can't read the vanilla build queue. Megastructure decisions use selective perk weighting (Voidborn 10→100, Master Builders 10→100, Galactic Wonders 10→200 only after Voidborn), and gateways replace vanilla's broad weighting with a single high-threshold case.
The key insight isn't any single subsystem — it's that all of them read the same economy flags and run on the same distributed scheduler. Coherent strategy emerges from many small decisions that share the same picture of the world.
Each empire is assigned a fleet doctrine at game start — corvette-preference or battleship-preference — and that decision propagates through every layer. 44 tech/component checks reference the doctrine flag to align vanilla systems with the mod's choice.
This is a static commitment to a strategy that a deliberative agent would replan around. Faced with an enemy running heavy point-defense that neutralizes corvette swarms, a planning system would search for a new strategy. The mod's doctrine is assigned once and never revisited. The trade-off is intentional: a mid-game doctrine switch would require simultaneously updating 44 weight entries, production logic, and the naval capacity formula — and any missed entry creates incoherence. The cost of replanning exceeds the cost of committing to a suboptimal doctrine and executing it well.
The naval capacity model reflects this specialization: default desired utilization runs from 0.60 (peace, minimal) to 1.10 (war, maximum). Fanatic militarists and genocidal civics shift upward; Inward Perfection and fanatic pacifists shift downward. Disbanding follows a strict hierarchy: corvettes first, then destroyers, then cruisers, then battleships — always trimming smallest first.
Starbases can serve many roles, but vanilla builds mixed layouts that are mediocre at everything. The mod separates role allocation from role filling through a multi-pass system: compute desired role targets (shipyards, trading hubs, anchorages), assign roles to existing starbases preferring reuse, then fill slots per role.
The two-pass structure — classify, then build — approximates what an HTN does when it decomposes a compound task into subtasks. The key advantage is that role assignment considers the global starbase portfolio before committing any individual starbase. The trade-off is rigidity: if a war creates sudden demand for more shipyards, the role allocation can't be recomputed until the next yearly pulse.
Personality is expressed through parameter shifts on the same underlying logic rather than separate code paths. The same naval capacity formula serves every empire, with civics shifting the inputs. A fanatic militarist doesn't have a different decision tree — it has the same tree with "desired fleet size" shifted upward. This is parameter-space variation rather than architectural variation: separate code paths per personality type would multiply the maintenance burden by N, while parameter shifts cost nothing extra.
Strong empires sit peacefully next to weak ones forever. The galaxy freezes into a static political map. This is the coherence problem from §1.3 in a different domain: vanilla's military AI and diplomatic AI don't share state, so neither knows whether war would serve the empire's strategic interests. The mod combats this through both push and pull. Pull: gai_war.1 scans peaceful AI empires with sufficient strength and looks for weaker neighboring AI empires as war targets, selecting war goals based on civics. Push: poor empires get the no_platforms modifier (§2.7), preventing them from wasting resources on defense platforms while their economy starves. Global defines make the base AI more expansionist, more claim-focused, and more willing to operate with large fleets.
Standard RTS games handle this through a strategic layer that evaluates the game state holistically — Age of Empires' AI decides when to transition from economy to military based on game time and perceived opponent strength. The modder can't write a strategic evaluator that reads the entire diplomatic landscape (the language lacks the iterators and data structures to score every potential conflict), so they approximate: a single event that checks "is this empire strong, at peace, and next to someone weak?" The heuristic is coarse — it doesn't consider whether the target has defensive pacts, whether the war would overextend the attacker, or whether the attacker's fleet composition counters the defender's. But it breaks the stagnation, which is the failure mode being targeted. The coherence problem that freezes the galaxy can't be fully solved without a shared strategic assessment; it can only be poked at with targeted push and pull events.
Vanilla diplomacy is dominated by inertia — another manifestation of the coherence problem from §1.3. The diplomatic AI and the military AI don't share state, so alliances generate large flat opinion bonuses that make them self-sustaining regardless of whether they still serve either empire's strategic interests. The diplomatic map freezes.
The mod rewrites the diplomatic model around current strategic alignment. Passive relationship bonuses are stripped: alliance opinion drops from +25 to 0, federation membership from +50 to 0, defensive pact from +20 to 0. Trust accumulates much more slowly and caps lower (federation trust growth from 1.0/month to 0.25/month). Pact and federation acceptance is reweighted toward active strategic factors: shared rivals jump from 10 to 100 for federation acceptance, shared threat factor from 0.25 to 0.40.
The diplomatic weight system in Clausewitz is the modder's closest approximation to a utility function. Each acceptance factor has a weight, and the engine sums them to produce a decision. The modder can't write a proper utility function — one that normalizes competing considerations and multiplies them — but they can adjust the weights so that the weighted-sum model produces strategically coherent outcomes. The key move is making passive bonuses near-zero and strategic bonuses dominant. This is equivalent to designing a utility function where "current strategic alignment" scores high and "historical inertia" scores low — except it's done by tuning additive weights rather than composing response curves. The result is less elegant (the modder can't express "trust matters more when threat is low" as a multiplicative interaction) but it moves the AI's diplomatic behavior in the right direction.
Border friction grows stronger and more local. Max friction rises from 100 to 150. Each bordering system generates double the tension (5 → 10). Threat also decays faster (-0.25 → -0.50) and scales more with distance, keeping it local and current.
The combined effect: empires form coalitions based on shared rivals and threats, not on the fact that they've been allied for fifty years. Alliance blocs dissolve when the threat that created them disappears.
Some failures are mechanical breakdowns that need targeted patches, not architectural redesigns.
The vanilla default country type still governs many AI behaviors that custom events can't reach. The mod overrides constructor and science ship targets, links ship fractions to gai_build_fleet (gating fleet building on economy, war state, and naval capacity), and switches army type fractions based on technology and empire type. On game start, a bootstrap event grants the AI arc emitter tech and the map_the_stars edict. The global define rewrite zeroes the mineral budget for navy, colony, robot, building, and starbase spending so the scripted planner controls those outlays directly. The ascension perk chain from §3.3 (Voidborn → Master Builders → Galactic Wonders) is reinforced at this level too.
The diplomatic rewrite (§3.8) extends further than the core changes: trust caps are reduced for all treaty types, gift diplomacy is weakened, SHARED_THREAT_MAX drops from 200 to 50 (anti-threat coalitions cap earlier), and acceptance formulas are restructured around strategic factors rather than flat opinion bonuses.
A full-stack AI replacement — the most ambitious approach of the three. The vanilla AI's construction logic is hidden in C++ with no hooks for modders, so partial fixes couldn't reach the core problem. Where Stellaris coexists and Imperator compensates, ARoAI replaces: the mod owns every economic decision the game allows script to control. This gives it the architectural freedom to build something close to a utility system — the closest any of these three projects gets to standard game AI architecture — while still being constrained by the same language limitations as the other two.
Three NAI defines disabled construction, tax management, and government spending. Debt thresholds set to impossible values. Interest group promotion and suppression disabled. The mod owns every economic decision. The trade-off: total freedom, total maintenance burden. Every game patch potentially breaks everything, and the mod must handle every edge case that vanilla would otherwise cover.
The replace strategy is only viable when the vanilla system is so broken that coexistence produces worse outcomes than starting from scratch, and when the modder can express the replacement logic in script. ARoAI chose this because Victoria 3's vanilla construction AI couldn't be steered with weight overrides — it made decisions in C++ that no amount of ai_weight tuning could fix. The modder's alternative to "replace" was "accept broken." The cost is that every new building Paradox adds in a DLC is invisible to ARoAI until someone writes the compatibility patch — the same data-accessibility problem from §1.4, but now the entire economy depends on the patch being correct.
The scripting language can't read building definitions. Two fundamentally different evaluation approaches, chosen by domain:
The production building approach is the closest any of these three projects gets to a proper utility system. The supply-vs-demand level is a response curve — it maps a continuous input (market imbalance) to a normalized output (1-99 priority). Multiple buildings are scored against the same curve, and the highest score wins the next construction slot. This is utility scoring with a single consideration (market urgency), which is both its strength and its limitation. A full IAUS implementation would add more considerations: construction time, input-good availability for the building's inputs, geographic proximity to demand centers, synergy with existing buildings in the same state. The language can multiply dynamically computed values inside a single script value, but it cannot evaluate that product for every candidate building in a loop, sort all candidates by the result, and pick the highest — which is what a multi-consideration utility system requires. So geography and synergy get their own separate evaluation layers (state aptitude and branching, §4.5), and construction time is not considered at all (a known limitation, §4.15).
Each building's goods carry two independent parameters. Weight controls priority — which building gets built first — by being added to the supply/demand level. Offset controls expansion gating — how productive existing buildings must be before the AI builds more — by being added to the supply/demand level to produce the productivity requirement. Primary goods get low weight (1-4) and zero offset; secondary luxury goods get high weight (7-11) and high offset (4-6), creating a double barrier.
The weight/offset decomposition is what a two-consideration utility system would look like if you decomposed it into two independent parameters because you can't compose a multi-candidate scoring loop. In an IAUS, weight and productivity would each be response curves, and their product would determine the final score: high weight × low productivity = maybe build; low weight × high productivity = maybe build; high weight × high productivity = definitely build. The decomposition can't express this product across candidates, so it adds weight to the priority score and adds offset to the productivity threshold. These are additive decompositions of what should be a multiplicative interaction. The result is that the two parameters interact less gracefully than they would in a proper utility system: a building with very high weight but very low productivity still gets queued (because the high weight pushes it to the top of the priority list), and then it may fail the productivity gate — but the priority system already committed a construction slot to it. A utility system would never score it highly in the first place.
Knowing there's a shortage doesn't mean you should build more. If existing mines are unprofitable, building more accelerates the bleeding. The productivity requirement connects market urgency to actual building performance.
The productivity level is calculated as supply_vs_demand_level + offset, then mapped to a minimum earnings threshold against the country-wide median. During extreme shortage with zero offset, a building needs to earn about 42% of the median to justify expansion. At mild shortage plus moderate offset, the bar rises to 84-96%. Above balance, only buildings earning well above median (up to 145%) get expanded.
A crucial discount prevents deadlocks: when a good is critically important (ranging from 5 for gold mines to 99 for railways), productivity requirements are discounted — divided by 1.30 for very crucial buildings (~23% discount) or by 1.15 for moderately crucial buildings (~13% discount). This prevents the AI from refusing to build something it desperately needs because existing instances are underperforming due to the very shortage the new building would resolve.
This is a heuristic approximation of a joint optimization over two variables — urgency and viability — that a planning system would solve by searching the action space. The modder decomposes it into two sequential checks: is there a shortage? (urgency) and are existing buildings profitable? (viability). The crucial-good discount is the escape hatch for the chicken-and-egg problem where the building is unprofitable because the shortage hasn't been resolved, and the shortage can't be resolved because the building is unprofitable. A planner would find a path through this; the modder hardcodes the exception.
Budget health (-3 to +3) uses two-dimensional threshold surfaces. Negative levels require weeks_of_reserves < 156 as a gate — a country with 3+ years of reserves cannot be in negative health regardless of cash flow. Positive health can be achieved through either sufficient surplus or sufficient reserves. Tax/wage adjustment is asymmetric: healthy budgets lower taxes and raise wages; negative budgets raise taxes first, then lower wages. Military wages floor at medium during active wars.
Spending shares are a command hierarchy implemented as a hard budget. Government administration gets 20% (plus up to 5% from lost taxes), university 10% (plus up to 2.5% for innovation deficit), port 10%, military 30% (barracks share hard-capped at 80%), and construction gets the residual plus the investment pool. An investment pool multiplier (up to 2.20) scales non-construction shares upward when private-sector funding covers construction costs. Country-specific military spending multipliers apply before 1870 (Egypt 2.5x, Turkey/Prussia 1.5x, declining to 1.0x). The shares are hardcoded — the military share stays at 30% regardless of war state, and the modder adjusts what that 30% buys (via threat assessment, §4.13) rather than the share itself.
State aptitude scoring varies by building type with randomization within tiers. States at the same aptitude level are interchangeable, so the AI builds in different places each game — trading a small amount of optimality for geographic diversity.
Resource and agriculture protection adds a strategic twist: if a state has potential for critical resources (rubber, oil, coal, iron) or luxury crops (tea, coffee, dye), building a non-critical good there is penalized. This prevents the AI from filling mining states with gold mines when iron deposits are still available.
Branching adds a second dimension: within each aptitude level, states are filtered by incorporation status, infrastructure headroom, and workforce availability into four branches. Branches are interleaved across aptitude levels (A1B1, A2B1, A1B2, A2B2...), so branch quality takes priority over aptitude quality — an aptitude-2 state with ideal conditions builds before an aptitude-1 state with poor conditions. Within the same branch group, aptitude order is preserved. "Right conditions in a decent location" beats "great location with wrong conditions," but the interleave means this is a structural guarantee, not a soft preference.
The aptitude × branching system is a heuristic approximation of what a utility system with geographic considerations would compute. Multiplicative scoring isn't available in script, so the joint optimization gets decomposed into two sorting keys (aptitude, branch) and interleaved. This produces a partial ordering rather than a total ordering, but it prevents the catastrophic failure of building a coal mine in a state with no workers.
When multiple building types end up at the same priority, the order attribute breaks the tie and encodes dependency chains:
1 Construction Sector
2 Government Administration
3 Railway, Port
4 Oil Rig, Tooling Workshops, Power Plant
5 Logging Camp, Whaling Station, Coal Mine, Iron Mine, ...
6 Lead Mine, Rubber Plantation, Paper Mills, ...
7 Glassworks, Motor Industry, Arms Industry, ...
...
15 Tea Plantation, Coffee Plantation, Arts Academy
Construction capacity first (need points to build anything), then administration (need bureaucracy), then infrastructure (need to move goods), then tools and power (inputs for everything downstream), then raw materials, then heavy industry, then military, then grain, then luxury goods last. The bootstrapping problem is mostly about avoiding catastrophic mis-sequencing, not finding the optimal path. A static heuristic tiebreaker gets the broad sequencing right while dynamic priority handles country-specific adjustments.
This is a manual linearization of a dependency DAG — what a planning system would solve by searching for a valid build order given current state. The static order attribute can't adapt: if a country has abundant tooling workshops, the order still tells it to build more before moving to the next tier. The modder accepts this rigidity because writing a dependency-graph traversal in a language without recursion, dynamic data structures, or graph representations is not feasible. The order numbers are a hardcoded topological sort, and they work because the dependency structure is relatively stable across countries.
With 49+ building types × multiple countries × multiple states, the naive approach is prohibitively expensive.
Data packing stores multiple values in single integers by digit position — roughly 5x memory reduction. The trade-off is safety: if any value exceeds its digit range, it corrupts adjacent cells silently. A C++ struct would fail a type check; a packed integer silently produces a wrong priority.
Code generation: A JavaScript toolchain produces repetitive Paradox script — 3,000 script values for data extraction, a 999-case switch for technology progress, state filtering across 40 combinations. When you can't write for each building in country, you write a program that writes it by emitting 49 nearly-identical script blocks. The meta-layer itself is the real AI architecture; the generated script is its compiled output.
Compatibility patches: 200 reserved slots with JavaScript generators and a GitHub issue tracker for ID registration. The compatibility system exists because new buildings are invisible to the AI without explicit registration (§1.4). When your target language lacks expressiveness, build a meta-layer that generates code.
Not everything is replaced. Consumption tax AI, authority spending AI, and autonomous investment pool construction remain active. The three NAI defines disable construction, tax management, and government spending. The replace strategy doesn't mean replace everything — it means replace what's broken enough to justify the maintenance cost and what script can express. Consumption tax AI is left active because the modder can't easily observe it from script. Autonomous investment pool construction is left active because it represents private-sector decisions.
Not everything should be replaced, either. Production method selection is handled by vanilla AI with added ai_value weights (administration: 50,000 → 100,000 → 200,000 → 400,000), and some options get ai_value = 0 to block risky upgrades. The vanilla strike event chain is overridden so AI always breaks strikes rather than negotiate — negotiating creates economic promises the AI might not follow through on, leaving lingering negative modifiers. No gameplay values are changed — only AI guidance is added or decisions overridden.
AI-vs-AI wars where both sides reach 0 war support can persist indefinitely. Every 30 days, advance a stalemate counter through 24 levels (~2 years). At level 24: secessionists win secessionist wars, higher-population side wins revolutionary wars, other wars get white peace. Two years of patience with type-aware resolution feels like the world working itself out rather than an arbitrary coin flip.
Knowing when to shrink is as important as knowing when to build. Government building downsizing is gated on bureaucracy_load >= 0.75 (shrinking government infrastructure while overloaded is counterproductive) and uses a graduated health-tier structure: progressively deeper cuts as budget health declines from +1 through -3. A slightly declining budget trims the most obvious excess; a fiscal crisis cuts deep.
Production building downsizing tracks "abandoned" buildings: when occupancy drops below 40% and doesn't recover over six iterations (~147 days), the building is removed. Production downsizing is blocked under laissez-faire law.
Military threat assessment calculates building targets by comparing against the global landscape: the top 6 countries by army power and top 6 by navy, averaged for a "typical threat" value, then converted to required building counts. A Newton's method square root implementation handles population-to-military-strength curves — necessary because Paradox script has no sqrt().
Technology guidance operates in three modes. Default (Assisted): conditionally redirects innovation toward critical techs (railways, nationalism, key military techs), preserving flexibility. Railroaded: forces a strict tech path, zeroing out natural innovation and manually adding progress to the target technology. Disabled: no intervention. If the redirection modifier is active but the scripted effect fails to fire, the country silently loses all innovation for that iteration.
ARoAI provides 10 game rules: Power Level (0-100%), Construction scaling (0-200%), Building Priorities (Roleplay/Uniform), Research Assistance (Default/Railroaded/Disabled), autobuild for players, and stalemate prevention.
Default game rules give AI no stat advantages — the AI is smarter, not cheaty. Player autobuild uses the exact same evaluation, priority, and construction logic the AI uses, with per-category toggles. If players trust autobuild with their own economy, that's strong evidence the decision quality is real.
Neither evaluation strategy accounts for construction time; a 52-week building and a 4-week one get the same priority. Two countries in a customs union can both identify the same shortage and queue construction simultaneously, leading to eventual oversupply. Stalemate resolution ignores territorial control. Production downsizing uses an abandonment heuristic (occupancy < 40% over time) rather than a direct profitability check, so chronically unprofitable but fully staffed buildings escape downsizing. Consumption tax revenue is not integrated into budget calculations. The budget cooldown of 35 days prevents oscillation but creates a response gap. And the one-building-type-per-day construction cap limits the AI to at most 12 building types per 28-day iteration.
The lightest-touch approach, embedded in a team project with a communication-first workflow. Every patch came with a detailed public dev diary. Where ARoAI replaces and ACAI coexists, Invictus compensates: many critical systems are hardcoded and can't be replaced, so the mod builds parallel correction layers and fills in absent systems entirely. The architecture reflects the constraint — the modder can't override C++ logic, but they can observe its output and correct it, and they can add new logic where none existed.
Imperator: Invictus is AI work embedded in a team mod with its own vision, codebase, and players. The Imperator economy is simpler than Victoria 3's, making weight-based manipulation viable. Many critical systems (fort placement, ship building, unit movement) are hardcoded and can't be replaced. The architecture reflects the context: replacement isn't always wrong, but this project doesn't need it.
The team context adds a constraint that solo mods don't face: the modder must maintain compatibility with other contributors' work, and every change must be explainable to teammates who may need to maintain it later. This is partly why the dev diary practice emerged — it's a design tool for the team as much as documentation for players. The weight-based approach (tuning existing data rather than replacing systems) also has lower coordination cost: changing invention weights touches one file with clear semantics, while replacing a system requires coordinating with other contributors who may depend on the original behavior.
The patch-by-patch evolution: 1.9.1 laid the foundation with invention and building weight rework. 1.9.2 added law management, character loyalty, state investment, diplomatic stances, and national ideas. 1.10 leveraged the new modifier: syntax. 1.10.1 added roads, fort placement, and integrated everything into the Advanced AI game rule.
When setting AI weights for hundreds of inventions, humans lose track of what the numbers mean. A 2.5x weight difference is negligible in a randomized system choosing from dozens of options. The initial approach — setting "pretty good" to 200 and "very good" to 500 — failed because absolute best picks having only 5x higher weight than average ones doesn't produce reliable selection in a pool with dozens of variables. Pushing "very good" above 9,000 worked but left the codebase filled with magic numbers of various magnitudes that no future maintainer could interpret without memorizing the scale.
This is a specific failure of the Clausewitz weight system as a utility function. The engine selects from available options using proportional random selection: probability = weight_i / sum(all_weights). In a pool of 20 options, setting one option to weight 200 while the rest average 50 gives that option only ~17% selection probability. The weight system's proportional-random selection means that meaningful preference differences require extreme weight ratios. An IAUS would handle this with response curves that naturally map importance to [0,1] and multiply considerations — the scoring is deterministic, the selection is greedy (pick the highest), and no weight inflation occurs. The Clausewitz weight system is a degraded utility function: it supports additive weighting and random selection, but not multiplicative scoring, deterministic selection, or response curves. The modder must work within this degraded framework.
Invention Relative Chance (IRC): a probability-based scale. IRC 35% means "this option has a 35% chance of being chosen if all others are at base weight." The formula converts a desired probability into the extreme weight ratio needed to achieve it in a proportional-random system: IRC 35% = weight 10, IRC 75% = weight 57, IRC 95% = weight 361. The 95% option is 35x more likely than the 35% option — a difference that would require typing ~9,000 as a raw weight.
The self-documenting nature of irc_35 immediately tells any maintainer the tier of importance. The probability framing prevents weight inflation and keeps values anchored to a meaningful scale. This is a human-factors solution to a language-design problem: the weight system doesn't support meaningful scales natively, so the modder builds a semantic layer on top of it.
The building system uses a layered sequence: non-cities first (mines, farming settlements, slave estates for trade goods and income), then ratio buildings in cities (Academy for nobles, Court of Law for citizens, Forum for freemen, Mill for slaves — matched to territory culture rights), then modifier buildings (Library, Training Camp, Market, Tax Office), then unique buildings conditionally (Foundry where trade goods are expensive, Great Temple for religious conversion, Grand Theatre for assimilation).
This is another manual linearization of a dependency DAG, the same pattern as ARoAI's order attribute. Ratio buildings (which convert pops into output types) should precede modifier buildings (which enhance existing output), because a Library's research bonus is wasted if there are no Academies producing research to enhance. The modder hardcodes the topological sort as a layered sequence. The conditional logic on each layer (e.g., "don't build Academy if research efficiency is high") provides enough adaptability to handle common cases.
Once research efficiency is high enough (checkable after patch 2.0.5 via modifier:), the AI stops building Academies and switches to Forums and Mills — preventing the failure where small AI countries filled every city with research buildings while starving for manpower and taxes. Farming settlements receive very high priority in states at risk of starvation. Port management limits ports to 1-2 per state, deletes excess ports, and downscales intermediate-level ports to level 1 (the building slot cost isn't justified).
Instead of weighting individual inventions, weight the entire tree leading to important targets. Priority layers: discipline inventions (directly determining combat effectiveness), national tax inventions (revenue is the lifeblood of expansion), then secondary priorities (economic, siege capability, culture-specific trees), then lower priority (diplomatic, navy).
Within each layer, weights are close enough that randomness creates variety between AI countries — every AI knows discipline is important, but some focus military first while others prioritize economy. The IRC scale enables this: IRC 75% vs IRC 65% creates meaningful preference without being deterministic.
Several game systems had no AI logic at all — not underperforming, just absent. These are the purest examples of what the modder does when the constraint isn't "the AI is broken" but "there is no AI." Standard game AI architectures don't have this category: a C++ AI architect would never ship a system without at least a default behavior. The modder encounters it because Paradox shipped features with player-facing mechanics but no AI to use them, and the modder fills the gap.
Patch 2.0.5's modifier: syntax enabled precise governor policy selection. The system factors in character traits: corrupt governors prefer Acquisition of Wealth over Encourage Trade, Merciful governors avoid Harsh Treatment, disloyal governors refuse expected policies. Regions needing Religious Conversion or Cultural Assimilation require governors of the country's religion and primary culture.
This is a decision-tree implementation of what a utility system would express as: utility(policy) = f(country_benefit, governor_personality, province_needs). The modder encodes the major interaction effects as branching conditions rather than continuous scores — the same pattern as Stellaris's building logic. The modifier: syntax made this possible; the entire feature was data-blocked until the engine exposed the data.
The same syntax makes mercenary recruiting budget-aware and enables economic policy decisions based on actual conditions: Free Trade only when exports justify it, Harsh Taxation for countries over maximum research efficiency (shifting from research to revenue at diminishing returns), and a mercenary recruiting algorithm that checks budget limits and physical reachability.
Road building: Built from scratch because the game had no road AI. Three phases: inter-region connections for levy delivery, dense province-level networks, then city-to-city roads. This is the most architecturally novel feature in any of these three projects because it requires implementing pathfinding in a language that can't run graph search. The modder enumerates connections as a flat sequence of conditional checks — effectively unrolling Dijkstra's algorithm into scripted effects. This is only feasible because the Imperator map is finite and the road network is sparse; a market simulation with thousands of goods-flow paths would be impossible to express this way.
Fort optimization: A corrective layer over hardcoded vanilla logic. Vanilla AI was meant to match forts with province capitals but couldn't enforce it or adapt after territorial shifts. The workaround: reposition forts after vanilla places them — matching province capitals, prioritizing border provinces, maintaining higher-level forts in capitals, and deliberately exposing inner provinces.
This is the compensate pattern at its clearest: the modder can't prevent vanilla from making a bad placement, but they can detect the result and fix it. The cost is that vanilla's bad placement consumes resources for one tick before the correction fires; the benefit is resilience — the correction logic is simpler than a full replacement and automatically handles correct placements by doing nothing.
When AI tribes learned to found cities and enact laws, they started abusing special low-requirement reform decisions that existed because the AI couldn't manage standard requirements. Once the AI could actually play, the training wheels became exploits — AI tribes mass-reformed to monarchies within 100 years. The fix: force AI tribes to meet the same requirements as players. Similarly, once proper loyalty management existed, vanilla's cheating on political influence costs was removed.
Patch 2.0.5's modifier: syntax unlocked new decisions: research efficiency became queryable (the AI could stop over-building Academies), governor policy selection could use precise loyalty values, and mercenary recruiting could check actual maintenance costs. Each was a known problem with solutions waiting for the data.
The "Advanced AI" game rule is a modular opt-in system: mercenary recruiting, internal development (higher civil war resistance, manual pop movement, heavy province investment), road building, and fort optimization. Base improvements (invention weights, building logic, law management) ship by default. Default game rules give AI no stat advantages — the AI is smarter, not cheaty.
AI behavior is invisible. Players can't tell why the AI did something, teammates don't know what assumptions the code relies on, and the next maintainer won't understand the design intent. Every Imperator: Invictus update came with a detailed dev diary explaining the design reasoning, the observed problems, and honest assessment of what still doesn't work.
Writing the explanation is a design tool. "I can't explain why this weight is 75% instead of 65%" signals that the choice is arbitrary. The dev diaries also serve as a historical record: when a later patch changes behavior, the original diary explains the original reasoning, preventing new contributors from "fixing" something that was deliberately designed that way.
Strategy game AI modding is not about making the AI play like a human. It's about making the AI stop playing against itself. Coherence beats cleverness. Robustness beats optimality.
Standard game AI uses architectures that compute, search, or plan. Utility systems score every candidate action against response curves and pick the best. Planning systems search for action sequences that achieve goals. HTNs decompose compound tasks along predefined paths. Behavior trees structure reactive priorities. Each architecture has requirements: arbitrary functions, symbolic state representations, search algorithms, or at minimum, functions that return values. Strategy games rarely use formal planning — the combinatorial explosion of the action space makes it impractical — and instead rely on prioritized reactive logic: evaluate current needs, pick the most urgent action, repeat.
The Clausewitz/Jomini scripting language provides none of these. It has events, triggers, effects, and weights. It can test conditions and fire actions. It cannot compute a utility score, search a graph, decompose a task, or plan a sequence. The gap between what the theory demands and what the language permits is where every interesting design decision in these three projects lives.
Each project found a different crossing point. Stellaris/ACAI coexists with vanilla AI, building a middleware layer — cheap architecturally, but accumulating archaeological layers and fighting a constant rearguard action. ARoAI replaces vanilla entirely, coming closest to a proper utility system (market supply-vs-demand as a response curve, weight/offset as decomposed utility parameters) — but paying with 49 individually hardcoded buildings, a code-generation toolchain, and a packed-integer data scheme that corrupts silently on overflow. Imperator/Invictus compensates, building correction layers over hardcoded logic and filling absent systems from scratch — the lightest touch with the heaviest impact per line of code, and the only project that treats communication with future maintainers as a first-class design concern.
The recurring pattern across all three projects is approximation with known losses. Each principle from §2 trades something: quantized thresholds lose smooth trade-offs, static weights lose bidirectional feedback, hardcoded build sequences lose adaptability, decision trees lose continuous scoring. The losses compound. When ARoAI's weight/offset system — the closest any project gets to utility scoring — decomposes a two-consideration evaluation into additive parameters, a high-urgency building claims a construction slot and then fails the productivity gate, wasting the slot. A utility system would never have scored it highly in the first place. The language doesn't just limit architectural choices; it creates failure modes that the architectures it can't express would avoid entirely.
Two constraints bind all three projects more tightly than any architectural choice. Performance: script runs on the main game thread, every empire every tick, deterministic lockstep in multiplayer. This forces staggered evaluation schedules and deliberate idle periods. Data accessibility: the AI can only decide based on what the script can read. Victoria 3 can't read building definitions, so the modder hardcodes 49 buildings by hand. Stellaris can't read the ship build queue, so the modder emulates it with six flags. Imperator couldn't check research efficiency until patch 2.0.5 shipped a new syntax. When the engine exposes new data, entire categories of improvement become possible overnight.
The fleet repair system that freezes damaged ships with +150% hull regen is crude, but it prevents half-destroyed fleets charging back into combat. Law management, character loyalty, national idea selection, and road building in Imperator weren't bad systems needing fixing — they were absent systems needing creation. The IRC system's core insight — that humans can't reliably assign raw weights across hundreds of options without a probability-based scale — transforms a maintenance nightmare into a self-documenting system.
And the dev diary that explains why the AI builds Forums instead of Academies after reaching research cap is as valuable as the code that implements it. Without the explanation, the next person to touch that code will "fix" it back to Academies — not because they're wrong about what Academies do, but because they don't know what the current code is for.