Anbeeld

Projects Articles Support Contact

Designing AI for Strategy Games Through Modding

How I built 3 massive AI mods for Paradox grand strategy games (Stellaris, Victoria 3, Imperator: Rome) in a scripting language that doesn't even have arrays.

Vanilla Victoria 3 AI vs Anbeeld's Revision of AI
  1. The Problem Space
    1. What game AI looks like when you have the right tools
    2. What Paradox script can actually do
    3. Why vanilla Paradox AI fails, and why modders can't simply fix it
    4. Data accessibility as the binding constraint
  2. Design Philosophy
    1. Quantize the world into stable decisions
    2. Distribute computation over time
    3. Active planner + static override alignment
    4. Specialize over generalize
    5. Choose your relationship with vanilla: coexist, replace, or compensate
    6. Bypass broken systems by simulating their effects
    7. Target actual failure modes, not abstract optimization
  3. Stellaris: Anbeeld's Custom AI
    1. Building on inherited work
    2. Giving every subsystem the same picture of the world
    3. Taking over productive decisions
    4. One high-level decision, propagated everywhere
    5. Classification first, then build
    6. Making AI players feel different from each other
    7. Breaking mid-game stagnation
    8. Making diplomacy strategic instead of sentimental
    9. Duct tape for specific failure modes
    10. Country-type rewrite and startup bootstrap
  4. Victoria 3: Anbeeld's Revision of AI
    1. Deciding to replace vanilla entirely
    2. Using the best available signal for each domain
    3. Gating expansion on real-world performance
    4. Designing smooth degradation instead of cliff edges
    5. Deciding where to build, not just what
    6. Solving the cold start: implicit bootstrapping through build order
    7. Engineering around language limits
    8. What vanilla systems remain active
    9. Breaking infinite wars: patient stalemate resolution
    10. Building downsizing: graduated retreat instead of all-or-nothing
    11. Military threat assessment and technology guidance
    12. Configurability and validation through shared logic
    13. Known limitations and design trade-offs
  5. Imperator: Invictus
    1. Working within someone else's project
    2. Human-readable AI weights: the relative chance system
    3. Structured building decisions: sequencing over optimization
    4. Guiding research without scripting it: trees, not picks
    5. Building systems that simply didn't exist
    6. Governor policy and economic policy: character-driven decision making
    7. Road building and fort optimization: working around hardcoded logic
    8. Removing crutches and unlocking new decisions
    9. Difficulty through smarter play, not bigger numbers
    10. Communication as part of the design process
  6. Comparison with other AI mods
    1. Stellaris: StarNet AI and Sovereign AI
    2. Victoria 3: Kuromi's AI and Smarter AI
  7. Conclusion
    1. The gap between theory and language
    2. Three crossing points
    3. Approximation with known losses
    4. The binding constraints
    5. What worked
    6. Design philosophy

Case studies

  1. Anbeeld's Custom AI (ACAI) — Stellaris (personal project for a single 2.1.3 legacy version)
  2. Anbeeld's Revision of AI (ARoAI) — Victoria 3 up to 1.3.6
  3. AI in Imperator: Invictus — Imperator: Rome (patches 1.9.1 through 1.10.1)

1. The Problem Space

1.1 What game AI looks like when you have the right tools

Game AI is not machine learning. It's architecture: a system that reads world state, evaluates options, and issues commands every tick, through the same interface a human uses but via code.

When game studios build AI with full access to the codebase, they pick from a known menu of architectures, each with different strengths and computational demands.

The common thread is that all of these architectures require something Paradox script typically doesn't provide: arbitrary computation, symbolic state representations, search algorithms, or at minimum, functions that return values. When you have C++ and a reasonable CPU budget, you choose the architecture that fits the problem. When you have a Paradox script file, you choose the architecture that fits the language.

1.2 What Paradox script can actually do

Clausewitz script (the engine across all Paradox titles, sometimes called Jomini in newer games, same engine with different versions) has four primitives: events (fire on schedules or conditions), triggers (boolean tests of game state), effects (mutate game state), and data weights (influence the hardcoded AI's choices). Variables store numbers or booleans on scoped objects. Flags are boolean markers. Modifiers are persistent stat changes. Script values compute a number with left-to-right arithmetic (no operator precedence, 5 decimal places max). These are functions that return a numeric value. Scripted triggers return booleans. But scripted effects, the workhorse of AI logic that changes game state, cannot return values.

What's missing: no arrays, no structs, no generic runtime introspection, no exception handling. Both Clausewitz and Jomini have bounded while loops (capped at 1000 iterations) and prebuilt iteration blocks (every_country, random_owned_planet, ordered_*) that iterate over specific scope types with built-in weight/order support. Jomini games also have variable lists for dynamic collection building. The limitation isn't that you can't iterate, but that you can only iterate over the specific collections the engine exposes (every_owned_starbase, every_tile, random_tile, etc.), not arbitrary or dynamically composed collections like "all building types in the game" or "all candidate actions".

The language provides hundreds of named triggers for specific game-state queries (like has_building, treasury > X, is_at_war, modifier:research_efficiency, scale_supply_demand), but these are specific and only available in certain code blocks. You can't enumerate all modifiers on a scope, discover available operations, or query arbitrary object structure at runtime. The actual constraint is compositional: the language is terrible for complex constructions.

Utility scoring is possible in limited contexts: you can compute weights inside iteration blocks and use random_/ordered_ effects with built-in weight and order parameters. But building a proper multi-candidate scoring loop that evaluates every candidate, sorts, and picks the best remains impractical. You can multiply dynamically computed values inside a single script value, and with variable lists and ordered_ effects you can build a target list, score each candidate via script values, and pick the highest or lowest. So a full utility scoring loop is technically possible. The problem is practical, not categorical: data access is limited (you can't query arbitrary building properties at runtime), math operations are slow and constrained (no operator precedence, 5 decimal places, no built-in exponentials or square roots), and composing multi-consideration scoring across many candidates in a language without functions that return values is clumsy enough that hardcoding the scoring for each building into its own evaluation trigger remains the better engineering choice.

Script runs on the main game thread. Every empire, every tick. A poorly designed AI mod halves game speed. The game is deterministic lockstep in multiplayer (any non-determinism causes out-of-sync errors), so there's no room for threading, caching strategies that might diverge, or probabilistic shortcuts. The only execution model is the daily/monthly pulse system: on_action callbacks that fire at regular intervals for every scoped object. Performance is a first-class design constraint. Time distribution, idle periods, selective data collection, and data packing all exist because of this.

The language shapes the design as much as the game does: you design an AI that script can express efficiently, not the ideal AI compressed into script.

1.3 Why vanilla Paradox AI fails, and why modders can't simply fix it

Paradox games are deeply systemic. Economy, diplomacy, military, and internal politics all interact: building a factory affects employment, which affects political support, which affects what laws you can pass, which affects what factories you can build. Standard game AI handles this kind of cross-domain coupling through one of several mechanisms: a shared blackboard (all subsystems read/write a common data store), a command hierarchy (a strategic layer delegates goals to subsystems), or shared utility scoring (one evaluation function naturally produces coherent trade-offs).

Vanilla Paradox AI has none of these accessible to the script layer. It is split between hardcoded C++ (unmoddable) and data weights (moddable but limited). The C++ layer runs its own decision loops, and those subsystems likely share state internally; the building AI and the military AI presumably exchange information through C++ data structures, and the overall architecture is functional at the C++ level. What you see is the interface: limited weight adjustments through defines and files like ai_strategies, and the observable output. The script can influence C++ behavior indirectly through these exposed parameters, but can't rewire the logic between subsystems or add entirely new behavior.

The coherence problem isn't that C++ subsystems never communicate, but that whatever coordination exists isn't enough, the exposed weights can't steer it strongly enough for my preference, and the script layer can't supplement it. The result is the behavior that multi-agent theory predicts when subsystem optimization is underdeveloped: the building AI doesn't know what the military AI is doing, and neither knows what the economy needs. Each subsystem operates in its own bubble, making locally rational decisions that are globally incoherent, or at least, that's what it looks like from outside the C++ layer.

Common failures are observable: builds randomly without regard to market conditions, ignores shortages until the economy collapses, fleets don't repair after battles, transports get stuck with invalid orders, wars stagnate into decades-long stalemates, empires bankrupt themselves building defense platforms they can't afford, tribes never enact laws or reform their government, AI countries never build roads or manage character loyalty.

These failures are partly architectural but mostly developmental. Paradox's AI is permanently behind in priority compared to DLC content and mechanics. The technology and architecture are there, and likely include adequate foundations, but fine-tuning AI is expensive work, and sometimes entire game actions are left unavailable to AI because the team didn't have time to write the code. The design is complex enough that not every programmer can work on it independently. Paradox also balances AI efficiency against roleplay: sometimes an AI that plays perfectly would produce a less interesting game than one that makes human-like mistakes. This work is not fixing a fundamentally broken system but pushing an underdeveloped one further than the developers had time or inclination to take it.

You can't fix this by writing a utility system or a planner in script, not directly and not at scale. A utility scoring loop is technically possible with variable lists and ordered_ effects, and Dijkstra-like pathfinding was implemented for road building, but both required creative use of engine features and significant engineering work that doesn't generalize. The C++ subsystems won't expose their internal state, so a shared blackboard is out. And you can't replace the C++ decision loops, which rules out a command hierarchy.

What you can do is build a parallel system in script that makes its own decisions, and then align the C++ layer's remaining choices through weight overrides. The three projects in this document chose three different points on that spectrum: coexist, replace, or compensate.

The goal isn't to make the AI play optimally, but to make it play coherently: all subsystems pulling in the same direction, producing behavior that looks intentional rather than random. You don't design the ideal AI and then implement it in script. You design an AI that script can express efficiently.

1.4 Data accessibility as the binding constraint

The AI can only decide based on what the script can read. If the scripting language can't query a piece of game state, no amount of clever logic can use it. This is the deepest asymmetry between standard game AI and modded AI: a C++ AI architect can add a new query to the world-state interface. A modder cannot. The AI is bounded not by what it can reason about, but by what it can observe.

Victoria 3: The scripting language cannot query what goods a building type produces, what technology it requires, or what its input/output ratios are. A C++ utility system would read these from the building definition at runtime. I had to manually hardcode each building's properties as scripted triggers and effects: 49 buildings, each with separate evaluation, sanction, and allocation logic, plus 3,000 generated script values for data extraction. When someone adds new buildings, those buildings are invisible to the AI unless someone writes a compatibility patch.

Imperator: Before patch 2.0.5, the script couldn't check research efficiency. The AI couldn't know if a country was at 40% or 175% efficiency, so there was no way to tell it "stop building research buildings, you've overcapped". When modifier: syntax was added, a previously impossible improvement became straightforward. Similarly, governor policy selection couldn't use precise province loyalty change values, and mercenary recruiting couldn't check actual maintenance costs; both became possible with the same syntax change. In standard game AI, adding a new sensor is an engineering task. In modded AI, it requires waiting for the engine team to ship a new syntax.

Stellaris: The script can't read the vanilla ship build queue, so the mod emulates queue occupancy with six busy-slot flags per starbase. It can't detect fleet damage precisely, so repair decisions use coarse health thresholds that get stricter with distance. The fleet upkeep calculation requires estimation: the economy evaluation subtracts an approximated fleet maintenance cost because the script can't query actual upkeep directly.

When new engine capabilities expose new data, entire categories of improvement become possible overnight. Maintain a mental wish list of data-blocked improvements and you ship fastest when the engine exposes new capabilities. But don't just wait: request the capabilities you need. The NAI defines that made ARoAI's replace strategy possible existed because I asked Paradox for them and the game director implemented them. The constraint is real but not immovable.


2. Design Philosophy

These principles emerge across all three projects. Each one can be read as an approximation of a standard game AI architecture, adapted for what Clausewitz script can express and sacrificing something specific in the trade.

2.1 Quantize the world into stable decisions

Approximates: The response curves in a utility system. In Dave Mark's Infinite Axis Utility System, each consideration maps a continuous input to a [0,1] output via a configurable curve: exponential for urgency, inverse for satiation, logistic for threshold behavior. The curve shape determines how the AI trades off competing needs, and multiple considerations multiply together so any zero vetoes the action. You can normalize values with arithmetic (divide by a known maximum, multiply by a scaling factor, use round/ceil for precision), and ARoAI's supply-vs-demand system does exactly this, mapping market imbalance to a 1-22 priority scale.

What the language can't do is apply arbitrary mathematical functions. No exponentials, no logistic curves, no square roots without manually implementing them (ARoAI implements Newton's method for sqrt), no way to write a general response curve that maps an arbitrary input to an arbitrary output shape. You can linearly normalize and threshold; you can't bend the curve. You can also quantize continuous values into discrete categories (low/medium/high) and use those as lookup keys. This is also a sound engineering choice on its own terms: fragile arithmetic chains break in scripting languages; threshold-based decisions are robust and composable. A quantized flag only changes when the underlying value crosses a defined boundary, so a rounding error can't flip a decision permanently.

Sacrifices: Smooth, arbitrary response curves. A utility system can express "energy is slightly below comfortable" and produce a proportionally moderate preference for energy-producing actions, with the shape of the response curve determining how quickly preference escalates. ACAI's boolean economy flags lose this entirely: a flag can only say "energy low" or "energy not low". ARoAI's supply-vs-demand system preserves proportional preference (22 levels of shortage/surplus). Imperator's IRC system is itself a utility function: it maps preference to probability (IRC 75% means "75% chance of selection vs base weight"), and could theoretically operate at any granularity (64.3%, not just 60 or 65), but the granularity was chosen for maintainability, not because finer resolution is impossible. What each project sacrifices differs: ACAI loses proportional preference entirely (binary thresholds), ARoAI loses curve shaping (linear steps, not exponentials or logistics), and IRC loses only fine granularity (the discrete steps approximate a continuous preference scale).

A utility system would use exponential curves for urgency (small deficits produce mild preference, large deficits produce overwhelming preference), logistic curves for thresholds (gradual transition around a critical point), or inverse curves for satiation (diminishing returns as need approaches satisfaction). You can stack multiple thresholds to approximate a curve (ACAI uses low/medium/high/extreme), but each additional category costs script variables and evaluation time, and the approximation is piecewise-constant, not smooth.

A shared vocabulary of quantized state also approximates a shared blackboard, the inter-subsystem communication mechanism that standard multi-agent architectures use for coherence. The building planner and the fleet producer don't need to agree on what "rich" means in absolute terms because they both check the same flags. This is the blackboard pattern with quantized categories instead of a shared numeric workspace: weaker (no arithmetic on shared values) but sufficient for coherence.

Stellaris: The economy evaluation turns stockpiles and incomes into boolean flags (aai_boolean_energy_income_low, acai_boolean_minerals_income_extreme). Every downstream system reads these same flags. Raw incomes and stockpiles are quantized into discrete steps with a cap (the old engine couldn't expose per-resource data directly, so the mod had to compute proxies from what little was accessible). The building planner, colony reserve system, robot logic, and ship logic all read the same derived flags rather than inventing different ideas of "rich" or "poor".

Victoria 3: Budget health score (-3 to +3) uses two-dimensional curves where the surplus requirement varies with debt level and vice versa. Negative health levels require both weeks_of_reserves < 156 AND either a surplus or debt threshold. Reserves act as override (156+ weeks blocks all negative health) and gate (positive health can be achieved through either sufficient surplus or sufficient reserves). Debt-free countries get easier thresholds that scale with how full their gold reserves are, creating smoother degradation within each quantized level than a simple threshold would.

2.2 Distribute computation over time

Approximates: The replanning cadence of a deliberative architecture. A planner with a finite action space doesn't replan every frame, because that would be computationally prohibitive. It replans when the world changes enough, or on a fixed interval. The approximation is staggering work across game days, creating the appearance of a persistent planner that thinks continuously, when really each subsystem runs infrequently on its own schedule.

Sacrifices: Responsiveness. A planning system would replan immediately when a significant event occurs, whether a fleet destroyed, a war declared, or a market crash. The script must wait for the next scheduled pulse. Victoria 3's 28-day cycle means a sudden economic shock can't trigger construction changes for up to two weeks. The idle buffer (days 15-28) is explicitly defensive engineering against cascading delays, but it also means the AI can't use that time even if something urgent happens. Stellaris's economy runs on days 1-7 and ships on days 8-15; a fleet destroyed on day 8 won't trigger economy recalibration until the next month's day 1. This is the fundamental trade-off of fixed-schedule planning: regularity and performance predictability in exchange for latency.

Stellaris: Economy on days 1-7, ships 8-15, robots 16-19, buildings in separate waves. Monthly and yearly cadences for systems that need less frequent updates.

Victoria 3: 28-day iteration cycle per country (minimum 14), with randomized start dates. Day 1 = preparation (data collection, tax/wage management, downsizing, tech redirection), Day 2 = evaluation, Days 3-14 = construction (one building type per day, up to 12 types per iteration), Days 15-28 = deliberately idle buffer. The idle time means a delayed Day 14 doesn't cascade into the next cycle.

2.3 Active planner + static override alignment

Approximates: A command hierarchy. In standard game AI, a strategic layer sets high-level goals and delegates execution to tactical subsystems. A real-time strategy AI might have a commander that decides "expand economy" and a build manager that executes the construction queue. The key property: the commander can observe whether the build manager is succeeding and adjust goals accordingly.

Sacrifices: Bidirectional feedback. The approximation is one-directional: the script sets a strategy (fleet doctrine, construction priority) and the static weights align the vanilla engine with it. But the static weights can't report back. If the script decides "corvette doctrine" and sets component weights accordingly, but the game state has shifted (enemy running point-defense counters), the weights have no way to observe that they're failing and request a doctrine change. They are baked-in biases, not a feedback loop. The architecture also can't handle conflicts between vanilla subsystems that the script doesn't control: if two hardcoded C++ subsystems have competing goals, the mod's static weights can only bias them, not mediate.

Stellaris: Ship doctrine flags (corvette-preference or battleship-preference) assigned at game start and cascaded through every layer. Tech weights, component weights, and ship section layouts all reference the doctrine flag, and 44 tech/component checks align vanilla systems with the mod's strategic choice.

Victoria 3: Production method ai_values guide vanilla PM selection with clear progression (50,000 → 100,000 → 200,000 → 400,000) while three defines disable vanilla construction, tax management, and government spending AI entirely, giving full control to the scripted system.

Imperator: Invictus: Invention Relative Chance (IRC) values set high enough to overcome the randomized selection pool: IRC 95% translates to weight 361, which is 35x more likely than an IRC 35% option. The mod's own scripts handle law management, building decisions, and economic policies that the vanilla AI barely touches, while the static IRC weights ensure the vanilla research system picks reasonable inventions.

2.4 Specialize over generalize

Approximates: The manual decomposition rules in a Hierarchical Task Network (HTN), a planning architecture that decomposes compound tasks into subtasks using predefined methods rather than searching for plans. An HTN can only find plans the author anticipated: "Assault position" decomposes into "throw grenade, advance while suppressed, clear area" only if someone wrote that decomposition. This approach leans into the constraint: rather than fighting the language's inability to search for novel strategies, it reduces the decision space to a smaller set of known-good patterns. A narrow competent AI beats a broad confused one.

Sacrifices: Adaptability to change. When the game changes (patches, DLC, mods that add new content), the reduced decision space may miss viable strategies that a broader approach would find. Each specialization carries the same cost: if the game's balance shifts, the mod must be manually updated. A utility system with per-option scoring would adapt automatically by shifting scores when conditions changed.

Stellaris: Corvette-or-battleship doctrine only; no mixed fleets, no destroyers or cruisers. ACAI removes destroyers and cruisers not because they're bad ship types or harder to script, but because they were near-worthless in the 2.1.3 meta, where single-type fleets dominated. Starbases are specialized as shipyard, anchorage, or trade hub, again chasing the meta, not reflecting a scripting limitation. These are personal design choices in a project that chased the competitive meta freely.

Victoria 3: Government buildings use formula-based evaluation (checking population, GDP, innovation targets, military threats) while production buildings use market supply/demand queries against actual trade data. Two fundamentally different methods, each suited to its domain. Within production buildings, a three-layer decision hierarchy: weight controls urgency, order breaks ties encoding dependency chains, and offset gates expansion by productivity requirements. These are three orthogonal levers, each tunable independently.

Imperator: Building priorities structured as ratio buildings first (matched to territory culture rights and trade goods), then modifier buildings that enhance what's already there. Once research efficiency is high enough, the AI stops building Academies and switches to Forums and Mills, capping the research preference that the mod's own IRC weights create before it produces diminishing returns.

2.5 Choose your relationship with vanilla: coexist, replace, or compensate

The three projects chose three different integration strategies with the hardcoded C++ layer, and each one maps to a recognizable pattern in software architecture for integrating with legacy or unmodifiable systems.

The right choice depends on the game: how much the vanilla AI exposes to modding, how badly the vanilla logic fails, and how complex a replacement would be. But the preference is always the lightest intervention that achieves the target behavior: not the minimum by default, but the minimum that suffices. Replace was chosen for Victoria 3 not because heavy intervention is preferred, but because lighter approaches couldn't produce the desired level of control. Invictus's compensate pattern worked where thin wrappers or weights sufficed. Where replacement-level logic was needed, road building disabled vanilla and ran alone; fort placement couldn't and carried coexistence overhead as a result.

2.6 Bypass broken systems by simulating their effects

Approximates: The adapter pattern: wrapping a broken interface with one that produces correct behavior. In standard game AI, you'd refactor the decision function. I can't refactor C++ code, so I bypass the broken function entirely and reimplement its intended effects in script.

Sacrifices: Completeness of the bypass. Event edicts (like masters_writings and improved_work_environment) retain ai_weight = 1, so the vanilla AI can still occasionally pick them independently of any scripted logic. The simulated replacements are applied as timed modifiers, not edicts, so any system checking for specific edict instances won't see them.

Stellaris: The mod handles two edict categories differently. Campaigns remain real edicts, retuned to energy-cost with structured AI weights: base weight 10-60, zeroed below 2000 energy, scaled up at 6000+ and 8000+. Core empire edicts all get ai_weight = 0; script applies replacements as timed modifiers, gated by influence thresholds (300-900 depending on ascension perks) and economy conditions. gai_edict.2 applies capacity overload (if energy income low), production targets (fallback), or research focus (materialist fallback), each with duplicate-prevention checks and 300 influence cost. Enclave trade deals use the same timed-modifier approach with tiered energy costs.

Imperator: Economic policy management. AI couldn't reliably choose between Free Trade and Trading Permits, so the mod implements the logic directly. Harsh Taxation (normally player-only) is used by AI when research efficiency is severely overcapped. Each policy decision that the vanilla AI ignores or bungles is handled directly.

Victoria 3: The strike event override. The entire vanilla strike event chain (9 events) is overridden so AI always chooses to break strikes rather than negotiate (negotiate=0, break=10 in the AI weighting). Negotiating creates economic promises the AI might not follow through on, leaving lingering negative modifiers.

2.7 Target actual failure modes, not abstract optimization

Stellaris has transports that get stuck with invalid orders and never move. Victoria 3 has AI wars that stagnate for decades because neither side can reach the other. Imperator has province capitals left in absurd locations after annexation, and AI countries building ports in every coastal territory wasting building slots with no gains. Each of these is a visible disaster: the AI is doing something concretely wrong, and the fix is equally concrete. Detect stuck transports, retry, then recreate the fleet. Advance a stalemate counter over two years and force white peace. Move the capital to a logical location, and limit ports to 1-2 per state.

This is satisficing rather than optimizing. A utility system maximizes expected utility across all possible actions; it considers what the AI should do. The starting point here is often the opposite: what is the AI doing wrong, and can I stop it? But the objective isn't only "stop playing badly". ARoAI's market-driven construction builds economies pretty well; Invictus's road and fort systems produce genuinely good infrastructure. The work spans a spectrum from failure removal to designed improvement, and the workflow is consistent across all three projects: first observe the failure, then implement correct behavior, then optimize after seeing it work in practice. The failure-elimination framing is the correct starting point for what the language can't express: if you can't write a utility function that correctly scores every option, you can still write targeted rules that prevent the worst options. Many of these systems go further once the foundation is right.

The cost: any behavior that isn't a known failure mode or a designed improvement falls through the cracks. Suboptimal play that doesn't produce visible disasters (inefficient but stable economies, underutilized game mechanics, missed opportunities) persists because nobody filed a bug about it. An optimization-oriented AI would find these through its scoring function. You only fix what's broken enough to notice or build what's understood well enough to design.

Stellaris: Strongest-fleet repair routing (detect damage, route to starbase, freeze and regen). Critical building enforcement (yearly scan ensures buildings that vanilla "forgets" still exist). No-platforms modifier (poor empires get -50 starbase defense capacity, stopping them from wasting money on defense platforms).

Victoria 3: Building downsizing (buildings with occupancy < 40% for 6+ iterations are removed).

Imperator: Preventing civil wars through bribing and free hands (removing vanilla's cheating on political influence costs), deleting excess ports that waste building slots, farming settlements in states with many cities to prevent starvation, and moving province capitals away from weird post-annexation locations.


3. Stellaris: Anbeeld's Custom AI

Built for Stellaris 2.1.3 with its tile-based planet system. The first generation: coexist with vanilla AI, built on top of the Glavius AI Mod. The coexist strategy means the mod never fully owns any decision. It must constantly counteract vanilla AI's choices while producing its own. This produces an architecture that looks like a middleware layer with partial overrides: some decisions are intercepted and handled correctly, others are nudged via weight overrides, and still others are left to vanilla and hoped for the best.

3.1 Building on inherited work

The mod grew around Glavius's existing systems (colonization, critical building enforcement, edict simulation, war pressure) with a new layer: economy model, fleet production, starbase planning, repair, robot assembly, habitat building. This created two design eras in one mod: different naming conventions (gai_* for legacy, aai_*/acai_* for new), different assumptions, overlapping responsibilities.

In a C++ codebase, accumulated dead code would be flagged by static analysis and removed in a refactor. In a modding ecosystem, you can rewrite inherited code (I rewrote a lot of Glavius's systems), but you choose not to refactor what's already working, because the cost of rewriting functioning systems isn't worth it when you could spend that time moving forward instead. I directly edited the old gai_* files when needed; I just didn't want to, because they already worked as-is. The archaeological layers persist because removing working code risks breaking undocumented dependencies, and the priority was forward progress over maintenance.

The edict simulation layer illustrates the coherence cost of this accumulation: most edict checks use inline energy thresholds rather than the shared economy booleans that the building and ship systems read. This is likely legacy code, old triggers that weren't updated to use the newer shared system rather than an architectural choice. Either way, tuning changes to the central economy model may not propagate to buff eligibility. This is the coherence problem again: two subsystems nominally sharing a blackboard, but one subsystem has a private copy of the data that drifts.

3.2 Giving every subsystem the same picture of the world

The central economy calculator (aai_calc_economy) converts raw stockpiles and incomes into a shared vocabulary of boolean flags. Every downstream system reads the same flags. Resource incomes are quantized into discrete steps with a cap (a workaround for the old engine's limited data access), while food income uses separate scaling tiers differing by empire size.

The blackboard pattern, implemented through quantized flags: the economy calculator writes and every subsystem reads. Conflicts don't arise because the flag vocabulary is categorical, not propositional: energy low doesn't conflict with minerals high. The trade-off is expressiveness: the blackboard holds booleans and small integers, not structured proposals, so subsystems can only observe shared state and independently decide what to do about it.

The calculator subtracts estimated main-fleet upkeep when the fleet is parked at a friendly crew-quarters starbase. This estimation is necessary because the script can't query actual fleet upkeep. A C++ AI would read the real number; I had to approximate with 15-year scaling brackets, and the approximation drifts. The paired mineral deduction (exactly 2× energy) encodes a game-design assumption about the energy:mineral cost ratio that may not hold across all fleet compositions. The inability to query the real number is the deeper problem.

3.3 Taking over productive decisions

The vanilla AI makes acceptable individual decisions but they don't form a coherent strategy. The mod takes over essentially every productive activity: building on planets (reading tile yields to choose between food, energy, minerals, research, and special deposits), building on habitats, upgrading through full tech chains, enforcing critical support buildings yearly, handling Rogue Servitor bio-trophy mechanics, controlling colonization timing with a 300-mineral savings reserve, and keeping robot assembly active.

A utility system would score each tile's best building option against current needs and pick the highest. There's no way to write that function: "score this tile's yield for food, multiply by food urgency, compare to the same calculation for energy and minerals". Instead, the building logic is a hand-authored decision tree: if food low, build farm; else if energy low, build plant; else if minerals low, build mine; else build research. This is a priority-ordered selector: the leftmost satisfied condition wins.

It's correct when the priorities are right, but it can't make trade-offs. A tile with +4 minerals and +1 food gets a mine when the empire is slightly low on food, because food never gets priority over minerals in the tree. A utility system would see the +4 mineral yield and the slight food deficit and might choose differently. This rigidity is accepted because the alternative, encoding utility scoring in a language without functions, requires hardcoding every score combination explicitly, which explodes combinatorially.

Colonization is dual-gated: the outer check requires the shared economy boolean aai_energy_income_low = no, and the inner check requires sufficient stockpile plus 300+ minerals (or a full deposit reserve). A deposit system diverts minerals into a virtual reserve capped at 300: empires below 3 planets deliberately save for colony ships rather than spending on buildings. On colonization, the mod removes the default shelter and relocates it to a better tile using cascading adjacency priority conditions. Species selection avoids pops under assimilation or purge, and for empires whose primary species has habitability below 20%, it creates an additional pop of a viable species.

Building upgrades form complete chains through tier 5 for all resource types, with empire-type-specific unity progressions (temple, uplink node, autochthon monument). Research labs specialize at tier 2 capital (random specialization weighted 20% physics / 40% biology / 30% engineering, gated behind years_passed > 10). The monthly building pass can also demolish and replace buildings for surplus rebalancing.

Ships are created at a starbase with free shipyard capacity, routed through a delayed delivery chain toward the main fleet. Six busy-slot flags per starbase emulate queue occupancy since the script can't read the vanilla build queue. Megastructure decisions use selective perk weighting (Voidborn 10→100, Master Builders 10→100, Galactic Wonders 10→200 only after Voidborn), and gateways replace vanilla's broad weighting with a single high-threshold case.

The key insight isn't any single subsystem but that all of them read the same economy flags and run on the same distributed scheduler. Coherent strategy emerges from many small decisions that share the same picture of the world.

3.4 One high-level decision, propagated everywhere

Each empire is assigned a fleet doctrine at game start, corvette-preference or battleship-preference, and that decision propagates through every layer. 44 tech/component checks reference the doctrine flag to align vanilla systems with the mod's choice.

A static commitment to a strategy that a deliberative agent would replan around. A planning system, faced with an enemy running heavy point-defense that neutralizes corvette swarms, would search for a new strategy. The mod's doctrine is assigned once and never revisited. The trade-off is intentional: a mid-game doctrine switch would require simultaneously updating 44 weight entries, production logic, and the naval capacity formula, and any missed entry creates incoherence. The cost of replanning exceeds the cost of committing to a suboptimal doctrine and executing it well.

The naval capacity model reflects this specialization: default desired utilization runs from 0.60 (peace, minimal) to 1.10 (war, maximum). Fanatic militarists and genocidal civics shift upward; Inward Perfection and fanatic pacifists shift downward. Disbanding follows a strict hierarchy: corvettes first, then destroyers, then cruisers, then battleships, always trimming smallest first.

3.5 Classification first, then build

Starbases can serve many roles, but vanilla builds mixed layouts that are mediocre at everything. The mod separates role allocation from role filling through a multi-pass system: compute desired role targets (shipyards, trading hubs, anchorages), assign roles to existing starbases preferring reuse, then fill slots per role.

The two-pass structure (classify, then build) approximates what an HTN does when it decomposes a compound task into subtasks. The key advantage is that role assignment considers the global starbase portfolio before committing any individual starbase. The trade-off is rigidity: if a war creates sudden demand for more shipyards, the role allocation can't be recomputed until the next yearly pulse.

3.6 Making AI players feel different from each other

Personality is expressed through parameter shifts on the same underlying logic rather than separate code paths. The same naval capacity formula serves every empire, with civics shifting the inputs. A fanatic militarist doesn't have a different decision tree. It has the same tree with "desired fleet size" shifted upward. This is parameter-space variation rather than architectural variation: separate code paths per personality type would multiply the maintenance burden by N, while parameter shifts cost nothing extra.

3.7 Breaking mid-game stagnation

Strong empires sit peacefully next to weak ones forever. The galaxy freezes into a static political map. This is the coherence problem from §1.3 in a different domain: vanilla's military AI and diplomatic AI don't share state, so neither knows whether war would serve the empire's strategic interests. The mod combats this through both push and pull. Pull: gai_war.1 scans peaceful AI empires with sufficient strength and looks for weaker neighboring AI empires as war targets, selecting war goals based on civics. Push: poor empires get the no_platforms modifier, preventing them from wasting resources on defense platforms while their economy starves. Global defines make the base AI more expansionist, more claim-focused, and more willing to operate with large fleets.

Standard RTS games handle this through a strategic layer that evaluates the game state holistically: Age of Empires' AI decides when to transition from economy to military based on game time and perceived opponent strength. I can't write a strategic evaluator that reads the entire diplomatic landscape (the language can iterate over specific scope types but cannot compose a multi-candidate scoring loop across all potential conflicts), so I approximate with a single event that checks "is this empire strong, at peace, and next to someone weak?" The heuristic is coarse: it doesn't consider whether the target has defensive pacts, whether the war would overextend the attacker, or whether the attacker's fleet composition counters the defender's. But it breaks the stagnation, which is the failure mode being targeted. The coherence problem that freezes the galaxy can't be fully solved without a shared strategic assessment. It can only be poked at with targeted push and pull events.

3.8 Making diplomacy strategic instead of sentimental

Vanilla diplomacy is dominated by inertia, another manifestation of the coherence problem from §1.3. The diplomatic AI and the military AI don't share state, so alliances generate large flat opinion bonuses that make them self-sustaining regardless of whether they still serve either empire's strategic interests. The diplomatic map freezes.

The mod rewrites the diplomatic model around current strategic alignment. Passive relationship bonuses are stripped: alliance opinion drops from +25 to 0, federation membership from +50 to 0, defensive pact from +20 to 0. Trust accumulates much more slowly and caps lower (federation trust growth from 1.0/month to 0.25/month). Pact and federation acceptance is reweighted toward active strategic factors: shared rivals jump from 10 to 100 for federation acceptance, shared threat factor from 0.25 to 0.40.

The weight system in Clausewitz is the closest approximation to a utility function available in script. Each factor has a base weight, and conditional factor multipliers scale those weights based on game state, so "trust matters more when threat is low" is expressible. This same system handles technologies, buildings, and other choices across the engine. You can tune weights and multipliers to produce strategically coherent outcomes.

The key move in diplomacy is making passive bonuses near-zero and strategic bonuses dominant. This is equivalent to designing a utility function where "current strategic alignment" scores high and "historical inertia" scores low. The result is less elegant than composing arbitrary response curves, but it moves the AI's diplomatic behavior in the right direction.

Border friction grows stronger and more local. Max friction rises from 100 to 150. Each bordering system generates double the tension (5 → 10). Threat also decays faster (-0.25 → -0.50) and scales more with distance, keeping it local and current.

The combined effect: empires form coalitions based on shared rivals and threats, not on the fact that they've been allied for fifty years. Alliance blocs dissolve when the threat that created them disappears.

3.9 Duct tape for specific failure modes

Some failures are mechanical breakdowns that need targeted patches, not architectural redesigns.

3.10 Country-type rewrite and startup bootstrap

The vanilla default country type still governs many AI behaviors that custom events can't reach. The mod overrides constructor and science ship targets, links ship fractions to gai_build_fleet (gating fleet building on economy, war state, and naval capacity), and switches army type fractions based on technology and empire type. On game start, a bootstrap event grants the AI arc emitter tech and applies the map_the_stars edict's effects as a timed modifier (7200 days) with 100 influence subtracted to emulate the cost. The global define rewrite zeroes the mineral budget for navy, colony, robot, building, and starbase spending so the scripted planner controls those outlays directly. The ascension perk chain from §3.3 (Voidborn → Master Builders → Galactic Wonders) is reinforced at this level too.

The diplomatic rewrite (§3.8) extends further than the core changes: trust caps are reduced for all treaty types, gift diplomacy is weakened, SHARED_THREAT_MAX drops from 200 to 50 (anti-threat coalitions cap earlier), and acceptance formulas are restructured around strategic factors rather than flat opinion bonuses.


4. Victoria 3: Anbeeld's Revision of AI

A full-stack AI replacement, the most ambitious approach of the three. The vanilla AI's construction logic is hidden in C++ with no hooks for modders, so partial fixes couldn't reach the core problem. Where Stellaris coexists and Imperator compensates, ARoAI replaces: the mod owns every economic decision the game allows script to control. This gives it the architectural freedom to build something close to a utility system, the closest any of these three projects gets to standard game AI architecture, while still being constrained by the same language limitations as the other two.

4.1 Deciding to replace vanilla entirely

Three NAI defines disabled construction, tax management, and government spending. These defines exist because I requested them from Paradox. The game director Wiz implemented them after I explained on Discord why the replace strategy needed a way to disable specific vanilla AI subsystems. The constraint wasn't immutable; asking for the capability and getting it made ARoAI possible. Debt thresholds set to impossible values. Interest group promotion and suppression disabled. The mod owns every economic decision. The trade-off: total freedom, total maintenance burden. Every game patch potentially breaks everything, and the mod must handle every edge case that vanilla would otherwise cover.

The replace strategy is only viable when the vanilla system produces results far enough below your preference that coexistence produces worse outcomes than starting from scratch, and when you can express the replacement logic in script. ARoAI chose this because Victoria 3's vanilla construction AI couldn't be steered strongly enough with weight overrides; it made decisions in C++ that weight tuning could only nudge, not redirect. The alternative to "replace" was "accept broken".

The cost is that every new building Paradox adds in a DLC is invisible to ARoAI until someone writes the compatibility patch, the same data-accessibility problem from §1.4, but now the entire economy depends on the patch being correct.

4.2 Using the best available signal for each domain

The scripting language can't read building definitions. Two fundamentally different evaluation approaches were chosen by domain:

The production building approach is the closest any of these three projects gets to a proper utility system. The supply-vs-demand level is a response curve: it maps a continuous input (market imbalance) to a normalized output (1-99 priority). Multiple buildings are scored against the same curve, and the highest score wins the next construction slot. This is utility scoring with a single consideration (market urgency), which is both its strength and its limitation.

A full Infinite Axis Utility System (IAUS) implementation would add more considerations: construction time, input-good availability for the building's inputs, geographic proximity to demand centers, synergy with existing buildings in the same state. The language can multiply dynamically computed values inside a single script value, and variable lists with ordered_ effects can technically score every candidate and sort by result, but the practical friction of data access limits, slow math, and compositional difficulty makes multi-consideration utility scoring across many candidates clumsy enough that splitting into separate evaluation layers is the better engineering choice. So geography and synergy get their own separate evaluation layers (state aptitude and branching, §4.5), and construction time is not considered at all (a known limitation, §4.13).

Each building's goods carry two independent parameters. Weight controls priority (which building gets built first) by being added to the supply/demand level. Offset controls expansion gating (how productive existing buildings must be before the AI builds more) by being added to the supply/demand level to produce the productivity requirement. Primary goods get low weight (1-4) and zero offset; secondary luxury goods get high weight (7-11) and high offset (4-6), creating a double barrier.

The weight/offset decomposition is what a two-consideration utility system would look like if you had to split two considerations across separate mechanisms. In an IAUS, weight and productivity would each be response curves, and their product would determine the final score: high weight × low productivity = maybe build; low weight × high productivity = maybe build; high weight × high productivity = definitely build. The decomposition can't express this product across candidates, because weight goes into the priority queue and offset goes into the productivity gate, and these two mechanisms don't multiply their inputs.

Instead it adds weight to the priority score and adds offset to the productivity threshold. These are additive decompositions of what should be a multiplicative interaction. The result is that the two parameters interact less gracefully than they would in a proper utility system: a building with very high weight but very low productivity still gets queued (because the high weight pushes it to the top of the priority list), and then it may fail the productivity gate. A utility system would never score it highly in the first place.

4.3 Gating expansion on real-world performance

Knowing there's a shortage doesn't mean you should build more. If existing mines are unprofitable, building more accelerates the bleeding. The productivity requirement connects market urgency to actual building performance.

The productivity level is calculated as supply_vs_demand_level + offset, then mapped to a minimum earnings threshold against the country-wide median. During extreme shortage with zero offset, a building needs to earn about 42% of the median to justify expansion. At mild shortage plus moderate offset, the bar rises to 84-96%. Above balance, only buildings earning well above median (up to 145%) get expanded.

A crucial discount prevents deadlocks: when a good is critically important (ranging from 5 for gold mines to 99 for railways), productivity requirements are discounted, divided by 1.30 for very crucial buildings (~23% discount) or by 1.15 for moderately crucial buildings (~13% discount). This prevents the AI from refusing to build something it desperately needs because existing instances are underperforming due to the very shortage the new building would resolve.

This is a heuristic approximation of a joint optimization over two variables (urgency and viability) that a planning system would solve by searching the action space. I decomposed it into two sequential checks: is there a shortage? (urgency) and are existing buildings profitable? (viability). The crucial-good discount is the escape hatch for the chicken-and-egg problem where the building is unprofitable because the shortage hasn't been resolved, and the shortage can't be resolved because the building is unprofitable. A planner would find a path through this; the mod hardcodes the exception.

4.4 Designing smooth degradation instead of cliff edges

Budget health (-3 to +3) uses two-dimensional threshold surfaces. Negative levels require weeks_of_reserves < 156 as a gate: a country with 3+ years of reserves cannot be in negative health regardless of cash flow. Positive health can be achieved through either sufficient surplus or sufficient reserves. Tax/wage adjustment is asymmetric: healthy budgets lower taxes and raise wages; negative budgets raise taxes first, then lower wages. Military wages floor at medium during active wars.

Spending shares are a command hierarchy implemented as a hard budget. Government administration gets 20% (plus up to 5% from lost taxes), university 10% (plus up to 2.5% for innovation deficit), port 10%, military 30% (barracks share hard-capped at 80%), and construction gets the residual plus the investment pool. An investment pool multiplier (up to 2.20) scales non-construction shares upward when private-sector funding covers construction costs. Country-specific military spending multipliers apply before 1870 (Egypt 2.5x, Turkey/Prussia 1.5x, declining to 1.0x). The shares are hardcoded: the military share stays at 30% regardless of war state, and I adjust what that 30% buys (via threat assessment, §4.11) rather than the share itself.

4.5 Deciding where to build, not just what

State aptitude scoring varies by building type with randomization within tiers. States at the same aptitude level are interchangeable, so the AI builds in different places each game, trading a small amount of optimality for geographic diversity.

Resource and agriculture protection adds a strategic twist: if a state has potential for critical resources (rubber, oil, coal, iron) or luxury crops (tea, coffee, dye), building a non-critical good there is penalized. This prevents the AI from filling mining states with gold mines when iron deposits are still available.

Branching adds a second dimension: within each aptitude level, states are filtered by incorporation status, infrastructure headroom, and workforce availability into four branches. Branches are interleaved across aptitude levels (A1B1, A2B1, A1B2, A2B2...), so branch quality takes priority over aptitude quality: an aptitude-2 state with ideal conditions builds before an aptitude-1 state with poor conditions. Within the same branch group, aptitude order is preserved. "Right conditions in a decent location" beats "great location with wrong conditions," but the interleave means this is a structural guarantee, not a soft preference.

The aptitude × branching system is a heuristic approximation of what a utility system with geographic considerations would compute. Multiplicative scoring isn't available in script, so the joint optimization gets decomposed into two sorting keys (aptitude, branch) and interleaved. This produces a partial ordering rather than a total ordering, but it prevents the catastrophic failure of building a coal mine in a state with no workers.

4.6 Solving the cold start: implicit bootstrapping through build order

When multiple building types end up at the same priority, the order attribute breaks the tie and encodes dependency chains:

1  Construction Sector
2  Government Administration
3  Railway, Port
4  Oil Rig, Tooling Workshops, Power Plant
5  Logging Camp, Whaling Station, Coal Mine, Iron Mine, ...
6  Lead Mine, Rubber Plantation, Paper Mills, ...
7  Glassworks, Motor Industry, Arms Industry, ...
...
15 Tea Plantation, Coffee Plantation, Arts Academy

Construction capacity first (need points to build anything), then administration (need bureaucracy), then infrastructure (need to move goods), then tools and power (inputs for everything downstream), then raw materials, then heavy industry, then military, then grain, then luxury goods last. The bootstrapping problem is mostly about avoiding catastrophic mis-sequencing, not finding the optimal path. A static heuristic tiebreaker gets the broad sequencing right while dynamic priority handles country-specific adjustments.

Manually linearizing a dependency DAG (directed acyclic graph: items with one-way dependencies where cycles are impossible, like "tools are needed before workshops") which a planning system would solve by searching for a valid build order given current state. The static order attribute is a tiebreaker, not a top-level controller: when buildings are at different priority levels (because their goods have different supply-vs-demand urgency), priority always wins.

Order only matters between buildings at the same priority, and in that case it correctly sequences tooling workshops before the factories that depend on them. The order numbers are hardcoded because script can't compute interdependencies of buildings at runtime, but it would be completely possible to generate the same ordering dynamically by computing dependency trees (e.g., half the economy depends on tools while coffee is just a luxury). The dependency structure is relatively stable across countries, which makes the hardcoded sort work in practice even though it's not inherent.

4.7 Engineering around language limits

With 49+ building types × multiple countries × multiple states, the naive approach is prohibitively expensive.

Data packing stores multiple values in single integers by digit position, roughly 5x memory reduction. The trade-off is safety: if any value exceeds its digit range, it corrupts adjacent cells silently. A C++ struct would fail a type check; a packed integer silently produces a wrong priority.

Code generation: A JavaScript toolchain produces repetitive Paradox script: 3,000 script values for data extraction, a 999-case switch for technology progress, state filtering across 40 combinations. When the language can't iterate over arbitrary collections like building types, you write a program that writes the iteration by emitting 49 nearly-identical script blocks. The meta-layer itself is the real AI architecture; the generated script is its compiled output.

Compatibility patches: 200 reserved slots with JavaScript generators and a GitHub issue tracker for ID registration. The compatibility system exists because new buildings are invisible to the AI without explicit registration (§1.4). When your target language lacks expressiveness, build a meta-layer that generates code.

4.8 What vanilla systems remain active

Not everything is replaced. Consumption tax AI, authority spending AI, and autonomous investment pool construction remain active. The three NAI defines disable construction, tax management, and government spending. The replace strategy doesn't mean replace everything. It means replace what's broken enough to justify the maintenance cost and what script can express. Consumption tax AI is left active because it can't easily be observed from script. Autonomous investment pool construction is left active because it represents private-sector decisions.

Not everything should be replaced, either. Production method selection is handled by vanilla AI with added ai_value weights (administration: 50,000 → 100,000 → 200,000 → 400,000), and some options get ai_value = 0 to block risky upgrades. The vanilla strike event chain is overridden so AI always breaks strikes rather than negotiate, because negotiating creates economic promises the AI might not follow through on, leaving lingering negative modifiers. This is one of dozens of event overrides across the three projects where some AI choices have very bad impact and need redirecting. No gameplay values are changed: only AI guidance is added or decisions overridden.

4.9 Breaking infinite wars: patient stalemate resolution

AI-vs-AI wars where both sides reach 0 war support can persist indefinitely. Every 30 days, advance a stalemate counter through 24 levels (~2 years). At level 24: secessionists win secessionist wars, higher-population side wins revolutionary wars, other wars get white peace. Two years of patience with type-aware resolution feels like the world working itself out rather than an arbitrary coin flip.

4.10 Building downsizing: graduated retreat instead of all-or-nothing

Knowing when to shrink is as important as knowing when to build. Government building downsizing is gated on bureaucracy_load >= 0.75 (shrinking government infrastructure while overloaded is counterproductive) and uses a graduated health-tier structure: progressively deeper cuts as budget health declines from +1 through -3. A slightly declining budget trims the most obvious excess; a fiscal crisis cuts deep.

Production building downsizing tracks "abandoned" buildings: when occupancy drops below 40% and doesn't recover over six iterations (~147 days), the building is removed. Production downsizing is blocked under laissez-faire law.

4.11 Military threat assessment and technology guidance

Military threat assessment calculates building targets by comparing against the global landscape: the top 6 countries by army power and top 6 by navy, averaged for a "typical threat" value, then converted to required building counts. A Newton's method square root implementation handles population-to-military-strength curves, necessary because Paradox script has no sqrt().

Technology guidance operates in three modes. Default (Assisted): conditionally redirects innovation toward critical techs (railways, nationalism, key military techs), preserving flexibility. Railroaded: forces a strict tech path, zeroing out natural innovation and manually adding progress to the target technology. Disabled: no intervention. The entire system exists because at the time of mod support, there were no AI weights for technologies. The vanilla AI had no way to prioritize which techs to research. This gap is what justified an over-engineered scripted solution. If the engine later added AI weight support for techs, much of this system could be replaced with simple weight overrides. If the redirection modifier is active but the scripted effect fails to fire, the country silently loses all innovation for that iteration.

4.12 Configurability and validation through shared logic

ARoAI provides 10 game rules: Power Level (0-100%), Construction scaling (0-200%), Building Priorities (Roleplay/Uniform), Research Assistance (Default/Railroaded/Disabled), autobuild for players, and stalemate prevention.

Default game rules give AI no stat advantages: the AI is smarter, not cheaty. Player autobuild uses the exact same evaluation, priority, and construction logic the AI uses, with per-category toggles. If players trust autobuild with their own economy, that's strong evidence the decision quality is real.

4.13 Known limitations and design trade-offs

Neither evaluation strategy accounts for construction time; a 52-week building and a 4-week one get the same priority. Two countries in a customs union can both identify the same shortage and queue construction simultaneously, leading to eventual oversupply. Stalemate resolution ignores territorial control. Production downsizing uses an abandonment heuristic (occupancy < 40% over time) rather than a direct profitability check, so chronically unprofitable but fully staffed buildings escape downsizing. Consumption tax revenue is not integrated into budget calculations. The budget cooldown of 35 days prevents oscillation but creates a response gap. And the one-building-type-per-day construction cap limits the AI to at most 12 building types per 28-day iteration.


5. Imperator: Invictus

The lightest-touch approach that still achieves the target behavior, embedded in a team project with a communication-first workflow. Every patch came with a detailed public dev diary. Where ARoAI replaces and ACAI coexists, Invictus compensates: many critical systems are hardcoded and can't be replaced, so the mod works around them. Where thin wrappers or weights sufficed (redirecting governor policy, steering economic decisions) the touch was light. Where the target behavior needed replacement logic (fort placement, road building) the mod built full evaluation and build systems. Road building disabled vanilla and ran alone; fort placement couldn't, so the replacement code also had to counteract vanilla's ongoing choices, making it heavier than a pure replacement. The constraint is the same either way: C++ logic can only be disabled entirely or worked around, not adjusted granularly for complex behavior.

5.1 Working within someone else's project

Imperator: Invictus is AI work embedded in a team mod with its own vision, codebase, and players. The Imperator economy is simpler than Victoria 3's, making weight-based manipulation viable. Many critical systems (fort placement, ship building, unit movement) are hardcoded and can't be directly adjusted for complex behavior. The architecture reflects the context: replacement isn't always wrong, but this project doesn't need it.

The team context adds a constraint that solo mods don't face: you must maintain compatibility with other contributors' work, and every change must be explainable to teammates who may need to maintain it later. This is partly why the dev diary practice emerged: it's a design tool for the team as much as documentation for players. The weight-based approach (tuning existing data rather than replacing systems) also has lower coordination cost: changing invention weights touches one file with clear semantics, while replacing a system requires coordinating with other contributors who may depend on the original behavior.

The evolution was incremental: 1.9.1 laid the foundation with invention and building weight rework; 1.9.2 added the first scripted systems (law management, character loyalty, state investment) that went beyond what weights could express; 1.10 leveraged the new modifier: syntax to make those systems precise and introduced the mercenary recruiting game rule (later expanded into Advanced AI); 1.10.1 added roads and fort placement, filling the last category of absent systems. Each patch extended the boundary between what weights could handle and what required script.

5.2 Human-readable AI weights: the relative chance system

When setting AI weights for hundreds of inventions, humans lose track of what the numbers mean. A 2.5x weight difference is negligible in a randomized system choosing from dozens of options. The initial approach, setting "pretty good" to 200 and "very good" to 500, failed because absolute best picks having only 5x higher weight than average ones doesn't produce reliable selection in a pool with dozens of variables. Pushing "very good" above 9,000 worked but left the codebase filled with magic numbers of various magnitudes that no future maintainer could interpret without memorizing the scale.

The Clausewitz weight system fails as a utility function in a specific way. The engine selects from available options using proportional random selection: probability = weight_i / sum(all_weights). In a pool of 20 options, setting one option to weight 200 while the rest average 50 gives that option only ~17% selection probability. The weight system's proportional-random selection means that meaningful preference differences require extreme weight ratios.

An IAUS would handle this with response curves that naturally map importance to [0,1] and multiply considerations, where the scoring is deterministic, the selection is greedy (pick the highest), and no weight inflation occurs. The Clausewitz weight system can multiply within a single candidate's weight block (conditional factor multipliers) and select proportionally randomly, but it lacks deterministic greedy selection and continuous response curves, so weight inflation is inevitable when you need strong preferences. You must work within this constraint.

Invention Relative Chance (IRC): a probability-based scale. IRC 35% means "this option has a 35% chance of being chosen if all others are at base weight". The formula converts a desired probability into the extreme weight ratio needed to achieve it in a proportional-random system: IRC 35% = weight 10, IRC 75% = weight 57, IRC 95% = weight 361. The 95% option is 35x more likely than the 35% option, a difference that would require typing ~9,000 as a raw weight.

The self-documenting nature of irc_35 immediately tells any maintainer the tier of importance. The probability framing prevents weight inflation and keeps values anchored to a meaningful scale. This is a human-factors solution to a language-design problem: the weight system doesn't support meaningful scales natively, so I built a semantic layer on top of it.

5.3 Structured building decisions: sequencing over optimization

The building system uses a layered sequence: non-cities first (mines, farming settlements, slave estates for trade goods and income), then ratio buildings in cities (Academy for nobles, Court of Law for citizens, Forum for freemen, Mill for slaves, matched to territory culture rights), then modifier buildings (Library, Training Camp, Market, Tax Office), then unique buildings conditionally (Foundry where trade goods are expensive, Great Temple for religious conversion, Grand Theatre for assimilation).

The same dependency-DAG linearization pattern appears here as in ARoAI's order attribute. Ratio buildings (which convert pops into output types) should precede modifier buildings (which enhance existing output), because a Library's research bonus is wasted if there are no Academies producing research to enhance. The topological sort is hardcoded as a layered sequence. The conditional logic on each layer (e.g., "don't build Academy if research efficiency is high") provides enough adaptability to handle common cases.

Once research efficiency is high enough (checkable after patch 2.0.5 via modifier:), the AI stops building Academies and switches to Forums and Mills. The mod's invention weights heavily prioritize research, which naturally leads the AI toward Academies; the efficiency cap prevents that preference from producing diminishing returns at the cost of manpower and taxes. Farming settlements receive very high priority in states at risk of starvation. Port management limits ports to 1-2 per state, deletes excess ports, and downscales intermediate-level ports to level 1 (the building slot cost isn't justified).

5.4 Guiding research without scripting it: trees, not picks

Instead of weighting individual inventions, weight the entire tree leading to important targets. Priority layers: discipline inventions (directly determining combat effectiveness), national tax inventions (revenue is the lifeblood of expansion), then secondary priorities (economic, siege capability, culture-specific trees), then lower priority (diplomatic, navy).

Within each layer, weights are close enough that randomness creates variety between AI countries: every AI knows discipline is important, but some focus military first while others prioritize economy. The IRC scale enables this: IRC 75% vs IRC 65% creates meaningful preference without being deterministic.

5.5 Building systems that simply didn't exist

Several game systems had no AI logic at all. Not underperforming, just absent. These are the purest examples of what modding AI looks like when the constraint isn't "the AI is broken" but "there is no AI". Standard game AI architectures don't have this category: a C++ AI architect would never ship a system without at least a default behavior. This happens because Paradox shipped features with player-facing mechanics but no AI to use them, and the mod fills the gap.

5.6 Governor policy and economic policy: character-driven decision making

Patch 2.0.5's modifier: syntax enabled precise governor policy selection. The system factors in character traits: corrupt governors prefer Acquisition of Wealth over Encourage Trade, Merciful governors avoid Harsh Treatment, disloyal governors refuse expected policies. Regions needing Religious Conversion or Cultural Assimilation require governors of the country's religion and primary culture.

The decision-tree structure maps to what a utility system would express as: utility(policy) = f(country_benefit, governor_personality, province_needs). The major interaction effects are encoded as branching conditions rather than continuous scores, the same pattern as ACAI's building logic. The modifier: syntax made this possible; the entire feature was data-blocked until the engine exposed the data.

The same syntax makes mercenary recruiting budget-aware and enables economic policy decisions based on actual conditions: Free Trade only when exports justify it, Harsh Taxation for countries over maximum research efficiency (shifting from research to revenue at diminishing returns), and a mercenary recruiting algorithm that checks budget limits and physical reachability.

5.7 Road building and fort optimization: working around hardcoded logic

Road building: Vanilla had road AI, but it required micromanaging armies with a road-building action enabled, which the AI couldn't handle. The result was very few roads. The mod replaces vanilla road building entirely: the AI pays gold and roads appear, preserving all conditions and prices (including having a suitable, non-busy army) but bypassing the army-micro pathway that vanilla required and the AI couldn't manage. Three phases: inter-region connections for levy delivery, dense province-level networks, then city-to-city roads.

The implementation builds a dynamic list of territories using Jomini variable lists, assigning each a depth value (distance from the starting point), then picks the territory closest to the target. It's a Dijkstra's-like shortest-path traversal that would be trivial in any language with recursion or priority queues but required creative use of variable lists here. This is only feasible because the Imperator map is finite and the road network is sparse; a market simulation with thousands of goods-flow paths would be impossible to express this way.

Fort optimization: A quasi-replacement running alongside hardcoded vanilla logic. Vanilla AI was meant to match forts with province capitals but couldn't enforce it or adapt after territorial shifts. The mod's system evaluates all options and builds where needed, but it also has to undo vanilla's ongoing bad placements before doing its own work. The priorities: match province capitals, prioritize border provinces, maintain higher-level forts in capitals, and deliberately expose inner provinces.

Roads are a full replacement: vanilla road building is disabled, the mod runs alone. Forts are a quasi-replacement: the mod builds a full evaluation system but can't disable vanilla, so some of that code exists only to undo vanilla's work. A true wrapper like governor policy redirect is thin — observe, redirect, done. Forts are thick: evaluate everything, undo vanilla, then build. The code that undoes vanilla is overhead that a pure replacement wouldn't need.

5.8 Removing crutches and unlocking new decisions

When AI tribes learned to found cities and enact laws, they started abusing special low-requirement reform decisions that existed because the AI couldn't manage standard requirements. Once the AI could actually play, the training wheels became exploits: AI tribes mass-reformed to monarchies within 100 years. The fix: force AI tribes to meet the same requirements as players. Similarly, once proper loyalty management existed, vanilla's cheating on political influence costs was removed.

Patch 2.0.5's modifier: syntax unlocked new decisions: research efficiency became queryable (the AI could stop over-building Academies), governor policy selection could use precise loyalty values, and mercenary recruiting could check actual maintenance costs. Each was a known problem with solutions waiting for the data.

5.9 Difficulty through smarter play, not bigger numbers

Invictus doesn't give the AI stat bonuses. Difficulty settings already do that. Instead, every improvement makes the AI play better: mercenary recruiting, internal development, road building, fort optimization, economic policy, invention and building priorities. The "Advanced AI" game rule makes some of these opt-in (mercs, heavy province investment, pop movement) because they lean on meta play, but even there the AI is spending the same resources a player would. No game rule hands the AI free stats.

5.10 Communication as part of the design process

AI behavior is invisible. Players can't tell why the AI did something, teammates don't know what assumptions the code relies on, and the next maintainer won't understand the design intent. Every Imperator: Invictus update came with a detailed dev diary explaining the design reasoning, the observed problems, and honest assessment of what still doesn't work.

Writing the explanation is a design tool. "I can't explain why this weight is 75% instead of 65%" signals that the choice is arbitrary. The dev diaries also serve as a historical record: when a later patch changes behavior, the original diary explains the original reasoning, preventing new contributors from "fixing" something that was deliberately designed that way. The practice emerged in a team context, but it's a general principle: any future project, solo or not, ships with dev diaries.


6. Comparison with other AI mods

These are not the only AI mods for these games. Other modders have approached the same problems from different directions, producing different trade-offs and revealing different aspects of what the scripting language can and can't do. Note that ACAI was built for Stellaris 2.1.3, now an ancient version. The game has changed enough that direct gameplay comparisons are no longer meaningful, but the design choices still illustrate approaches that generalize.

6.1 Stellaris: StarNet AI and Sovereign AI

StarNet AI (by salvor/OldEnt) takes a full-replace approach, overwriting personality definitions, opinion modifiers, diplomatic defines, and attitudes with modified values using a conditional inline-script system that falls back to vanilla when StarNet isn't loaded. Its distinctive contribution is the diplomatic model: empires initially cooperate during a ~10-year truce period, but once it expires, increased personality aggressiveness and amplified threat propagation cause militaristic empires to start wars while neighbors detect the rising threat and militarize in response, producing a chain reaction resembling a "Dark Forest" cascade. This cascade is emergent from tuned parameters (amplified SHARED_THREAT_MULT, heightened aggressiveness, modified attitudes that enable more war behaviors), not a scripted event chain detecting shipbuilding.

StarTech is a fork that extends the truce to ~40 years, reframing the same mechanic as a prisoner's dilemma where all empires act "superrationally" during the cooperation phase, which is the mod author's own framing. The Friendship Patch sub-mod reverts StarNet's diplomatic parameters to vanilla for players who want the economic improvements without the aggressive diplomacy, though personality combat behaviors and economic optimization remain StarNet's.

Where ACAI coexists with vanilla AI and pushes it in the right direction, StarNet replaces vanilla's diplomatic and personality systems entirely, a closer analogue to ARoAI's approach than to ACAI's. StarNet's diplomatic cascade is architecturally different from anything in the three projects analyzed here: it's an emergent behavior produced by tuned opinion modifiers, amplified threat propagation, and modified personality aggressiveness. No scripted event chain detects shipbuilding or triggers cascading wars. This is a notable example of what the Clausewitz weight system can produce when applied consistently: coherent emergent behavior from parameter tuning alone.

Sovereign AI (by Meme-Theory, for Stellaris 4.x, early development at v0.1.0) takes an even more radical approach: every building and district the mod define starts at weight = 0, and the AI only builds when it detects an actual resource deficit. The mod currently covers a fraction of the game's full building roster (the design philosophy extends to everything, but implementation is partial). A gate system (G1-G6) checks upstream resource sustainability before building converters: G1 prevents foundries when minerals are below 15/mo, G2 prevents research when consumer goods are below 2/mo, and so on.

Nine personality types (Aggressive Expansionist, Knowledge Seeker, Fortress Guardian, etc.) shift weight factors rather than using separate code paths, the same parameter-space variation principle as ACAI. Sovereign AI also builds an evidence-based development pipeline: a Node.js economy simulator validates AI building decisions before they ship, and MCP-based tools provide structured access to wiki data and save game analysis.

The deficit-driven weight-zero approach is the inverse of ACAI's positive-threshold approach: instead of "build X when resource X is low," it's "build nothing unless you can prove you need it". Both prevent cascade failures. ACAI ensures the AI addresses the most critical need first; Sovereign AI ensures the AI never builds something it can't sustain. The gate system is a formalized version of what ACAI's economy booleans do implicitly: check upstream conditions before committing to a downstream building.

6.2 Victoria 3: Kuromi's AI and Smarter AI

These mods chose the opposite end of the coexist-replace spectrum from ARoAI.

Kuromi's AI (by KuromiAK, ~68K subscribers) works almost entirely within the vanilla AI framework. The approach is parameter tuning: adjusting ai_weight values, defines, and AI strategy entries to make the existing AI less self-destructive, with occasional zero-weight overrides on specific behaviors rather than replacing entire systems. Key changes include removing hardcoded non-European economic penalties, adjusting surplus thresholds to keep AI economies from collapsing into debt spirals, and per-strategy and per-religion market interest patterns (specific AI strategies for countries like Qing, Prussia, and the US, plus religion-based goods preferences). The philosophy is explicit: "I respect the sandbox nature of the game. As such the changes are made to stay as close as possible to my interpretation of the developer's intent". Each change is individually tested for its intended effect.

Smarter AI (by TheGamingNot, ~7K subscribers) is a hybrid: mostly vanilla parameter tuning plus some scripted additions. It directs AI construction strategy toward the "construction loop" meta (iron + tools → more construction → more iron + tools) through AI strategy entries and event-driven decisions, adds event-based law selection, and improves tech picks. It stays within the vanilla AI framework and doesn't replace core vanilla systems. Known issues include the AI cutting military to near-nothing (economy-first focus undermines military balance), private-sector port and railway spam, and AI navy deletion, failures that ARoAI avoids precisely because it replaces the whole system.

The key comparison: ARoAI, Kuromi's AI, and Smarter AI represent three distinct points on the intervention spectrum for the same game. ARoAI replaces everything and achieves the most control but carries the heaviest maintenance burden and is now discontinued. Kuromi's AI stays closest to vanilla, preserves the game's sandbox character, and remains actively maintained. Smarter AI pursues a meta-optimized middle ground but struggles with edge cases that full replacement or conservative tuning handle more cleanly.

The difference in outcomes is instructive. Kuromi's AI can't make the AI build an optimal economy. It can only prevent the worst outcomes within the vanilla decision framework. Smarter AI can push the AI toward meta strategies but inherits vanilla's inability to coordinate between subsystems. ARoAI could produce coherent economies precisely because it replaced the incoherent vanilla system entirely, but the cost was fragility: every game patch potentially broke everything, and the mod is now discontinued. The three approaches form a clean trade-off: more control requires more replacement, more replacement requires more maintenance, and more maintenance leads to more risk of abandonment.


7. Conclusion

Strategy game AI modding is not about making the AI play like a human. It's about making the AI stop playing against itself, and, when possible, making it play well. Coherence beats cleverness. Robustness beats optimality. As §6 shows, these aren't the only ways to approach the problem: other modders have found different trade-off points on the same spectrum, from conservative parameter tuning to deficit-driven architectures.

7.1 The gap between theory and language

Standard game AI uses architectures that compute, search, or plan: utility systems, planners, HTNs, behavior trees. Each requires something that Paradox script typically can't provide: arbitrary functions, symbolic state representations, search algorithms, or at minimum, functions that return values. The Clausewitz scripting language (the engine across all Paradox titles, sometimes called Jomini in newer versions) has events, triggers, effects, and weights instead. It can compute utility scores in limited contexts (inside iteration blocks with built-in weight parameters, through carefully structured script values, or even via variable lists and ordered_ effects that sort and pick candidates), but not as a general-purpose, composable scoring system without significant practical friction from data access limits and constrained math. Strategy games rarely use formal planning anyway; the combinatorial explosion of the action space makes it impractical. What they rely on instead, prioritized reactive logic, is roughly what Clausewitz script expresses natively.

7.2 Three crossing points

Each project found a different crossing point. ACAI coexists with vanilla AI, building a middleware layer: cheap architecturally, but accumulating archaeological layers and fighting a constant rearguard action. ARoAI replaces vanilla entirely, coming closest to a proper utility system (market supply-vs-demand as a response curve, weight/offset as decomposed utility parameters), but paying with 49 individually hardcoded buildings, a code-generation toolchain, and a packed-integer data scheme that corrupts silently on overflow. Imperator: Invictus compensates with true wrappers where possible (governor policy as thin redirect layer) and quasi-replacements where needed (forts as full systems burdened by coexistence with vanilla), and is the only project that treats communication with future maintainers as a first-class design concern.

The same spectrum appears in other mods, at different points. StarNet replaced vanilla's diplomatic and personality systems on Stellaris; Sovereign AI is building a deficit-driven replacement; Kuromi's AI chose conservative tuning on Victoria 3. Each position carries its own costs and capabilities.

7.3 Approximation with known losses

The recurring pattern across all three projects is approximation with known losses. Each principle from §2 trades something: quantized thresholds lose smooth trade-offs, static weights lose bidirectional feedback, hardcoded build sequences lose adaptability, decision trees lose continuous scoring. The losses compound. ACAI can't query actual fleet upkeep, so it estimates with 15-year scaling brackets and a 2× mineral-to-energy cost assumption. The economy booleans derived from that estimate drive building, colonization, and ship production — three subsystems making consequential decisions from the same noisy signal. A utility system would read the real number and weight it proportionally; the boolean chain passes the approximation through as truth, and each downstream decision amplifies the error.

7.4 The binding constraints

Two constraints bind all three projects more tightly than any architectural choice. Performance: script runs on the main game thread, every empire every tick, deterministic lockstep in multiplayer. This forces staggered evaluation schedules and deliberate idle periods. Data accessibility: the AI can only decide based on what the script can read. Victoria 3 can't read building definitions, so I hardcoded 49 buildings by hand. Stellaris can't read the ship build queue, so ACAI emulates it with six flags. Imperator couldn't check research efficiency until patch 2.0.5 shipped a new syntax. When the engine exposes new data, entire categories of improvement become possible overnight.

7.5 What worked

The fleet repair system that freezes damaged ships with +150% hull regen is crude, but it prevents half-destroyed fleets charging back into combat. Law management, character loyalty, national idea selection, and road building in Imperator weren't bad systems needing fixing. They were absent systems needing creation. The IRC system's core insight, that humans can't reliably assign raw weights across hundreds of options without a probability-based scale, transforms a maintenance nightmare into a self-documenting system.

The dev diary that explains why the AI builds Forums instead of Academies after reaching research cap is as valuable as the code that implements it. Without the explanation, the next person to touch that code will "fix" it back to Academies, not because they're wrong about what Academies do, but because they don't know what the current code is for.

7.6 Design philosophy

The pattern across these projects, and the approach I would bring to any constrained scripting environment: the lightest intervention that achieves the target behavior, not the minimum by default, but the minimum that suffices. Start from failure: observe what the AI does wrong, implement correct behavior, then optimize after seeing it work in practice.

Use utility-style scoring when the language supports it. Imperator's IRC system is a utility function, not a thresholding scheme: it maps preferences to probabilities and could operate at any granularity, but discrete steps were chosen for maintainability. Fall back to quantized thresholds when data access forces the issue; ACAI's economy booleans are pragmatic thresholding born from engine limits, not an inherent preference.

Don't accept the engine's constraints as immutable. Request the capabilities you need, because sometimes you get them: the NAI defines that made ARoAI possible existed because I asked Paradox for them. And write it down, because the next maintainer won't know why the code does what it does unless you tell them.

Back to articles