全面戰爭：羅馬II》戰役人工智能的背後（第3部分，共5篇）｜人工智能與遊戲。 (Behind the Campaign AI of Total War: Rome II (Part 3 of 5)

字幕列表影片播放

Hi I'm Tommy Thompson, this is AI and Games and welcome to part 3 of the AI of Total War.
As the core systems of Total War have been established and redefined in the franchise
- a point I have discussed in the first two parts of this series - there is always a need
to strive for better. RTS games continue to be one of the most demanding domains for AI
to operate within and as such we seek new inspiration from outside of game AI practices.
With this in mind, I will be taking a look at 2013's Total War: Rome II - one of the
most important games in the franchise when it comes to the design and development of
AI practices. So let's take a look at what happened behind the scenes and what makes
Rome II such a critical and vital step in Total Wars future progression.
In part 2 of this series we concluded with an overview of the dramatic changes to the
underlying AI systems in Total War with the release of Empire, followed by Napoleon in
2009 and 2010 respectively. What was once a more simple and more manageable state-driven
and reactive AI system had made way for an adoption of the Goal Oriented Action Planning
system. A technique popularised by First Encounter Assault Recon. The GOAP implementation within
Total War was ambitious but struggled on launch with Empire, requiring patching and updating
both post-launch as well as in the following year's Napoleon. The same AI tech was adopted
in 2011's Total War: Shogun 2, with it proving to be a less challenging experience for the
systems involved. Shogun 2 returned to Japan, which provided a much more balanced mix of
ranged combat and melee, with less emphasis on gun-driven combat. Even the campaign AI
didn't struggle with the same problems as Empire and Napoleon, with a smaller and less
chaotic structure. But while it seems Creative Assembly was becoming content with the combat
systems, the campaign AI still needed more work. This resulted in some significant changes
under the hood during the Fall of the Samurai DLC for Shogun 2, which among other things
includes the naval warfare of Empire and Napoleon. One of the new problems this creates for players
is that the army and naval logic were until that point separate, meaning the AI needed
to be rewritten to consider how naval strategy could influence ground troops, such as being
bombarded on the coast line. At that point, the campaign AI's planning approach couldn't
foresee these issues well enough and was often stuck being reactive in its planning process
rather than deliberative and forging ahead on its own ambitions.
To resolve this, a new campaign AI system was prototyped in Shogun 2, which was later
expanded to create some rather seismic changes in Total War: Rome II.
2013's Total War: Rome II was a return to one of the most well-known entries in the
franchise, but with it came a rather seismic change for the campaign AI under the hood.
The drive for a more deliberative system that could consider the overlap between mechanics
resulted in a growing number of sub-systems responsible for individually managing the
budgeting of money, conducting diplomacy, selecting tasks for attacking and defending
- be they attacking enemy forces or laying siege to settlement - deciding what issues
take high priority, figuring out how to navigate an army safely across the map, not to mention
managing construction and taxes.
All of these require the AI to consider the overall suite of resources it has at its disposal
and how best to utilise them. The system is still reliant on the belief, desire and intention
system mentioned in part 2, but now the sheer number of combinations here are staggering.
Even if the system has decided on a smaller subset of tasks it wants to complete in a
given turn, there are still tens of thousands of different possible outcomes for that one
turn. The map for military deployment is quoted to have around 800,000 individual hex points
alone. How can the system hope to approach this sort of task at this scale?
The answer comes in the form of Monte Carlo Tree Search: an AI algorithm that had recently
taken academic research by storm and is making big waves in general intelligence AI research.
MCTS allows for the system to consider all of the different possiblities, explore the
ones that seem the most fruitful but also continue to consider alternatives. In time,
those alternatives might yield some strong outcomes, so this system is able to keep doing
things it knows are good for it, but also consider other opportunities along the way.
Now before we get into the meat of how the campaign AI in Rome II is managed through
MCTS, I need to take a moment to talk about how the algorithm works.
Monte Carlo Tree Search is a type of reinforcement learning algorithm: a branch of machine learning
algorithms that look at a problem and find good decisions by considering all possibilities,
while largely focussing on the ones it finds to be most useful. This is really useful when
you have a problem that is incredibly large and has a large number of possibilities, given
we might find a good decision to make, but we can't say with any certainty it's the best
decision. In order to have a better understanding of whether there are better options to take,
we need to consider alternatives periodically and see if they would be more useful. This
is known in reinforcement learning as the exploration/exploitation trade off. We want
to exploit the actions and strategies we have found to be the best, but must also continue
to explore the local space of alternative decisions and see whether they could replace
the current best. This is a difficult process to resolve, given that sometimes we need to
really explore a series of decisions to discover that an action that might look bad now, might
actually prove to be a really good idea somewhere down the line.
This is what MCTS does best: it explores all potential options for a given decision point,
isolates the best ones and then dictates which one is the best, both considering it's short
and long-term ramifications. The key component of MCTS the ability to run
a playout: where the AI effectively plays the game from a given starting point, all
the way to the end by making random decisions. Now it can't actually play the game to the
end, so MCTS uses what's called a forward-model: an abstract approximation of the game logic
that allows it consider the outcome of playing action X in state Y, resulting in outcome
Z. The algorithm gathers up all the decisions it can make in a given state of the game,
then runs thousands of random playouts across them in a structured and intelligent fashion.
It gathers data from each of these rollouts and concludes the process by selecting the
action that had the best rollout score. It's both incredibly powerful and strangely stupid
in its execution.
The smart part comes in how each rollout is decided upon and executed, to do this it relies
on four key steps: selection, expansion, simulation and backpropagation.
Selection takes the current state of the game and selects decisions down the tree to a future
state a fixed depth down the tree. Next up comes expansion: provided the state
we reached didn't end the game (either as a win or a loss), we expand it one step down
to and simulate the outcome. Simulation is the random playout phase: it
plays a game of completely random decisions from this point until it reaches either a
terminal state (where it wins or loses) or a simulation cap is reached. It then gives
back a result of how well it performed as a score. This is passed to the backpropagation
phase. In backpropagation: we update the perceived
value of a given state, not just to the state we ran the rollout, but every state that led
to it. So any score - be it positive or negative - works its way back up the tree to the starting
point.
Through those four phases, we can take decisions to a fixed point in the tree, simulate their
outcome and then propagate back the perceived value of it. Now doing this once isn't enough,
you have to do it thousands of times and balance which playouts to make. Different MCTS algorithms
balance it out so they shift focus to different parts of the tree periodically to ensure there
are no better solutions to be found it didn't otherwise spot. But once the playout limit
is reached, it's done and takes the action leading to the best scoring state.
What makes this system even more powerful, is that it's what we call an anytime algorithm:
meaning that it will always give an answer regardless of how many playouts we let it
take. So in a context like a game, where CPU and memory resources are pretty tight, if
it needs to stop evaluating the game at a moments notice, it will still give the best
answer it could within that time. Despite this, giving it a massive amount of CPU resource
won't result in godlike AI, given the knowledge accrued from repeatedly running playouts eventually
levels out.
Alright, with all the science out of the way, how does this all work in Rome II?
First I need to explain how the Rome II campaign AI manages itself. It's broken down into three
chunks: pre-movement, task allocation and post-movement.
- Pre-movement identifies threats and areas of opportunity for the player. It also budgets
resources, conducts diplomacy and selects skills for armies.
- Task allocation is conducted by a highly complex Task Management System - which is
the focus of the MCTS. The task system handles armies, navies, agents and actions related
to diplomacy. - Lastly there is post-movement: once all
units and such are moved and decisions made, the AI will then focus on construction of
buildings, setting taxes and technology research.
MCTS is responsible for managing two critical components of the task allocation systems:
the distribution of resources such that the AI can approach different tasks it wants to
complete and the execution of specific tasks. The tasks themselves are driven by a variety
of different task generation systems with their own focus or perspective. So while there
is a task generator for armies, there is also once for navies, diplomacy actions and much
more. The thing is that there are often way more valid tasks to execute than there are
available resources: the actual units on the map and money to spend. As such, the system
then prioritises which tasks it would complete by selecting the most viable and then allocating
resources to them.
In addition, task viability also carries some filtering to stop it trying to do anything
too stupid, such as removing actions that could cause diplomatic tensions, filtering
actions that could impact long-term strategies and also factoring what it had done recently
so it avoids contradicting itself. Once filtered, the tasks are then assessed using the MCTS
algorithm to grade their effectiveness and priority. With the best and more desirable
looking opportunities graded a higher priority.
After this, the MCTS is called on again in order to run resource coordination: or rather
now that it knows what it wants to do, it still needs to figure out how exactly to do
it. As such, once the system has made some approximations of appropriate targets and
their locations on the map, it will run more MCTS approaches on army movement and army
recruitment. Factoring the makeup of its own forces as well as the opponents in order to
determine where best to move current forces, as well as what types to recruit for future
turns.
In each case, the MCTS is limited such that it doesn't search all the way to the goal,
given that Total War as a game is so large that it would take too long for it to simulate
completing the game. In addition, the game is complex that simulating that far out won't
yield any useful outcome. In fact, it was quoted that the system is only capable of
looking one turn ahead before starting random playouts due to the complexity of the game.
Given the nature of Total War, the MCTS can only exhaustive search the entire state space
for the best action during the opening turns of the game. Over time the number of possible
states grows exponentially, to a point that it is simply beyond the algorithms reach.
Despite this, the anytime property of the algorithm ensures we will still get a useful
and intelligent decision from the system.
Rome II launched in September of 2013 to a largely positive response, but with a few
problems. Most notably, the campaign AI took quite a long time to make its decisions in
the launch build: taking several minutes to conduct campaign movements that most players
conduct in a minute or two, resulting in aggressive patching of the game for several weeks after
launch. In time this led to a noted improvement in campaign decision making that was received
favourably (though not univerally) among fans and critics.
Revolutions aren't easy, nor are they clean and the legacy of Total War: Rome II is no
exception. But it is nonetheless a major milestone for the development of AI systems and practices
in the commercial video games and has led the way for many a successor that is seeking
to adopt MCTS as part of its own AI toolchain. MCTS is a hot topic in contemporary AI research
and has shown many useful applications in fields of expert play and general intelligence.
To learn more about how it all works, be sure to check out the AI 101 on MCTS here on AI
and Games.
Thanks for watching this third entry in the AI of Total War. In part four, I'll be looking
at how the MCTS implementation was improved in Total War: Attila, combined with a deep
dive into just how exactly does the diplomacy AI work in more recent iterations of the game.

全面戰爭：羅馬II》戰役人工智能的背後（第3部分，共5篇）｜人工智能與遊戲。 (Behind the Campaign AI of Total War: Rome II (Part 3 of 5) | AI and Games)