Ms. Pacman

**Algorithmus 1:** Generic MCTS
1.	start decision tree $V =$ { $v_{0}$ }
2.	while within computational budget do
3.	$\qquad$ reset the simulation to $v_{0}$
4.	$\qquad$ use $T R E E P O L I C Y (V)$ (usually UCB) to simulate from $v_{0}$ until a not yet expanded decision
5.	$\qquad$ create a new leaf node $v_{l}$ for that decision and append to $V$
6.	$\qquad$ use $R O L L O U T P O L I C Y$ to roll out the simulation until termination, receiving return $R$
7.	$\qquad BACKUP($ $v_{l}$ , $R$ $)$ updates the values $n_{v}$ , $Q_{v}$ of all parents of $v_{l}$
8.	end while
9.	return best child $argmax_{v}\ \frac{Q_v}{n_v}$ of children $v \in \mathcal \partial$ $v_{0}$

**Algorithmus 2:** Pseudocode von MCTS für Frau Pac-Man
1.	$M C T S ($ node $p$ , cumulative path length $l$ ):
2.	$\qquad$ if $l>T_{path}$ then
3.	$\qquad \qquad$ return Playout(p)
4.	$\qquad$ else if $E x p a n d a b l e N o d e (p, l)$ then
5.	$\qquad \qquad$ for $c \in C(p)$ do
6.	$\qquad \qquad \qquad c \gets NewLeafNode()$
7.	$\qquad \qquad \qquad$ Add new leaf $c$ to the tree
8.	$\qquad \qquad \qquad (R,c') \gets Playout(p)$
9.	$\qquad \qquad \qquad Update(c,R) ; Update(p,R)$
10	$\qquad \qquad\qquad$ return $(p, R)$ :
11	$\qquad$ else
12.	$\qquad \qquad$ Let $t$ be the tactic set at the root
13.	$\qquad \qquad$ $n_{p}$ $\gets$ $p$ . $n_{old}$ $+$ $p$ . $n_{\text{new}}$
14.	$\qquad \qquad$ for $c \in C(p)$ do
15.	$\qquad \qquad \qquad$ Let $i$ be the action associated with child $c$
16.	$\qquad \qquad \qquad$ $n_{i}$ $\gets$ $c$ . $n_{old}$ $+$ $c$ . $n_{\text{new}}$
17.	$\qquad \qquad \qquad v_i \gets M^i_t$
18.	$\qquad \qquad$ Select $a$ move $i$ that maximizes $X_{i}$
19.	$\qquad \qquad (c,R',\Delta l)\gets p.ApplyMove(i)$
20.	$\qquad$ if $R'_{survival}=0$ then
21.	$\qquad (R,c') \gets Playout(p)$
22.	$\qquad Update(p,R)$
23.	$\qquad$ return $(p, R)$
24.	$\qquad R \gets MCTS(c,l+\Delta l)$
25.	$\qquad Update(p,R)$
26.	$\qquad$ return $(p, R)$

¶ Einführung