Week 11: Notes

2-dimensional dynamic programming problems

Two weeks ago we began our study of dynamic programming. The dynamic programming problems we saw last week had a 1-dimensional structure. That means that each of those problems could be solved by writing a recursive function with 1 argument. The naive recursive solution was exponentially inefficient, since it solved the same subproblems over and over again. We found that we could solve these problems much more efficiently by filling in a 1-dimensional array in a certain order. This array recorded the solution to each subproblem, so that it was immediately available to use in solving other, larger, subproblems.

For example, previously we studied the rod-cutting problem. The naive recursive solution looked like this:

  // Return the best price for cutting a rod of length n, given a table
  // with the prices for pieces from lengths 1 .. n.
  static int profit(int n, int[] prices) {
      int best = 0;
      for (int i = 1 ; i <= n ; ++i)
          best = Max(best, prices[i] + profit(n - i, prices));
      return best;
  }

Notice that profit() is a function of a single argument "int n" (ignoring the constant array prices[], which could just be stored as a field outside the function). To compute the result more efficiently, we used a bottom-up approach that filled in a 1-dimensional array called best[], where best[i] held the best possible profit for a rod of size i, i.e. the value that would be returned by profit(i, prices).

This week we will study slightly more challenging dynamic programming problems that have a more general 2-dimensional structure. For these problems, a naive recursive solution will be a function with 2 arguments. Once again, this naive solution will run in exponential time. We can solve these problems more efficiently using bottom-up dynamic programming in which we fill in a 2-dimensional array or array-like data structure. We will need to be careful about the order in which we fill in the array elements. The solution to any subproblem instance will depend on the solutions to smaller instances, so we need to fill in the array in some order that guarantees that the dependent solutions will already be available when we compute any subproblem solution.

Some examples will make these ideas clearer. Let's consider the problem of finding the longest palindromic subsequence of a given string. For example, consider the string "watermelon stew". The string "wteretw" is a longest palindromic subsequence:

W A T E R M E L O N   S T E W
W   T E R   E           T   W

That's because this subsequence is a palindrome (i.e. it reads the same forwards and backwards), and there is no longer palindromic subsequence in the string. Note that the longest palindromic subsequence is not necessarily unique: "wtemetw" is another possibility.

We can solve this problem using two-dimensional dynamic programming. Suppose that the input string s has length N, with character indices from 0 to (N - 1). Let L(i, j) be the length of the longest palindromic subsequence of s[i .. j].

We must first find a recursive formulation of the problem that will allow us to compute L(i, j) recursively for any i and j. Our base case is straightforward: if i = j, then clearly L(i, j) = 1, since any 1-character string is a palindrome. In fact it will be useful to have an additional base case that is even smaller: if j < i, then s[i .. j] is empty, so L(i, j) = 0.

Now suppose that i < j, which will be the recursive case. We may consider two possibilities:

First suppose that s[i] = s[j] = c for some character c. Let P be any longest palindromic subsequence of s[(i + 1) .. (j – 1)], and let Q = cPc. Then certainly Q is a palindomic subsequence of s[i .. j]. Furthermore, no palindromic subsequence of s[i .. j] can be longer than Q. (To see this, suppose that R is a palindromic subsequence of s[i .. j] that is longer than Q. If R = cSc for some character c, then S must be a palindrome and must also be a subsequence of s[(i + 1) .. (j – 1)] that is longer than P, which is a contradiction. Otherwise, if R does not begin and end with c then R itself must be a subsequence of s[(i + 1) .. (j – 1)] that is longer than P, which is also a contradiction.)

So we see that Q is a longest palindromic subsequence of s[i .. j]. And so
Now suppose that s[i] ≠ s[j]. Then the longest palindromic subsequence of s[i .. j] does not include both s[i] and s[j], since if it did, it would begin with s[i] and would end with s[j] and would not be a palindrome. Therefore the longest palindromic subsequence of s[i .. j] must either be a palindromic subsequence of s[i .. j – 1], or of s[i + 1 .. j] (or possibly of both). So in this case

Since we have found a recursive pattern, we may now write a recursive function to compute L(i, j):

    // Compute the length of the longest palindromic subsequence of s[i .. j].
    static int len(string s, int i, int j) {
        if (j < i) return 0;
        if (i == j) return 1;

        return s[i] == s[j] ? len(s, i + 1, j - 1) + 2
                            : Max(len(s, i, j - 1), len(s, i + 1, j));
    }

It works:

 string s = "watermelon stew";
 WriteLine(lpsLength(s, 0, s.Length – 1));   // writes 7

However this function will run in exponential time.

To solve this problem more efficiently, we will use bottom-up dynamic programming to fill in a 2-dimensional array len[], where len[i, j] = L(i, j). As mentioned above, we need to be careful about the order in which we fill in the array. Specifically, when we compute len[i, j] we must already know len[i + 1, j] and len[i, j – 1] and len[i + 1, j – 1]. An illustration will help here. Supposed that s = "india". Here is a table showing the values of len[i, j]:

Notice the following:

Every entry on the diagonal is 1, since len[i, i] = 1 for all i.
Every entry below the diagonal is 0, since len[i, j] = 0 whenever j < i.
len[0, 3] = 3, since the longest palindromic subsequence of s[0 .. 3] = "indi" has length 3. When we recursively compute len[0, 3], we will need the values of len[0, 2], len[1, 2] and len[1, 3], as illustrated by the arrows above. I've only drawn these arrows for this one cell, but all cells in the table will have the same dependency pattern.
len[0, 4] = 3. This is the top-level answer we are seeking.

Now, we certainly cannot fill in this rows in this table from top to bottom, because then as we computed each row we would not already have the values below it. One possible order for filling in the rows is bottom to top, left to right:

When we fill in this order, as we encounter each cell (i, j) we will already know the value of its neighbors below, to the left, and diagonally below and to the left. And so we will be able to compute len[i, j].

Now, there is really no need for us to iterate over all the cells below the diagonal in the table above, since they all have value 0 and will be initialized to that value when we create our array. So instead we can fill in the table from bottom to top, proceeding rightward from the diagonal position in each row:

We now have a strategy, so let's write code to fill in the array:

    // Compute the length of the longest palindromic subsequence of s.
    static int longestPalindromic(string s) {
        // len[i, j] is the length of the longest palindromic subsequence of
        // s[i .. j].
        int[,] len = new int[s.Length, s.Length];
        int i, j;
        for (i = s.Length - 1 ; i >= 0 ; --i) {
            len[i, i] = 1;
            for (j = i + 1 ; j < s.Length ; ++j)
                len[i, j] =
                    s[i] == s[j] ? len[i + 1, j - 1] + 2
                                 : Max(len[i, j - 1], len[i + 1, j]);
          }
        return len[0, s.Length - 1];
    }

Our function will run in O(N²), where N = s.Length. This is a huge improvement over the exponential version.

Of course, we may want to know not only the length of the longest palindromic subsequence, but also the subsequence itself! So we'd like to extend our code to return that string. Here is one possible approach: after we have filled in the table above we can use the values in it to reconstruct the string we want. To do that, we start at the upper right, i.e. the value len[0, s.Length – 1]. We can reconstruct a path through the table that explains how that value was derived, and that path will reveal the string itself. At any cell (i, j), if s[i] == s[j] then we know that s[i] and s[j] are included in the string we want, and we can record those characters and proceed to (i + 1, j – 1). Otherwise, we proceed either to (i, j – 1) or to (i + 1, j), choosing the cell with the larger value. Here is an extended version of the function above that can reconstruct the string in this way:

  // Compute the longest palindromic subsequence of s.
  static string longestPalindromic(string s) {
      // len[i, j] is the length of the longest palindromic subsequence of
      // s[i .. j].
      int[,] len = new int[s.Length, s.Length];
    
      … same code as above for filling in the table … 
    
      // Now len[0, s.Length - 1] is the length of the longest palindromic
      // subsequence.  We now want the subsequence itself.  We need to build
      // the sequence from the outside inward, so we use two strings a and b
      // and will return (a + b).
    
      string a = "", b = "";
      i = 0;
      j = s.Length - 1;
      while (j >= i) {
          if (j == i) {
              a += s[i];
              break;
          }
          if (s[i] == s[j]) {
              a = a + s[i];
              b = s[j] + b;
              ++i;
              --j;
          } else if (len[i, j - 1] > len[i + 1, j])
              --j;
            else ++i;
      }
    
    return a + b;
  }

Study the function to understand how it works.

This code for constructing the longest palindromic subsequence may seem like a chore, and so you may be wondering if there is an easier way. Actually there is. When we build the table, instead of storing the length of the longest palindromic subsequence of s[i .. j] in each table cell, we can store the longest palindromic subsequence itself:

    // Compute the longest palindromic subsequence of s.
    static string longestPalindromic(string s) {
        // p[i, j] is the longest palindromic subsequence of s[i .. j].
        string[,] p = new string[s.Length, s.Length];
        int i, j;
        
        for (i = s.Length - 1 ; i >= 0 ; --i) {
            p[i, i] = s[i].ToString();
            for (j = i + 1 ; j < s.Length ; ++j) 
                if (s[i] == s[j])
                    p[i, j] = s[i] + p[i + 1, j - 1] + s[j];
                else {
                    string t = p[i, j - 1], u = p[i + 1, j];
                    p[i, j] = t.Length > u.Length ? t : u;
                }
        }
    
        return p[0, s.Length - 1];
    }

That was easier! This seems like a nicer solution for this problem.

For some other dynamic programming problems, however, it may be easier or more efficient to store intermediate values in the table and the reconstruct the final solution at the end. (Actually we did that for a couple of one-dimensional dynamic programming problems last week.)

The subset sum problem

Let's look at another classic problem that we can solve using two-dimensional dynamic programming: the subset sum problem. Given a set of positive integers and an integer k, we wish to determine whether any subset of the given set has sum k.

As usual, we'd first like to come up with a recursive formulation of the problem. As a clue to how to do that, recall how a couple of weeks ago we wrote a function that generated all subsets of a given set. Here is the general approach we took. Given a set S, we can take any element x from the set, and let S' be the remaining elements in S. If we recursively generate all subsets of S', then each subset is itself a subset of S. In addition, we can add x to any of those subsets to form a subset of S.

So now suppose that we are given a set of positive integers S and an integer k, and we'd like to know whether any subset of S has the sum k. Take any integer x from S, and let S' be the remaining integers in S. If any subset of S' has sum k, then that is a subset of S which has sum k. In addition, if any subset of S' has sum (k – x), then we can add x to that subset to obtain a subset of S which has sum k.

We can use this idea to solve the problem recursively:

// Return true if any subset of a[0 .. (i - 1)] has sum k.
static bool hasSum(int[] a, int i, int k) {
    if (k == 0)
        return true;   // the empty set is a subset, and has sum k
    if (i == 0)
        return false;  // set is empty, cannot have non-zero sum
    return hasSum(a, i - 1, k)             // we can make k without a[i - 1]
        || a[i - 1] <= k &&
           hasSum(a, i - 1, k - a[i - 1]); // we can make k by adding a[i - 1]
}
    
static bool hasSum(int[] a, int k) => hasSum(a, a.Length, k);

As usual, this native recursive solution may take exponential time to run.

The function hasSum above takes two parameters i and k (in addition to the constant array a). That is a sign that we can solve this problem using two-dimensional dynamic programming. We will need a two-dimensional array that holds the boolean value hasSum(a, i, k) for every possible value of i and k. In our bottom-up implementation, we will also call this array hasSum. Specifically, hasSum[i, k] will be true if any subset of a[0 .. (i – 1)] has sum k, just like in the recursive function above.

Once again we must consider the order in which we will fill the array elements. When we compute hasSum[i, k], we may need to know the value of hasSum[i – 1, j] for any value j in the range 0 <= j <= k. That shows that we should fill the array in increasing order of i. It does not matter whether we fill each array column in increasing or decreasing order of k, since the computation of hasSum[i, k] does not depend on any other values in the same column.

After we have filled in the array, if it turns out that there was some subset with sum k, we would like to generate that subset. As in other dynamic programming problems, we can use an iterative loop to reconstruct the solution from the array. At the beginning of each loop iteration, we have values i and k such that hasSum[i, k] is true, so we know that some subset of a[0 .. (i - 1)] has sum k.

If hasSum[i - 1, k] is also true, then we can make the sum k without using a[i - 1]. So we can simply subtract 1 from i, then proceed to the next loop iteration.
Otherwise hasSum[i - 1, k] is false, and we need to use a[i – 1] in our set that adds up to k. In this case hasSum[i - 1, k - a[i - 1]] must be true, since there must be some subset of a[0 .. (i - 2)]that adds up to k - a[i - 1], to give us the remaining part of the sum. So we can decrement i and subtract a[i - 1] from k, then proceed to the next loop iteration.

Combining these ideas, here is our bottom-up solution:

// Does any subset of the integers in a have sum s?  If so, return the subset;
// otherwise return null.
static int[] subsetSum(int[] a, int s) {
    // hasSum[i, k] is true if some subset of a[0 .. (i - 1)] adds up to k.
    bool[,] hasSum = new bool[a.Length + 1, s + 1];
    hasSum[0, 0] = true;   // we can make 0 from the empty set
    
    for (int i = 1 ; i <= a.Length ; ++i)
        for (int k = 0; k <= s ; ++k)
            hasSum[i, k] = hasSum[i - 1, k]    // we can make k without a[i - 1]
                        || a[i - 1] <= k &&
                           hasSum[i - 1, k - a[i - 1]];  // we can make k by adding a[i - 1]
    
    if (!hasSum[a.Length, s])
        return null;  // sum is not possible
      
    // Now construct the integers in the set.
    List<int> result = new List<int>();
    for (int i = a.Length ; i > 0 ; --i) {
        if (!hasSum[i - 1, s]) {
            result.Add(a[i - 1]); 
            s -= a[i - 1];
        }
    }
    
    return result.ToArray();
}

Once again, you may feel that having to reconstruct the solution at the end is a bother. Is there an easier way? Well, just as in the previous exercise we could store the solutions themselves in the array we are filling in. For example, instead of a two-dimensional array of booleans, we could store a two-dimensional array of List<int>:

List<int>[,] sumSet = new List<int>[a.Length + 1, s + 1];

And then for any i and k, if there is a subset of a[0 .. i - 1] whose sum is k, then sumSet[i, k] could hold a list of integers containing that subset, and could otherwise be null.

However this solution would be relatively inefficient. If we use lists to represent sets, then every time we want to construct a list that contains all the elements in a previous set plus a new integer, we must make a copy of the previous list. If the problem instance was large, all of these list copies could take a significant amount of time and memory.

If, however, we represent sets using linked lists rather than List objects (which are really arrays), then no copying would be necessary, since we can prepend an element to a linked list (while leaving the existing list intact) in O(1). So that would be a reasonable solution, and you may want to try to code that up as an exercise. Of course, even that solution would be less efficient than our solution above, since it is hard to beat an array of booleans if you are trying to save space or time. :) So is it worth using linked lists to avoid the extra loop at the end of the method above? You can make your own judgment about that. :)

game playing algorithms

Games such as Tic Tac Toe, checkers and chess are 2-player abstract strategy games, which are games with the following characteristics:

These games are deterministic, i.e. there is dice-rolling or other no element of chance. The same moves will always result in the same outcome.
There is no hidden information: both players have the same information about the game state. (By contrast, poker is an example of a game with hidden information: you can't see your opponent's cards.)

30 years ago the best human players could still defeat the top computer programs in games such as chess and Go. But this is no longer true. In the 1990s computer programs were developed (notably IBM's Deep Blue) that could defeat the top human chess players. And just in the last few years computer programs (notably Google's AlphaGo) have become stronger than the top human players at Go.

The newest and most powerful game-playing programs are based on neural networks. Those are beyond the scope of this course; we will instead focus on the classic minimax algorithm.

For any game we wish to play, we will implement a strategy, which is a function that takes the current game state as input and decides which move to play. Every deterministic game with sequential moves and no hidden information has some optimal strategy which will perform at least as well as any other strategy against any opponent. We would like our implementation to play an optimal strategy if possible.

By the way, a game with simultaneous moves may have no optimal strategy. For example, consider Rock, Paper, Scissors, in which player choose and present their moves simultaneously. In this game, for any strategy S there is some strategy that will be better than S against certain opponents. For example, if S is "always play Scissors", then against a opponent who always plays Rock, the strategy P "always play Paper" is better than S. In general there is no optimal strategy for this game.

However, games with sequential moves such as Tic-Tac-Toe and chess have optimal strategies. In Tic-Tac-Toe, if the first player plays optimally, they can never lose. If both players play optimally, the game will be a draw. (Most children realize this after playing Tic-Tac-Toe for a while.) Not every game is a draw with optimal play. For example, it has been proven that if both players play optimally in Connect Four, the first player will win.

For any abstract strategy game such as Tic-Tac-Toe, a game tree represents all possible sequences of moves. Each node of a game tree represents a state of a game in progress. Here is a partial game tree for Tic-Tac-Toe:

In the first move of the game, player 1 (X) has 9 possible moves. So the branching factor of Tic-Tac-Toe at the beginning is 9. In the next move, player 2 (O) has 8 possible moves, so the branching factor decreases to 8 at this point. As the game progresses, the branching factor decreases further. If every game of Tic-Tac-Toe continued until the game was completely full, there would be 9 · 8 · … · 2 · 1 = 8! possible games. However, the number of possible games of Tic-Tic-Toe is less than 9!, because many games will end before the board is full.

The total number of possible games of Tic-Tac-Toe is relatively small, since its average branching factor is fairly small and the game is short. As a result, we can write a program that explores the entire game tree of Tic-Tac-Toe. As we will see, this allows us to play an optimal strategy. More serious games have much larger game trees. For example, it is believed that there are at least 10¹²⁰ possible chess games. And so no program can ever entirely explore chess's game tree.

scores and minimax values

We will assign a numerical score to each outcome in a game. In Tic-Tac-Toe, we will represent a win for player X (who plays first) using the score +1, and a win for O using the score -1. If the game is a draw, we will say that the score is 0. Thus X wishes to maximize the game score, and O wants to minimize it. In fact for all games we consider we'll adopt a similar convention: the first player (called X or "Max") wants to maximize the score and the second player (called O or "Min") wants to minimize it.

Some games might have only two possible outcomes, if a draw is not possible. For these games, the only possible scores are +1 and -1. On the other hand, some games could have more than three possible outcomes. For example, we can imagine a game in which players can capture each other's pieces and the score at the end is the difference between the numbers of pieces that each player has remaining. In a game like this, a player wishes not only to win, but also to win by as much as possible.

Consider the following game tree for a hypothetical abstract strategy game:

tree

The game has only two moves. First X plays, bringing the game to either state a, b, or c. Now O plays, bringing the game to one of the nine states in the lowermost row. Each of these states is labelled with a numeric score. Once again, X wants to maximize the final score and O wishes to minimize it. How should X and O choose their moves?

In each of the states a, b, and c it is O's turn to play, and O sbould choose the move that yields the lowest score. So we may assign a value to each of these states, namely the minimum of the values of all successor states. This is the score of the outcome of the game, assuming that O plays perfectly, i.e. O chooses the move that will minimize the final score:

tree

In the start state it is X's turn to play. X should choose the move that leads to a state of maximal value. So we may likewise assign a value to the start state, namely the maximum of the values of all successor states. This will be the score of the outcome of the game, assuming that both X and O play perfectly.

tree

The game above is trivial, but we may apply the same analysis to a game tree of any depth, assigning each node its minimax value, obtained by minimizing whenever it is O's turn to play and maximizing when it is X's turn, starting at the leaves of the game tree and working upward. This process of labelling nodes in this way is the minimax algorithm.

Note the following two important points:

If we know the minimax value of every state, then we know how to play optimally. At every point, X should choose the move that leads to the following node with the highest minimax value; O should choose the move that leads to the following node with the lowest minimax value.
The minimax value at any state predicts the numerical outcome of the game, assuming that both players play optimally from that point onward. In particular, the mimimax value of the initial state is the score of the outcome of the game played from beginning to end, assuming perfect play by both X and O.

Now let's return to Tic-Tac-Toe. We have already noted that with optimal play by both players, Tic-Tac-Toe will be a draw. In other words, the minimax value of the initial state of Tic-Tac-Toe is 0. Here is the partial game tree for Tic-Tac-Toe that we saw before, with each node labelled with its minimax value:

Note the following:

After any first move by X, the outcome will still be a draw if both players play correctly.
In the last row above, it is O's turn to play. One node has minimax value +1, meaning that X has already won the game, assuming that X will play optimally from that point forward. Take a moment to see why that is true. Also, in the same row two nodes have the minimax value -1, meaning that O has already won the game at that point. That's because in both of those states O has a move that will force a victory, as a bit of reflection will show.
In the second row above, it is O's turn to move. Therefore the minimax value of each node in that row is the minimum of the minimax values of the nodes below it.
In the third row above, it is X's turn to move. Therefore the minimax value of each node in that row is the maximum of the minimax values of the nodes below it.

As mentioned above, the game tree for chess is far too large for us to draw (it has more nodes than there are atoms in the universe) and is likewise much too large for us to analyze by computer. But if we could draw such a tree and label its nodes with the minimax algorithm, then the value of the start node would be either +1 (indicating that with the right strategy White can always win), 0 (indicating that best play leads to a draw) or -1 (meaning that Black can always win). It is not known what this value would be, since we don't know whether White or even Black can always win at chess. (It certainly seems very unlikely that Black can always win, but it has not been proven that this is not the case.)

implementing minimax in C#

We can implement the minimax algorithm in C# straightforwardly using recursion.

Consider the following simple game, which we will call 21. A counter initially has value 0. Players A and B alternate turns, starting with A. On each player's turn, they may add 1, 2, or 3 to the counter. The first player who reaches the number 21 loses. With optimal play, who will win – A or B?

We can have a class with a name such as Game that represents the game state. A Game object might hold, for example, an 8 x 8 array representing a chessboard, with array values that show which pieces are where. Typically the game state class will have a move() method that makes a single move, enforcing the rules of the game.

Here is an implementation:

// A game of 21.
class Game {
    int counter = 0;
    public int turn = 1;    // whose turn it is to play (1 or 2)
    public int winner = 0;    // the player who has won, or 0 if nobody has won yet
    
    public Game() { }
    
    public Game clone() {
        Game g = new Game();
        g.counter = counter; g.turn = turn; g.winner = winner;
        return g;
    }
    
    public void move(int i) {
        counter += i;
        turn = 3 - turn;    // it is now the other player's turn
        if (counter >= 21)
            winner = turn;    // the current player has won
    }
}

class TwentyOne {
    // Compute the minimax value at the given game state.
    static int minimax(Game game, out int best) {
        best = 0;
        
        if (game.winner > 0)     // game is done
            return game.winner == 1 ? 1 : -1;

        int val = game.turn == 1 ? int.MinValue : int.MaxValue;
        
        for (int i = 1 ; i <= 3 ; ++i) {
            Game g1 = game.clone();
            g1.move(i);
            int v = minimax(g1, out int dummy);
            
            if (game.turn == 1 && v > val ||
                game.turn == 2 && v < val) {  // we have a new best move
              val = v;
              best = i;
            }
        }
        
        return val;
    }
    
    static void Main() {
        Game g = new Game();
        WriteLine(minimax(g, out int best));
    }
}

The method minimax() can compute the minimax value for any node, whether it is A's or B's turn to play.

Notice the important call to game.clone() in the recursive minimax() method above. In this object-oriented version, when we call game.move() it modifies the game state. If we did not clone the game state at each step, then there would be only a single Game object, and as soon as the recursion reached an ending state then the single Game would be done forever! In other words, the clone operation allows us to modify the state as we explore one part of the tree while allowing the state to remain intact for exploring other branches.

For some games that have a significant amount of state, it can be expensive to clone the game state each time the recursive search makes a move. So some game state classes may have a unmove() method that allows the caller to undo a move that was made. If the Game class for 21 above had an unmove() method, then instead of

        for (int i = 1 ; i <= 3 ; ++i) {
            Game g1 = game.clone();
            g1.move(i);
            int v = minimax(g1, out int dummy);

we could write

        for (int i = 1 ; i <= 3 ; ++i) {
            g1.move(i);
            int v = minimax(g1, out int dummy);
            g1.unmove(i);   // undo the move

With this approach, a call to minimax() will always preserve the previous game state, just like in the version with clone(). For games with complex state, this approach can be a significant performance win.

As an exercise, you may wish to implement unmove() in the Game class for 21 above. It is not difficult.