I was recently very fortunate to take part in a 3-day CEED workshop which gave a solid introduction to Markov decision processes (MDPs), a widely used framework for modelling decision-making in ecology and conservation.
Led by artificial intelligence expert Frédérick Garcia and MDP meistro Iadine Chadès, our foray into decision theory was a mixture of lectures and practical work (including an introduction to the MDP toolbox developed by Fred and Iadine for Matlab). We then honed our new found skills by replicating the findings from Mick McCarthy’s research on optimal ﬁre management for native shrubbery Banksia ornata.
So what are MDPs, and how can they be used to help solve conservation problems?
Consider a situation where you need to manage a plant population, such as a native Banksia, which is responsive to fire. Too frequent fire can harm the population, as there’s not enough time for young plants to mature and reproduce. However, if time interval between fires is too long, the population may become dominated by old, mature plants that don’t provide space for new plants to germinate and grow. When fire does arrive, the build up of fuel load can create conditions for catastrophic fire.
So in every year, there needs to be a decision about which action (lighting a prescribed ﬁre, suppression of fire, or do nothing) will maximise the population’s probability of surviving over the long term. Since the influence of fire on Banksia depends on the time since the last burn, we also need to consider the age and abundance of the population. Fire is more likely to be harmful to a small population with only young or only old trees, than one where there are more individuals within a range of different age classes
This problem translates into a Markov decision process, as it consists of:
- a number of system states (age and abundance classes),
- a number of possible actions (the management decision),
- the transition probabilities of moving from one state to another (that is, how likely an action will cause the population to move to a different age & abundance class), and
- the reward (or utility) of reaching a certain state at the end of the time period. In the case of McCarthy et al. 2001, it was assumed that this utility was proportional to the annual probability of ﬁre, with a higher utility given to younger populations to reflect a management preference for lower fuel loads.
To solve the MDP, an optimisation technique called stochastic dynamic programming (SDP) is often used to identify the action in each time step that will give the overall best outcome at the end of the management period (i.e maximise the total reward). SDP finds the exact solution to an MDP, but a downside is it can take too long to solve very large problems – the so called ‘curse of dimensionality’. So unless you’re willing to wait for a billion years to solve a problem with more than a handful of system states, heuristic algorithms which give a near-optimal solution can be a better alternative (see Wilson et al. 2006). There’s also some more recent work on a modified algorithm which gives near-optimal solutions to complex problems (Nicol et al. 2010).
All of this said, MDPs and SDP are still quite technical, and so to be most useful to real-life conservation management problems it is best if findings can be summarised into simple rules of thumb. For instance in the case of Banksia ornata, the results of an optimisation suggested that prescribed fire should ideally take place no more than once every 30 yrs, or more frequently if there was a preference to younger plants (with a lower fuel load). In another recent example (this time using Partially Observable MDPs, or POMDPs), Chadès et al. (2008) found that it is best to continue managing for a cryptic threatened species (in this case, the Sumatran tiger) even if we are not sure the species is present or not. But there the species is not seen for a number of years, it may be best to instead direct conservation resources elsewhere.
Probably the most useful part of the workshop for me was the time spent on discussing some of the assumptions behind MDPs. The first and most important is that MDPs must exhibit the Markov property – that is, the future state of a system depends only on the present state – the history of the decision process is irrelevant. This also implicitly assumes that a decision maker will decide on the best management strategy based only on the current state of the system, and not what has happened in the past.
In general, decision theory assumes that a decision maker cares only about maximising the expected utility of an outcome, and that the preferences for different outcomes are dictated by a clear set of rules. Simply put, we’re assuming rational economic behaviour. In case you needed reminding, people are not always rational! In particular, there might a number of factors aside from overall utility that have an influence on someone’s decision.
Finally, there is always uncertainty about the transition probabilities (that is, the effectiveness of a management action), and perhaps even uncertainty about the likely reward or utility of management. Making use of POMDPs is one way to find the optimal solution to a problem where we can only partially observe the system – for example, if we’re not sure about the presence of the Sumatran tiger. By being clear on what the assumptions are behind these methods, we can more easily work towards developing ways to overcome some of their limitations.
All that said, MDPs still provide a useful and robust method for decision making in conservation, and there’s a lot of potential for future development. Thanks for a great workshop, Fred and Iadine!
Chades, I., E. McDonald-Madden, M. A. McCarthy, B. Wintle, M. Linkie, and H. P. Possingham. 2008. When to stop managing or surveying cryptic threatened species. Proceedings of the National Academy of Sciences of the United States of America 105:13936-13940. doi: 10.1073/pnas.1016846108
McCarthy, M. A., H. P. Possingham, and A. M. Gill. 2001. Using stochastic dynamic programming to determine optimal fire management for Banksia ornata. Journal of Applied Ecology 38:585-592. DOI: 10.1046/j.1365-2664.2001.00617.x
Nicol, S. C., I. Chadès, S. Linke, and H. P. Possingham. 2010. Conservation decision-making in large state spaces. Ecological Modelling 221:2531-2536. doi:10.1016/j.ecolmodel.2010.02.009
Wilson, K. A., M. F. McBride, M. Bode, and H. P. Possingham. 2006. Prioritizing global conservation efforts. Nature 440:337-340. doi:10.1038/nature04366