Geospatial Data Analysis and Simulation

CSAP Research Streams

The three main streams of research undertaken by CSAP researchers in the Talisman project are:

Ongoing work in each of these streams is described in more detail below. Other recent work of interest includes:

Stream 1: Microsimulations models (MSM)

Microsimulation models (MSMs) were introduced by Guy Orcutt in the late 1950s in order to better represent economic systems. Traditional mathematical models are often based on aggregated or averaged values and individual characteristics can often become blurred or even disappear. Thus a MSM involves the creation of a synthetic population of individuals based on aggregate census data, which then allows us to examine the impact of policy changes at the level of the individual. A very good overview of microsimulation is provided in the recent publication by Prof Mark Birkin and Dr Belinda Wu:

Birkin, M. & Wu, B. (2012). A review of microsimulation and hybrid agent-based approaches. In A. J. Heppenstall, A. T. Crooks, L. M. See & M. Batty (Eds.), Agent-based models of geographical systems, pp. 51-68, Springer.

Below is a summary of ongoing or completed research in the area of MSM within TALISMAN:

Linking Twitter and MSM

Mark Birkin and Nick Malleson have begun work to link Twitter data to MSM. They have presented this work in Amsterdam:

  • Birkin, M. and Malleson, N. 2011. Microscopic simulations of complex metropolitan dynamics. Unpublished. ComplexCity. Amsterdam. 5-6 Dec 2011.

and at the NCRM Annual Meeting in Oxford (10-11 Jan 2013).

Kirk Harland has been working alongside researchers producing the Twitter Workbench to incorporate the Geographical Analysis Machine (GAM) spatial cluster hunting algorithm into the workflow for analysing the spatial distribution of tweets.

Flexible Modeling Framework (FMF)

A modeling framework has been developed for creating synthetic populations using microsimulation and simulated annealing. The software and a user manual are available. Contact Alison Heppenstall or Kirk Harland for more information. The framework will be used during the summer school at Leeds in July 2013.


Comparison of Methods for the Creation of Synthetic Populations

In this study, three different established methods for creating synthetic populations were compared including deterministic reweighting, conditional probability (Monte Carlo simulation) and simulated annealing. The results showed that the simulated annealing algorithm produced the most consistent and accurate populations but that each approach has advantages and disadvantages, which are discussed in the following paper produced from this work:

Harland, K. , Heppenstall, A. , Smith, D. and Birkin, M. 2012. Creating realistic synthetic populations at varying spatial scales: A comparative critique of population synthesis techniques. Journal of Artificial Societies and Social Simulation, 15(1).

Updating the GAM Cluster Hunter

The Geographical Analysis Machine (GAM) has been updated to make it a light-weight processing algorithm completely separate from GeoTools.  It has been successfully integrated into the Twitter Workbench. There is some further performance tuning yet to be done, parallel processing will be enabled in the algorithm and the processing flow will be restructured to make it more efficient.

Kirk Harland has worked alongside Health and Social Care Information Centre (HSCIC) staff over the past three months (Oct to Dec 2013) to produce a national geographical cluster analysis of mortality events for several health conditions.  Outputs from the work are currently being compiled for publication from a methodological perspective while further substantive analysis is being undertaken to better understand the patterns revealed.

Kirk has also worked to integrate spatial modelling approaches (spatial microsimulation and spatial interaction modelling) with outputs produced by the Population 24/7 project at the University of Southampton. A working prototype of an integrated model has been created producing initial results for one time period of a working day for the study area of Leeds.  Further work is planned over the coming months to expand the approach to produce a simulation for a full day and to consolidate the modelling approach.

Research Visit of Mark Birkin to Oakridge National Laboratory

Mark Birkin was a guest from 21 Oct to 1 Nov 2013 at the Oakridge National Laboratory spending time with the GIS and Computer Science teams.  ORNL have a group of more than 30 research scientists developing the Landscan product and related technologies for 24/7 demographic estimation, modelling and simulation (e.g. for emergency planning, disaster relief, short-term migration and population movements) both globally and at a fine spatial scale.  This work overlaps strongly with Talisman concerns in microsimulation, demographic modelling, spatial analysis and the exploitation of big data.

Stream 2: Agent-based models (ABMs) and Cellular Automata (CA)

Example from some work on agent-based modelling of burglary in Leeds

Agent-based models (ABMs) have been used at CSAP for more than a decade to model individual agents and their behaviour in situations where aggregate models are not able to capture the complexity of the problem. Recent work includes the application of a Genetic Algorithm to the optimization of the model parameters of an ABM of burglary in Leeds. This work was recently presented at GISRUK 2013:

Malleson, N., Heppenstall, A., See, L. and Evans, A. 2013. Optimising an agent-based model to explore the behaviour of simulated burglars. GISRUK, University of Liverpool, 3-5 April 2013

and will appear in the following book:

Malleson, N., See, L., Heppenstall, A.J. and Evans, A. (in press). Optimising an agent-based model to explore the behaviour of simulated burglars. In: Mago, V. and Daggabian, V. (eds.) Modelling and Simulation of Complex Social Systems.

Cellular Automata (CA) have been used extensively in the past to model urban dynamics as the grid-based structure and simple transition rules are well suited to characterizing this problem. An example of recent research in this area was on comparing different validation methods of a CA model of urban growth in Riyadh, Saudi Arabia, where the resulting book chapter can be downloaded from the link below:

Al-Ahmadi, K., See, L. and Heppenstall, A.J. (2013) Validating spatial patterns of urban growth from a Cellular Automata Model. In: Emerging Applications of Cellular Automata, A. Salcido (ed.)


Stream 3: Spatial Interaction Models (SIMs)

Visualising migration flow patterns globallySpatial interation models (SIMs) have a long history of research in the School of Geography, University of Leeds, where they were developed by Prof Alan Wilson (now in CASA) back in the late sixties / early seventies (Wilson, 1967; 1971). SIMs remain one of the core geospatial models in CSAP as they are as relevant today as they were in the past. SIMs are used to model flows between origins and destinations, e.g. for modelling the movements of people between locations.

Research on SIMs within the TALISMAN project is currently being undertaken by Michael Thomas, who has a PhD studentship on using SIMs for migration.  Michael began his studentship 1 October 2011 under the supervision of Prof John Stillwell and Dr Myles Gould. He is using data from Acxiom’s Research Opinion Poll for 2005, 2006 and 2007 to investigate the lifestyle characteristics of migrants vis a vis non-migrants. Initial work has involved reviewing the determinants of migration and establishing what micro data sets are available from UK censuses and surveys. He has begun to extract data from the files supplied by Acxiom, to clean and geocode the data and to compare the flows at the district level with equivalent flows from the 2001 Census and from the NHS Central Register.

His research findings to date are presented in the following working paper:

Thomas M, Gould M, Stillwell J. (2012) Exploring the potential of microdata from a large commercial survey for the analysis of demographic and lifestyle characteristics of internal migration in Great Britain. Working Paper 12/03. School of Geography, University of Leeds.

Since starting in October 2013, Robin has mostly been working on building SIMs in R and developing methods of calibrating these using Twitter data. This culminated in a paper submission to the journal Geo-spatial Information Science in December 2013. In 2014 Robin plans to work on a methods paper about testing and optimising the iterative proportional fitting algorithm, an NCRM working paper on implementing SIMs in R (based partly on Adam Dennet’s work) and will continue to explore the use of social media data to calibrate models of movement.


Wilson, A.G. 1967. A statistical theory of spatial distribution models. Transportational Research, 1, 253-269.

Wilson, A.G. 1971. A family of spatial interaction models and associated developments. Environment and Planning A, 3, 1-32.

All three research streams will come together in a summer school to be held at the University of Leeds in July 2013. More details can be found here.