Tube Delays, Mean, Variance and Numerical Precision

This was something I came across while trying to calculate expected waiting times for tubes and buses. I’ve collected several months’ worth of transport data and wanted to calculate the mean and variance of waiting times at every station and platform. This can be achieved in a few hours for the tube, but there are over 600 bus routes in London and many more stops, so I needed something more computationally efficient than the naive algorithm that I was using.

After looking around, I found the following recurrance formulas for mean and variance:

[latex]M_k=M_{k-1}+(x_k-M_{k-1})/k[/latex]

[latex]S_k=S_{k-1}+(x_k-M_{k-1})*(x_k=M_k)[/latex]

See: http://www.jstor.org/stable/2286154 for a comparison of different methods, but the original method dates back to a paper from 1962 by B. P. Welford published in Technometrics: http://www.jstor.org/stable/1266577. It’s also in Donald Knuth’s book, “The Art of Computer Programming, Volume 2: Semi-Numerical Algorithms”, so I probably should have read my own copy a bit more carefully. The section on “Numerical Precision” in floating point maths is essential reading for any kind of data-mining or mathematical modelling. Not just because of the mantissa size and “Very big number minus very small number equals no change” problem, but also because I want to use running mean and variance to build an adaptive system that can detect problems in the transport network as they happen.

At the moment, the real-time problem detection system for the tube uses statistics that I have pre-computed, so when a waiting time at a station exceeds what is normal, then it gets flagged on the map as a potential problem. With the bus data calculations being so computationally intensive, it makes more sense to use the running mean and variance formulas in an online system so that it adapts over time to what is considered to be the normal operating point of the system.

TALISMAN at the Research Methods Festival

Virtual burglar space/time movementsI recently attended the National Centre for Research Methods (NCRM) 5th Research Methods Festival. Researchers from the Talisman project presented in a few different sessions, presenting cutting edge work on methods for collecting data (with a focus on new crowd-sourced data) as well as methods for spatial modelling, simulation and policy analysis. All of the presentations are available on the Research Methods Festival website.

The left image, taken from one of my presentations, shows some of the movement patterns produced by a ‘virtual burglar’ as they move around Leeds. We can use models like this to explore crime patterns at an individual level, and try to predict the effects that crime-reduction initiatives will have. The image was created using a piece of software called GeoTime, which is designed to allow users to explore spatio-temporal patterns of human movement. For more information about the burglary model itself, have a look at my Research page (scroll down to “Agent-Based Modelling of Crime”).

Harnessing Unique Methods of Visualisation

The first time I saw a tube map of London was a long time before I actually ever went there. It was during my one summer living in downtown Toronto with a bunch of crazy girls. One of the saner ones had a poster of the tube on her bedroom wall. It was from the Tate Gallery, where the different tube lines were created with thick lines of paint, which I’ve found again here. At the time, I found this poster really fascinating. If you know anything about the underground in Toronto, it’s pretty uninteresting and wouldn’t make much of a visual object!

Then more recently, I came across a super wacky map of the London Underground made by Franceso Dans, a visitor to UCL from Goldsmiths spending 6 weeks in CASA. You can view his map and his motivation behind this amazing construct on his blog. He’s working on more of these tube maps for different cities as well as an application that will allow you to travel from A to B anywhere in the world without using an airplane. He’s harvesting all the information, e.g. train and bus timetables, from the internet to build the site.

If you’d like to spend some time at CASA working with experts in visualisation, digital media, data harvesting and crowdsourcing, then consider applying for one of our User Fellowships. CASA has developed a number of interesting ways of visualising ‘big data’ that might benefit your organization. Or maybe you have an interest in crowdsourcing and seeing how you can harness the power of the crowd in providing data. Or you’d like to see what analysing twitter feeds might mean for your business. If you are a non-academic and want to work with CASA on a project where you could pick up some visualisation or data analysis skills, then apply for a User Fellowship. The deadline is 21 Sep 2012 and there’s more information on the TALISMAN website.

Picking Up Raster GIS Skills

Do you need to work with satellite images or datasets that are gridded? By gridded I mean data that are stored in grid like cells such as heights of the earth (or a digital elevation model), a global land cover map or gridded populations of the world? There are many other gridded datasets available, e.g. climate data, maps of biomass, ecosystem services, etc.

Example of a satellite image on the left and a LIDAR DEM on the right

Or have you collected data using a GPS that you need to interpolate to a continuous surface like that shown below:

An example of interpolating points to surfaces

If the answer to any of these questions is yes, or if you’ve suddenly realised this might be useful for your research,  then come and learn more about raster data and how to manipulate these datasets in a Geographic Information System (GIS). On 26-27 July 2012, Dr Steve Carver will teach a 1.5 day course at the University of Leeds on how to work with raster datasets using ArcGIS. This course is open to staff and students at Higher Education institutions in the UK and Ireland.

The course will cover the following topics with practical exercises throughout to gains hands-on experience with the concepts and the software:

  1. Introduction to raster modelling in ArcGIS
  2. Importing and converting raster data
  3. Point-surface interpolation
  4. Digital elevation models and terrain analysis
  5. Cartographic modelling (which allows you chain processes together in a work flow and automate your modelling)

You can register for the course on the TALISMAN website or email Amy O’Neill if you have any questions. Alternatively, if you catch this post after the course has taken place, contact Amy and let her know that you are interested in future courses.