Throwing the spatial analysis ‘baby’ out with the big data ‘bathwater’

Three days at the splendidly organised Twelfth International Conference on GeoComputation (Wuhan, China, 23rd-25th May) have provided a welcome opportunity for intellectual refreshment in the company of old friends and colleagues. Nevertheless an irritating feature of the meeting has been the apparently endless queue of speakers with diverse national and intellectual backgrounds all wanting to wax ever more lyrically on the need for new technologies in order to squeeze the value from rapidly overflowing reservoirs of social and physical data at ever finer scales of spatial and temporal resolution.
I was reminded somewhat of the old-fashioned idea of ‘throwing the baby out with the bathwater’, a multi-layered expression which conveys the general idea of a failure to distinguish the worthwhile from the worthless. In short, I’d like to hear a little bit less about all the new things we can do with our latest empirical goodies, and a bit more about how this helps us to build on the things to which many of us have already devoted the best of our careers.
It concerns me that the discipline of GeoComputation has the potential to become too easily harnessed to superficial readings of the ‘fourth paradigm’ rhetoric in which analytical pursuits are emasculated by the succubus of inductive reasoning. Quantitative geographers have been wrestling for the last sixty years with problems involving the generalisation and analytic representation of spatial problems. Social scientists could legitimately trace similar concerns back to Chicago in the 1920s, if not to scholars of the nineteenth century such as Ravenstein or Charles Booth. Typically such enquiry has wrestled gamely with the issue of a severe deficiency in Vitamin D(ata).
I’d be amongst the last to deny the possibilities for new styles of analysis with virtualised or real-time data, and the importance of emerging patterns of spatial and personal behaviour associated with new technologies such as social media. But surely we need to do this at the same as staying true to our roots. The last thing we need to do is to abandon our traditional concerns with the theory of spatial analysis when we finally have the information we need to start designing and implementing proper tests of what it means to understand the world around us. Wouldn’t it be nice to see a few more models out there (recalling that a model is no more or less than a concrete representation of theory) in which new sources of data are being exploited to test, iterate and refine real ideas which ultimately lead to real insights, and perhaps even real solutions to real problems?

Big Data Masochists must resist the Dominatrix of Social Theory

Here at the AAG in Los Angeles it has been good to witness at first hand the continued vibrancy and diversity of geographical inquiry.  In particular, themes such as Agent-based Modelling, Population Dynamics, and CyberGIS have been well represented alongside sessions on Social Theory, Gender and Cultural and Political Ecology.  At Friday’s sessions on big data and its limits (“More data, more problems”; “the value of small data studies”) I learned, however  that the function of geography is to push back against those who worship at the atheoretical altar of empirical resurrection, while simultaneously getting off on what one delegate branded the ‘fetishisation of numbers’.  The ‘post-positivists’ it seems are far too sophisticated to be seduced by anything so sordid as numerical evidence, while poor old homo quantifactus still can’t quite see past the entrance of his intellectual cave.

Exactly who is being raped here is a moot question however.  ( For those unfamiliar with the reference to the ‘rape of geographia’, see the cartoon here: http://www.quantamike.ca/pdf/94-11_MapDiscipline.pdf).  Speaking as someone who has spent 30 years in the UK working with GIS, simulation and geocomputation, it is hard to escape the conclusion that the dominatrix of social theory has deployed her whips and chains with ruthless efficiency.  Can we therefore be clear on a couple of points.  First, the role of geography (and geographers) is not to push back against specific approaches, but to provide the means for spatial analysis (and theory) which complements other (non-spatial) perspectives.  Second, whether my data is bigger than your data is not the issue, and any argument which couches the debate in terms of ‘the end of theory’ (cf Anderson, 2008 – http://www.wired.com/science/discoveries/magazine/16-07/pb_theory) is missing the point.  New data sources provide immense opportunity for creative approaches to the development of new ‘theories’ of spatial behaviour, and to the testing and refinement of existing theories (which I probably like to call ‘models’, although I’ve not evolved to become a post-positivist yet).

The challenges are tough.  It is possible that some current attempts to navigate the big data maze lack rigour because the sub-discipline is so new, but also after 30 years of chronic under-investment in quantitative geography our community struggles to access the necessary skills.  Far from pushing back against the mathematicians, engineers and other scientists it may be necessary to engage with these constituencies over a long period of time to our mutual benefit.  Observations about what any of us might like to do with numbers in the privacy of our own offices and hotel rooms are a less than helpful contribution to the debate.

How many social scientists does it take to transform a lightbulb?

I was invited to contribute to a round-table meeting to discuss Computational and Transformational Social Science which took place at the University of Oxford on Monday 18th February.  In the background papers for the meeting I learned that the International Panel for the Review of the e-Science Programme, commissioned by the UK Research Councils in 2009, had reported that:

“Social science is on the verge of being transformed … in a way even more fundamental than research in the physical and life sciences”.

It continues in a similarly reasonable vein:

“… the ability to capture vast amounts of data on human interactions in a manner unimaginable from traditional survey data and related processes should, in the near term, transform social science research … (t)he impact of social science on both economic and social policy could be transformed as a result of new abilities to collect and analyse real-time data in far more granular fashion than from survey data.”

In this context, the participants were asked to comment (briefly!) on three questions:

  1. What is the state of the art in ‘transformative’ digital (social) research?
  2. Are there examples of transformative potential in the next few years?
  3. What are the special e-Infrastructural needs of the social science community to achieve that potential?

My first observation about these questions is that they are in the wrong order.  Before anything else we need to think about examples where digital research can really start to make a difference.  I would assert that such cases are abundant in the spatial domains with which TALISMAN concerns itself.  For instance the challenge of monitoring individual movement patterns in real-time, of understanding and simulating the underlying behaviour, and translating this into benefit, such as in policy arenas relating to health, crime or transport.

In relation to the state of the art I am somewhat less sanguine.  My notes read that ‘the academic sector is falling behind every day’ – think Tesco Clubcard, Oystercard, SmartSteps, even Twitter as data sets to which our community either lacks access or is in imminent danger of losing access.  How do we stay competitive with the groups who own and control these data?  In relation to current trends in funding I ask whether it is our destiny to become producers of the researchers but no longer of the research itself?

As regards e-Infrastructure, my views are if anything even more pessimistic.  After N years of digital social research in the UK (where N>=10) are there really people who still believe that the provision of even more exaflops of computational capacity is key?  While data infrastructure could be of some significance, the people issues remain fundamental here – how do we engage a bigger community in these crucial projects (and why have we failed so abjectly to date?), and how do we fire the imagination of the next generation of researchers to achieve more?