It is four years today since MapTube was launched at the Barbican and to mark this event, I’ve made some changes to how the home page displays. This is a bit of an experiment, but I’ve tried to make the home page display topical data by using RSS feeds from the BBC News page, the Guardian and our own CASA blog aggregator. The basic method is to construct a list of keywords and frequencies from the RSS feeds, removing any words on a “stop words” list like “a”, “and”, “or” etc. Then a network graph of MapTube’s maps is constructed where each vertex is a map which is linked by edges made from where maps share keywords. So, for example, all the “London” maps form a fully connected group. This is similar to my previous post on using “Force Directed Graphs for Visualisation of Search Results”: http://talisman.blogweb.casa.ucl.ac.uk/2012/01/23/force-directed-graphs-for-visualisation-of-search-results/
Network Graph of MapTube London Maps
Once the connections between the maps has been calculated, each vertex is visited in turn and assigned a topicality value based on the RSS word frequency of all the map’s matching keywords. This weight is then propagated through the network via any connected edges up to a distance of 2 links from the parent vertex, with the weight reduced by a factor of 1/(r^2), where “r” is the number of vertices traversed. I did experiment with how many links from the parent vertex to travel, but found that 1 or 2 links from the parent gave the best results. Any further than this and it just ends up giving weight to the maps with the highest number of connections.
As I stated at the beginning, this is still very much an experiment and I’ve deliberately built the system with enough degrees of freedom to allow for some tinkering with the algorithm. I can control which feeds we mine for the topical keywords, the stop words list can be edited (I had to put “us” back in as we have a lot of United States maps) and I also have the ability to add my own keyword weights. At the moment I’ve artificially inflated the real-time tube locations map to get it onto the front page along with our most popular map of the London Underground tube station locations, which is now three years old. The first run of the system on the live server produced high values for a lot of the air quality maps, which was an interesting result.
The biggest criticism I had of MapTube was that the home page always displayed the most popular maps, sorted by the number of hits. This meant that the most popular maps stayed on the front page by virtue of people always clicking on the top ones. We did try showing the most recently added maps for a while, but that didn’t work as lots of test maps get uploaded with no data on them. Hopefully, as this new topical maps system evolves, we should see MapTube as a much more dynamic source of geographical information.
One final point, but by knowing what data we have on MapTube that’s topical, we also know what’s topical that we don’t have and should perhaps try to track down and upload. This approach would form a closed loop geographic information system.