National Rail Train Locations

The purpose of my previous post on the TransXChange timetable data was to make it possible to track National Rail trains in real time. Due to the large number of stations making up the network and the fact that you can’t obtain information for a whole line in one go, the only viable option is to use timetable data. The other limiting factor is the lack of any kind of unique train identifier on the National Rail website (see: http://ojp.nationalrail.co.uk/service/ldbboard/dep/EUS ).

The preliminary results are shown below:

Trains going into or out of Waterloo for 16:42 on a weekday

The technique is quite simple and involves making requests to the National Rail website to probe the current positions of trains. We first ask where the trains should be at the current point in time based on the timetable. Then, we probe the running information for the stations just ahead of where the trains should be. Tested using the whole of the Greater London network, only 319 unique station requests were required to determine train positions out of a total of 2,575 stations. This number can be reduced even further as we only need to hit on a single station ahead of the train in order to find out whether it’s on time. The position can always be worked out from the timetable by asking where it should be on its route at the time now minus the late minutes.

Once all the data for the departure boards has been collected, the next stage is to match up trains to departure details for stations based on the passing points extracted from the TransXChange timetable. This links a train to the running service which tells us all the stopping points and times on its route, along with a unique route code. This unique route code is used to identify the same train on different departure boards so we can use the best position information available, in other words, the departure board that it is approaching next.

An interesting question is what happens if there is enough of a disruption to the services to make the timetable useless? In this situation, the concept of whether a train is late is meaningless, but we still have a system which can probe the departure boards and match trains using the runtimes between stations. Certain network geometries make it impossible to match trains accurately without timetable data if the destination is shared between two routes. For example, a “Y” section where two trains with the same destination code merge onto one line. Another complicating factor is the circular route, where trains all start at Waterloo and end up at Waterloo again.

Using TransXChange Train Timetables

For a project on transport data I needed access to the National Rail timetables to calculate passing points for trains at every station. The National Public Transport Data Repository (NPTDR) TransXChange data is available for download on the data.gov.uk site, but it’s easier to download from the Department for Transport site: http://data.dft.gov.uk/NPTDR/index.html

TransXChange is an XML format which supercedes the old “CIF” format files which were in a coded ASCII format. Once I had downloaded the October 2010 data, which is the latest available as this is only a yearly snapshot done in October, I then had to write some software to calculate all the passing points.

I’m using C#, so the first thing I did was to autogenerate a class matching the TransXChange schema. All the necessary schema files can be found at the following link: http://www.transxchange.org.uk/schema/schemas.htm My TransXChange data is using the 2.1 schema, so I downloaded that version.

Next, I used the following command to create a class for the schema using Visual Studio’s “xsd” tool:

[code]

xsd.exe -c -l:c# -n:TransXChangeXML
TransXChange_general.xsd TransXChange_common.xsd TransXChange_types.xsd TransXChange_registration.xsd
apd\AddressTypes-v1-3.xsd apd\BS7666-v1-3.xsd apd\CitizenIdentificationTypes-v1-3.xsd
apd\CommonSimpleTypes-v1-3.xsd apd\ContactTypes-v1-3.xsd apd\PersonalDetailsTypes-v1-3.xsd apd\PersonDescriptiveTypes-v1-0.xsd

[/code]

This generated a C# class called TransXChangeXML, although I do get lots of warnings about multiple definitions which I ignored.

The next part involved deserialising the “ATCO_490_TRAIN.TXC” file into my class. I’m using area 490 which is Greater London, so I had to unpack the relevant zip files from the TransXChange download.

After reading the TransXChange manual for some time and some experimentation, I worked out that the method for calcualting passing points is as follows:

FOR EVERY “<Service>”

Get the JourneyPattern id from inside “<StandardService>”, which is the journey reference code (e.g. “JP8755”) and also the sequence code from the “<JourneyPatternSectionRefs>” (e.g. “SEQ12SEC11”)

Lookup the “<JourneyPatternRef>” code in the “<VehicleJourneys>/<VehicleJourney>” section. This gives us an absolute departure time and a list of “<JourneyPatternTimingLinkRef>” references (e.g. “SEC162POS95”) containing runtimes and wait times for the “from” and “to” links. These sequence positions are merged with ones from “<JourneyPatternSections>” looked up using the “SEQ_SEC_” reference from earlier. The vehicle timing links have preference over the journey pattern timing links.

FOR EVERY TimingLink from previous stage

Link Arrival Time = Previous Link Departure Time + RunTime

Link Departure Time = Arrival Time + Wait Time from end of previous link (“TO”)

The code for this is copied below:

[code language=”csharp”]

//read TransXChange file
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.IgnoreWhitespace = true;
settings.IgnoreComments = true;

using (XmlReader reader = XmlReader.Create(Filename, settings))
{
XmlSerializer SerializerObj = new XmlSerializer(typeof(TransXChangeXML.TransXChange));
TransXChangeXML.TransXChange TXC = (TransXChangeXML.TransXChange)SerializerObj.Deserialize(reader);

//Build a JourneyPattern index by the section cdoe
Dictionary JPS = new Dictionary();
foreach (TransXChangeXML.JourneyPatternSectionStructure JourneyPattern in TXC.JourneyPatternSections)
{
JPS.Add(JourneyPattern.id, JourneyPattern);
}

//build an index of vehicle journeys by the JP ref code so that we can look them up easily
Dictionary VehicleJourneyByJPRef = new Dictionary();
//VehicleJourneys have departure times
foreach (TransXChangeXML.VehicleJourneyStructure VehicleJourney in TXC.VehicleJourneys.VehicleJourney)
{
string VJourneyCode = VehicleJourney.VehicleJourneyCode;
DateTime VDepartureTime = VehicleJourney.DepartureTime;
string VOperatorRef = VehicleJourney.OperatorRef.Value; //e.g. SW=SW TRAINS

string JPRef = "";
if (VehicleJourney.Item is TransXChangeXML.VehicleJourneyRefStructure) JPRef = (VehicleJourney.Item as string);
else if (VehicleJourney.Item!=null) JPRef = (VehicleJourney.Item as TransXChangeXML.JourneyPatternRefStructure).Value; //this must be an xsd error? All this just to get the JP code?

//TODO: need an array of the duplicates indexed by JPRef
if (!string.IsNullOrEmpty(JPRef))
{
if (VehicleJourneyByJPRef.ContainsKey(JPRef))
{
//System.Diagnostics.Debug.WriteLine("Duplicate key: " + JPRef);
}
else
VehicleJourneyByJPRef.Add(JPRef, VehicleJourney);
}
}

//Now go through all the services, linking them up to the VehicleJourneys via their JP reference. The VehicleJourney contains the
//parent sequence of stops which is then overridden by any JourneyPatternSectionRefs contained in the JourneyPattern.

//Services only have relative timings on them, but link to VehicleJourneys through the JP ref, which gives us absolute times
foreach (TransXChangeXML.ServiceStructure Service in TXC.Services)
{
TransXChangeXML.StandardServiceStructure StandardService = Service.StandardService;
string SDDestination = StandardService.Destination.Value;
string SDOrigin = StandardService.Origin.Value;
//The parent service destination and origin only seem to be to do with the grouping and not something you would actually display
//System.Diagnostics.Debug.WriteLine("Service: Destination = " + SDDestination + " Origin = "+SDOrigin);

foreach (TransXChangeXML.JourneyPatternStructure JP in StandardService.JourneyPattern)
{
string JPRef = JP.id; //this is the JP number
if (JPRef == "JP8755") //This is Shepperton
{
string DestinationDisplay = JP.DestinationDisplay.Value; //this overrides the StandardService Destination
//lookup the journey using the JP code, which gives us a departure time and the section links which can be overridden
TransXChangeXML.VehicleJourneyStructure VehicleJourney = VehicleJourneyByJPRef[JPRef];
DateTime DepartureTime = VehicleJourney.DepartureTime;

//make a list of the timing link overrides from the VehicleJourneys which also gives us waiting times and additional runtimes
Dictionary VTimingLinks = new Dictionary();
if (VehicleJourney.VehicleJourneyTimingLink != null)
{
foreach (TransXChangeXML.VehicleJourneyTimingLinkStructure TimingLink in VehicleJourney.VehicleJourneyTimingLink)
{
VTimingLinks.Add(TimingLink.JourneyPatternTimingLinkRef.Value, TimingLink);
}
}

//get journey pattern timing links from SEQ SEC ref number
System.Diagnostics.Debug.WriteLine("Departure Time: "+DepartureTime + " Destination display: " + DestinationDisplay + " JP id=" + JP.id);
//now traverse the Journey accumulating a TimeDelta at each stop relative to the VehicleJourney Departure Time
TimeSpan TimeDelta = new TimeSpan();
TimeSpan FromWaitTime = new TimeSpan();
TimeSpan ToWaitTime = new TimeSpan();
foreach (TransXChangeXML.JourneyPatternSectionRefStructure SeqSecRef in JP.JourneyPatternSectionRefs)
{
System.Diagnostics.Debug.WriteLine("REF: " + SeqSecRef.Value);
TransXChangeXML.JourneyPatternSectionStructure JourneyPattern = JPS[SeqSecRef.Value]; //lookup JourneyPattern using SEQ SEC ref number
foreach (TransXChangeXML.JourneyPatternTimingLinkStructure TimingLink in JourneyPattern.JourneyPatternTimingLink)
{
//todo: need activity in here…
TimeSpan RunTime = ParseTime(TimingLink.RunTime);

FromWaitTime = ToWaitTime; //weird – wait time on end of last segment rolled around to start of this segment
ToWaitTime = new TimeSpan();
if (!string.IsNullOrEmpty(TimingLink.To.WaitTime)) ToWaitTime = ParseTime(TimingLink.To.WaitTime);

if (VTimingLinks.ContainsKey(TimingLink.id))
{
TransXChangeXML.VehicleJourneyTimingLinkStructure VTimingLink = VTimingLinks[TimingLink.id]; //VehicleTimingLink lookup
if (!string.IsNullOrEmpty(VTimingLink.RunTime)) RunTime = ParseTime(VTimingLink.RunTime);
if ((VTimingLink.To != null) && (!string.IsNullOrEmpty(VTimingLink.To.WaitTime)))
{
ToWaitTime = ParseTime(VTimingLink.To.WaitTime);
}
}

System.Diagnostics.Debug.WriteLine("Arrive: " + (DepartureTime+TimeDelta) + " Depart: " + (DepartureTime+TimeDelta+FromWaitTime)
+ " " + TimingLink.id + " " + TimingLink.From.StopPointRef.Value + " " + TimingLink.To.StopPointRef.Value
+" Runtime="+RunTime+" FromWaitTime="+FromWaitTime+" ToWaitTime="+ToWaitTime);
TimeDelta = TimeDelta + FromWaitTime + RunTime;
}
}
System.Diagnostics.Debug.WriteLine("Final arrival: "+(DepartureTime+TimeDelta));

}
}
}
}

[/code]

I’ve limited the code to only produce the passing points for a single service, “JP8755”, which is a Waterloo to Shepperton service. This produces the following output:

[code]
Departure Time: 01/01/0001 05:12:00 Destination display: Shepperton Rail Station JP id=JP8755
REF: SEQ12SEC11
Arrive: 01/01/0001 05:12:00 Depart: 01/01/0001 05:12:00 SEQ12POS88 9100WATRLMN 9100VAUXHLM Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:01:00
Arrive: 01/01/0001 05:15:00 Depart: 01/01/0001 05:16:00 SEQ12POS89 9100VAUXHLM 9100CLPHMJM Runtime=00:04:00 FromWaitTime=00:01:00 ToWaitTime=00:01:00
Arrive: 01/01/0001 05:20:00 Depart: 01/01/0001 05:21:00 SEQ12POS90 9100CLPHMJM 9100ERLFLD Runtime=00:03:00 FromWaitTime=00:01:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:24:00 Depart: 01/01/0001 05:24:00 SEQ12POS91 9100ERLFLD 9100WDON Runtime=00:04:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:28:00 Depart: 01/01/0001 05:28:00 SEQ12POS92 9100WDON 9100RAYNSPK Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:31:00 Depart: 01/01/0001 05:31:00 SEQ12POS93 9100RAYNSPK 9100NEWMLDN Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:34:00 Depart: 01/01/0001 05:34:00 SEQ12POS94 9100NEWMLDN 9100NRBITON Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:37:00 Depart: 01/01/0001 05:37:00 SEQ12POS95 9100NRBITON 9100KGSTON Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:40:00 Depart: 01/01/0001 05:40:00 SEQ12POS96 9100KGSTON 9100HAMWICK Runtime=00:02:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:42:00 Depart: 01/01/0001 05:42:00 SEQ12POS97 9100HAMWICK 9100TEDNGTN Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:45:00 Depart: 01/01/0001 05:45:00 SEQ12POS98 9100TEDNGTN 9100FULWELL Runtime=00:04:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:49:00 Depart: 01/01/0001 05:49:00 SEQ12POS99 9100FULWELL 9100HAMPTON Runtime=00:04:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:53:00 Depart: 01/01/0001 05:53:00 SEQ12POS100 9100HAMPTON 9100KMPTNPK Runtime=00:03:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:56:00 Depart: 01/01/0001 05:56:00 SEQ12POS101 9100KMPTNPK 9100SUNBURY Runtime=00:02:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 05:58:00 Depart: 01/01/0001 05:58:00 SEQ12POS102 9100SUNBURY 9100UHALIFD Runtime=00:02:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Arrive: 01/01/0001 06:00:00 Depart: 01/01/0001 06:00:00 SEQ12POS103 9100UHALIFD 9100SHEPRTN Runtime=00:05:00 FromWaitTime=00:00:00 ToWaitTime=00:00:00
Final arrival: 01/01/0001 06:05:00
[/code]

Comparing this with the South West Trains timetable for the service, I can check that all the arrival and departure times are correct. It’s worth pointing out that, while this data is an October 2010 timetable, the current timetable in operation hasn’t changed in that time. This code should also be treated with caution as it hasn’t been rigorously tested. As can be seen from some of the comments, there are parts of the TransXChange schema that I’ve ignored for the sake of simplicity, for example, the activity at the stop and whether it’s a “Dead Run”.

Now that I have a list of passing points for every station in Greater London, I can use the information to build a real-time train tracking system.

Force Directed Graphs for Visualisation of Search Results

I’ve been looking at the D3.js library for data visualisation in Javascript. This is the new evolution of Protovis from the Stanford Visualization group, and uses SVG to create dynamic visualisations from raw data.

What interests me is how large archives of geospatial datasets can be searched for relevant information. On MapTube at the moment we have a large amount of Census data, but it’s not particularly accessible as you can only search by keyword. Thinking ahead to the up-coming 2011 Census release, we really need a way of showing how datasets relate to one another. I’ve shown some examples before of how a network graph can be used to show how London data on population links to employment (workplace population), education, index of multiple deprivation and crime. Recently, I’ve been looking at how to use this technique to enhance the search results on MapTube. Using the MapTube database, which currently contains 715 maps, I create a graph where each vertex is one of the 715 maps, which are linked by non-directed edges where maps share a common keyword. The idea is that when the user inputs a search term, for example “population”, then a sub-graph is created of all the maps containing the “population” keyword. The image below shows this example displayed as a force-directed graph using D3:

Firstly, I should explain that all the edges labelled “population” have been removed as, otherwise, this would be a fully connected graph as they all contain the search keyword. The size of the red dots is also proportional to the number of hits that map has received, so it’s possible to see the more popular maps very easily. What this method of searching has over the standard list view is that you can see that there are three distinct groupings of results. The two U.S. population maps form their own group on the left, the Italian population map is on its own at the top and everything else forms its own UK based group. One other interesting property is that we can see that the “Female” population density map has received a lot more hits than its equivalent “Male” population density map.

The static image from the web page doesn’t do this example justice, as it’s possible to move the nodes around to investigate the structure further. The following link shows how the nodes can be moved around dynamically on the web page: http://youtu.be/TmRQx-Ogm0E

The example was created using a bespoke C# program to create the graph structure from the MapTube database. This was exported as a JSON file which contained the nodes with the map title and number of hits, along with the links as pairs of node numbers.

The D3 code is relatively simple as it’s just a modification of one of the “force” examples, but with an SVG circle and label node added as children of an “svg:g” parent node so that theĀ mapsĀ contain the map title:

[javascript]
var w = 960,
h = 500,
fill = d3.scale.category20();

var vis = d3.select("#chart").append("svg")
.attr("width", w)
.attr("height", h);

d3.json(/*"miserables.json"*/"MapTube.json", function(json) {

var force = d3.layout.force()
.gravity(0.02)
.charge(-120)
.linkDistance(/*30*/200)
.nodes(json.nodes)
.links(json.links)
.size([w, h])
.start();

var link = vis.selectAll("line.link")
.data(json.links)
.enter().append("line")
.attr("class", "link")
.style("stroke-width", function(d) { return Math.sqrt(d.value); })
.attr("x1", function(d) { return d.source.x; })
.attr("y1", function(d) { return d.source.y; })
.attr("x2", function(d) { return d.target.x; })
.attr("y2", function(d) { return d.target.y; });

var node = vis.selectAll("g.node")
.data(json.nodes)
.enter().append("svg:g")
.attr("class", "node")
.call(force.drag);

node.append("svg:circle")
.attr("class","circle")
.attr("r",function(d) { var r=d.hits/4; if (r25) r=25; return r; }); //this is going to be hits

node.append("svg:text")
.attr("class", "nodetext")
.attr("dx", function(d) { var r=d.hits/4; if (r25) r=25; return r; })
.attr("dy", ".35em")
.text(function(d) { return d.name });

force.on("tick", function() {

link.attr("x1", function(d) { return d.source.x; })
.attr("y1", function(d) { return d.source.y; })
.attr("x2", function(d) { return d.target.x; })
.attr("y2", function(d) { return d.target.y; });

node.attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; });
});
});

[/javascript]

This is still only an experiment as the layout rules need some work and I doubt that it will cope very well if there are large numbers of nodes and links, but it’s a different way of looking at search results.

Links:

D3

Examples

D3 Google Groups

InfoVis Toolkit

All the London Datastore Maps

This started out as an experiment in how to handle geospatial data published in Internet data stores. The idea was to make an attempt at structuring the data to make searching, comparison and visualisation easier. The London Datastore publish a manifest file which contains links to CSV files that are in the correct format for MapTube to handle, so I wrote a process to make the maps automatically. The results are one thumbnail map for every field in the first hundred datasets on the London Datastore. I stopped the process once I got to a hundred as it was taking a long time. A section of the results are shown as an image below, but the link goes to the full 10,000 pixel image created using the Image Cutter.

Link to full zoomable image

The name of the dataset and name of the column being visualised are shown in the top left of the map, while the colour scale is a Jenks 5 class range between the min and max of the data. This sort of works, but raises more questions than it answers about the data. To start with, one interesting thing that jumps out of the data is that there was a step change in London population around 1939. This comes from the “London Borough Historic Population” dataset, shown on the top two lines of the image above (it’s seven rows from the bottom of the zoomable image). The 1939 image is the third from the right on the top row.

Continue reading “All the London Datastore Maps”