Two of the core deliverables for the Erewhon project are the creation of technical specifications for using geolocation for university resources and to compile a report on dynamic location-dependent information delivery services. Now, this certainly sounds very nice and there is even more information on the two deliverables to be found in the JISC application, but I thought it would be a good idea to tell you a bit more about what it is that we actually want to do and how we plan to meet the deliverables.
In this post I will tell you about OxPoints a simple geodatabase, which is currently in use at the University of Oxford and which we intend to redo, since it does not fulfill our requirements of a geodatabase for university resources.
OxPoints – the current system
If you browse the University websites you might come across a dynamically generated map of all of Oxford’s colleges. This map is generated using the Google Maps API and data (the longitude and latitude for each college) provided by a system called OxPoints. OxPoints was developed at OUCS to provide geolinking information for the University of Oxford and is able to output its data, for example, as KML which is the input format used by Google Maps and Google Earth.
A good question to ask now would be: “It seems to do the job. So why do you want to create a new one?”
To answer this, we have to dig a bit deeper into the current system and have a look at how it stores its data.
OxPoints uses an XML language called TEI (more information on TEI) to store information about colleges and units and associated buildings, rooms etc. A typical OxPoints record looks something like this:
<place type="college" xml:id="alls">
<placeName>All Souls College</placeName>
<place subtype="primary" type="building">
<placeName>Lodge</placeName>
<location when="2007-01-29T13:08:55.535Z">
<geo rend="0">-1.253042221069336 51.75278555467572</geo>
</location>
</place>
<place type="building">
<place type="room">
<placeName>Wharton Room</placeName>
</place>
</place>
</place>
What this bit of XML tells us is, that there is a college called All Souls College and that it owns two buildings, one located at -1.25 51.75 (longitude, latitude) and the other one without any geoinformation but with a room called Wharton Room.
It is easy to see, that this system allows us to store colleges and information on all buildings that a college owns and even all the rooms inside each building. So we should be able to answer queries of the form: “Give me a list of all rooms, owned by college A, that have a capacity greater than X and show them on a map”. But what about this query: “Give me a list of all the rooms, used by college A”?
The problem with this query is, that colleges tend to use buildings that they do not own, which is something that we cannot express directly in the current storage format. Since the information that college A owns building B is stored implicitly through the XML hierarchy, one solution would be to start copying all the building records for each used building into our college record, ending up in something like this:
<place type="college" xml:id="alls">
<placeName>All Souls College</placeName>
<!-- our own buildings -->
<place subtype="primary" type="building" ownershipStatus="owned-by-us">
<placeName>Lodge</placeName>
<location when="2007-01-29T13:08:55.535Z">
<geo rend="0">-1.253042221069336 51.75278555467572</geo>
</location>
</place>
<!-- buildings that we use -->
<place subtype="primary" type="building" ownershipStatus="used-by-us">
<placeName>Museum</placeName>
<location when="2007-01-23T10:21:44.462Z">
<geo>-1.26018762588500 51.75536912069192</geo>
</location>
</place>
</place>
Now suppose, that the University consisted of only 10 colleges, each owning only one building, but using the buildings of all the other colleges. Instead of having 10 records, one for each building, we’d end up in having 100 records, 10 for each building, duplicating all the information. Obviously, this solution is not a really good one.
You might now say: “Well, XML knows about IDs. Why not use mechanisms to link to other elements”. Let’s have a look at how this might look like:
<place type="college" xml:id="alls">
<placeName>All Souls College</placeName>
<!-- our own buildings -->
<place xml:id="some-building-id" subtype="primary" type="building" ownershipStatus="owned-by-us">
<placeName>Lodge</placeName>
<location when="2007-01-29T13:08:55.535Z">
<geo rend="0">-1.253042221069336 51.75278555467572</geo>
</location>
</place>
<!-- buildings that we use -->
<place linksto="#some-building" ownershipStatus="used-by-us"/>
</place>
This is clearly a much better design, since we are not storing any redundant information in our system anymore. However, suppose one of our colleges stops using the rooms of college A. How would we reflect that in the database? One simple and efficient way to reflect that change would be to simply remove the link. Our database would after that change, again, reflect the current status of the University, but the information, that the college once did use those rooms would be gone forever.
When we thought about that problem and realized, that it would be indeed very nice to be able to have that extra dimension (allowing for queries like: “Give me a list of all the colleges that were present from 1500 to 1600”), we had to admit that the old system’s XML (and indeed any hirarchical XML) would not give us the flexibility that we want for our geolocation database.
One of the first tasks in Erewhon is therefore to create a new database schema, that gives us a great flexibility for expressing relationships between various university entities, that knows about time and is able to annotate any statement with time information and that is extendable so that all the information that we cannot yet think about, but that really should be in the system, can be added without changing the underlying schema (otherwise we’d end up, where we are at the moment, having to redo everything again, which is clearly something that we would like to avoid).
So much for the old OxPoints. I’ll try to keep you posted on any development and I’d be more than happy for any comments.