OxPoints and the Semantic Web

In OxPoints – Providing geodata for the University of Oxford I told you about the old OxPoints system which is currently providing geolinking information for the University of Oxford and talked about what is wrong with it and why we want to start from scratch to create a new OxPoints.

Before we start talking about solutions let’s start off by defining what we want the new system to look like:

Blackfriars College on Google Maps

Blackfriars Hall on Google Maps

As we have seen, the old OxPoints system stores geo- and some additional information (such as for example images and postal addresses) on all 38 colleges and the other important university entities. It is able to export its information as KML (an XML based language for expressing geographic annotations) which can be imported into, for example, Google Maps or Google Earth. A simple frontend allows users to query the data and display the results directly in either Google Maps or Google Earth, or as KML.

But even though it wouldn’t tell you, the old system is already a bit more powerful than that. Let’s have a look at a typical OxPoints record like the one on Blackfriars:

<place type="college" xml:id="blac">
  <placeName>Blackfriars</placeName>
  <event when="1221" type="officialstatus">
    <label>Foundation</label>
  </event>
  <event when="1921" type="officialstatus">
    <label>refounded</label>
  </event>
  <location type="address">
    <address>
      <addrLine>Blackfriars Priory, Oxford</addrLine>
      <postCode>OX1 3LY</postCode>
    </address>
  </location>
  <trait type="url">
    <desc>
      <ptr target="http://www.bfriars.ox.ac.uk/"/>
    </desc>
  </trait>
  <place subtype="primary" type="building">
    <placeName>Lodge</placeName>
    <location when="2007-05-22T11:50:22.34+01:00">
      <geo rend="90">-1.2603700160980225 51.756916532903084</geo>
      <note> recorded by Janet McKnight</note>
    </location>
  </place>
</place>

In lines 3 and 6 we see two event-tags. These tell us that Blackfriars was founded in 1221 and refounded 700 years later in 1921. Although the data is there, the system does not allow you to query for it (unless of course you bother to take a look at the XML directly). However, if this data was queriable this would already allow for some cool applications like creating historic maps (“Show all colleges that were present in 1600”) or even the creation of dynamic maps displaying the development of the University of Oxford over time.

The minimum requirements for the new OxPoints system are therefore:

  • Have all the information of the old system, and
  • allow for the information that was in the old system but that was not queriable to be queried.

But we do not want to stop here. As I have described in OxPoints – Providing geodata for the University of Oxford we would like to model usage of buildings and rooms. That is, to be able to say which rooms and buildings are used by which university entity. We want to store more events, like construction dates for buildings or when they were purchased by the university. And finally, we want to be able to model any kind of complex relationship between all entities stored in OxPoints (buildings, rooms, places etc. and colleges, units, research projects, departments etc.) and not just to reflect the current state (as in: “The University of Oxford consists of 38 colleges”) but to reflect it over time (as in: “In 1300 the University of Oxford consisted of 3 colleges: University College, Balliol, and Merton”). And while we’re at it, why shouldn’t it be possible for others to store data on entities stored in OxPoints in such a way that it can be easily mashed up? Well, since we do not know and we think this would be rather cool, we want to provide for it.

At first, the problem sounds as if a rather simple relational data model might be sufficient. Have one table for places, one for units and then a couple of relationship tables. But what if we  want to add new relationship types in a year’s time that we have not yet thought of? And how are we going to handle the time dimension? And how is this data to be mashed up?
Using relational data models always feels like we have to predetermine either too much or too little which would probably result in changing the data model quite frequently (for each new relationship that we want to model), or in having a very loose schema which would make querying it a very tedious task.

After trying different relational approaches, not being happy with any of them, we took one step back and thought about a completely different approach. What we want to do is to make assertions on entities (A college was founded in, The name of a building is, etc.); which is, when you think about it, exactly what the Semantic Web is about.

The idea behind the Semantic Web is to make all information on the web, which at the moment mostly comes in the form of simple text documents, machine-readable by using technologies such as RDF (Resource Description Framework) or OWL (Web Ontology Language). RDF and OWL were developed and are maintained by the W3C (World Wide Web Consortium) and provide means to describe resources (anything that can be uniquely identified) in a subject, predicate, object kind of way. A simple RDF triple could look like this:

oxp:college_123 oxp:hasName "Balliol" .

In plain English this could mean: “The resource identified by oxp:college_123 has a property called oxp:hasName and the property’s value is Balliol”.

We believe that Semantic Web technologies offer us almost everything we need for the new OxPoints system, and hence, we plan to go ahead with it. Before getting into any implementation details and problems I will give you (and me, since I am also rather new to the Semantic Web :-} ) an introduction to RDF, OWL and maybe some other related technologies.

I hope to see you back soon and I am, as always, happy for comments.

5 Responses to OxPoints and the Semantic Web

  1. Yihong Ding says:

    I think Semantic Web fits for the project. I am interested in watching how you might have achieved the goal. Good luck.

    Yihong

  2. […] After deciding to implement the new OxPoints system with Semantic Web technologies (see OxPoints and the Semantic Web) I started to read up on all I could find on RDF (Resource Description Framework) and related […]

  3. Andrew Chapman says:

    I am concerned at your apparent dismissal of RDBMS technology. XML and its variants were always designed to be for data exchange; to enable data stuck in complicated RDBMS (and other types of DB) to be retrieved and mashed up.
    XML, RDF were never intended to be datastores in their own right – just snapshots of the ‘real’ data which lurked in some datastore. They were ephemeral chunks of data abstracted from the whole for a particular purpose. Hence the limitations of their use; one of which is outlined above by you.

    The solution should not be to make XML into something it is not. RDBMS are designed to store complex datastructures and there are plenty of examples of databases dealing with complicated time-dependant data (most financial databases for example) or complex geo-spatial databases (just visit http://www.agi.org.uk).

    It is a useful systems analysis exercise to step back and see what exactly one wants to model; perhaps looking at outputs like the XML datachunks one might like to produce. That shouldn’t mean though that a proper RDBMS bases data design is pointless and that the data structures should be modelled and held entirely in XML.

    A better solution to my mind is defining a series of XML structures; say for property ownership, for property use, for historical events. Then a RDBMS structure can be defined that stores this data and can generate the required XML data.

    Trying to define some massive XML structure that incorporates *all* this data is unwieldy, and will require elaborate code to parse out those bits of data that might be required in any particular situation. It is inefficient too as XML requires far more bandwidth to transfer data than raw outputs.

    The task of the client should be to process as compact XML datasets as is compatible with the application being used. The bulk of the data-processing should be on the server. RDBMS systems are the engine in which this should be done – they are the result of years of development.

  4. amittelbach says:

    Dear Andrew,

    thanks for your thoughts on OxPoints. You’ve raised a few good issues, but here are my reasons for not agreeing with them.

    I very much agree, that RDBMS are designed to store complex data structures and that there are ways of introducing a time dimension into RDBMS. But we are talking about very closely defined data structures, when using RDBMS. All relationships between different entities have to be known beforehand. If you do not know the relationships between your entities, or in fact do not even know all the entities you want to model, you come into difficulties defining your RDBMS schema. So wouldn’t that mean, that we were making RDBMS technologies into something they are not?

    I do agree, that today XML can be seen as the quasi standard of data exchange but, I believe, that it was originally designed as a markup language to handle electronic publication. However, your claim that XML is not suited as a database back end is, I believe, wrong. Look at projects like eXist (http://exist-db.org/) a DBMS built entirely on XML technologies, or have a closer look at any of the leading RDBMS systems. Most do support native XML processing.
    This is, in my opinion, because XML comes bundled with a large toolset allowing you to, for example, query or transform your data, whereas using SQL you are stuck with having relations as your output format, needing specialized tools to process them further. We are not talking about storing one huge file of XML on disk and delivering that file to any application leaving it to process the data. There are more sophisticated methods of storing large chunks of XML (for example, using a XML database like eXist) and with XSLT you can easily create the tailored subset needed by any querying application. This does not mean, that I favor XML over RDBMS or believe that XML is superior to RDBMS. In fact, when I started to think about OxPoints I designed several RDBMS data models. But I think that RDBMS and XML are two approaches when it comes to storing large (and small) sets of data, and that both have their advantages and disadvantages.

    Anyway, as I said in the post, I do agree, that general XML and creating a huge schema for it to store everything we need is not the way to go forward with. That is why we are currently looking into RDF. Now throwing XML and RDF into one pot might make many people unhappy. In fact, RDF and XML have nothing in common whatsoever. XML/RDF is only one of many abstract syntaxes (see my introduction to RDF) to describe an RDF Graph. RDF on the other hand is data model used to describe resources (anything identifiable) and in particular relationships between resources.
    Many so called triple stores allow you to store your RDF Graphs in memory or in RDBMS systems, enabling you to store huge (several million triples) RDF Graphs and providing you with standardized query mechanisms (e.g. SPARQL). Although I am still not sure, whether RDF will offer us everything we need (see my posts on RDF), I believe that we can make it work. This does not mean, that we want to force RDF into something it is not (at least we hope, that this is not what we are doing). But we think, that RDF might give us many advantages. One would be that adding new entities and relationships to an RDF Graph is relatively trivial. Another one is, that we believe that with using RDF we offer other application developers a much nicer framework for mashing up our data. However, we are still in the designing phase and one possibility that we are not putting aside is that using some RDBMS back end for OxPoints after all.

  5. Pete Quinn says:

    Whichever solution on language is reached it should be able to meet the needs outlined to enable access to who uses which building and, hopefully, from there accessibility of building info for disabled staff, students and visitors.

    Pete Quinn

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: