Talk to OxPoints! E-mail and Twitter

July 2, 2009
Tweeter tweeting #oxp

Tweeter tweeting #oxp

As our new OxPoints system is starting to take shape, people have started coming to us with more use-cases and feature requests. To aid in this process we’ve got two main methods of communication for you:

Twitter:
Follow us, message us @oxforderewhon
If you want to talk about OxPoints we encourage you to use the tag #oxp which will help us track all talk relating to OxPoints and hopefully provide you with a better service.

E-Mail:
erewhon@oucs.ox.ac.uk
If you prefer e-mail, chuck us some suggestions, problems etc at this address.


A simple library mashup

June 1, 2009

At the Erewhon workshop in December we asked people to choose/suggest applications for geodata. One of the favourites was: “Find the nearest copy of a book from a reading list (bearing in mind which libraries you can use, and the opening hours of libraries)” so we decided to use this as an example of how we’d begin to use Oxpoints data to enhance other services.

Ingredients:

A library search results page

A library search results page

We couldn’t easily get hold of the patron data (i.e. which libraries a user has access to), and the opening hours looked fairly indigestible in their current form (see example); so we decided to leave these out of this mashup. Read the rest of this entry »


RDF and the Time Dimension – Part 2

December 10, 2008

In part 1 I claimed that you will run into problems when you try to model dimensional data in RDF: In basic RDF there is no way to properly model any form of dimension and with Named Graphs we are only able to model discrete dimensions. I received some feedback saying that the way I described the problem (in particular the distinction between discrete and continuous dimensions) was a bit unfortunate (and I agree). So before I get to describing my solutions, I’d like to rephrase the problem.

RDF and Dimensions – The Problem

What we want to model in OxPoints is the development of the University of Oxford over time. A very simple example would be the name of the Oxford University Computing Services (OUCS). OUCS exists since 1957 and was originally named “Computing Laboratory”. In 1969 OUCS was split in two, one branch becoming “Oxford University Computing Services”. Let’s say we simply want to model, that a resource identified by “OUCS” existed since 1957 and was named “Computing Laboratory” from 1957 to 1969. It then changed its name to “Oxford University Computing Services”. Now, why can’t we encode this in RDF? In part 1 I’ve outlined a proof sketch describing why the basic form of RDF does not support describing this kind of conditional data: The problem is RDFs notion of entailment. However, with the extension of “named graphs”, encoding this data should be an easy exercise: Create one graph named “1957-1969” that describes OUCS as “Computing Laboratory” and one graph named “1969-“ that describes it as “Oxford University Computing Services”. As I was pointed out (and I have to agree with it), not having enough names for all possible values does not stop us from talking and reasoning about a dimension (or call it a set of values). Take the real numbers as an example. There are many more real numbers than we can make up names for them. However, this does not stop us from happily talking and proving concepts about them. So what is the difference with RDF? If I asked you whether 2 is in the set of ]1,2[ \subset \mathbb{R}, then you would probably tell me, that it is not. However, you can only give me what I see as really the correct answer if you have the same understanding of the name ]1,2[ as I have (which is that ]a,b[ describes the open interval: $latex x \in \mathbb{R} | a < x < b}). If we have the common understanding, then we can deduce information from a given name. However, for a computer the name “1957-1969” is only a bunch of characters without any more meaning attached to it. Therefore a computer could not tell me whether or not 1960 is part of “1957-1969”. The name of a named graph is just a name (or more precisely a URI), and nothing that you could (or should) deduce information from. So how are we to encode our data in such a way, that allows us to query for:

  • What is (was) the name of OUCS today (in 1960)
  • What names did OUCS have from 1060-1980
  • When is the following true: OUCS is called “Oxford University Computing Services”

I have come up with two ideas on how to solve this problem. One using reification and relaxing the notion of entailment and one using named graphs.

Read the rest of this entry »


RDF and the Time Dimension – Part 1

November 28, 2008

In RDF – an Introduction I claimed that introducing any kind of continuous dimension (for example, a time dimension) is not possible, if you follow the official interpretation given in the RDF specifications. Actually it is even worse: In basic RDF even discrete dimensions cannot be modeled.

In this post I will elaborate on my claims giving a detailed description of the problem. In part 2 I will propose a new interpretation of RDF Graphs, allow for dimensions into RDF. If you are new to RDF, or terms such as reification, entailment, fact or model don’t mean much to you, you might want to read my introduction to RDF since we need these terms to talk about RDF’s incapability of modeling dimensions. I will try to present everything in a semi formal way, using some mathematical notation, but to always try to keep the post understandable for those that would not define themselves as “math people”. However, I feel that a certain amount of formality is necessary, to outline the problem and proposed solution.

Continuous and Discrete Dimensions

Let’s start by trying to give you an idea, of what I mean by continuous and discrete dimensions in RDF. Think of a dimension as a variable d that can take values from a specified set (e.g. 1 and 2). You now define your triples (or facts) relative to the d. This means, that for d = 1 you have a different set of facts than for d = 2. Whether I now speak of a continuous or discrete dimension depends on the cardinality (number of elements) of the value set for d. If the value set contains an infinite number of elements I speak of a continuous dimension and if the number of elements is finite I speak of a discrete dimension. Since in our example the cardinality of the value set was 2 (|{1,2}|=2) we have a discrete dimension. Read the rest of this entry »


RDF – an Introduction

November 26, 2008

After deciding to implement the new OxPoints system with Semantic Web technologies (see OxPoints and the Semantic Web) I started to read up on all I could find on RDF (Resource Description Framework) and related technologies like RDFS and OWL. In particular I was looking for

  • specifications,
  • best practices and
  • reports on projects using RDF.

I was astonished to find that, even though many people talk about RDF, it seems that only very few have actually ever used it (i.e. outside academic studies). Or if they have, they at least did not tell anyone about it.
However, one thing, that I did definitely not expect to find was that there seems to be a fundamental design flaw in RDF. I thought about this a lot, and hope that by blogging about it, you will either tell me, that I am wrong and how to do it right, or that we might find a solution on how to solve the problem.

But before talking about what I think is wrong with RDF and proposing one way to solve that problem (yes, luckily I think there is a solution), we need to establish a common language, which is what I want to achieve with this introduction. If you are already familiar with RDF, you might want to have a look at the sections: Triples are Facts, Reification and Entailment. If you are new to RDF, I hope that this will give you a first start. However, I kept this introduction very short and so many aspects are missing. If you want to learn more about RDF I would recommend you to start with the RDF Primer, the introduction to RDF from the W3C. In most sections I have also linked the specific sections from the RDF Specifications.

I will try to assume as little previous knowledge as possible, but since RDF is not a trivial topic, I have to start somewhere. Basic knowledge of XML and some knowledge of mathematical notation would therefore probably be of help.

RDF (Resource Description Framework)

The Resource Description Framework (or short RDF) is a set of W3C specifications which were first published in 1999 and revised in 2004 (more information on the history of RDF can be found at its Wikipedia page or at the W3C pages on RDF). RDF is “a language for representing information about resources in the World Wide Web” (RDF Primer [http://www.w3.org/TR/REC-rdf-syntax/]).

So what are resources in the World Wide Web?

Read the rest of this entry »


OxPoints and the Semantic Web

November 22, 2008

In OxPoints – Providing geodata for the University of Oxford I told you about the old OxPoints system which is currently providing geolinking information for the University of Oxford and talked about what is wrong with it and why we want to start from scratch to create a new OxPoints.

Before we start talking about solutions let’s start off by defining what we want the new system to look like:

Blackfriars College on Google Maps

Blackfriars Hall on Google Maps

As we have seen, the old OxPoints system stores geo- and some additional information (such as for example images and postal addresses) on all 38 colleges and the other important university entities. It is able to export its information as KML (an XML based language for expressing geographic annotations) which can be imported into, for example, Google Maps or Google Earth. A simple frontend allows users to query the data and display the results directly in either Google Maps or Google Earth, or as KML.

But even though it wouldn’t tell you, the old system is already a bit more powerful than that. Let’s have a look at a typical OxPoints record like the one on Blackfriars: Read the rest of this entry »


OxPoints – Providing geodata for the University of Oxford

November 18, 2008

Two of the core deliverables for the Erewhon project are the creation of technical specifications for using geolocation for university resources and to compile a report on dynamic location-dependent information delivery services. Now, this certainly sounds very nice and there is even more information on the two deliverables to be found in the JISC application, but I thought it would be a good idea to tell you a bit more about what it is that we actually want to do and how we plan to meet the deliverables.

In this post I will tell you about OxPoints a simple geodatabase, which is currently in use at the University of Oxford and which we intend to redo, since it does not fulfill our requirements of a geodatabase for university resources.

OxPoints – the current system

Map of all colleges on www.ox.ac.uk

Map of all colleges on http://www.ox.ac.uk

If you browse the University websites you might come across a dynamically generated map of all of Oxford’s colleges. This map is generated using the Google Maps API and data (the longitude and latitude for each college) provided by a system called OxPoints. OxPoints was developed at OUCS to provide geolinking information for the University of Oxford and is able to output its data, for example, as KML which is the input format used by Google Maps and Google Earth.

A good question to ask now would be: “It seems to do the job. So why do you want to create a new one?”

To answer this, we have to dig a bit deeper into the current system and have a look at how it stores its data.
OxPoints uses an XML language called TEI (more information on TEI) to store information about colleges and units and associated buildings, rooms etc. A typical OxPoints record looks something like this:

<place type="college" xml:id="alls">
   <placeName>All Souls College</placeName>
   <place subtype="primary" type="building">
       <placeName>Lodge</placeName>
       <location when="2007-01-29T13:08:55.535Z">
           <geo rend="0">-1.253042221069336 51.75278555467572</geo>
       </location>
   </place>
   <place type="building">
       <place type="room">
           <placeName>Wharton Room</placeName>
       </place>
   </place>
</place>

What this bit of XML tells us is, that there is a college called All Souls College and that it owns two buildings, one located at -1.25 51.75 (longitude, latitude) and the other one without any geoinformation but with a room called Wharton Room.

It is easy to see, that this system allows us to store colleges and information on all buildings that a college owns and even all the rooms inside each building. So we should be able to answer queries of the form: “Give me a list of all rooms, owned by college A, that have a capacity greater than X and show them on a map”. But what about this query: “Give me a list of all the rooms, used by college A”?
The problem with this query is, that colleges tend to use buildings that they do not own, which is something that we cannot express directly in the current storage format. Since the information that college A owns building B is stored implicitly through the XML hierarchy, one solution would be to start copying all the building records for each used building into our college record, ending up in something like this:

<place type="college" xml:id="alls">
   <placeName>All Souls College</placeName>

   <!-- our own buildings -->
   <place subtype="primary" type="building" ownershipStatus="owned-by-us">
       <placeName>Lodge</placeName>
       <location when="2007-01-29T13:08:55.535Z">
           <geo rend="0">-1.253042221069336 51.75278555467572</geo>
       </location>
   </place>

   <!-- buildings that we use -->
   <place subtype="primary" type="building" ownershipStatus="used-by-us">
       <placeName>Museum</placeName>
       <location when="2007-01-23T10:21:44.462Z">
           <geo>-1.26018762588500 51.75536912069192</geo>
       </location>
   </place>
</place>

Now suppose, that the University consisted of only 10 colleges, each owning only one building, but using the buildings of all the other colleges. Instead of having 10 records, one for each building, we’d end up in having 100 records, 10 for each building, duplicating all the information. Obviously, this solution is not a really good one.

You might now say: “Well, XML knows about IDs. Why not use mechanisms to link to other elements”. Let’s have a look at how this might look like:

<place type="college" xml:id="alls">
   <placeName>All Souls College</placeName>

   <!-- our own buildings -->
   <place xml:id="some-building-id" subtype="primary" type="building" ownershipStatus="owned-by-us">
       <placeName>Lodge</placeName>
       <location when="2007-01-29T13:08:55.535Z">
           <geo rend="0">-1.253042221069336 51.75278555467572</geo>
       </location>
   </place>

   <!-- buildings that we use -->
   <place linksto="#some-building" ownershipStatus="used-by-us"/>
</place>

This is clearly a much better design, since we are not storing any redundant information in our system anymore. However, suppose one of our colleges stops using the rooms of college A. How would we reflect that in the database? One simple and efficient way to reflect that change would be to simply remove the link. Our database would after that change, again, reflect the current status of the University, but the information, that the college once did use those rooms would be gone forever.

When we thought about that problem and realized, that it would be indeed very nice to be able to have that extra dimension (allowing for queries like: “Give me a list of all the colleges that were present from 1500 to 1600”), we had to admit that the old system’s XML (and indeed any hirarchical XML) would not give us the flexibility that we want for our geolocation database.

One of the first tasks in Erewhon is therefore to create a new database schema, that gives us a great flexibility for expressing relationships between various university entities, that knows about time and is able to annotate any statement with time information and that is extendable so that all the information that we cannot yet think about, but that really should be in the system, can be added without changing the underlying schema (otherwise we’d end up, where we are at the moment, having to redo everything again, which is clearly something that we would like to avoid).

So much for the old OxPoints. I’ll try to keep you posted on any development and I’d be more than happy for any comments.