RDF and the Time Dimension – Part 2

December 10, 2008

In part 1 I claimed that you will run into problems when you try to model dimensional data in RDF: In basic RDF there is no way to properly model any form of dimension and with Named Graphs we are only able to model discrete dimensions. I received some feedback saying that the way I described the problem (in particular the distinction between discrete and continuous dimensions) was a bit unfortunate (and I agree). So before I get to describing my solutions, I’d like to rephrase the problem.

RDF and Dimensions – The Problem

What we want to model in OxPoints is the development of the University of Oxford over time. A very simple example would be the name of the Oxford University Computing Services (OUCS). OUCS exists since 1957 and was originally named “Computing Laboratory”. In 1969 OUCS was split in two, one branch becoming “Oxford University Computing Services”. Let’s say we simply want to model, that a resource identified by “OUCS” existed since 1957 and was named “Computing Laboratory” from 1957 to 1969. It then changed its name to “Oxford University Computing Services”. Now, why can’t we encode this in RDF? In part 1 I’ve outlined a proof sketch describing why the basic form of RDF does not support describing this kind of conditional data: The problem is RDFs notion of entailment. However, with the extension of “named graphs”, encoding this data should be an easy exercise: Create one graph named “1957-1969” that describes OUCS as “Computing Laboratory” and one graph named “1969-“ that describes it as “Oxford University Computing Services”. As I was pointed out (and I have to agree with it), not having enough names for all possible values does not stop us from talking and reasoning about a dimension (or call it a set of values). Take the real numbers as an example. There are many more real numbers than we can make up names for them. However, this does not stop us from happily talking and proving concepts about them. So what is the difference with RDF? If I asked you whether 2 is in the set of ]1,2[ \subset \mathbb{R}, then you would probably tell me, that it is not. However, you can only give me what I see as really the correct answer if you have the same understanding of the name ]1,2[ as I have (which is that ]a,b[ describes the open interval: $latex x \in \mathbb{R} | a < x < b}). If we have the common understanding, then we can deduce information from a given name. However, for a computer the name “1957-1969” is only a bunch of characters without any more meaning attached to it. Therefore a computer could not tell me whether or not 1960 is part of “1957-1969”. The name of a named graph is just a name (or more precisely a URI), and nothing that you could (or should) deduce information from. So how are we to encode our data in such a way, that allows us to query for:

  • What is (was) the name of OUCS today (in 1960)
  • What names did OUCS have from 1060-1980
  • When is the following true: OUCS is called “Oxford University Computing Services”

I have come up with two ideas on how to solve this problem. One using reification and relaxing the notion of entailment and one using named graphs.

Read the rest of this entry »