RDF and the Time Dimension – Part 2

December 10, 2008

In part 1 I claimed that you will run into problems when you try to model dimensional data in RDF: In basic RDF there is no way to properly model any form of dimension and with Named Graphs we are only able to model discrete dimensions. I received some feedback saying that the way I described the problem (in particular the distinction between discrete and continuous dimensions) was a bit unfortunate (and I agree). So before I get to describing my solutions, I’d like to rephrase the problem.

RDF and Dimensions – The Problem

What we want to model in OxPoints is the development of the University of Oxford over time. A very simple example would be the name of the Oxford University Computing Services (OUCS). OUCS exists since 1957 and was originally named “Computing Laboratory”. In 1969 OUCS was split in two, one branch becoming “Oxford University Computing Services”. Let’s say we simply want to model, that a resource identified by “OUCS” existed since 1957 and was named “Computing Laboratory” from 1957 to 1969. It then changed its name to “Oxford University Computing Services”. Now, why can’t we encode this in RDF? In part 1 I’ve outlined a proof sketch describing why the basic form of RDF does not support describing this kind of conditional data: The problem is RDFs notion of entailment. However, with the extension of “named graphs”, encoding this data should be an easy exercise: Create one graph named “1957-1969” that describes OUCS as “Computing Laboratory” and one graph named “1969-“ that describes it as “Oxford University Computing Services”. As I was pointed out (and I have to agree with it), not having enough names for all possible values does not stop us from talking and reasoning about a dimension (or call it a set of values). Take the real numbers as an example. There are many more real numbers than we can make up names for them. However, this does not stop us from happily talking and proving concepts about them. So what is the difference with RDF? If I asked you whether 2 is in the set of ]1,2[ \subset \mathbb{R}, then you would probably tell me, that it is not. However, you can only give me what I see as really the correct answer if you have the same understanding of the name ]1,2[ as I have (which is that ]a,b[ describes the open interval: $latex x \in \mathbb{R} | a < x < b}). If we have the common understanding, then we can deduce information from a given name. However, for a computer the name “1957-1969” is only a bunch of characters without any more meaning attached to it. Therefore a computer could not tell me whether or not 1960 is part of “1957-1969”. The name of a named graph is just a name (or more precisely a URI), and nothing that you could (or should) deduce information from. So how are we to encode our data in such a way, that allows us to query for:

  • What is (was) the name of OUCS today (in 1960)
  • What names did OUCS have from 1060-1980
  • When is the following true: OUCS is called “Oxford University Computing Services”

I have come up with two ideas on how to solve this problem. One using reification and relaxing the notion of entailment and one using named graphs.

Read the rest of this entry »


RDF and the Time Dimension – Part 1

November 28, 2008

In RDF – an Introduction I claimed that introducing any kind of continuous dimension (for example, a time dimension) is not possible, if you follow the official interpretation given in the RDF specifications. Actually it is even worse: In basic RDF even discrete dimensions cannot be modeled.

In this post I will elaborate on my claims giving a detailed description of the problem. In part 2 I will propose a new interpretation of RDF Graphs, allow for dimensions into RDF. If you are new to RDF, or terms such as reification, entailment, fact or model don’t mean much to you, you might want to read my introduction to RDF since we need these terms to talk about RDF’s incapability of modeling dimensions. I will try to present everything in a semi formal way, using some mathematical notation, but to always try to keep the post understandable for those that would not define themselves as “math people”. However, I feel that a certain amount of formality is necessary, to outline the problem and proposed solution.

Continuous and Discrete Dimensions

Let’s start by trying to give you an idea, of what I mean by continuous and discrete dimensions in RDF. Think of a dimension as a variable d that can take values from a specified set (e.g. 1 and 2). You now define your triples (or facts) relative to the d. This means, that for d = 1 you have a different set of facts than for d = 2. Whether I now speak of a continuous or discrete dimension depends on the cardinality (number of elements) of the value set for d. If the value set contains an infinite number of elements I speak of a continuous dimension and if the number of elements is finite I speak of a discrete dimension. Since in our example the cardinality of the value set was 2 (|{1,2}|=2) we have a discrete dimension. Read the rest of this entry »


RDF – an Introduction

November 26, 2008

After deciding to implement the new OxPoints system with Semantic Web technologies (see OxPoints and the Semantic Web) I started to read up on all I could find on RDF (Resource Description Framework) and related technologies like RDFS and OWL. In particular I was looking for

  • specifications,
  • best practices and
  • reports on projects using RDF.

I was astonished to find that, even though many people talk about RDF, it seems that only very few have actually ever used it (i.e. outside academic studies). Or if they have, they at least did not tell anyone about it.
However, one thing, that I did definitely not expect to find was that there seems to be a fundamental design flaw in RDF. I thought about this a lot, and hope that by blogging about it, you will either tell me, that I am wrong and how to do it right, or that we might find a solution on how to solve the problem.

But before talking about what I think is wrong with RDF and proposing one way to solve that problem (yes, luckily I think there is a solution), we need to establish a common language, which is what I want to achieve with this introduction. If you are already familiar with RDF, you might want to have a look at the sections: Triples are Facts, Reification and Entailment. If you are new to RDF, I hope that this will give you a first start. However, I kept this introduction very short and so many aspects are missing. If you want to learn more about RDF I would recommend you to start with the RDF Primer, the introduction to RDF from the W3C. In most sections I have also linked the specific sections from the RDF Specifications.

I will try to assume as little previous knowledge as possible, but since RDF is not a trivial topic, I have to start somewhere. Basic knowledge of XML and some knowledge of mathematical notation would therefore probably be of help.

RDF (Resource Description Framework)

The Resource Description Framework (or short RDF) is a set of W3C specifications which were first published in 1999 and revised in 2004 (more information on the history of RDF can be found at its Wikipedia page or at the W3C pages on RDF). RDF is “a language for representing information about resources in the World Wide Web” (RDF Primer [http://www.w3.org/TR/REC-rdf-syntax/]).

So what are resources in the World Wide Web?

Read the rest of this entry »