RDF and the Time Dimension – Part 1

In RDF – an Introduction I claimed that introducing any kind of continuous dimension (for example, a time dimension) is not possible, if you follow the official interpretation given in the RDF specifications. Actually it is even worse: In basic RDF even discrete dimensions cannot be modeled.

In this post I will elaborate on my claims giving a detailed description of the problem. In part 2 I will propose a new interpretation of RDF Graphs, allow for dimensions into RDF. If you are new to RDF, or terms such as reification, entailment, fact or model don’t mean much to you, you might want to read my introduction to RDF since we need these terms to talk about RDF’s incapability of modeling dimensions. I will try to present everything in a semi formal way, using some mathematical notation, but to always try to keep the post understandable for those that would not define themselves as “math people”. However, I feel that a certain amount of formality is necessary, to outline the problem and proposed solution.

Continuous and Discrete Dimensions

Let’s start by trying to give you an idea, of what I mean by continuous and discrete dimensions in RDF. Think of a dimension as a variable d that can take values from a specified set (e.g. 1 and 2). You now define your triples (or facts) relative to the d. This means, that for d = 1 you have a different set of facts than for d = 2. Whether I now speak of a continuous or discrete dimension depends on the cardinality (number of elements) of the value set for d. If the value set contains an infinite number of elements I speak of a continuous dimension and if the number of elements is finite I speak of a discrete dimension. Since in our example the cardinality of the value set was 2 (|{1,2}|=2) we have a discrete dimension.

Here is an example for a discrete dimension. In my introduction to RDF I asked you, how you would model the following in RDF:

August is a summer month if you are in the northern hemisphere. If you are in the southern hemisphere it is a winter month.

Let’s think of this in terms of dimensions. We could say that we have a dimension d that can take one of two values: “northern hemisphere” or “southern hemisphere”. For d = “northern hemisphere”, we define the fact: “August is a summer month” and for d = “southern hemisphere” we define the fact: “August is a winter month”.

So what would be an example for a continuous dimension? The easiest I can think of (and probably the most important) is time. Here our variable d can take any moment in time which allows us to specify facts for any moment in time and thereby to model change. With this we would, for example, be able to model that from 1990 to 1999 person A was married to person B, then got divorced and from 2001 until today person A is married to person C. For all moments in time between 1990 and 1999 (an infinite number) we would assert the fact that person A is married to person B and for those moments between 2001 and today we would assert the fact that person A is married to person C.

RDF does not Support the Concept of Dimensions

Now, let’s get back to my claim that RDF does not support the concept of dimensions and give an idea of how a formal proof could be constructed.

Suppose RDF did support the concept of dimensions. In this case an RDF Graph would allow us to store semantically contradicting triples and provide some means of distinguishing between different contexts (dimension values). If we now take all the triples distinguishing between the different contexts out of the graph, thereby creating a true subgraph, the following holds: A model satisfying the original graph does not satisfy the subgraph. Since for the original graph the model has to distinguish between the different contexts, it is not able to make all contradicting facts true simultaniously which would be needed to satisfy the constructed subgraph. This however does violate the so called subgraph lemma (see section Entailment in my RDF Introduction), stating: “A graph entails all its subgraphs.” Therefore the assumption cannot be true and RDF cannot support the concept of dimensions.

Let’s follow the argumentation looking at our example of a discrete dimension:

August is a summer month if you are in the northern hemisphere. If you are in the southern hemisphere it is a winter month.

An RDF Graph describing the above situation would necessarily contain the triples “August is a summer month” and “August is a winter month”. Additional triples would be used to distinguish between the different situations (being in the northern or the southern hemisphere). If we now constructed a subgraph by removing those additional triples, a model for the original graph would not satisfy the constructed subgraph since for the subgraph August must be a winter and a summer month. This would however violate the subgraph lemma, and therefore no RDF Graph can exist that describes the above example.

Why are Dimensions Important?

As we have seen, RDF does not support the concept of dimensions. But why should we care? Are there any good reasons why we should want to model dimensions in RDF? I believe the answer is yes: There are many good reasons, and we should try to incorporate dimensions in RDF.

Take FOAF as an example. FOAF stands for Friend of a Friend and is a widely used ontology for describing people. You could, for example, use FOAF and RDF to say, that there is a person called John Doe and that he is part of the S-A-M-P-L-E project. But what happens if John leaves the project. FOAF and RDF only allow us to say that either John is part of the project, that John is not part of the project or, to make no assertion at all about whether or not John is part of the project.

An odd thing, isn’t it? People change, situations change, the web changes. RDF is static. There is no way of properly modeling change in RDF. Even our very simple example, about August being a summer or a winter month depending on the point of view, cannot be expressed in RDF. You can say, August is a summer month. You can say August is a winter month. You can even say August is a summer and a winter month. However, you cannot say that it depends your location whether August is a summer or a winter month.

In OxPoints we need, for example, to be able to say that a college has changed its name over time. Oxford University has existed for almost 800 years and things have changed. To be able to model Oxford University in RDF, we need RDF to support a time dimension.

An introduction to OxPoints can be found at https://oxforderewhon.wordpress.com/2008/11/18/oxpoints-providing-geodata-for-the-university-of-oxford/.

Named Graphs – The Solution for Discrete Dimensions

I admit, I exaggerated a bit. Things are not really as bad as I outlined them. There is an extension to RDF that allows you to model discrete dimension: Named Graphs.

The idea behind named graphs is rather simple. Instead of one RDF Graph, you create multiple graphs. This allows you to make assertions on the RDF Graphs and since you can have multiple graphs you can easily implement our example: Create one graph for the northern hemisphere and one graph for the southern hemisphere and you’re done.

Even though named graphs are not directly a part of the RDF specifications many RDF tools support the idea in one form or another. Sesame [http://www.openrdf.org/] and Jena [http://jena.sourceforge.net/] for example, two RDF triple stores written in Java, allow you to specify a context for each triple. These contexts are then used to group triples together (thereby creating a named graph). This concept of assigning a context to a triple is often referred to as quads: subject predicate object context.

For more information on named graphs see http://www.w3.org/2004/03/trix/.

Named Graphs – No Solution for Continuous Dimensions

So if there is a solution, what exactly is my problem?

The answer is easy: Named graphs do not work for continuous dimensions. Let’s take time as an example. Should we create one graph per year, one graph per day or one graph per second? It is easy to see, that you end up creating hundreds and thousands of graphs, not really capturing the idea of time, but modelling a discrete subset. Let’s suppose you went for a graph per year. Not only are you unable to say anything about something being valid only for a couple of months. If you realized that you wanted to include a new dimension into your data – let’s only use our simple northern/southern hemisphere example – you would end up doubling the number of graphs you have to maintain. And this just because of a very simple dimension.

It is obvious, apart from not really being able to model continuous dimensions, named graphs would not scale even if you reduced your dimension to a discrete subset.

Conclusion

We have seen that basic RDF is not able to support the outlined concept of dimensions and that the named graph extension is only able to support discrete dimensions. I hope I could convince you, that support for continuous dimensions in RDF would be a very helpful extension, since it would allow us to use RDF to model change.

It believe that we can solve this problem by slightly changing the interpretation of RDF Graphs. I will outline my ideas in Part 2 of this post.

If you have also faced (or solved) this problem, I’d be delighted to hear from you.

5 Responses to RDF and the Time Dimension – Part 1

  1. […] will elaborate on this in my next blogpost (RDF and the Time Dimension). Until then, as a thinking exercise, try to encode the following into RDF: “August is a […]

  2. […] and the Time Dimension – Part 2 In part 1 I claimed that you will run into problems when you try to model dimensional data in RDF: In basic […]

  3. […] RDF and the Time Dimension – Part 1 […]

  4. […] RDF and the Time Dimension Part 1 — in this post the author explains succinctly where the problem lies although the example used is flawed because it contains hidden context (i.e. “August is a summer month…” is not true in general and needs the context “…for those in the Northern Hemisphere”, which can be modelled in RDF). The post also settles on named graphs as a solution but claims they cannot be used for continuous dimensions such as time (missing the solution of using something like OWL-Time to represent intervals and relative timings). […]

  5. […] RDF and the Time Dimension Part 1 — in this post the author explains succinctly where the problem lies although the example used is flawed because it contains hidden context (i.e. “August is a summer month…” is not true in general and needs the context “…for those in the Northern Hemisphere”, which can be modelled in RDF). The post also settles on named graphs as a solution but claims they cannot be used for continuous dimensions such as time (missing the solution of using something like OWL-Time to represent intervals and relative timings). […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: