RDF – an Introduction

After deciding to implement the new OxPoints system with Semantic Web technologies (see OxPoints and the Semantic Web) I started to read up on all I could find on RDF (Resource Description Framework) and related technologies like RDFS and OWL. In particular I was looking for

  • specifications,
  • best practices and
  • reports on projects using RDF.

I was astonished to find that, even though many people talk about RDF, it seems that only very few have actually ever used it (i.e. outside academic studies). Or if they have, they at least did not tell anyone about it.
However, one thing, that I did definitely not expect to find was that there seems to be a fundamental design flaw in RDF. I thought about this a lot, and hope that by blogging about it, you will either tell me, that I am wrong and how to do it right, or that we might find a solution on how to solve the problem.

But before talking about what I think is wrong with RDF and proposing one way to solve that problem (yes, luckily I think there is a solution), we need to establish a common language, which is what I want to achieve with this introduction. If you are already familiar with RDF, you might want to have a look at the sections: Triples are Facts, Reification and Entailment. If you are new to RDF, I hope that this will give you a first start. However, I kept this introduction very short and so many aspects are missing. If you want to learn more about RDF I would recommend you to start with the RDF Primer, the introduction to RDF from the W3C. In most sections I have also linked the specific sections from the RDF Specifications.

I will try to assume as little previous knowledge as possible, but since RDF is not a trivial topic, I have to start somewhere. Basic knowledge of XML and some knowledge of mathematical notation would therefore probably be of help.

RDF (Resource Description Framework)

The Resource Description Framework (or short RDF) is a set of W3C specifications which were first published in 1999 and revised in 2004 (more information on the history of RDF can be found at its Wikipedia page or at the W3C pages on RDF). RDF is “a language for representing information about resources in the World Wide Web” (RDF Primer [http://www.w3.org/TR/REC-rdf-syntax/]).

So what are resources in the World Wide Web?

In RDF a resource is anything that can be uniquely identified by a URI (Uniform Resource Identifier). This can be a blogpost (this blogpost can be identified by the URI: https://oxforderewhon.wordpress.com/2008/11/25/rdf-an-introduction/), a person (I could for example define that the URI http://arno-mittelbach.de/contact#me should be used to identify myself) or even an abstract concept like being a parent of somebody (which could be described, using the FOAF relationship extension, as http://purl.org/vocab/relationship/parentOf). If I have already lost you with using abbreveations like FOAF (which stands for “Friend of a Friend” and is one of many RDF vocabularies) don’t worry. The point here is that a resource can be more or less anything (we just have to make up a URI for it). Once we have our resources, RDF claims to give us a framework with which we are able to represent (any) information about them.

Representing Information

Now, let’s start off by storing some information on this blogpost in RDF:

1) using Turtle (I abbreviated the URIs to fit them in one line)

<http[..]introduction> dc:creator <http[..]contact#me> .

2) using an RDF Graph

RDF Graph

RDF Graph

3) using XML/RDF

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
   <rdf:Description rdf:about="https://oxforderewhon.wordpress.com/
2008/11/25/rdf-an-introductionrdf-an-introduction/">
      <dc:creator rdf:resource="http://arno-mittelbach.de/contact#me"/>
   </rdf:Description>
</rdf:RDF>

These are 3 examples of encoding the same information (that the resource identified by “http[..]introduction” has a creator identified by “http[..]contact#me”) using 3 different syntaxes. The first one using the so called Turtle syntax, the second one using an RDF Graph and the third using XML/RDF.

What we can already see, is that RDF can come in many different forms but that the way in which information is encoded stays the same. RDF is a set of triples, each consisting of a subject, a predicate and an object. In the example the only triple consists of the subject “https://oxforderewhon.wordpress.com/ 2008/11/25/rdf-an-introductionrdf-an-introduction/”, the predicate “dc:creator” and the object “http://arno-mittelbach.de/contact#me&#8221;. Subject, predicate and object are always resources, with one exception: Objects can also be atomic values (called literals). An example for a string literal would be a person’s name (e.g. “Arno Mittelbach”).

More information on basic RDF concepts can be found in section 2 of the RDF Primer [http://www.w3.org/TR/REC-rdf-syntax/#statements].

Triples are Facts

I will henceforth speak of RDF Graphs (being one representation of a set of RDF triples).

In an RDF Graph each triple is making an assertion on a resource, thereby stating, that in any model that satisfies this RDF Graph the subject-predicate-object relationship defined by this triple must be true. This might sound a bit strange at first, but have a look at this example: If our RDF Graph only talks about ants, then a world (or model) in which only ants existed might satisfy all the assertions specifyed in the RDF Graph.

Or if you want to get an example from the world of maths. One model for \forall x \exists y : y > x would be the set of all natural numbers \mathbb{N}, since for each x\in\mathbb{N} there exists a y\in\mathbb{N} such that y > x. Another one would be the set of all real numbers \mathbb{R}. It’s the same with RDF. With each triple you add to your RDF Graph, you put another restriction on those models that satisfy the Graph. In most cases, you will find, that many different models satisfy your Graph. We will use this concept of models in later sections, so don’t worry if at this stage it sounds a bit abstract.

More information on the interpretation of RDF Graphs can be found in section 1.3 in the RDF Semantics document [http://www.w3.org/TR/rdf-mt/#interp].

Blank Nodes

One interesting concept in RDF is the concept of blank nodes which can be used to encode n-ary relationships (with n > 2).

Let’s say we want to represent Sam Sample’s address (Sample Street 1, Sampleville, UK). With simple triples, we are only able to store binary relations, which leaves us with something like this:

Using a string literal to encode the address

Using a string literal to encode the address

But, encoding the address as one string literal looses the information that it actually consists of several parts (in this case: street, city and country). We could therefore create a new resource for Sam’s address (e.g. ex:contactaddresses#Sam) and encode it like this:

Using a named resource

Using a named resource

This is a perfectly fine solution and all the information is there. The only problem is, that we had to make up a new URI for Sam’s address and in many cases you might not want to do that (be it because you do not want to make up a new URI, or you do not want to further constrict your namespace or any other reason). The solution comes to us in the form of blank nodes, which is really just the same thing, but relieves us from having to come up with a URI for the “middle node”:

Using a blank node

Using a blank node

Normally the central node should be left blank, since you only need some form of identifier if you are not going for the Graph syntax, but W3C’s RDF Validator (which I used to create the above images) puts in some automatically generated dummy id (genid:A14041).

More examples and information on blank nodes can be found in section 2.3 of the RDF Primer [http://www.w3.org/TR/REC-rdf-syntax/#structuredproperties].

Reification (Statements about Statements)

In the following examples I will use the Turtle encoding, since reification does not really go well with the Graph representation of RDF. Even though I haven’t really introduced that form of encoding, I am pretty sure that you are able to follow the example.

Let’s say we want to encode that John’s name is in fact John Doe.

ex:contact#John ex:hasName "John Doe" .

But what if we wanted to say that Sam said, that John’s name is John Doe? The solution is to reify the statement. This means, that we create a new resource (in the example a blank node called _:reif1, _: being the syntax for encoding blank nodes in Turtle) that represents the above triple:

_:reif1 rdf:type rdf:Statement .
_:reif1 rdf:subject ex:contact#John .
_:reif1 rdf:predicate ex:hasName .
_:reif1 rdf:object "John Doe" .

There are several concepts in this example to explain. The first triple (_:reif1 rdf:type rdf:Statement) uses a predefined property from the RDF vocabulary to state that the resource identified by _:reif1 is an instance of the class rdf:Statement. I will at this stage not talk about classes and concepts in RDF. If you are familiar with object oriented programming you can think of it as _:reif1 being an instance of the class rdf:Statement but I think a better way of thinking about types might be to say that _:reif1 is conform to the concept called rdf:Statement, which is defined somewhere in the rdf namespace.
The next three lines should look somehow familiar. We said that an RDF triple always consists of a subject, a predicate and an object. These three triples simply say that our statement (identified by _:reif1) has the subject ex:contact#John, the predicate ex:hasName and the object “John Doe”, thereby identifying our initial triple.

Now that we have a resource identifying our triple we are able to say:

ex:contact#Sam ex:hasSaid _:reif1

Let’s go back thinking about models for our RDF Graphs (btw: in RDF terms a Turtle representation of RDF is also a RDF Graph. Confusing, isn’t it?). What exactly does it mean to have reified statements?

Imagine we have the following graph, consisting of only the reified statement, but not the triple as such:

_:reif1 rdf:type rdf:Statement .
_:reif1 rdf:subject ex:contact#John .
_:reif1 rdf:predicate ex:hasName .
_:reif1 rdf:object "John Doe" .
ex:contact#Sam ex:hasSaid _:reif1

This means, that a world (or model) in which a resource identified by ex:contact#Sam said that the resource ex:contact#John has the name “John Doe” would satisfy our RDF Graph. Even if ex:contact#John’s name is in fact “John Miller”. Just having a reified statement therefore does not make any assertion about the underlying statement. Whether that is true or not in our model is irrelevant.

Let’s consider the following graph:

ex:contact#John ex:hasName "John Doe" .
_:reif1 rdf:type rdf:Statement .
_:reif1 rdf:subject ex:contact#John .
_:reif1 rdf:predicate ex:hasName .
_:reif1 rdf:object "John Doe" .
ex:contact#Sam ex:hasSaid _:reif1

In this case, a world (or model) that satisfies the graph needs to have the resources ex:contact#John and ex:contact#Sam, ex:contact#John’s name must be “John Doe” and ex:contact#Sam must have said so.

For further information on reification look at section 3.3.1. in RDF Semantics [http://www.w3.org/TR/rdf-mt/#Reif].

More information on classes and types can be found in the RDF Primer [http://www.w3.org/TR/rdf-primer/] or in the RDF Schema description [http://www.w3.org/TR/rdf-schema/].

Entailment

There is one last concept we need, before being able to tackle “RDF’s fundamental problem” (or what I claim to be RDF’s fundamental problem). The concept of entailment.

Entailment is a term used in logic, describing a relation between two sets of sentences of a formal language, stating that if A \models B (A entails B), then every interpretation that makes every sentence s \in A true will make B true (or in other words every sentence t \in B true).

So, what exactly is that suppose to mean, and what has it to do with RDF?

Let’s substitute the sets of sentences of a formal language with an RDF Graph and the interpretation with a model. Now, this should all sound much more familiar. Let’s say we are having two RDF Graphs A and B, then we say A entails B if and only if all models that satisfy A also satisfy B.

Let’s say we have one set of restrictions saying all cars are green (A) and another set of restrictions saying all cars are green and all motorbikes are red (B), then each model that fulfills all the restrictions of B also fulfills the restrictions of A. We can therefore say B \models A (that is, B entails A).

This brings us to the so called Subgraph Lemma. A subgraph is simply a subset of triples. So if A and B are RDF Graphs then B is a subgraph of A if and only if each triple t in B is also a triple in A (or in other words: \forall t\in B: t\in A).

The Subgraph Lemma now says: “A graph entails all its subgraphs.“.

The proof follows directly from the definition of entailment and subgraphs: If a model satisfies all restrictions (triples) of an RDF Graph, then it will also satisfy the RDF Graph where you have removed some of the restrictions (triples).

More on entailment can be found in section 2 of the RDF Semantics [http://www.w3.org/TR/rdf-mt/#rdfs_entailment].

Conclusion

I have tried to give you a short introduction to RDF (which became much longer than I anticipated) and have only covered a fraction of the RDF specifications. I have left out RDF Schemas and Ontologies and have not said a word about vocabularies. I still hope, that the introduction was understandable and that if you came here without any knowledge of RDF you now have some idea of what it is all about, and if you already knew about the basic concepts that I could give you some new insights (especially about the topics of reification and entailment).

If you have read this post, we are now able to talk about what I think is one of the fundamental problems with RDF and which we have to solve in order to use RDF for OxPoints. I claim, that if you follow the official interpretation given in the RDF specifications that it is not possible to introduce a time dimension (or any kind of continual dimension) into RDF.

I will elaborate on this in my next blogpost (RDF and the Time Dimension). Until then, as a thinking exercise, try to encode the following into RDF: “August is a summer month if you are in the northern hemisphere. If you are in the southern hemisphere it is a winter month.”

As always, I am looking forward to questions, comments and of course, solutions.

2 Responses to RDF – an Introduction

  1. […] and the Time Dimension – Part 1 In RDF – an Introduction I claimed that introducing any kind of continuous dimension (for example, a time dimension) is not […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: