Wednesday, December 29, 2004

What's in a name - Part 2

How to give a resource a name, an URI? Let's look at this statement:

movie:Terminator dc:creator "James Cameron".

Happy with that? This is a valid RDF statement, and you understand what I wanted to say, and your RDF machine will be able to read and process it, too, so everything is fine.

Well, almost. movie:Terminator is a QName, and movie: is just a shorthand prefix, a namespace, that actually has to be defined as something. But as what? URIs are well-defined, so we shouldn't just define the namespace arbitratily. The problem is, someone else could do the same, and suddenly, one URI could denote two different resources - this is called URI collision, and it is the next worst thing to immanentizing the Eschaton. That's why you should grab some URI space for yourself and there you go, you may define as many URIs there as you like (remember, the U in URI means Universal, that's why they make such a fuss about the URI space and ownership of it).

I am the webmaster of http://semantic.nodix.net, and the URI belongs to me and with it, all the URIs starting with it. Thus I decide, that movie: shall be http://semantic.nodix.net/movie/. Our example statement thus is the same as:

http://semantic.nodix.net/movie/Terminator http://purl.org/dc/elements/1.1/creator "James Cameron".

So this is actually what the computer sees. The short hand notation above is just for humans. But if you're like me, and you see the above Subject, you're already annoyed that it is not a link, that you can't click on it. So you copy it into your browser address bar, and go to http://semantic.nodix.net/movie/Terminator. Ups. A 404, the website is not found. You start thinking, oh man, stupid! Why you giving the resource such a name that looks so much like an web address, and then point it to 404-Nirvana?

Many think so. That's because they don't grasp the difference between URIs and URLs, and to be honest, this difference is maybe the worst idea the W3C ever had (that's a hard-to-achieve compliment, considering the introduction of XML/RDF-serialisation and XSD). We will return to this difference, but for now, let's see what usually happens.

Because http://semantic.nodix.net/movie/Terminator leads to nowhere, and I'm far too lazy to make a website for the Terminator just for this example, we will take another URI for the movie. Jumping to IMdb we quickly find the appropriate one, and then we can reformulate our statement:

http://www.imdb.com/title/tt0088247/ http://purl.org/dc/elements/1.1/creator "James Cameron".

Great! Our subject is a valid URI, clicking on it (or pasting it to a browser) will tell you more about the subject, and we have a valid RDF statement. Everything is fine again...

...until next time, where we will discuss the minor problems of our solution.

Tuesday, December 28, 2004

What's in a name - Part 1

There are tons of mistakes that may occur when writing down RDF statements. I will post a six part series of blog entries, starting with this one, about what can go wrong in the course of naming resources, why it is wrong, and why you should care - if at all. I'll try to mix experience with pragmatics, usability with philosophy. And I surely hope that, if you disagree, you'll do so in the comments or in your own blog.

The first one is the easiest to spot. Here we go:

"Politeia" dc:creator "Plato".

If you don't know about the differences between Literals, QNames and URIs, please take a look at the RDF Primer. It's easy to read and absolutely essential. If you know about the differences, you already know that the above said actually isn't a valid RDF statement: you can't have a literal as the subject of a statement. So, let's change this:

philo:Politeia dc:creator "Plato".

What's the difference between these two? In the first one you say that "Plato" is the creator of "Politeia" (we take the semantics of dc:creator for granted for now). But in the second you say that "Plato" is the creator of philo:Politeia. That's like in Dragonheart, where Bowen tries to find a name for the dragon because he can't just call him 'dragon', and he decides on 'draco'. The dragon comments: "So, instead of calling me dragon in your own language, you decide to call me dragon in another language."

Yep, we decide to talk about Politeia in another language. Because RDF is another language. It tries to look like ours, it even has subjects, objects, predicates, but it is not the language of humans. It is (mostly) much easier, so easy in fact even computers can cope with it (and that's about the whole point of the Semantic Web in the first place, so you shouldn't be too surprised here).

"Politeia" has a well defined meaning: it is a literal (the quotation marks tell you that) and thus it is interpreted as a value. "Politeia" actually is just a word, a symbol, a sign pointing to the meant string Plato (a better example would be: "42" means the number 42. "101010binary", "Fourty-Two" or "2Ah" would have been perfectly valid other signs denoting the number 42).

And what about philo:Politeia? How is it different from "Politeia", what does this point to?

philo:Politeia is a Qualified Name (QName), and thus ultimatively a short-hand notation for an URI, an Unified Resource Identifier. In RDF, everything has to be a resource (well, remember, RDF stands for Resource Description Framework), but that's not really a constraint, as you may simply consider everything a resource. Even you and me. And URIs are names for resources. Universally (well, at least globally) unique names. Like philo:Politeia.

You may wonder about what your URI is, the one URI denoting you. Or what the URI of Plato is, or of the Politeia? How to choose good URIs, and what may go wrong? And what do URIs actually denote, and how? We'll discuss this all in the next five parts of this series, don't worry, just stay tuned.

Monday, December 27, 2004

Why we will win

People keep saying that the Semantic Web is just a hype. That we are just an unholy chimaera of undead AI researchers talking about problems solved by the database guys 15 years ago. And that our work will never make any impact in the so called real world out there.

As I stated before: I'm a believer. I'm even a catholic, so this means I'm pretty good at ignoring hard facts about reality in order to stick to my beliefs, but it is different in this case: I slowly start to comprehend why Semantic Web technology will prevail and make life better for everyone out there. It' simply the next step in the IT RevoEvolution.

Let's remember the history of computing. Shortly after the invention of the abacus the obvious next step, the computer mainframe, appeared. Whoever wanted to work with it, had to learn to use this one mainframe model (well, the very first ones were one-of-a-kind machines). Being able to use one didn't necessarily help you using the other.

First the costs for software development were negligible. But slowly this changed, and Fred Brooks wrote down his experience with creating the legendary System/360 in the Mythical Man-Month (a must-read for software engineers), showing how much has changed.

Change was about to come, and it did come twofold. Dennis Ritchie is to blame for both of them: together with Ken Thompson he made Unix, but in order to make that, he had to make a programming language to write Unix in, this was C, which he made together with Brian Kernighan (this account is overly simplified, look at the history of Unix for a better overview).

Things became much easier now. You could port programs in a simpler way than before, just recompile (and introduce a few hundred #IFDEFs). Still, the masses used the Commodore 64, the Amiga, the Atari ST. Buying a compatible model was more important than looking at the stats. It was the achievement of the hardware development for the PC and of Microsoft to unifiy the operating systems for home computers.

Then came the dawning of the age of World Wide Web. Suddenly the operating system became uninteresting, the browser you use was more important. Browser wars raged. And in parallel, Java emerged. Compile once, run everywhere. How cool was that? And after the browser wars ended, the W3Cs cries for standards became heard.

That's the world as it is now. Working at the AIFB, I see how no one cares what operating system the other has, be it Linux, Mac or Windows, as long as you have a running Java Virtual Machine, a Python interpreter, a Browser, a C++ compiler. Portability really isn't the problem anymore (like everything in this text, this is oversimplified).

But do you think, being OS independent is enough? Are you content with having your programs run everywhere? If so, fine. But you shouldn't be. You should ask for more. You also want to be independent of applications! Take back your data. Data wants to be free, not locked inside an application. After you have written your text in Word, you want to be able to work with it in your Latex typesetter. After getting contact information via a Bluetooth connection to your mobile phone, you want to be able to send an eMail to the contact from your web mail account.

There are two ways to achieve this: the one is with standard data formats. If everyone uses vCard-files for contact information, the data should flow freely, shouldn't it? OpenOffice can read Word files, so there we see interoperability of data, don't we?

Yes, we do. And if it works, fine. But more often than not it doesn't. You need to export and import data explicitly. Tedious, boring, error prone, unnerving. Standards don't happen that easily. Often enough interoperability is achieved with reverse engineering. That's not the way to go.

Using a common data model with well defined semantics and solving tons of interoperability questions (Charset, syntax, file transfer) and being able to declare semantic mappings with ontologies - just try to imagine that! Applications being aware of each other, speaking a common language - but without standard bodies discussing it for years, defining it statically, unmoving.

There is a common theme in the IT history towards more freedom. I don't mean free like in free speech, I mean free like in free will.

That's why we will win.

Sunday, December 19, 2004

I am weak

Basically I was working today, instead of doing some stuff I should have finished a week ago for some private activities.

The challenge I posed myself: how semantic can I already get? What tools can I already use? Firefox has some pretty neat extensions, like FOAFer, or the del.icio.us plugin. I'll see if I can work with them, if there's a real payoff. The coolest, somehow semantic plugin I installed is the SearchStatus. It shows me the PageRank and the Alexa rating of the visited site. I think that's really great. It gives me just the first glimpse of what metadata can do in helping being an informed user. The Link Toolbar should be absolutely necessary, but pitily it isn't, as not enough people make us of HTMLs link element the way it is supposed to be used.

Totally unsemantic is the mouse gestures plugin. Nevertheless, I loved those with Opera, and I'm happy to have them back.

Still, there are such neat things like a RDF editor and query engine. Installed it and now I want to see how to work with it... but actually I should go upstairs, clean my room, organise my bills and insurance and doing all this real life stuff...

What's the short message? Get Firefox today and discover its extensions!

Wednesday, December 15, 2004

Imagine there's a revolution...

... and no one is going to it.

This notion sometimes scares me when I think abou the semantic web. What if all this great ideas are just to complex to be implemented? What if it remains an ivory tower dream? But, on the other hand, how much pragmatism can we take without loosing the vision?

And then, again, I see the semantic web working already: it's del.icio.us, it's flickr, it's julie, and there's so much more to come. The big time of the semantic web is yet to come, and I think none of us can really imagine the impact it is going to have. But it will definitively be interesting!

Friday, December 03, 2004

AcceLogiChip

Accelerated logic chips - that would be neat.

The problem with all this OWL stuff is, that it is computationally expensive. Google beats you in speed easily, having some 60.000 PCs or so, but indexing some 8 billion web pages, each with maybe a thousand words. And if you ever tried Googles Desktop Search, you will see they can perform this miracles right on your PC too! (Never mind that there are a dozen tools doing exactly the same stuff Googles Desktop Search does, just better - but hey, they lack the name!)

What does the Semantic Web achieve? Well, ever tried to run a logic inferencing engine with a few million instances? With a highly axiomatized TBox of, let's say, just a few thousand terms? No? You really should.

Sure, our PCs do get faster all the time (thanks to Moores Law!), but is that fast enough? We want to see the Semantic Web up and running not in a few more iterations of Moores Law, but much, much earlier. Why not use the same trick graphic magicians did? Highly specialized accelerated logic chips, things that can do your tableu reasoning in just a fraction of the time needed with your bloated all-purpose-CPU.

Thursday, December 02, 2004

World Wide Prolog

Today I had the idea - maybe this whole Semantic Web idea is nothing else than a big worldwide Prolog program. It's the AI researchers trying to enter the real world through the W3Cs backdoor...

No, really, think about it: almost all most people do with OWL is actually some logic programing. Declaring subsumptions, predicates, conjunctions, testing for entailment, getting answers out of this - but on a world wide scale. And your browser does the inferencing for you (or maybe the server? Depends on your architecture).

They are still a lot of questions open (and the actual semantic differences between Description Logics, and Logic Programming surely ain't the smalles ones of them), like how to infere anything with contradicting data (something that surely will happen in the World Wide Semantic Web), how to treat dynamics (I'm not sure how to do that without reification in RDF), and much more. Looking forward to see this issues resolved...