Last month when we released the CrunchBase API, Benjamin Nowack came to our attention when he developed Semantic CrunchBase, a RDF/SPARQL interface to CrunchBase. Since then he has remained an active user of the CrunchBase API and last week released a Twitter bot that responds to commands with CrunchBase info.
Nowack runs a small web agency that focuses on combining mainstream website creation with Semantic Web technologies. In addition, he works as a contractor for early adopters in that area and maintains an open source RDF toolkit for LAMP environments. Through his efforts he hopes to get the SemWeb agency market get off the ground.
For us, the Semantic Web is terra incognita. Eager to find out more about it, we contacted Nowack and asked him a few questions about Semantic CrunchBase and the Semantic Web.
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released Semantic CrunchBase. Can you explain what Semantic CrunchBase is and what inspired you to create it?
Nowack: The graph-shaped CrunchBase data is ideal for showing that there is more (or rather *less*) to the Semantic Web than “AI on the Internet”. One of its core benefits is simplified data repurposing, plus the ability to extend applications at run-time. For Semantic CrunchBase, I’ve created machine-readable descriptions of all CrunchBase items, and also machine-readable links between related items (This process could be fully automated, thanks to the nice design of your API). Once we move from a Website of linked *pages* to a graph of linked *data objects* (and crunchbase.com is already pretty close), lots of new possibilities arise. Semantic CB allows the CrunchBase dataset to be explored and filtered using a faceted browser, there is a SPARQL endpoint for arbitrary graph queries, and a tool to define custom API methods which can integrate related Web data (such as the job feed from CrunchBoard, or dbpedia, a SemWeb version of Wikipedia).
CrunchBase: Do you know of any apps that are using Semantic CrunchBase to enhance their functionality?
Nowack: Only a few experimental ones. There was a short thread on the mailing list about using the SPARQL endpoint to extract social graph fragments from CrunchBase. SWSE, a semantic search engine, is experimenting with the data created myself is a Twitter bot that can answer questions such as “Founder of Flickr”.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Nowack: It was a trap! I was tricked into this whole SemWeb stuff in 2003 when I was looking for a topic for my diploma thesis. I read TimBL’s Weaving the Web where he explains the Semantic Web idea, and it all sounded like a great area to explore. However, there were hardly any toolkits for mainstream coders back then, so I started to write my own. And it took a while to realize that there is absolutely no need to implement all the specifications the SemWeb community comes up with every month. After figuring out which technologies to use and which ones to skip, I got pretty excited about RDF for website development, especially for small development teams.
CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Nowack: The basic ideas behind the Semantic Web are increased content granularity and repurposing of Web data. The goal is to move from a Web of documents to a Web of information items. And with the Resource Description Framework (RDF), we can do just that: Describe things in a more reusable way than with plain HTML, and let software utilize this “High-Resolution Web” (as Twine’s founder Nova Spivack likes to call it). RDF comes with a couple of own data exchange formats (XML and JSON, among others). The essential parts of the framework, however, are a simple, unifying data model (which by the way allows the integration of RSS, Atom, microformats, or other typical Web 2.0 information sources) and a query language, SPARQL. SPARQL is like SQL for the Web. Instead of tables, it joins (possibly distributed) resource descriptions. Think of a database-like interface to the Web. SPARQL also provides a standardized protocol, which enables something we could call “Mashup chaining”: the ability to build on the value created by other mashups, successively. RDF and SPARQL make it almost trivial to open enhanced data to other apps.
RDF and SPARQL are developer-oriented, they should not be exposed to non-tech website visitors directly. Their portability and flexibility *can* be passed through to the UI to a certain extent, though. For example, all filtering options in the faceted browser at Semantic CB are generated by SPARQL operations. These user-driven queries could possibly be ported to another dataset, or a different UI (which is what the Twitter bot is basically doing). Another example is the collection of resource descriptions (similar to RSS), where a website visitor could import or subscribe to very specific data objects. Users of the Operator Firefox plugin can do some of these things with microformats or RDFa (an RDF-in-HTML syntax) already today. I did some tests with a semantic clipboard some time ago. It worked, but introducing new UI patterns is not trivial. For end-users, I don’t expect in-your-face RDF and SPARQL anytime soon.
CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Nowack: RDF with its generic data model supports “data first” approaches for Web development. There is no need to define a model or database tables in advance, you can directly start with the app’s UI. The only custom things I needed for the initial Semantic CB were a parser for the API’s JSON, a theme for the site, and HTML templates for the resource views. (Well, and a server, but that’s another story.) Once I had a working prototype online, I could extend the system based on early feedback, without touching the database structure, and at run-time. The data model simply evolves with the app. And with SPARQL, you can access your data more easily than with SQL. The syntax is simple, you don’t have to worry about complex table joins any more (because querying is done on the graph, not on the storage level), and you can always export and reuse the aggregated information, should you want to. RDF is mainly marketed to domains such as Life Sciences or Enterprises, but I personally think there is an equally large potential for Web agencies and startups where a reduced time-to-market affects customer satisfaction and success. Some people have started work on an RDF toolkit for Ruby, it could be interesting to see that combined with an agile framework like Rails one day.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Nowack: I’m not a fan of version numbers (TimBL would probably consider the Semantic Web as Web 1.0, as it’s close to his initial vision). But in the context of continued progress (the time after centralized social networks, incompatible data portability “standards”, and overly generic RSS feeds) I agree with Nova’s statement. Semantic Web technologies enable flexible remixing of information on the Web. When we waste less energy on the “how”, we can put more focus on the “what”, try more things at lower costs, and accelerate (and even distribute) innovation. The RDF community has still some work to do with regard to attracting (and listening to) the larger Web community. But many specs and toolkits are still evolving and pragmatic contributors are clearly welcome.
Thank you to Benjamin Nowack for taking the time to answer our questions.




