Building A Semantic Web: Interview with Benjamin Nowack
4 Comments
by Rob Olson on August 26, 2008

Benjamin NowackLast month when we released the CrunchBase API, Benjamin Nowack came to our attention when he developed Semantic CrunchBase, a RDF/SPARQL interface to CrunchBase. Since then he has remained an active user of the CrunchBase API and last week released a Twitter bot that responds to commands with CrunchBase info.

Nowack runs a small web agency that focuses on combining mainstream website creation with Semantic Web technologies. In addition, he works as a contractor for early adopters in that area and maintains an open source RDF toolkit for LAMP environments. Through his efforts he hopes to get the SemWeb agency market get off the ground.

For us, the Semantic Web is terra incognita. Eager to find out more about it, we contacted Nowack and asked him a few questions about Semantic CrunchBase and the Semantic Web.

CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released Semantic CrunchBase. Can you explain what Semantic CrunchBase is and what inspired you to create it?

Nowack: The graph-shaped CrunchBase data is ideal for showing that there is more (or rather *less*) to the Semantic Web than “AI on the Internet”. One of its core benefits is simplified data repurposing, plus the ability to extend applications at run-time. For Semantic CrunchBase, I’ve created machine-readable descriptions of all CrunchBase items, and also machine-readable links between related items (This process could be fully automated, thanks to the nice design of your API). Once we move from a Website of linked *pages* to a graph of linked *data objects* (and crunchbase.com is already pretty close), lots of new possibilities arise. Semantic CB allows the CrunchBase dataset to be explored and filtered using a faceted browser, there is a SPARQL endpoint for arbitrary graph queries, and a tool to define custom API methods which can integrate related Web data (such as the job feed from CrunchBoard, or dbpedia, a SemWeb version of Wikipedia).

CrunchBase: Do you know of any apps that are using Semantic CrunchBase to enhance their functionality?

Nowack: Only a few experimental ones. There was a short thread on the mailing list about using the SPARQL endpoint to extract social graph fragments from CrunchBase. SWSE, a semantic search engine, is experimenting with the data created myself is a Twitter bot that can answer questions such as “Founder of Flickr”.

CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?

Nowack: It was a trap! I was tricked into this whole SemWeb stuff in 2003 when I was looking for a topic for my diploma thesis. I read TimBL’s Weaving the Web where he explains the Semantic Web idea, and it all sounded like a great area to explore. However, there were hardly any toolkits for mainstream coders back then, so I started to write my own. And it took a while to realize that there is absolutely no need to implement all the specifications the SemWeb community comes up with every month. After figuring out which technologies to use and which ones to skip, I got pretty excited about RDF for website development, especially for small development teams.

CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?

Nowack: The basic ideas behind the Semantic Web are increased content granularity and repurposing of Web data. The goal is to move from a Web of documents to a Web of information items. And with the Resource Description Framework (RDF), we can do just that: Describe things in a more reusable way than with plain HTML, and let software utilize this “High-Resolution Web” (as Twine’s founder Nova Spivack likes to call it). RDF comes with a couple of own data exchange formats (XML and JSON, among others). The essential parts of the framework, however, are a simple, unifying data model (which by the way allows the integration of RSS, Atom, microformats, or other typical Web 2.0 information sources) and a query language, SPARQL. SPARQL is like SQL for the Web. Instead of tables, it joins (possibly distributed) resource descriptions. Think of a database-like interface to the Web. SPARQL also provides a standardized protocol, which enables something we could call “Mashup chaining”: the ability to build on the value created by other mashups, successively. RDF and SPARQL make it almost trivial to open enhanced data to other apps.

RDF and SPARQL are developer-oriented, they should not be exposed to non-tech website visitors directly. Their portability and flexibility *can* be passed through to the UI to a certain extent, though. For example, all filtering options in the faceted browser at Semantic CB are generated by SPARQL operations. These user-driven queries could possibly be ported to another dataset, or a different UI (which is what the Twitter bot is basically doing). Another example is the collection of resource descriptions (similar to RSS), where a website visitor could import or subscribe to very specific data objects. Users of the Operator Firefox plugin can do some of these things with microformats or RDFa (an RDF-in-HTML syntax) already today. I did some tests with a semantic clipboard some time ago. It worked, but introducing new UI patterns is not trivial. For end-users, I don’t expect in-your-face RDF and SPARQL anytime soon.

CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?

Nowack: RDF with its generic data model supports “data first” approaches for Web development. There is no need to define a model or database tables in advance, you can directly start with the app’s UI. The only custom things I needed for the initial Semantic CB were a parser for the API’s JSON, a theme for the site, and HTML templates for the resource views. (Well, and a server, but that’s another story.) Once I had a working prototype online, I could extend the system based on early feedback, without touching the database structure, and at run-time. The data model simply evolves with the app. And with SPARQL, you can access your data more easily than with SQL. The syntax is simple, you don’t have to worry about complex table joins any more (because querying is done on the graph, not on the storage level), and you can always export and reuse the aggregated information, should you want to. RDF is mainly marketed to domains such as Life Sciences or Enterprises, but I personally think there is an equally large potential for Web agencies and startups where a reduced time-to-market affects customer satisfaction and success. Some people have started work on an RDF toolkit for Ruby, it could be interesting to see that combined with an agile framework like Rails one day.

CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?

Nowack: I’m not a fan of version numbers (TimBL would probably consider the Semantic Web as Web 1.0, as it’s close to his initial vision). But in the context of continued progress (the time after centralized social networks, incompatible data portability “standards”, and overly generic RSS feeds) I agree with Nova’s statement. Semantic Web technologies enable flexible remixing of information on the Web. When we waste less energy on the “how”, we can put more focus on the “what”, try more things at lower costs, and accelerate (and even distribute) innovation. The RDF community has still some work to do with regard to attracting (and listening to) the larger Web community. But many specs and toolkits are still evolving and pragmatic contributors are clearly welcome.

Thank you to Benjamin Nowack for taking the time to answer our questions.

Track Changes With The CrunchBase Edit Timeline
4 Comments
by Henry Work on August 21, 2008

One of the biggest pieces of feedback we get on are people wondering what’s happened to their edits they’ve made.  And we’re the first to admit: our user edit process still has a long way to go. But today we’re happy to announce a nice new way of keeping of the queue of edits made to the site, including your own: The CrunchBase Edit Timeline. Great title, right?

The timeline will show you a reverse-chronological, paginated list of all the edits made to the site (all 77,047 of them). We’re also flashing some edit stats on the sidebar; there have been 581 edits to the site today, 1852 this week, and 9431 this month (thanks to the ActiveSupport CoreExtension Calculations for making these easy).

Another good way to find out when your edits get applied is to subscribe to a company (or person or any other entity) RSS feed. Check out this earlier article on revision RSS feeds.

Rails link_to Weirdness Inside Namespaces
4 Comments
by Rob Olson on August 21, 2008

On our TechCrunch Elevator Pitches site we have an admin interface that lives inside of a “admin” namespace. The route declaration in routes.rb looks like this:

map.namespace(:admin) do |admin|
  admin.resources :videos, :member => {:update_status => :put}
  admin.resources :comments
  admin.root :controller => "videos"
end
 
map.connect "logged_exceptions/:action/:id", :controller => "logged_exceptions"

We also use the Exception Logger plugin to track exceptions. So we have a route for that.

I ran into trouble in the admin/videos/index.html.erb view when I attempted to link outside the admin namespace to the logged_exceptions page. This is what I was trying to do that doesn’t work:

<%= link_to "Exceptions", :controller => "logged_exceptions" %>

The code above creates the following url: http://foo.com/admin/logged_exceptions. But what I needed is this: http://foo.com/logged_exceptions. This problem has not come up before because we normally use named routes which will resolve to the correct route regardless of the current namespace.

The solution is really simple but took me a while to figure out. To explicitly direct Rails to look for the controller at the site root place a “/” before the controller name. The correct link_to statement looks like:

<%= link_to "Exceptions", :controller => "/logged_exceptions" %>

Hopefully this helps anyone in the same situation.

Calling All Ruby Developers: We’re Hiring
19 Comments
by Henry Work on August 13, 2008

Here at TechCrunch HQ, we’re looking to add a couple fellow Rubyists to help us build out CrunchBase, our pride and joy. Being a TechCrunch developer is a pretty sweet gig: we work on a technically interesting, growing structured wiki, attend a lot of startup events (including our own), and get to meet and partner with cool companies within the startup ecosystem. We also work with great tools (Ruby, Rails, RSpec, Git, etc).

So check out our official job description below and apply by emailing Gené if you’re interested in joining our small team.


Want to work for TechCrunch?

Founded on June 11, 2005, TechCrunch, is a weblog dedicated to obsessively profiling and reviewing new Internet products and companies. Today TechCrunch is the most popular technology weblog on the Internet and is ranked #2 on the Technorati 100.

TechCrunch is building a small but intense team of web developers to work on CrunchBase, our online database of startup, investor and entrepreneur information. CrunchBase attempts to structure the world of tech companies; it aggregates funding, acquisitions, products, people, investors, and offices via mashups and user-submitted data. We’re all about opening up our data as much as possible; we recently launched an API that’s taking off and gives developers easy integration and complete access to CrunchBase. Since its inception, CrunchBase has grown into one of the largest structured wiki deployments on the net (and unofficially one of the top 50 trafficked Rails sites).

What’s it like working for TechCrunch?

TechCrunch is very much a startup. The culture is fast paced and dynamic with a significant amount of exposure to other startups in the technology industry. We throw big events including movie screenings, our annual August Capital party, and TechCrunch50.

As for development, we work with Ruby and we work with Rails. We use TextMate, rSpec, Capistrano, Git, GitHub, Lighthouse and we practice ‘agile web development’. We eat DRY code for breakfast, write specs in the afternoon, and deploy new stuff at night.

View the full job listing on CrunchBoard.

CrunchBase Team Interviewed For FiveRuns’ TakeFive
2 Comments
by Henry Work on August 1, 2008

We did a fun interview for the famous FiveRunsTakeFive series. Check it out here.

New Stock Chart Widget From Wikinvest
1 Comment
by Henry Work on July 31, 2008

Wikinvest, a wiki company that does a lot of cool things with investments, just released an embeddable, interactive stock chart widget today. We’ve been looking for a widget like this for a while (kind of like compete/quancast graphs but for stocks), so when we saw it we naturally had to add it for all of our public companies as quickly as possible. See Amazon’s or Google’s page to check it out. When Wikinvest itself goes public, its own widget will show up on its CrunchBase page — and that will be truly awesome.

CrunchBase Now With Full Revision History, Real Diffing, And RSS Feeds
1 Comment
by Henry Work on July 30, 2008

Today we’re exposing the complete revision history of all the edits made on CrunchBase, along with some cool ways of visualizing this historical data.

Revisions
CrunchBase revision history

Each CrunchBase entity page now has revisions — subpages where you can browse the edit history of a particular entity (say, Facebook’s edit history).  From the revisions page you can view revision pages which show what an entity page looked like at a historical point in time. For example, you can see what Facebook page looked like on March 19th, 2008 when it had its fifth edit. You can also step through the revision pages like a book, seeing how the page evolved over time.

Comparing Revisions

facebook diff picture

Something we’re particularly excited about are diff pages, which offer visual comparisons of two revisions of an entity. In obligatory red and green colors, diff pages highlight the sections that were present in the old version (red) and the ones that have been changed in the new (green).

You can find the diff pages from the revisions page by clicking on the date and time of an edit. Like revision pages, you can step through the diff pages to gain an historical appreciation of user edits. Also, comparison between any two arbitrary revisions in time is a snap (here’s the Facebook’s diff of revisions 5 and 74).

RSS Feeds

RSS Feeds! With an edit history, we figured we should generate a feed for entities so that users can receive notice when a page gets edited. If you go to an entity’s revisions page, you’ll see a Subscribe via RSS link in the top right-hand corner (see image, left). Currently, the feed items include the time, user, and a link to the diff page (”see what’s changed”), but we hope to make them more useful in the future. If you want to keep close tabs on your company’s (or your own) page, this is definitely the easiest way to do so.

Great Apps Using the CrunchBase API
by Rob Olson on July 27, 2008

Since launching the CrunchBase API less than two weeks ago we’ve seen a great response from developers, who have already developed a number of impressive plugins and applications. The CrunchBase API offers access to information from thousands of tech companies, VCs and startup entrepreneurs. It’s free to use, there are no accounts to sign up for and no request throttling. The API returns clean, pretty-printed JSON, and only basic attribution is required.

To learn more, read the rest of the post and follow the discussion at TechCrunch.

New API Features: List, Search, and Callbacks
2 Comments
by Mark McGranaghan on July 17, 2008

We’ve just rolled out three new features for the CrunchBase API, all implemented based on feedback from our early users.

The first is the new “list” action that returns the name and permalink for all entities in CrunchBase of a certain type. For example:

http://api.crunchbase.com/v/1/companies.js

The second new feature is a “search” action. To search across CrunchBase for entities matching a given keyword or keywords, use:

http://api.crunchbase.com/v/1/search.js?query=techcrunch

The third feature is JavaScript callbacks, which are enabled for both the exisiting “show” API action and the new “search” action. For example:

http://api.crunchbase.com/v/1/search.js?query=techcrunch&callback=callme

This request returns JavaScript that will call the function callme with the API data as a single argument.

Be sure to check out the Google Group for complete documentation and mailing list information.

Ruby JSON Pretty-Printer for the CrunchBase API
3 Comments
by Mark McGranaghan on July 15, 2008

We’ve recently been working on the CrunchBase API. To encourage API use, we want to make it as easy as possible for users to access our data. An important part of this is strategy is providing easy-to-read JSON output, which we accomplish with our now open source Ruby JSON pretty-printer library.

JSON is a lightweight and web-friendly data exchange format that we generally prefer to XML and YAML. However, we are not happy with the difficulty of reading the default ActiveSupport to_json output. For example, this is some typical output (some data attributes have been omited to save space):

{"permalink":"techcrunch","products":[{"permalink":"techcrunch","name":
"TechCrunch"},{"permalink":"crunchgear","name":"CrunchGear"},{"permalink":
"crunchbase","name":"CrunchBase"}],"relationships":[{"is_past":false,"title":
"Founder and Co-Editor","person":{"permalink":"michael-arrington","first_name":
"Michael","last_name":"Arrington"}},{"is_past":false,"title":"CEO","person":
{"permalink":"heather-harde","first_name":"Heather","last_name":"Harde"}}],
"homepage_url":\\"http:\/\/www.techcrunch.com", "name":"TechCrunch"}

That looks pretty bad to us, and we think it will deter potential API users. We want a user to be able to come to our CrunchBase API help page, click on one of the example API urls, and see in their browsers a nicely-formated and easily-readable JSON response. Something like this:

{"name": "TechCrunch",
 "permalink": "techcrunch",
 "homepage_url": "http://www.techcrunch.com",
 "products":
  [{"name": "TechCrunch",
    "permalink": "techcrunch"},
   {"name": "CrunchGear",
    "permalink": "crunchgear"},
   {"name": "CrunchBase",
    "permalink": "crunchbase"}],
 "relationships":
  [{"is_past": false,
    "title": "Founder and Co-Editor",
    "person":
     {"first_name": "Michael",
      "last_name": "Arrington",
      "permalink": "michael-arrington"}},
   {"is_past": false,
    "title": "CEO",
    "person":
     {"first_name": "Heather",
      "last_name": "Harde",
      "permalink": "heather-harde"}}]}

Not finding any existing Ruby JSON pretty-printers on Google or GitHub, we wrote our own. The new JsonPrinter exposes a single class method render, which return a JSON representation of any given object consisting of arrays, hashes, symbols, strings, numbers, and false, true, and nil values.

The printer uses a simple but effective rendering algorithm. In addition to managing whitespace, the printer recognizes ordered hashes, which is nice when you’d prefer certain attributes like “name” and “permalink” to appear at the top of the output. Finally, our benchmarks indicate that the printer is faster than the JSON gem’s pure Ruby implementation.

You can see some live examples at these urls:

api.crunchbase.com/v/1/company/facebook.js
api.crunchbase.com/v/1/person/brad-fitzpatrick.js

Check out our JsonPrinter project page on GitHub and see our CrunchBase API announcement post.