Awesome Open Source
Awesome Open Source

Mundaneum

This is a tiny, highly incomplete clojure wrapper around the Wikidata project's massive semantic database. It's named after the Mundaneum, which was Paul Otley's mad and wonderful c. 1910 vision for something like the World Wide Web.

(There's a mini-doc about him and it here.)

Motivation

Wikidata is amazing! And it provides API access to all the knowledge it has collected! This is great, but exploratory programmatic access to that data can be fairly painful.

The official Wikidata API Java library offers a document-oriented interface that makes it hard to ask interesting questions. A better way to do most things is with the Wikidata query service, which uses the standard Semantic Web query language, SPARQL.

The SPARQL query service is nice, but because the WikiData data model must cope with (a) items with multiple names in multiple languages, and (b) single names that map to multiple items, they've used a layer of abstraction by which everything in the DB is referred to by an id that looks like P50 (property number 50, meaning "author") or Q6882 (entity number 6882, the author "James Joyce").

For example, to get a selection of works authored by James Joyce, one would issue a query like:

SELECT ?work
WHERE { ?work wdt:P50 wd:Q6882. } 
LIMIT 10

(Users of Datomic will recognize the ?work style of selector, which is not a coincidence as SPARQL and Datomic were both strongly influenced by Datalog.)

The above query is simple enough, except for the non-human readable identifiers in the WHERE clause, which were both found by manually searching the web interface at Wikidata.

The first order of business was to build a more human-friendly way to specify relationships and entities without leaving my coding environment. The approach I took was:

  • download and reformat the full list of ~2000 properties (fresh as of 2017-04-19), shape them into a map of keyword/string pairs where the keyword is the name of the property and the string is its id, and make a helper function
(property :author)
;;=> "P50"
  • create a helper function that tries to correctly guess the id of an entity based on a string that's similar to its "label" (common name, currently sadly restricted to English in this code)
(entity "James Joyce")
;;=> "Q6882"

;; the entity function tries to return the most notable entity 
;; that matches, but sometimes that isn't what you want.

(describe (entity "U2"))
;;=> "Irish alternative rock band"

;; not the one I meant, let's try with more info:
(describe (entity "U2" :part-of (entity "Berlin U-Bahn")))
;;=> "underground line in Berlin"

This already helps to keep my emacs-driven process running smoothly. The next point of irritation was assembling query strings by hand, like an animal. So I banged together a quick and sloppy DSL similar to the one offered by Datomic. This looks like:

;; what are some works authored by James Joyce?
(query '[:select ?work ?workLabel
         :where [[?work (wdt :author) (entity "James Joyce")]]
         :limit 10])
;; #{{:work "Q864141", :workLabel "Eveline"}
;;   {:work "Q861185", :workLabel "A Little Cloud"}
;;   {:work "Q459592", :workLabel "Dubliners"}
;;   {:work "Q682681", :workLabel "Giacomo Joyce"}
;;   {:work "Q764318", :workLabel "Two Gallants"}
;;   {:work "Q429967", :workLabel "Chamber Music"}
;;   {:work "Q465360", :workLabel "A Portrait of the Artist as a Young Man"}
;;   {:work "Q6511", :workLabel "Ulysses"}
;;   {:work "Q866956", :workLabel "An Encounter"}
;;   {:work "Q6507", :workLabel "Finnegans Wake"}} 

This is actually quite similar to the programmatic query interface I created for the first purpose-built TripleStore around 15 years ago.

This code is much easier to understand if you have some familiarity with SPARQL and how it can be used to query Wikidata. I strongly recommend this introduction to get started. I'm trying to make sure all the examples are easy to translate to the DSL used here.

Condition

This is young code, and the APIs are likely to change in the future. It is presented for entertainment purposes only. The mundaneum.examples namespace is all examples, should you care to have a play.

Enjoy!

License

Copyright © 2016-2019 Jack Rusher. Distributed under the BSD 0-clause license.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
clojure (1,955
dsl (167
sparql (45
wikidata (27

Find Open Source By Browsing 7,000 Topics Across 59 Categories