Tuesday, July 19, 2011

Abiword & RDF Gusto

I recently blogged about updating Calligra to improve its RDF support and bring back support for viewing and editing location information using Marble. A computer loves RDF because it is nice and verbose and allows low level unambiguous expression of semantics in a format that a machine can work with. For a human however, some might find having long descriptors, schemas and the like just to say "meet me at the Mall" a little tedious. One of the many challenges that I see for office applications wanting to offer RDF to the user is making it visible in a subtle way.

Abiword can now colour code parts of the document which have RDF associated with them and tell you how that association is formed, and how much RDF is linked at any point. In the below, the purple text has some RDF associated, and the purple "Mark" I have the mouse pointer on so it shows the bubble text letting you know how the RDF is attached and how much of it there is.

In the future I of course want to let you know more; is the RDF location, contact, event, or related to another domain. It would also be nice to highlight RDF only for types. So, for example if you are interested only in the times that trains leave then highlight departure logistics in bold red. The computer knows what you mean too, so might also want to offer a menu button to check if the train is on time or not.

Being able to highlight like this is a good start because it allows users who are unfamiliar with the document the chance to know exactly where there might be RDF "hiding".

Friday, July 15, 2011

Calligra & RDF Smaak

As some readers will already know, the ODF document standard has support for including one or more RDF/XML files. I have made a few such files, and will grow that collection over time. The weekend hike document cites a few people, links some to a location and also cites a time and place. While one could just say "Dom Plein" or "11 am Wednesday" these purely text references are subjective and require a human to read them.

On the other hand, the location could have an exact bounding polygon or point with digital longitude and latitude information. A computer is all to happy to use that precise description to offer maps and "how to get there from here" type information. And the 11am is of course dubious because it doesn't link to a timezone, a human will know we are not talking UK time or the ACDT timezone. But that requires inference from the cited location and knowledge of which Plein that is, or rather, which timezone that Plein is in.

I did a little hacking to freshen up some of the RDF code in Calligra and bring back optional support for using Marble to show and edit location information. The below is the weekend hike example from the github above. James, Joyce, and Mark all have contact RDF associated, with Mark also giving his location. The "next weekend" at the end of the first paragraph has both a time and place associated. In the screenshot I've opened up the place to edit it. Rather simple to drag a map around and click OK than to know the digital coordinates right of the top of your head ;)


I was hacking on fixing the same bug as boemann on #calligra, which is strange for me as I normally don't overlap on things. It seems along the way setCanvas() was called again so I removed my explicit view passing stuff in the update I just git pushed. It looks like the docker was fixed correctly by somebody else in the end. My SPARQL updates should still help. It has been a while since I hacked on this codebase, and I have to thank the Calligra guys for being so welcoming and having such a fun contribution environment! I'm fortunate to be able to hack on two projects, Abiword and Calligra, which are both so welcoming :)

Monday, July 11, 2011

RDF in ODF: Abiword & Calligra


RDF has been slowly making it's way into Office applications. The ODF standard includes support for shipping RDF/XML file(s) inside the zip file that is an odt file. This RDF can also be linked to particular part(s) of the document text so that you and your computer both know where the RDF is most relevant. For example, if "Fred" in the document has his phone number, location, and cake preference in RDF, that can all be linked just to the four characters "Fred" so that it all makes sense. Strange as it might be, not everybody likes Baumkuchen, and it is fairly likely not to be relevant to a stock quote in another part of the document.

RDF has spread to OpenOffice, abiword, KOffice, and Calligra. All of these applications can read and write RDF in text documents. The later two also include a GUI to allow you to query, inspect, and update the RDF. Since I'm hacking on abiword, I've been throwing around how to best expose RDF to the person using abiword for document editing...

First, this is what Calligra does. The main document window includes an RDF docker which shows you the high level "Semantic objects". These are things which make use of many RDF triples to present a single object type such as a contact, calendar event, location, or explicit train trip. Note that the RDF docker only shows you the semantic objects for the RDF which is relevant to the current document cursor position.

The Document Information window also lets you get at all of the RDF which ships with an ODF file. The Semantic tab is very similar to the RDF docker but shows all the Semantic Objects regardless of where they are relevant in the document (if at all). As you can see below, editing the "Dan" person semantic object you can set their name, nickname, phone number, and homepage. Of course, more information is relevant to people and this whole section should be expanded to cater for that. And yes, for Calligra having a good hookup to Akonadi would be of great use for all.

Contacts use the FOAF RDF schema in Calligra. This allows not only contact information but also the relations between contacts to be expressed. FOAF is about Friends of a Friend after all. Looking at the above you might think name, phone etc are each going to be a triple in the RDF from the document. The triples tab lets you get at that lower level RDF goodness as shown below. A few things to note; while RDF is triples, each object has a type (is it a chunk of text or a link to another subject), and since there are many possible "files" the RDF/XML came from that is tracked for each triple so they can go back there too on save.


Notice in the above that prefixes are included in the subject, predicate, and object columns. This is an attempt to make the raw RDF less verbose and somewhat simpler to handle. The namespaces tab lets you set these up. Any namespaces that are used in the RDF/XML from the ODF file are automatically added and used for you.

The stylesheets tab I'll cover at another time. The SPARQL tab lets you run a query against all the RDF for the document. The one I've run here is the default one that Calligra shows you, which will select all the triples from the document without restriction. The subject, predicate, and object resulting are shown in the bottom half of the window.


I was thinking about all of this recently because I'm now looking to add GUI stuff to abiword to allow RDF interaction. The first idea was to simply add a "edit RDF" context menu item to allow you to associate one or more triples with the cursor position or current selection. The ability to define and reuse namespaces would also help to make such a dialog less painful to use. This brings the design close to the combined "Triples" and "Namespaces" tabs of the Calligra Document Information window. This might be OK for determined users who already really, really know they want to do these things. But I tend to think there are more folks who could take advantage of using RDF but not necessarily care about it.

Simplicity for users was the driving force behind the design of Semantic Objects and the use of Drag and Drop to and from other applications to create and harvest RDF data. I think it is much simpler to grab the "Fred" contact from Evolution and drop it into the document than to work out that you want to use FOAF and the exact predicates to create a well defined RDF graph for the Fred contact and then copy and paste each of those pieces of data individually.

One might like to consider the Triples+Namespaces as a special type of Semantic Object, a "raw" object if you will. This brings together the design of the advanced and user friendly interaction into a single dialog. As the namespaces are likely to have whole document scope they can be setup and edited elsewhere. Unfortunately I had a bit of trouble working out how to populate a tree or list in glade-2 or glade-3 for mock ups, so these are gimped a bit too.

The dialog below is a semantic object editor with the advanced tab allowing raw interaction. As there can be zero or more semantic objects of a given type in scope at any point there is a list on the left side allowing you to choose which object of a type to view. Perhaps that should be a drop down list at the top of the tab to save screen space.

The email and VoIP links should start a new message or request a phone call with the person respectively. Such actions should also be available without getting to the editor itself. My current plan is to have the advanced tab allow interaction with the raw triples. Remember though that triples carry type information, possibly extra context, and/or perhaps a range of the revisions in a change tracked document that the triple is valid for. So its by no means just a list with three columns as the name triples might at first imply.

A somewhat problematic first blush at this gives the below. I'm thinking that the subj, pred, and object strings can be namespace:foo strings, possibly with some completion for known namespaces like foaf, et al. The type is fairly OK as these are fixed and mainly URI or Object.


The revision range selection is a real challenge. This might become some sort of date range bar line the timeline or timeplot from the simile widgets. The trick as is usual is extrapolating the extra dimension from what is in it's vanilla sense a linear one dimensional data set ( time, revision ). Though having the revisions and their descriptions in the top half of the timeline and the ability to pan and zoom seeing a density plot in the lower half would work for starters.


I'm thinking that as well as showing you all the triples that maybe allowing simple one or two line SPARQL to be run to find the triples to edit would be preferable. Perhaps it doesn't add much for a small document range with only 20 triples associated, but to use the dialog on the whole document too, you might want to limit triples to "current revision" and foaf related only for example. Using a triple list allows you to sort by column and search, but such a search could also be performed with relatively simple SPARQL. And normally, and extremely unfortunately, one normally doesn't get to stable sort lists by 2+ columns. A limitation I try to avoid inflicting.

So in summary, raw triple editing can be just an advanced semantic object. The list of semantic objects should be able to be found from a document position (cursor) or arbitrary begin-end range. The later catering for whole document RDF editing as a special case. For contacts there might be one or more semantic objects for any doc position or range, but there will only be one raw-triple semantic object for any range.

Though I'm still chucking around how to make the query/edit part most convenient for users for the raw triples semantic object.