Friday, May 28, 2010

Second FISE Hackathon


At this week's IKS meeting at Paderborn the second FISE Hackathon took place. FISE is an open source semantic engine that provides semantic annotation algorithms like semantic lifting. The actual annotation algorithms are pluggable through OSGi. Existing CMSs can integrate the engine through an HTTP interface (inspired from Solr). Last week, Bertrand gave an introductory talk about FISE that is available online.


There was no explicitly set goal for the second Hackathon. Rather, the existing code base was extended in various different directions. Some examples:

  • a language detection enhancement engine (I am particularly glad to see this - automatic language detection in CMSs is a pet passion of mine)
  • a UI for FISE users that allows humans to resolve ambiguities
  • myself, I coded a JCR-based storage engine for the content and annotations

There was also a good amount of work done on the annotation structure used by FISE and documented on the IKS wiki.

A complete report of the Hackathon is available on the IKS wiki (the only thing it fails to mention: the event's good spirit).

One major non-code step was to get many participants up to speed with the FISE engine and enable them to deploy the engine as well as get accustomed with the architecture and code base.

It was only last week that I took a deeper look into FISE. I like its architecture a lot. The HTTP interface makes it easy to play with FISE as well as integrate it. Even more important, the pluggable archirecture that is mostly inherited from the OSGi services architecture makes FISE very flexible and extensible. This is particularly important given the different natures of the enhancement engines that we want to be able to deploy (hosted services, proprietary, open source, etc). I consider FISE to be a particularly well suited use case for OSGi.

(cross-posting from here)

Saturday, April 10, 2010

NoSQL talk at Developer Summit

Three days ago I had to chance to talk about NoSQL at the Internet Briefing's Developer Summit. On top of general ideas and concepts like the CAP theorem I chose to talk about Apache Jackrabbit, CouchDB and Cassandra. My slides are embedded below.

It was a really good event with interesting speakers and a knowledgeable audience. I was especially pleased that when I talked about CouchDB's HTTP API someone from the audience mentioned that Apache Sling does something very similar for Jackrabbit.

Special kudos to Christian Stocker of Liip for daring to do a live demo of the "real-time web" - he took a picture from his phone and had it pop up on Jabber and Twitter in about 5 secs.

Vlad Trifa has posted a good summary of the whole event (part 1, part 2) - he also gave a great presentation about the application of the REST architectural style to the "Web of Things".


No Sql
View more presentations from mmarth.

Friday, March 12, 2010

CMS vendors now and then

CMS analyst Janus Boye has just published a post on CMS vendors that discontinue their products (because they get bought out or similar)
During the past 10 years, a number of software products used by online professionals have been discontinued
That sentence reminded me that I had given a talk almost 10 years ago (it was in 2001 exactly) that contained a slide on the CMS market at that time:



The circles denote vendors that were part of CMS market overview articles by popular German IT magazines in that year (I wanted to show how differently the market place could be perceived). A vendor placed in any of the circles had enough attention to be part of at least one evaluation. The vendors outside of the circles were not part of any of these overview articles, but somehow present in the market place - at least I knew their names back then.

It is interesting to look at the landscape from that time. Of course there are a number of well-known vendors that got bought (Vignette, Obtree, Gauss), but the majority still seems to linger on - at least, a web site still exists, for example iRacer, Schema Text, or Contens.

On the other hand, one can ask how many vendors that were important enough to make it into a (German) market overview are still relevant in the market place today. I have used Janus Boye's spreadsheet of relevant European CMS vendors as a benchmark and checked which vendor's of today's list were already in 2001's presentation: Day, Coremedia and Open Text were "in the circles". Tridion was there, but outside of the circles. The rest of the vendors that Janus considers relevant today were not on my radar in 2001.

The end of my presentation involved a couple of CMS-related predictions. Let's see how I did. I predicted:
  • product borders between CMS, DMS and app servers will blur further - my take now: wrong. I do not think that these border are more blurry than they were in 2001
  • more standards and standards-based software (Java, JSP/ASP, XML, XSL) - true. The underlying technologies of CMSs are more homogeneous than they were at that time. Remember TCL?
  • But no true compatibility. True. Nothing more to say.
  • Improved Personalization. Improved Multi-Channel support. Both not really true, but rather fads of those days.
  • Improved DMS features and Office integration. Don't ask me why I said that.
  • No quick market consolidation in sight. Right on the money here.
Mostly correct on general market considerations, mostly wrong on features.

Saturday, January 09, 2010

mp3tagger on GitHub

On the mp3 tagger post I have received quite a bit of feedback and feature requests. Therefore, I thought it might be a good idea to do "social coding" and put the code on GitHub where it can easily be forked (and the forks can be watched).

Other than that, the latest version of the tagger contains these improvements:
  • the Last.fm keys and secret are not stored in the code anymore, but entered on the first run and stored in ~/.mp3tagger.cfg
  • you can run the script in two additional modes: simulation and ask. In simulation mode no changes to mp3s will be saved, in ask mode you will be asked to save each change. Start the script with flags "-m simulation" or "-m ask", respectively.
  • It is now possible to specify a list of genre tags that will be considered (additionally to the mp3 default genre tags). The list needs to be stored in a config file at ~/.mp3tagger_genres.cfg (in the "generic" section of the file). The full format this file needs to have is shown below.
  • The last improvement is a tricky one: after tagging all my mp3s I ended up with hundreds of albums tagged with genre Electronic or Indie. I wanted to refine these genres into sub-genres. This again works by putting a list of possible sub-genres into ~/.mp3tagger_genres.cfg and running the tagger with flag "-r genre", e.g. "-r Electronic". You would run this option when you find that you have too many albums of one genre and want to split them up.
So in summary my config file ~/.mp3tagger_genres.cfg looks like:


[generic]
genres=Shoegaze,Dubstep,Grime,Dub,Drum And Bass
[refinements]
Electronic=Idm,Turntableism,Techno,Minimal,Dub,Big Beat,Ambient,Breakbeat,House,Lounge,Electroclash,Drum And Bass,Chillout
Indie=Indie Rock,Indie Pop,Singer-Songwriter,Indie Pop,Shoegaze,Post-Rock,Americana,New Wave,Alt-Country
Reggae=Dancehall,Dub,Ska