Sunday, November 15, 2009

Running the iTunes genre tagger script with OS X Automator

Due to public demand here's a little recipe how to run last post's mp3 tagger without using the command line on OS X:
  • Open Automator
  • Start a new "Application" project
  • Drag the "Run Shell Script" action into the right workflow panel, set the "pass input" drop-down to "as arguments" and edit the script to (see screenshot below):
for f in "$@"
do
/opt/local/bin/python /Users/michaelmarth/Development/Code/mp3tagger/tag_groupings.py -d "$f"
done

(you will have to adapt the paths to your local setup)
  • Save the application and happily start dropping mp3 folders onto the application's icon.

Tuesday, November 03, 2009

Python script to set genre in iTunes with Last.fm tags

Now that I have started to seriously use iTunes I figured it might be nice to have the genre tag set in a meaningful way. Since I have a reasonably large collection of mp3s doing that manually was out of question - I wrote me a Python script to do that. There seems to be a large demand for such a functionality (at least I found a lot of questions on how to automatically set the genre tag) so maybe someone else finds the script useful. It is pasted below.

General Strategy

The basic idea is to use Last.fm's tags for genre tagging. In iTunes the genre tag is IMO best used when it only contains one single genre, i.e. something like "Electronica", not something like "Electronica / Dance". On the other hand dropping all but one tag would lose a lot of information, so I decided to use the groupings tag for additional information that is contained in the list of tags that an artist has on Last.fm. In the example above that would be something like "Electronica, Dance, 80s, German". In that way it is simple to use iTunes' Smart Playlist feature to create play lists of all, say, dance music. This approach is probably not suitable for classical music..

The ID3 field that is exposed in iTunes' UI as "grouping" is defined in the ID3v2 spec as:
TIT1
The 'Content group description' frame is used if the sound belongs to a larger category of sounds/music. For example, classical music is often sorted in different musical sections (e.g. "Piano Concerto", "Weather - Hurricane").
So, the strategy I described above seems to be kind of in line with the spec. In general, it is a good idea to have a look at the ID3v2 spec if you consider dabbling with mp3 tags.

Practical Considerations

If one would just take an artist's highest-rated Last.fm tag for the genre one would end up with pretty inconsistent genre tags (think "hip-hop", "hip hop", and "hiphop"). Therefore, I chose to use a fixed set of values for genre. In a previous version of ID3 the list of possible genres was fixed. While this is clearly a terrible idea to start with it came along handy in this case. I used his as a fixed list for genres.

The second practical consideration was which Last.fm tags to include. In Last.fm parlance each artist tag comes with a weight (values form 0 to 100). Selecting only the tags with weight larger than 50 worked out fine for me (usually I had 1-5 tags per artist).

A third thing you might want to be aware of: if you programmatically change tags in an mp3 iTunes will not pick up these changes automatically. A simple way of letting it know: select the "Get Info" command on these items. This will trigger a reload of the new tag values.

Script

To run the script you will need the Python libraries mutagen and pylast installed. Run it with the option
-d directory_with_mp3s
The script will walk along this directory and modify all mp3s it finds. Also, you will need a Last.fm API key and set your API_KEY and API_SECRET accordingly in the script.


#!/usr/bin/env python
# encoding: utf-8
"""
tag_groupings.py

Created by Michael Marth on 2009-11-02.
Copyright (c) 2009 marth.software.services. All rights reserved.
"""

import sys
import getopt
import pylast
import os.path
from mutagen.id3 import TCON, ID3, TIT1

help_message = '''
Adds ID3 tags to mp3 files for genre and groupings. Tag values are retrieved from Last.FM. Usage:
-d mp3_directory
'''

class Usage(Exception):
def __init__(self, msg):
self.msg = msg

all_genres = TCON.GENRES
genre_cache = {}
groupings_cache = {}
API_KEY = "your key here"
API_SECRET = "your secret here"
network = pylast.get_lastfm_network(api_key = API_KEY, api_secret = API_SECRET)

def artist_to_genre(artist):
if genre_cache.has_key(artist):
return genre_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
for tag in tags:
if all_genres.__contains__(tag[0].name.title()):
genre_cache[artist] = tag[0].name.title()
print "%20s %s" % (artist,tag[0].name.title())
return tag[0].name.title()

def artist_to_groupings(artist):
if groupings_cache.has_key(artist):
return groupings_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
relevant_tags = []
for tag in tags:
if int(tag[1]) >= 50:
relevant_tags.append(tag[0].name.title())
groupings = ", ".join(relevant_tags)
groupings_cache[artist] = groupings
print "%20s %s" % (artist,groupings)
return groupings

def walk_mp3s():
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".mp3"):
audio = ID3(os.path.join(root, name))
artist = audio["TPE1"]
genre = artist_to_genre(artist[0])
grouping = artist_to_groupings(artist[0])
if genre != None:
audio["TCON"] = TCON(encoding=3, text=genre)
if grouping != None:
audio["TIT1"] = TIT1(encoding=3, text=grouping)
audio.save()

def main(argv=None):
if argv is None:
argv = sys.argv
try:
try:
opts, args = getopt.getopt(argv[1:], "ho:vd:", ["help", "output="])
except getopt.error, msg:
raise Usage(msg)

# option processing
for option, value in opts:
if option == "-v":
verbose = True
if option in ("-h", "--help"):
raise Usage(help_message)
if option in ("-o", "--output"):
output = value
if option in ("-d"):
try:
os.chdir(value)
except Exception,e:
print "error with directory " + value
print e
walk_mp3s()

except Usage, err:
print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
print >> sys.stderr, "\t for help use --help"
return 2

if __name__ == "__main__":
sys.exit(main())

Monday, November 02, 2009

Interviewed by Internet Briefing Blog

Reto Hartinger of the Internet Briefing group has interviewed me about what I work on these days. The interview is in German.

Sunday, October 18, 2009

Colayer's approach to collaboration software

Chances are you have not heard of Colayer, a Swiss-Indian company producing a SaaS-based collaboration software. I did a small project with the guys, that is how I got to know. When I first saw their product I immediately thought the guys are onto something good, so it is worthwhile to share a bit of their application concepts. They follow an approach I have not seen anywhere else.

On first look Colayer seems to be a mixture between wikis and forums: the logical data structure resembles a hierarchical web-based forum and the forum entries are editable and versioned like in a wiki. But there is more: presence and real-time. All users that are currently logged in are visible and one can have real-time chats within the context of the page one is in or see updates to the page in real-time (similar as in Google Docs). These chats are treated as atomic page elements (called sems in Colayer parlance) just like the forum entries or other texts. Through this mechanism, all communication around one topic stays on one page and in the same context.

There are two more crucial elements: time and semantics. All sem's visibility is controlled by their age and their importance. As such, a simple chat is given less weight that a project decision and will fade out of view after some time. All new items from all pages (i.e. discussions or topics) are aggregated on a personal home page and shown within the context where they occurred.

Below is a screenshot of such different sems in one page. One page corresponds to one topic or forum or wiki page. You can see the hierarchical model and the different semantics (denoted by the colors).



Here is an example screen shot that aggregates different recent sems on one page (essentially a context-aware display of new items including time and context in the same display). Note that this way of displaying new items manages to map importance, time and context into a two-dimensional page, which I find a very cool achievement.



The funny thing about Colayer's product (especially when compared to Google Wave) is that one "gets it" when first looking at it. It solves a problem I am facing in my work on a daily basis: where to put or find crucial information - on an internal mailing list or on the wiki?

The Colayer application is delivered as a browser-based SaaS solution (mainly targeted towards company-internal collaboration). This limits potential usage scenarios outside of the firewall. It would be cool if Colayer found a way of opening up their application to other data sources or consumers. It would be worth it, the app rocks.

Thursday, October 01, 2009

Found out about FlexEvent.UPDATE_COMPLETE

Here is a small bit about the asynchronous nature of the Flex layout mechanisms that I learned while slapping together a presales demo yesterday:

When changing properties of UIComponents listen for FlexEvent.UPDATE_COMPLETE events. They get fired when the change is actually done. In my case I needed to get the textWidth of a Label after changing the label text. Right after calling the setter of text the getter of textWidth will still return the old value.

[Some background reading]

Tuesday, September 29, 2009

Talk at Java User Group

Yesterday, I gave a talk at the Java User Group Switzerland (JUGS) titled "Agile RESTful Web Development". It was about the REST style in principle and hands-on RESTful development with Apache Sling. I enjoyed giving the talk and think that it was well received. Here's the slide deck:

Thursday, September 17, 2009

Ruby script for generating Google sitemaps

I just wrote a small Ruby script to generate a Google sitemap out of file directory. I thought that it might come handy as a quick start for someone, so here is the code (requires the builder gem)

require 'builder'
htmlfiles = Dir.glob("*.html")
x = Builder::XmlMarkup.new(:target => $stdout, :indent => 1)
x.instruct!
x.urlset( "xmlns" => "http://www.sitemaps.org/schemas/sitemap/0.9" ) {
for file in htmlfiles do
x.url{x.loc "http://www.example.com/#{file}"; x.lastmod "2009-09-17"; x.changefreq "monthly"; x.priority "0.8"}
end
}