Tuesday, November 03, 2009

Python script to set genre in iTunes with Last.fm tags

Now that I have started to seriously use iTunes I figured it might be nice to have the genre tag set in a meaningful way. Since I have a reasonably large collection of mp3s doing that manually was out of question - I wrote me a Python script to do that. There seems to be a large demand for such a functionality (at least I found a lot of questions on how to automatically set the genre tag) so maybe someone else finds the script useful. It is pasted below.

General Strategy

The basic idea is to use Last.fm's tags for genre tagging. In iTunes the genre tag is IMO best used when it only contains one single genre, i.e. something like "Electronica", not something like "Electronica / Dance". On the other hand dropping all but one tag would lose a lot of information, so I decided to use the groupings tag for additional information that is contained in the list of tags that an artist has on Last.fm. In the example above that would be something like "Electronica, Dance, 80s, German". In that way it is simple to use iTunes' Smart Playlist feature to create play lists of all, say, dance music. This approach is probably not suitable for classical music..

The ID3 field that is exposed in iTunes' UI as "grouping" is defined in the ID3v2 spec as:
TIT1
The 'Content group description' frame is used if the sound belongs to a larger category of sounds/music. For example, classical music is often sorted in different musical sections (e.g. "Piano Concerto", "Weather - Hurricane").
So, the strategy I described above seems to be kind of in line with the spec. In general, it is a good idea to have a look at the ID3v2 spec if you consider dabbling with mp3 tags.

Practical Considerations

If one would just take an artist's highest-rated Last.fm tag for the genre one would end up with pretty inconsistent genre tags (think "hip-hop", "hip hop", and "hiphop"). Therefore, I chose to use a fixed set of values for genre. In a previous version of ID3 the list of possible genres was fixed. While this is clearly a terrible idea to start with it came along handy in this case. I used his as a fixed list for genres.

The second practical consideration was which Last.fm tags to include. In Last.fm parlance each artist tag comes with a weight (values form 0 to 100). Selecting only the tags with weight larger than 50 worked out fine for me (usually I had 1-5 tags per artist).

A third thing you might want to be aware of: if you programmatically change tags in an mp3 iTunes will not pick up these changes automatically. A simple way of letting it know: select the "Get Info" command on these items. This will trigger a reload of the new tag values.

Script

To run the script you will need the Python libraries mutagen and pylast installed. Run it with the option
-d directory_with_mp3s
The script will walk along this directory and modify all mp3s it finds. Also, you will need a Last.fm API key and set your API_KEY and API_SECRET accordingly in the script.


#!/usr/bin/env python
# encoding: utf-8
"""
tag_groupings.py

Created by Michael Marth on 2009-11-02.
Copyright (c) 2009 marth.software.services. All rights reserved.
"""

import sys
import getopt
import pylast
import os.path
from mutagen.id3 import TCON, ID3, TIT1

help_message = '''
Adds ID3 tags to mp3 files for genre and groupings. Tag values are retrieved from Last.FM. Usage:
-d mp3_directory
'''

class Usage(Exception):
def __init__(self, msg):
self.msg = msg

all_genres = TCON.GENRES
genre_cache = {}
groupings_cache = {}
API_KEY = "your key here"
API_SECRET = "your secret here"
network = pylast.get_lastfm_network(api_key = API_KEY, api_secret = API_SECRET)

def artist_to_genre(artist):
if genre_cache.has_key(artist):
return genre_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
for tag in tags:
if all_genres.__contains__(tag[0].name.title()):
genre_cache[artist] = tag[0].name.title()
print "%20s %s" % (artist,tag[0].name.title())
return tag[0].name.title()

def artist_to_groupings(artist):
if groupings_cache.has_key(artist):
return groupings_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
relevant_tags = []
for tag in tags:
if int(tag[1]) >= 50:
relevant_tags.append(tag[0].name.title())
groupings = ", ".join(relevant_tags)
groupings_cache[artist] = groupings
print "%20s %s" % (artist,groupings)
return groupings

def walk_mp3s():
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".mp3"):
audio = ID3(os.path.join(root, name))
artist = audio["TPE1"]
genre = artist_to_genre(artist[0])
grouping = artist_to_groupings(artist[0])
if genre != None:
audio["TCON"] = TCON(encoding=3, text=genre)
if grouping != None:
audio["TIT1"] = TIT1(encoding=3, text=grouping)
audio.save()

def main(argv=None):
if argv is None:
argv = sys.argv
try:
try:
opts, args = getopt.getopt(argv[1:], "ho:vd:", ["help", "output="])
except getopt.error, msg:
raise Usage(msg)

# option processing
for option, value in opts:
if option == "-v":
verbose = True
if option in ("-h", "--help"):
raise Usage(help_message)
if option in ("-o", "--output"):
output = value
if option in ("-d"):
try:
os.chdir(value)
except Exception,e:
print "error with directory " + value
print e
walk_mp3s()

except Usage, err:
print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
print >> sys.stderr, "\t for help use --help"
return 2

if __name__ == "__main__":
sys.exit(main())

15 comments:

Mike T. said...

I've been looking for *exactly* this for over a year now... but I'm not very confident doing anything in Terminal. Could this be made into a standalone app, or possibly an iTunes plug-in?

Michael Marth said...

Mike, glad you find it useful. I was looking into non-terminal usage of the script myself as well, but have not implemented anything, yet. If you're on OS X I'd like to point you to Automator. Also, on OS X saving a script with a .command extension will make it double-clickable, but I do not know how that would pipe in the mp3 directory.
If I do sthg n that area I'll post an update.

Mike T. said...

I am indeed on OS X... With a little bit of hand-holding, I would love to give this script a shot. Could the script be saved as an action I could import into Automator?

I think a lot of people will find your script supremely useful. Thanks.

Michael Marth said...

Mike, I've written a little tutorial how to use the script with OSX Automator. Enjoy.

Bram said...

Tagging my library now, works great!

Liam said...

Hi Michael, I'm so glad you made something to do this. However, I'm having a little trouble running the script, both from Terminal and as an app.

Terminal indicates that the problem is happening on line 38, in the artist_to_genre method:
if all_genres.__contains__(tag[0].name.title()):
It also says "KeyError: 0"

I'm not really experienced with Python or the Last.FM API, could you give me an idea of what's going wrong here?

Thanks, and my email is my name with no space or underscore at gmail.com by the way

Michael Marth said...

Liam,

I'm glad you find the script useful (in principle at least :) ).
Unfortunately, it is not really robust at the moment, but since there are some people who use it I will probably improve it some time soon.

Your problem might be caused by a mp3 by an artist that has weird tags on Last.fm. Can you let me know which artist it is?

I could not reach you at liam (at) gmail and did not see your full name. You can reach me at michael.marth (at) gmail

Michael Marth said...

For the benefit of others: Liam's error was caused by using an older version of Python. I developed and tested with version 2.6.2.

Pio said...

Wow, this is really great. I was thinking of writing the exact script and someone on IRC linked me to yours. Thanks!!

Nick Losier said...

Hey, does this work with M4A files (unprotected AAC)? It doesn't seem to be from my tests. Thanks.

Michael Marth said...

Hi Nick,
no, I don't think it does. The latest version on Github also supports .ogg and .flac. In order to support m4a you would need to add some code to the method "walk_audio_files", something like (after the identical code fragements for flac and ogg in that method)

elif name.lower().endswith(".m4a"):
try:
audio = M4A(os.path.join(root, name))
except Exception, e:
print 'ERROR: m4a Comment Error %s : %s' % (e, os.path.join(root, name))
continue
if not audio.has_key('artist'):
print 'ERROR: m4a comment has no "artist" key in file %s' % os.path.join(root, name)
continue
artist = audio['artist']
genre = artist_to_genre(artist[0])
if genre != None:
audio["genre"] = genre
audio_set = True


Pls note that this code is likely to not work, because the tag names (artist and genre in the code above) differ for each format - so you will have to figure out what it used in m4a.

Hope that helps

Stephen said...

This is saving me years of life

Jobot// said...

I know this is a few years late... but would you believe that the music organization community and industry have not come up with a solution like this?

I can tag my photos, I can tag files on my computer, why can't I tag my music? It'd be much more helpful for me to search/build playlists of my music if I could tag an artist such as "The Flashbulb" with more than just a vague genre (would like to tag with "instrumental, electronic, experimental" and such).

Like many, my library consists of much more than just MP3 files, and having an across the board solution is ideal.

I admire the work you have put into it up to this point, absolutely. I was kind of wondering if there was any other leads on a working solution to achieve something like this? I know I'm not the only one who wants this!

So by this post, I guess I'm asking mainly two things: Is there a complete solution for this that you know of? and secondly, What can we do to push the music distribution services to provide us with the ability first hand?

Thanks for your work here, it's really quite amazing!

Jobot// said...

I know this is a few years late... but would you believe that the music organization community and industry have not come up with a solution like this?

I can tag my photos, I can tag files on my computer, why can't I tag my music? It'd be much more helpful for me to search/build playlists of my music if I could tag an artist such as "The Flashbulb" with more than just a vague genre (would like to tag with "instrumental, electronic, experimental" and such).

Like many, my library consists of much more than just MP3 files, and having an across the board solution is ideal.

I admire the work you have put into it up to this point, absolutely. I was kind of wondering if there was any other leads on a working solution to achieve something like this? I know I'm not the only one who wants this!

So by this post, I guess I'm asking mainly two things: Is there a complete solution for this that you know of? and secondly, What can we do to push the music distribution services to provide us with the ability first hand?

Thanks for your work here, it's really quite amazing!

Michael Marth said...

Hi Jobot,

thanks for the kind words. Glad you find the script useful.

In terms of "leads on a working solution": it would be relatively straight forward to integrate the script into Picard which is a full (GUI) tagging solution. It supports Python scripts as well.

I believe one could also relatively easily integrate with iTunes' scripting capabilities but never had a closer look.

If you get anything like that working I'd appreciate a pointer :)

HTH
Michael