Now that I have started to seriously use iTunes I figured it might be nice to have the genre tag set in a meaningful way. Since I have a reasonably large collection of mp3s doing that manually was out of question - I wrote me a Python script to do that. There seems to be a large demand for such a functionality (at least I found a lot of questions on how to automatically set the genre tag) so maybe someone else finds the script useful. It is pasted below.
General StrategyThe basic idea is to use Last.fm's tags for genre tagging. In iTunes the genre tag is IMO best used when it only contains one single genre, i.e. something like "Electronica",
not something like "Electronica / Dance". On the other hand dropping all but one tag would lose a lot of information, so I decided to use the groupings tag for additional information that is contained in the list of tags that an artist has on Last.fm. In the example above that would be something like "Electronica, Dance, 80s, German". In that way it is simple to use iTunes' Smart Playlist feature to create play lists of all, say, dance music. This approach is probably not suitable for classical music..
The ID3 field that is exposed in iTunes' UI as "grouping" is defined in the
ID3v2 spec as:
- TIT1
- The 'Content group description' frame is used if the sound belongs to a larger category of sounds/music. For example, classical music is often sorted in different musical sections (e.g. "Piano Concerto", "Weather - Hurricane").
So, the strategy I described above seems to be kind of in line with the spec. In general, it is a good idea to have a look at the
ID3v2 spec if you consider dabbling with mp3 tags.
Practical ConsiderationsIf one would just take an artist's highest-rated Last.fm tag for the genre one would end up with pretty inconsistent genre tags (think "hip-hop", "hip hop", and "hiphop"). Therefore, I chose to use a fixed set of values for genre. In a previous version of ID3 the list of possible genres was fixed. While this is clearly a terrible idea to start with it came along handy in this case. I used his as a fixed list for genres.
The second practical consideration was which Last.fm tags to include. In Last.fm parlance each artist tag comes with a weight (values form 0 to 100). Selecting only the tags with weight larger than 50 worked out fine for me (usually I had 1-5 tags per artist).
A third thing you might want to be aware of: if you programmatically change tags in an mp3 iTunes will not pick up these changes automatically. A simple way of letting it know: select the "Get Info" command on these items. This will trigger a reload of the new tag values.
ScriptTo run the script you will need the Python libraries
mutagen and
pylast installed. Run it with the option
-d directory_with_mp3s
The script will walk along this directory and modify all mp3s it finds. Also, you will need a
Last.fm API key and set your API_KEY and API_SECRET accordingly in the script.
#!/usr/bin/env python
# encoding: utf-8
"""
tag_groupings.py
Created by Michael Marth on 2009-11-02.
Copyright (c) 2009 marth.software.services. All rights reserved.
"""
import sys
import getopt
import pylast
import os.path
from mutagen.id3 import TCON, ID3, TIT1
help_message = '''
Adds ID3 tags to mp3 files for genre and groupings. Tag values are retrieved from Last.FM. Usage:
-d mp3_directory
'''
class Usage(Exception):
def __init__(self, msg):
self.msg = msg
all_genres = TCON.GENRES
genre_cache = {}
groupings_cache = {}
API_KEY = "your key here"
API_SECRET = "your secret here"
network = pylast.get_lastfm_network(api_key = API_KEY, api_secret = API_SECRET)
def artist_to_genre(artist):
if genre_cache.has_key(artist):
return genre_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
for tag in tags:
if all_genres.__contains__(tag[0].name.title()):
genre_cache[artist] = tag[0].name.title()
print "%20s %s" % (artist,tag[0].name.title())
return tag[0].name.title()
def artist_to_groupings(artist):
if groupings_cache.has_key(artist):
return groupings_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
relevant_tags = []
for tag in tags:
if int(tag[1]) >= 50:
relevant_tags.append(tag[0].name.title())
groupings = ", ".join(relevant_tags)
groupings_cache[artist] = groupings
print "%20s %s" % (artist,groupings)
return groupings
def walk_mp3s():
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".mp3"):
audio = ID3(os.path.join(root, name))
artist = audio["TPE1"]
genre = artist_to_genre(artist[0])
grouping = artist_to_groupings(artist[0])
if genre != None:
audio["TCON"] = TCON(encoding=3, text=genre)
if grouping != None:
audio["TIT1"] = TIT1(encoding=3, text=grouping)
audio.save()
def main(argv=None):
if argv is None:
argv = sys.argv
try:
try:
opts, args = getopt.getopt(argv[1:], "ho:vd:", ["help", "output="])
except getopt.error, msg:
raise Usage(msg)
# option processing
for option, value in opts:
if option == "-v":
verbose = True
if option in ("-h", "--help"):
raise Usage(help_message)
if option in ("-o", "--output"):
output = value
if option in ("-d"):
try:
os.chdir(value)
except Exception,e:
print "error with directory " + value
print e
walk_mp3s()
except Usage, err:
print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
print >> sys.stderr, "\t for help use --help"
return 2
if __name__ == "__main__":
sys.exit(main())