For as much flack as Apple’s iTunes tends to get, we gotta say that the iTunes Genius feature works like a charm.
If you’re curious about the underlying algorithm used to make iTunes Genius such a success, Apple engineer Erik Goldman recently explained some of the more technical aspects behind the suggestion technology.
To uncover part of how iTunes Genius works, says Goldman, “look at information retrieval algorithms, especially those that leverage the vector-space model.” But before you can compare factors, such as the frequency of a particular artist or genre in a user’s library or playlists, across iTunes libraries via a Vector-Space model, you need a clever way to define the factor that gives more weight to the things that really matter.
A simple way to properly weight factors for comparison is what’s known as term frequency-inverse document frequency (tf-idf). It’s simply a way to compare how often a particular factor occurs in a single document (or song, or library) to how often that factor occurs in a larger body such as the sum of all iTunes libraries stored by the Genius servers. Thus, a factor that occurs pretty often in a given user’s library–for example, an affinity for an obscure indy band–will tend to be a more powerful determinant, unless it also happens to occur quite often in the total set of data–as would be the case if the factor was an affinity for the Beatles.
Once you’ve got your tf-idf weights sorted, you can represent them in a vector space model as vectors.
Ah yes, of course. Tf-idf. The answer was right in front of us all along.
If the above snippet gets your juices flowin, and you’ve actually heard of latent-factor algorithms before, check out a neat summary of Goldman’s explanation over here at MIT’s Tech Review.