Take your top 20 artists. For each of these artists, collect the top 5 similar artists. The resulting number of unique artists is your eclectic score. If the score is small (extreme = 5) your musical preferences are very limited, and if it is large (larger than 80, extreme = 100), then you have an eclectic musical preference. You can compute your own score at http://anthony.liekens.net/pub/scripts/last.fm/eclectic.php
My eclectic score is currently
76/100
The 76 related artists for my profile are AFI aiko American Hi-Fi Anti-Flag ATB Bad Religion Blink-182 (4) Brooklyn Bounce Capcom Sound Team Evanescence Eve 6 Every Little Thing Fall Out Boy (2) Fatboy Slim Foo Fighters Gabriela Robin George S. Clinton Gigi D'Agostino Good Charlotte (2) Gorillaz Green Day (4) Groove Coverage hitomi Iron Maiden Kasz & Beal Konami Kukeiha Club KoЯn Lagwagon Limp Bizkit (2) Liquid Tension Experiment Massive Attack Metallica Millencolin Moby Mutha's Day Out My Chemical Romance New Found Glory Nirvana No Use for a Name NOFX (2) Opeth ORIGA Osamu Kubota Panic! At the Disco Papa Roach (2) Pennywise (2) Pulsedriver Rammstein Rancid Red Hot Chili Peppers (2) Rise Against Rufio Saves the Day Shaun Imrei Sugarcult Sum 41 (3) Symphony X System of a Down (3) Taking Back Sunday (2) The Chemical Brothers The Movielife The Offspring (2) The Seatbelts The Starting Line The Used (2) The Young Gods Yellowcard (3) 下村陽子 坂本真綾 崎元仁 日比野則彦 桜庭統 梶浦由記 (2) 植松伸夫 (2) 椎名林檎 鬼束ちひろ
My problem with the algorithm is that your score is actually punished for liking similar artists. For example, I love punk, which is why Bad Religion, Offspring, Millencolin and AFI are all in my top five. But just because I love punk doesn't mean I do not also love many other kinds of music ("eclectic: selecting or choosing from various sources").
I suppose you could argue that because punk dominates my top five, I like non-punk
relatively less. But perhaps I simply listen to a large quantity of music. Say my entire profile consisted of a hundred Bad Religion tracks, and one track each of nineteen other unrelated artists. My score would be 100/100, which is just stupid. Admittedly, giving a pathological example is a little unfair, since last.fm is supposed to work better as you listen to more tracks and even out your list.
You might think the problem could be avoided by analyzing all of your artists -- perhaps above a minimum threshold, say 5, to reduce outliers. But even then you'll be punished for liking related artists, which makes no sense -- if you like one 80s hair metal band, why
wouldn't you like others? So expanding the analysis does not work. The whole idea of similar artists indicating a narrowness in musical taste is intrinsically flawed.
There is also a related, more general problem with last.fm's artist ranking: it only counts number of tracks played, not time listened, when factoring how highly to score each artist. So Dream Theater and Metallica, with their six-minute-plus songs, are not going to appear as highly as Millencolin, with all their two-minutes-minus tracks. Lame. And getting classical music into my top twenty, without concentrating on a single composer, would be difficult indeed. Such a notion is counterintuitive, since you would think that focusing solely on a single classical composer should make me
less "eclectic."
So how can we circumvent these issues to generate a more reasonable "eclectic score" for people's musical tastes? I propose that rather than counting top similar artists
against the score, we instead count
non-similar artists
towards it. To put it another way, count the number of genres "covered" by a person's list of artists. The details of doing so are tricky, since "genre" is only vaguely defined, but here is one idea:
1) Make a list of every artist on the user's playlist above the threshold (e.g., 5). These artists form the nodes of the graph we will be building.
2) For each artist on the list, get its list of similar artists.
3) Compare every pair of artists on the list. If one is on the other's list of similar artists, connect the nodes with an edge of weight 1. If not, but they share a similar artist, connect them with an edge of weight 2. If no commonality, no edge. (These values can be tuned.)
4) Now we analyze the graph. We will consider each connected subgraph separately.
5) For each subgraph, compute the diameter (the
eccentricity of a vertex v in a graph G is the maximum distance from v to any other vertex. The
diameter of a graph G is the maximum eccentricity over all vertices in a graph). Each subgraph represents a "genre cluster." The diameter of the subgraph indicates the cluster's "breadth."
6) Now we just need a formula to compute a total "eclectic score." Like the graph edges, this result can be tuned. One possible reasonable formula is the sum of the diameters of the subgraphs, plus the number of subgraphs. Summing the breadths of the genre clusters makes sense, because each breadth represents how "eclectic" that cluster is. Adding an extra point for each cluster itself is done to represent the added diversity inherent to unconnected genres. Also, if you think about it, any connecting path between two clusters would require the insertion of at minimum one extra node (though in theory there could be an "uber-artist" to whom all artists in both linked clusters relate, this situation seems highly unlikely).
This algorithm avoids the problem of similar artists actually hurting your score. It also does not significantly skew in favor of artists with shorter tracks.
One shortcoming is that the algorithm does not take into account how long it has been since you listened to the artist. Maybe you haven't done so in a really long time. Maybe it was just a phase. On a related note: maybe you have been on last.fm much longer than someone else, and scrobbled many more tracks. Though the algorithm should be somewhat resistant to this problem (and it's arguable whether it's a "problem" at all), it could still skew your results. Or more precisely, people who have not been on last.fm long enough to fully flesh out their tastes will receive a lower score.
One thing that could arguably improve the algorithm would be weighting the genre clusters according to number of track listens somehow, though doing so would reintroduce the track length skewing issue discussed above. But it might still be worth it in case some genre clusters strongly outweigh others. In particular, it might make sense to assign a penalty for genre clusters orders of magnitude stronger than others. Initially, though, I'm leery of such a penalty factor.
Hmm, I really like this algorithm; maybe I'll implement it. For many users it would be a lot of queries, though, meaning it would take a significantly long time to execute. I'll have to think about it.