MAL’s Most Popular TV Anime Genres

It’s not easy finding an anime show that can be tagged with a single genre. Comedies are almost always romances, and shows portraying school life are almost never about students sitting behind their desks. It’s either mysteries or sports clubs or students fighting other students in exoskeletons or giant robots. So if I say that school life is the most popular genre on MAL, of course you can believe me, then again that tells you almost nothing of the type of anime that dominant fan groups like to watch.

This is a follow-up to my exploration of AniDB’s genre tags. The idea was to take a look at crowdsourced data and see which genres usually stick together. The previous time I basically downloaded info from AniDB about every TV anime starting sometime in the last five years. I calculated how similar anime shows were among themselves in terms of genre information and grouped them accordingly. I was hoping that this grouping would correlate well with how popular the grouped anime are, and while I think the results were aiming in the right direction, I started questioning my methodology and my data source.

This time I tried doing the same, except that I used MyAnimeList’s data. I also changed my approach to similarity scores. Instead of using cosine similarity, which didn’t make much sense from a theoretical standpoint, I employed Jaccard similarity coefficient, which can be used for unweighted genre information. What is Jaccard similarity? Basically, if you have two anime that have the same subset of genres, then their Jaccard similarity score will be the number of genres in this subset divided by the number of genres presented by both. Do this pairwise for every two anime and you get a nice similarity matrix that we can run a clustering algorithm on. This algorithm spits out a grouping of anime titles, which enables us to check which genres are the most popular in each group. Below I present the results of this clustering procedure.

Results

TV Anime Group 1
Titles: 136
Average voters: 57332
Most common genres: 
    1.  comedy           (128 titles)
    2.  school           (118 titles)
    3.  romance          (86 titles)
    4.  shounen          (51 titles)
    5.  ecchi            (48 titles)
    6.  harem            (42 titles)
    7.  supernatural     (28 titles)
    8.  action           (23 titles)

TV Anime Group 2
Titles: 149
Average voters: 57040
Most common genres: 
    1.  action           (145 titles)
    2.  fantasy          (101 titles)
    3.  shounen          (72 titles)
    4.  adventure        (61 titles)
    5.  supernatural     (48 titles)
    6.  comedy           (48 titles)
    7.  magic            (33 titles)
    8.  game             (21 titles)

TV Anime Group 3
Titles: 125
Average voters: 43241
Most common genres: 
    1.  action           (117 titles)
    2.  sci-fi           (72 titles)
    3.  mecha            (37 titles)
    4.  seinen           (26 titles)
    5.  drama            (20 titles)
    6.  super power      (18 titles)
    7.  supernatural     (16 titles)
    8.  adventure        (14 titles)

TV Anime Group 4
Titles: 184
Average voters: 33462
Most common genres: 
    1.  school           (49 titles)
    2.  drama            (43 titles)
    3.  slice of life    (43 titles)
    4.  romance          (40 titles)
    5.  fantasy          (37 titles)
    6.  shoujo           (32 titles)
    7.  supernatural     (31 titles)
    8.  mystery          (26 titles)

TV Anime Group 5
Titles: 110
Average voters: 25340
Most common genres: 
    1.  slice of life    (110 titles)
    2.  comedy           (109 titles)
    3.  school           (46 titles)
    4.  seinen           (27 titles)
    5.  shounen          (8 titles)
    6.  romance          (7 titles)
    7.  drama            (5 titles)
    8.  fantasy          (5 titles)

TV Anime Group 6
Titles: 142
Average voters: 21975
Most common genres: 
    1.  comedy           (142 titles)
    2.  supernatural     (27 titles)
    3.  shounen          (22 titles)
    4.  fantasy          (21 titles)
    5.  mystery          (16 titles)
    6.  romance          (16 titles)
    7.  action           (14 titles)
    8.  drama            (14 titles)

Methodology

I was working with 846 anime titles and 40 different genre tags. This time I paid special attention to the number of clusters. Because it’s hard to say what the correct number of clusters is (it could be just one, it could be 846 of them), unless certain metrics show solid numbers, such as the silhouette score from the previous time, you have to ask yourself what types of anime should these clusters be having. We obviously want as little clusters as possible so that we can generalize well, while at the same time we want to see popular anime grouped in their own clusters, respective of each title’s genre similarities above all else.

Alongside the anime’s average user rating MAL also lists the number of votes each title has. Casting a vote is an expression of emotional involvement. Proof of this is that the vote count is relatively higher for very good shows and for very bad shows. We can exploit this property and say that shows with a higher vote count were also, most likely, the most watched.

Therefore we model the number of anime clusters along this number. We look at the clusters’ vote count averages and try to maximize them for each cluster by either lowering or increasing the number of clusters. Anime may fall into different clusters as we increase their number, and so too will their respective vote counts.

There was more sorcery involved, but ultimately I settled on there being six clusters that you can see above. Some have less popular importance than others, that’s because popular importance is based on the aforementioned vote count average.

This optimization problem wasn’t a cakewalk, then again I didn’t bother with any mathematical procedure for it. I settled on analyzing the means and medians with the eyeball method. Ask if you want me to go into more detail, but I doubt you do. Also, I had to do some scripting to make all of this work in the first place. Code will be made available sometime in the future.

Conclusion

I still think AniDB’s tagging system is better, because they have over 180 tags, whereas MAL has only 40 or something. AniDB also lets users place a weight for each tag. Unfortunately these become improvements only with more active users, which AniDB compared to MAL doesn’t have. What the future has in store for this experiment, it’s hard to say, but for it to have the best conditions one would need to build a better, more detailed, and more popular anime database. Contact me if you’re into that kind of thing.

Advertisements

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s