I’d just like to share progress on a new Twitter project that I’ve been working on for the past few weeks. The subject of discussion is AniTwitter, a loosely-connected community of Twitter users that like to talk about anime-related stuff. The goal of this project is to map it out, meaning explore the social network to figure out which users are more likely to be part of it.
This map is just the first step though. I took my first swing at analyzing AniTwitter last spring. I searched through user timelines for the most common hashtags, the most common adjectives, and the most retweeted tweets. As I was planning to redo the experiment last month, I unfortunately lost the user list that I’ve painstakingly built by hand, merging public user lists that I found on various Twitter profiles. I couldn’t be arsed to do it again. But soon it occurred to me I could have the user list built automatically.
So I started with @ANNZac. A recursive search algorithm I ran looked through all the users he’s been following, then again through all the users they’ve been following. I was looking for keywords that might qualify users to be part of the AniTwitter network. The condition was to have the words anime or manga or otaku mentioned twice in the last six months or in the last one thousand tweets. Is the condition too harsh? Too permissive?
At this moment the network boasts with 4640 discovered users and counting. I used the networkx Python library for graph building and visualization. If you look at the image, the nodes with bushy connections at the edges are like that because my search goes only two levels deep, so a bunch of users remain unconnected to the messy core. I decided on this limitation because searching three levels deep is too wide. Later on I found that the maximum degree of separation between two random users on Twitter was measured to be around 3.44. Considering AniTwitter is only a small sub-network of the whole social network, I believe my intuition to limit the search was correct. Whatever the case, I expect the individual bushes to grow smaller as more pairs of users are discovered to be following each other.
The next step will involve user clustering, so we’ll be able to see smaller cliques inside AniTwitter itself, which will depend on user connectivity, @reply frequency, the topics they talk about and so on. I’m also planning to upgrade my earlier script to be able to recognize named entities, such as names and locations, from tweet text.
If anyone has any questions or suggestions on how I should visualize the final map, I’m all ears. I’ll release the code with my next post on this topic.