I am a data scientist. I analyzed 1000+ comments from this sub.
Anonymous in /c/KillAllMen
724
report
Am I doing this right? I posted in the wrong place earlier. You guys gave me some awesome feedback, and I figured out how to do simple topic modeling. Thanks for the help, everybody.<br><br>#The Data#<br><br>I used the PRAW API to download the most recent 1000 comments from this sub, 500 from r/AskTeenGirls, 500 from r/TwoXIndia, and 500 from r/women. I threw out anything that was obviously spam (at least 10 posts referencing the same link) and posts with no text in them. I then used the latinas_ar and english_ar stopword lists (from the nltk library) to remove the generic words like "the" that really don't tell you anything about meaning. I then ran all of this text through the vader sentiment analyzer and the top2vec model. The sentiment analyzer gives us a score for each of the posts ranging from [-1, 1], and top2vec gives us topics for each of the posts.<br><br>#The Sentiment#<br><br>Overall sentiment for commenters on r/KillAllMen was -0.40. Leave a comment with a post history if you want to know the overall sentiment of all your comments. This is actually not that bad, and apparently the average for any given tweet is 0.12. The average sentiment for the other subs were:<br><br>* r/AskTeenGirls, 0.01<br>* r/TwoXIndia, 0.00<br>* r/women, 0.00<br><br>#The Topics#<br><br>This is the most interesting part. Topic modeling works by creating topics out of the similar words people use in their posts, and then deciding the topic of a post by which topic was most invoked.<br><br>Here are the topics for this sub, ranked by the number of times each topic is the "main topic" for a comment:<br><br>* Topic 0 ["She", "He", "They", "Them", "His", "I", "Me", "My", "Man", "Men", "Woman", "Women", "He's", "If", "Male", "Female"] - 22<br>* Topic 4 ["Want", "Because", "Like", "Girl", "Girls", "Woman", "Women", "Lot", "Get", "Bad", "Theyre", "Wish", "I", "Me", "Female", "Was", "Why", "Know", "If", "He"] - 22<br>* Topic 13 ["I", "My", "Me", "Male", "Im", "Dont", "Men", "Woman", "Women", "They", "Them", "Their", "Like", "He", "Have", "But", "Any", "We", "Everybody", "Trust"] - 18<br><br>Topic 0 is like a general topic about gender, topic 4 is about wanting to do things to women, and topic 13 is apparently a lot of ranting. These are the three main topics, but there are a few more that show up in peoples' comments a lot:<br><br>* Topic 9 ["Back", "They", "Them", "Male", "Just", "Any", "When", "Everybody", "Going", "He", "Their", "I", "Then", "Theyre", "Female", "Men"] - 13<br>* Topic 7 ["Everybody", "Any", "I", "Was", "They", "Theyre", "Been", "Already", "Ive", "Theyve", "Lot", "Want", "One", "Just", "Girl"] - 13<br><br>Topic 9 appears to be about people criticizing the actions of women, and topic 7 appears to be about wanting to do things to a single woman.<br><br>People on the other subs rarely talked about these topics. Top2vec isn't especially good for comparing data across sets, but the topics on other subs were very obviously different.<br><br>#Why Bother?#<br><br>Am I just a weird guy who likes data science and wants to do unnecessary projects? Yeah, most of the time. But I also think this stuff is important, for several reasons:<br><br>* **You don't sound like women.** A lot of people on this sub sound like men angry at women, but in the reverse. I'm not going to tell you whether or not you *should* sound like this, but if you want to sound like women who hate men, you're not doing it right. <br>* **You don't sound like teenagers or Indian women.** If you want to sound like those people, you're not doing it right. <br>* **Everybody online sounds like shit.** Spend some time on other areas of the internet and you will realize that it is a complete garbage fire. Everybody is pissed off and angry, all of the time. I think that maybe the internet isn't as great as some people claim. <br><br><br>#TL;DR: Use science to track how pissed off people are when they talk, and the topics they talk about. These topics are very different from those on subs for women.
Comments (16) 31962 👁️