Measuring different types of gender-based harassment

Possible for:
Hate Speech
The purpose of this analysis is to quantify the types of abusive language female figures (i.e. politicians, activists, etc.) receive in comparison to their male counterparts on social media. This methodology is only possible for Twitter (API) as it requires individual users’ posts.

1. Research Question

  • How are women being harassed on social media?

2. Sample Selection

  • Generate a list of female individuals of interest (i.e. politicians, activists). For example, you could select Greta Thunberg.
  •  Time Period: Select an appropriately wide time period for your given study. Recommendation to include several months of an election period if possible.

3. Gather your data

  • Gather all Tweets mentioning your specific individuals within your chosen period

4. Classify your data

  • Classify your sample of Tweets: (0) no harassment (1) indirect harassment, (2) information threat, (3) sexual harassment and (4) physical harassment.
  • What types of harassment may women experience online? A study from Dalhousie University* defined four categories to classify online gender-based harassment: (a) indirect harassment (i.e. stereotypes and suggesting women are inferior), (b) information threat (i.e. threats that information will be stolen, revealed or misused), (c) sexual harassment (i.e. insulting words of anger, violence or sex) and (d) physical harassment (i.e. threats based on female biology):

  • Using the above information, create a new column classifying Tweets as (0) no harassment and all other Tweets as (1) harassment.

5. Analyze your data

  • What percent of Tweets overall are abusive in any form?
  • How many Tweets do you see in each category?
  • What are the top words used in each category?

Further Resources:

  • Above classification scheme created by Sima Sharifirad and Stan Matwin, “When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP”, Arvix, February 2019
  • See DRI's Guide on Gender and Social Media for more information.