This is our project, Analysis of Tweets Red-Tagging Chad Booc and Lumad Schools, which aims to investigate tweets that red-tag the mentioned parties, and analyse the behaviours of these Twitter users.
This project hones the students' critical thinking as they analyze tweets and detect misinformation and disinformation related to Chad Booc's death and his activities with the Lumads. Moreover, we examined tweets that contains misinformation regarding Lumad schools and communities.
Red-tagging is an issue that hits close to home for any UP student. Chad Booc, a graduate of our department and an alumnus of UP CURSOR, is a famous victim of this issue. The circumstances surrounding his death are particularly polarizing. While many condemned the state for his death, quite a few lauded the government for its actions. This is an indication of the success of the state's red-tagging. Our frustration regarding this and the inherent dangers of red-tagging lead us to choose this issue as the focus of our CS 132 project.
Twitter has become a prominent platform for red-taggers to spread disinformation and target activists like Chad Booc. The ease with which information can be disseminated on Twitter means that disinformation can quickly gain traction and be amplified, leading to potentially severe consequences for targeted individuals and movements. These consequences include defamation, harassment, or even death, like in Chad Booc's case.
Through this project, we aim to be able to analyze the behavior and intent of red taggers on Twitter and identify their patterns of red-tagging. Through this, we can gain a better understanding on how red-taggers operate which can be helpful as a reference for combatting this kind of disinformation in the future.
At this stage, we decided what we wanted our topic to be and what kind of problem we hoped to answer. We then came up with our hypotheses and our general plan of action.
What kind of behaviour did red-taggers exhibit in 2022 concerning Chad Booc and his activities in relation with:
Analyse the frequency of tweets and usage of relevant keywords to identify red-taggers' behaviours.
We collected our data from twitter and analyzed each tweet to remove irrelevant tweets from the dataset.
We gathered tweets that red-tagged Chad Booc and Lumad schools. The scope of our data collection is 2022, the year he was killed.
In scraping for tweets, we used two sets of keywords, one for Chad Booc and one for Lumad school. Specifically, these are:
With this set of keywords, we were able to collect a majority of tweets from 2022 that red-tagged Chad Booc and Lumad schools.
In order to speed up the data collection process, we used snscraper, a web scraper for social networking services (SNS).
After gathering and sorting the data we preprocessed the information and formatted the values such that it would handle outliers and missing values properly. These would help in providing better results for Natural Language Processing and Time Series analysis.
Each row in our dataset contains one tweet. The columns we collected to describe each tweet are the following:
However, for the purposes of our analysis, we only used the following columns:
In the columns that we used, we did not encounter any missing values that needed filling. Similarly, we also do not have any columns that need normalising or standardisation.
The column Tweet Date Posted contained datetime values in the format MM/DD/YY hh:mm
. In order to make it readable for code, it was parsed into a datetime object as an ISO8601 format string.
Due to the nature of our project objectives and data, we encountered no outliers that needed to be handled during preprocessing. Specifically, it is because these outliers are what we will be looking for in our time series analysis as peaks in the dataset.
We preprocessed the data in this stage in order to prepare it for Natural Language Processing (NLP). This was done by making use of procedures such as cleaning, tokenising, lemmatising and more.
In order to perform topic clustering on the tweets collected, the following basic procedures were performed:
This step was performed via Google App Script.
To get only important words and concepts in the tweets, a list of English stopwords is used to filter the words present in a tweet. This removes unnecessary words such as articles and pronouns in the content of the tweet. NLTK corpus' stopwords is used for this, which we extended to include additional words such as meaningless expressions. Then, NLTK's word_tokenize function is used to get the tokenised words to prepare for the next steps.
To get the common topics found across the gathered tweets, it is important that the content in each are in the same form. This is done by lemmatising the tokenised words using NLTK's WordNetLemmatizer. The help of NLTK's post_tag is employed to determine each word's part of speech for a more accurate lemmatisation.
After these preprocessing procedures, we were able to get the top keywords used in the tweets:
In order to group the tweets by date, pandas' resample method is used with a frequency of 1 day.
Graphing this binned dataset, we can see the number of tweets posted per day:
We determined the topic clusters and dates where there was the most tweet activity from the given dataset using topic clustering and time series analysis.
Within 6 days there was an influx of tweets regarding Chad Booc's death
Link1 Link2There was a relatively low but stable number of tweets present for 25 days. Also, a peak was detected on March 12 when Chad's autopsy report was released.
Link1 Link2Another peak was recorded during the week of President Marcos' first State of the Nation Address.
Link1 Link2Using Latent Dirichlet Allocation (LDA), we were able to group the tweets into four clusters. The cluster of tweets predominantly talking about Chad Booc's death made up 27.46% of the total. Though this is one of the clusters that had the highest number of tweets, we were unable to find a statistically significant difference on the count in that cluster versus the others (χ2 = 1.96, p = 0.58).
We have identified several dates with distinct patterns of tweet frequency and corresponding characteristics. These are: February 25, the day of Chad Booc's death, March 12, with Inquirer's article release on Chad's autopsy, and July 26, PBBM's SONA. Furthermore, topic clustering showed that tweets are uniformly distributed across four topics: (1) Lumad schools as NPA training grounds, (2) red-tagging of Makabayan bloc and Filipino activists, (3) Chad Booc's death, and (4) Chad being red-tagged as an NPA member.
From the results, we conclude that the frequency of tweets red-tagging Chad Booc and his activities significantly increased after his death, but these tweets did not predominantly talk about his death.
Hi! We are a group of second year BS Computer Science students from the University of the Philippines Diliman and members of the UP Association of Computer Science Majors (UP CURSOR).
Hi! I'm Marius. I'm interested in web development and data science, particularly in the field of NLP.
Hello, I'm Julia! I'm interested in learning more about different fields of computer science. Outside of academics, I like to read, draw, and look at cat pictures.
Hi! I'm Joshua. I seek opportunities in different fields of computer technology, especially in software engineering and web development.