Research: Public's Opinion on DNA Fingerprinting using the Twitter API

tl;dr

Goals: To analyze the public's Opinion on DNA fingerprinting using publicly available tweets.

Overview

As DNA and ancestry companies continue to compile millions of genetic information samples into a database, controversy surrounding whether or not this is ethical. In fact, Ancestry and 23andMe have collectively obtained more than 26 million DNA samples in their databases. (Source). In order to analyze the public’s sentiment on DNA fingerprinting, which is one such application of creating a DNA database, a script written in Python using Google Colab was created to search the Twitter API for a list of keywords

List of technologies used:

Querying the Twitter API

  1. Create a list of keywords to search the Twitter API for.
  2. Query the Twitter API for 100 English tweets for each keyword in the keyword list before a certain date. The date is changed everyday to query for different results.
  3. Populate each raw tweet into their corresponding date sheet

*Tweets were queried for each day between 10-09-2021 to 11-03-2021

Figure 1: Diagram of steps taken in collecting publicly available tweets.

Data Cleaning

  1. Go through each sheet with raw tweets collected in Querying the Twitter API.
  2. For each tweet...
  3. Replace all URLs with _IMG
  4. Replace all regular URLs with _URL
  5. All emojis are replaced with their textual descriptions (🙂 →Slightly Smiling Face)
  6. All repeated letters are replaced with just 2 letters of the same repeated character (heeeeeeeelloheello.

This preprocessing was done in order to prepare tweets for sentiment analysis.

Figure 2: Diagram of steps taken in cleaning tweets.

Example tweet before data cleaning: "We 💚love💚 these photos of some very impressive students learning gel electrophoresis and DNA profiling ... in first year! 🤯 Thank you for sharing the photos @GoreyEtss. We're looking forward to seeing what these scientists do next! #BiotechExperience @ABEProgOffice https://t.co/idow3wAkSd"

Example tweet after data cleaning: ": green heart : love : green heart : photos impressive students learning gel electrophoresis DNA profiling .. first year ! : exploding head : Thank sharing photos @ GoreyEtss . 're looking forward seeing scientists next ! #BiotechExperience @ ABEProgOffice _IMAGE"

Sentiment Analysis

After data collection and cleaning, sentiment analysis was performed on each tweet, allowing us to determine an average sentiment for each keyword we searched for.

Keyword Sentiment Score
genetic fingerprinting -0.07
dna profiling 0.02
dna identification 0.04
dna typing 0.05
dna fingerprint 0.06
genetic profiling 0.08
dna profile 0.09
dna fingerprinting 0.1
genetic profile 0.1
genetic fingerprint 0.13

Note: sentiment scores ranged from [-1,1] with -1 being the most negative and 1 being the most positive

Through sentiment analysis, it is possible to determine the public's opinion on DNA fingerprinting. Genetic profiling contained tweets with the most positive sentiment (0.13), while genetic fingerprinting contained tweets with the most negative sentiment. Taking the average sentiment across all keywords reveals that the average sentiment is 0.06 This reflects that the public has a slightly positive sentiment when it comes to discussing DNA fingerprinting.

Subjectivity Analysis

Subjectivity analysis (how objective or opinionated) was performed on each tweet.

Keyword Sentiment Score
dna fingerprinting 0.35
genetic fingerprinting 0.37
dna profiling 0.38
dna fingerprint 0.39
genetic fingerprint 0.40
dna identification 0.40
genetic profile 0.40
dna profile 0.41
dna typing 0.43
genetic profiling 0.45

Note: The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

Through subjectivity analysis, it was possible to determine how objective or subjective tweets about a keyword were. For example, dna fingerprinting contained tweets that were the most objective (0.35), while genetic profiling contained tweets that were the most subjective (0.45). Taking the average of all keywords in subjectivity analysis reveals that the average subjectivity was 0.398. This means that on average, the public typically is more objective than subjective when it comes to talking about DNA fingerprinting

Results

Source Code

The source code, raw data, cleaned data, and analysis results are all published on GitHub in order to promote further research on this topic.

Errors

Further Research

Further research could be done in the following aspects: