Why Sentiment Models Misread Internet Language and What We Can Do About It

How culturally expressive language breaks traditional sentiment analysis and why it matters

In an age where emojis, slang, and internet-native expressions shape how we communicate, most NLP models still struggle to interpret them. This becomes especially problematic when those expressions come from historically marginalized communities, whose language styles are often misclassified, ignored, or flattened by algorithms.

In this project, I investigate how traditional sentiment analysis models misinterpret YouTube comments filled with internet language emojis, Gen Z slang, and terms rooted in Black digital culture. By analyzing comments from two politically expressive music performances, I explore how this misinterpretation can lead to biased sentiment classifications, and why it’s critical to rethink how we build and train these models.

Project Overview:

  • I collected and preprocessed thousands of YouTube comments on two performances that center the Black experience in America.

  • The comments contain a high volume of emojis, internet slang, and culturally expressive phrases often rooted in African American Vernacular English (AAVE) and Gen Z digital culture.

  • I created custom labeling guidelines to classify sentiment more accurately than traditional tools allow.

  • I hand-labeled over 1,000 comments and ran baseline models (Naive Bayes, Logistic Regression) to evaluate performance and model behavior.

  • I’m now exploring the use of active learning (similar to CAMEL) to scale the annotation process efficiently and ethically, with a focus on reducing labeling bias.

The Problem with Current Sentiment Models

Off-the-shelf sentiment analysis tools are optimized for standardized language. But the internet is anything but standardized. When you feed them comments like:

  • GOAT energy! He killed it! 🐐👑

  • They really crazy fr fr 😮

  • All shade, no tea

The models either misclassify them or struggle to assign any sentiment at all. Why?

  1. Emojis carry context: A single emoji can reverse the tone of a sentence, express sarcasm, or add emotional weight. Most models still treat them as meaningless or remove them entirely.

  2. Slang evolves fast: Words like “finna”, “queen”, or “ate” have meanings that models trained on Standard American English can’t necessarily understand.

  3. Cultural context matters: Much of internet slang comes from American Vernacular English (AAVE), but most models are not trained to understand it. This leads to misinterpretation or erasure.

This is a technical and fairness issue. When sentiment models are used in moderation, recommendation sytsems, or customer feedback analysis, misclassifying expressive language can silence or misrepresent communities.

My Approach: Building a Culturally-Informed Sentiment Pipeline

To address these gaps, I created a custom sentiment labeling process for my dataset:

  • Dataset: YouTube comments from two performances with political and cultural significance, each generating thousands of highly expressive responses.

  • Manual labeling: I created guidelines to capture tone, emoji use, and internet slang. I labeled over 1,000 comments as Positive, Negative, or Neutral—with space for ambiguity and mixed signals.

  • Modeling: I trained baseline models (Naive Bayes, Logistic Regression) on my labeled data and evaluated their accuracy. I paid special attention to where they struggled—especially on emoji-heavy and slang-rich comments.

  • Active Learning: To reduce manual labeling while preserving quality, I’m exploring CAMEL, a self-supervised method that selects the most informative samples to label next—allowing for better performance with less data.

Key Insights: What I’ve Learned So Far

  1. Baseline models fall short on internet language. Even with cleaned data, traditional models struggled with short, emoji-rich comments or culturally specific slang.

  2. Custom labeling matters. Popular models incorrectly labeled sentiment, though my own guidelines captured a broader emotional spectrum.

  3. Cultural language noise. Many models treat slang or emojis as noise to be removed. But in this dataset, they were essential to meaning.

Reflections + Next Steps

Moving forward, I’m:

  • Refining my labeling with active learning to reduce annotation fatigue and improve accuracy.

  • Comparing performance across different models to assess which architectures handle expressive language better.

  • Exploring how responsible AI frameworks could help guide dataset design, labeling, and model evaluation for projects like this.

Final Thoughts

Language on the internet is creative, emotional, and deeply cultural—and yet the models we use to interpret it are often rigid and narrow. If we want NLP systems that are truly inclusive, we need to start by questioning what our models ignore—and why.

If you’re working on sentiment analysis, cultural AI, or language bias in NLP, I’d love to connect. I’m also seeking opportunities to grow as a researcherwhether through hands-on roles, mentorship, or programs that bridge AI, language, and social impact.

*Let’s make sentiment models that actually get the internet. Please reach out if you’re building responsible NLP, or if you’ve got feedback on this project. I’m always looking to learn.

Next
Next

Becoming an Angeleno