Week 4 content of steam review analyzer
Use VADER for sentiment analysis
Understand compound scores
Handle negations and emphasis
This week's code: https://drive.google.com/file/d/1RaHTwIE8f73y9tePc3L4D-df_Pp5Ocr7/view?usp=sharing
Last week, we built a basic sentiment analyzer using word counting. It worked, but had limitations - it couldn't handle "not good" or understand that "AMAZING!!!" is stronger than "good."
This week, we'll use VADER (Valence Aware Dictionary and sEntiment Reasoner) - a powerful sentiment analysis tool that solves these problems.
VADER is a pre-built sentiment analysis tool specifically designed for social media and short texts. It understands:
Negations: "not good" is negative
Emphasis: "GOOD" vs "good" vs "GOOOOD"
Punctuation: "good!" vs "good"
Degree modifiers: "very good" vs "good"
Context: Multiple rules working together
Our basic analyzer from Week 3:
"This game is not good" → Positive (found "good")
"This is AMAZING!!!" → score of 1 (same as "good")
VADER:
"This game is not good" → Negative (understands "not")
"This is AMAZING!!!" → Higher score (understands emphasis)
First, let's import and set up VADER.
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
# Download VADER lexicon
nltk.download('vader_lexicon')
# Create VADER analyzer
sia = SentimentIntensityAnalyzer()
print("VADER is ready!")
VADER returns four scores for any text.
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
text = "This game is good."
scores = sia.polarity_scores(text)
print("Text:", text)
print("Scores:", scores)
Output:
Text: This game is good.
Scores: {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}
neg: Proportion of negative sentiment (0.0 to 1.0)
neu: Proportion of neutral sentiment (0.0 to 1.0)
pos: Proportion of positive sentiment (0.0 to 1.0)
compound: Overall sentiment score (-1.0 to +1.0)
Note: neg + neu + pos = 1.0 (they're proportions)
The compound score is the most useful - it's a single number from -1 to +1:
+1.0: Extremely positive
0.0: Neutral
-1.0: Extremely negative
Classification rules:
compound >= 0.05 → Positive
compound <= -0.05 → Negative
-0.05 < compound < 0.05 → Neutral
Let's see how VADER handles negations compared to our basic analyzer.
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
# Test sentences with negations
sentences = [
"This game is good.",*
"This game is not good.",*
"This game is not bad.",*
"I don't hate this game."*
]
for sentence in sentences:
scores = sia.polarity_scores(sentence)*
compound = scores['compound']*
if compound >= 0.05:*
sentiment = "Positive"*
elif compound <= -0.05:*
sentiment = "Negative"*
else:*
sentiment = "Neutral"*
print(f"Text: {sentence}")*
print(f"Compound: {compound:.3f} → {sentiment}")*
print()*
Expected Output:
Text: This game is good.
Compound: 0.440 → Positive
Text: This game is not good.
Compound: -0.296 → Negative
Text: This game is not bad.
Compound: 0.431 → Positive
Text: I don't hate this game.
Compound: 0.318 → Positive
Notice: VADER correctly flips sentiment when it sees "not"!
VADER understands that emphasis makes sentiment stronger.
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
# Different levels of emphasis
emphasis_tests = [
"This game is good",*
"This game is GOOD",*
"This game is good!",*
"This game is GOOD!",*
"This game is GOOD!!!",*
"This game is gooood"*
]
for text in emphasis_tests:
compound = sia.polarity_scores(text)['compound']*
print(f"{text:30} → Compound: {compound:.3f}")*
Output shows increasing scores:
This game is good → Compound: 0.440
This game is GOOD → Compound: 0.506
This game is good! → Compound: 0.502
This game is GOOD! → Compound: 0.569
This game is GOOD!!! → Compound: 0.629
This game is gooood → Compound: 0.473
Key Insights:
ALL CAPS increases sentiment strength
Exclamation marks boost sentiment
Multiple punctuation marks boost even more
Letter repetition adds emphasis
Words like "very", "extremely", and "somewhat" modify sentiment intensity.
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
modifiers = [
"The game is good.",*
"The game is very good.",*
"The game is extremely good.",*
"The game is somewhat good.",*
"The game is barely good."*
]
for text in modifiers:
compound = sia.polarity_scores(text)['compound']*
print(f"{text:35} → {compound:.3f}")*
Output:
The game is good. → 0.440
The game is very good. → 0.603
The game is extremely good. → 0.632
The game is somewhat good. → 0.439
The game is barely good. → 0.296
Let's create a complete function using VADER.
from nltk.sentiment import SentimentIntensityAnalyzer
def analyze_sentiment_vader(text):
"""Analyze sentiment using VADER"""*
sia = SentimentIntensityAnalyzer()*
scores = sia.polarity_scores(text)*
Get compound score*
compound = scores['compound']*
Classify based on compound score*
if compound >= 0.05:*
classification = "Positive"*
elif compound <= -0.05:*
classification = "Negative"*
else:*
classification = "Neutral"*
return {*
'compound': compound,*
'classification': classification,*
'positive': scores['pos'],*
'neutral': scores['neu'],*
'negative': scores['neg']*
}*
# Test the function
test_reviews = [
"This game is absolutely AMAZING!!!",*
"Terrible game. Complete waste of money.",*
"The game is okay, nothing special.",*
"Not bad, but not great either."*
]
for review in test_reviews:
result = analyze_sentiment_vader(review)*
print(f"Review: {review}")*
print(f"Result: {result['classification']} (compound: {result['compound']:.3f})")*
print()*
Let's compare our Week 3 analyzer with VADER on tricky examples.
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.tokenize import word_tokenize
# Our basic analyzer from Week 3
def analyze_sentiment_basic(text):
positive_words = ['good', 'great', 'excellent', 'amazing', 'love']*
negative_words = ['bad', 'terrible', 'awful', 'hate', 'worst']*
tokens = word_tokenize(text.lower())*
pos_count = sum(1 for token in tokens if token in positive_words)*
neg_count = sum(1 for token in tokens if token in negative_words)*
score = pos_count - neg_count*
if score > 0:*
return "Positive"*
elif score < 0:*
return "Negative"*
else:*
return "Neutral"*
# VADER analyzer
sia = SentimentIntensityAnalyzer()
def analyze_vader(text):
compound = sia.polarity_scores(text)['compound']*
if compound >= 0.05:*
return "Positive"*
elif compound <= -0.05:*
return "Negative"*
else:*
return "Neutral"*
# Tricky test cases
tricky_reviews = [
"This game is not good.",*
"The game is AMAZING!!!",*
"I really, really love this game!",*
"It's not terrible.",*
"Very bad game."*
]
print("Basic vs VADER Comparison")
print("=" * 60)
for review in tricky_reviews:
basic = analyze_sentiment_basic(review)*
vader = analyze_vader(review)*
match = "✓" if basic == vader else "✗"*
print(f"Review: {review}")*
print(f" Basic: {basic:8} | VADER: {vader:8} {match}")*
print()*
Let's see how accurate VADER is on our movie reviews dataset.
from nltk.corpus import movie_reviews
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
# Test on positive reviews
correct_pos = 0
total_pos = 100
for fileid in movie_reviews.fileids('pos')[:total_pos]:
text = movie_reviews.raw(fileid)*
compound = sia.polarity_scores(text)['compound']*
if compound >= 0.05:*
correct_pos += 1*
# Test on negative reviews
correct_neg = 0
total_neg = 100
for fileid in movie_reviews.fileids('neg')[:total_neg]:
text = movie_reviews.raw(fileid)*
compound = sia.polarity_scores(text)['compound']*
if compound <= -0.05:*
correct_neg += 1*
# Calculate accuracy
accuracy_pos = (correct_pos / total_pos) * 100
accuracy_neg = (correct_neg / total_neg) * 100
overall = (correct_pos + correct_neg) / (total_pos + total_neg) * 100
print("VADER Accuracy on Movie Reviews:")
print(f" Positive reviews: {accuracy_pos:.1f}%")
print(f" Negative reviews: {accuracy_neg:.1f}%")
print(f" Overall accuracy: {overall:.1f}%")
Let's analyze some realistic game review examples.
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
game_reviews = [
"This game is absolutely incredible! The graphics are STUNNING and gameplay is super smooth. Highly recommend!!!",*
"Buggy mess. Game crashes every 10 minutes. Not worth the money at all.",*
"It's okay. Graphics are decent but gameplay gets repetitive after a while. Not bad but not great.",*
"I didn't think I would like this game, but I was wrong! It's actually pretty fun.",*
"WORST GAME EVER! Total waste of time and money. Extremely disappointing."*
]
print("Game Review Sentiment Analysis")
print("=" * 70)
for i, review in enumerate(game_reviews, 1):
result = analyze_sentiment_vader(review)*
print(f"Review {i}: {review[:60]}...")*
print(f"Sentiment: {result['classification']}")*
print(f"Compound Score: {result['compound']:.3f}")*
print(f"Positive: {result['positive']:.2f} | Neutral: {result['neutral']:.2f} | Negative: {result['negative']:.2f}")*
print()*
Test VADER on your own reviews and compare with the basic analyzer:
Write 3 reviews with tricky language:
One with negation ("not bad")
One with emphasis ("AMAZING!!!")
One with degree modifiers ("very good")
Analyze them with both methods:
Basic word counting (Week 3)
VADER (Week 4)
Compare the results - which is more accurate?
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
# Your practice reviews
my_review_1 = "Write your first review here"
my_review_2 = "Write your second review here"
my_review_3 = "Write your third review here"
# Analyze with VADER
print("Review 1:", analyze_sentiment_vader(my_review_1))
print("Review 2:", analyze_sentiment_vader(my_review_2))
print("Review 3:", analyze_sentiment_vader(my_review_3))
VADER is much more sophisticated than basic word counting
Compound score (-1 to +1) is the main score to use
Negations are handled automatically ("not good" → negative)
Emphasis through caps, punctuation, and repetition increases strength
Degree modifiers like "very" and "extremely" adjust intensity
VADER achieves 70-80% accuracy on most review datasets
Use VADER when:
You need high accuracy
Text has complex language (negations, emphasis)
Working with social media or reviews
You want a quick, ready-to-use solution
Use Basic Analyzer when:
Learning NLP fundamentals
Need full control over word lists
Working with very specific domain vocabulary
Building custom sentiment rules
In Week 5, we'll learn:
Combining multiple reviews into overall sentiment
Visualizing sentiment distributions
Finding sentiment trends over time
Building a complete sentiment analysis report
Preparing for the Chrome extension integration
You now have a powerful sentiment analysis tool! Next, we'll learn how to use it at scale.