Steam Review Analyzer Week 4

    Steam Review Analyzer Week 4

    Week 4 content of steam review analyzer

    By AI Club on 10/13/2025
    0

    Week 4: VADER Sentiment Analysis

    Week 4 Goals

    • Use VADER for sentiment analysis

    • Understand compound scores

    • Handle negations and emphasis

    This week's code: https://drive.google.com/file/d/1RaHTwIE8f73y9tePc3L4D-df_Pp5Ocr7/view?usp=sharing

    Introduction

    Last week, we built a basic sentiment analyzer using word counting. It worked, but had limitations - it couldn't handle "not good" or understand that "AMAZING!!!" is stronger than "good."

    This week, we'll use VADER (Valence Aware Dictionary and sEntiment Reasoner) - a powerful sentiment analysis tool that solves these problems.

    What is VADER?

    VADER is a pre-built sentiment analysis tool specifically designed for social media and short texts. It understands:

    • Negations: "not good" is negative

    • Emphasis: "GOOD" vs "good" vs "GOOOOD"

    • Punctuation: "good!" vs "good"

    • Degree modifiers: "very good" vs "good"

    • Context: Multiple rules working together

    Why VADER is Better

    Our basic analyzer from Week 3:

    • "This game is not good" → Positive (found "good")

    • "This is AMAZING!!!" → score of 1 (same as "good")

    VADER:

    • "This game is not good" → Negative (understands "not")

    • "This is AMAZING!!!" → Higher score (understands emphasis)

    1. Setting Up VADER

    First, let's import and set up VADER.

    Installing and Importing VADER

    import nltk

    from nltk.sentiment import SentimentIntensityAnalyzer

    # Download VADER lexicon

    nltk.download('vader_lexicon')

    # Create VADER analyzer

    sia = SentimentIntensityAnalyzer()

    print("VADER is ready!")

    2. Understanding VADER Scores

    VADER returns four scores for any text.

    The Four VADER Scores

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    text = "This game is good."

    scores = sia.polarity_scores(text)

    print("Text:", text)

    print("Scores:", scores)

    Output:

    Text: This game is good.

    Scores: {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}

    What Each Score Means:

    • neg: Proportion of negative sentiment (0.0 to 1.0)

    • neu: Proportion of neutral sentiment (0.0 to 1.0)

    • pos: Proportion of positive sentiment (0.0 to 1.0)

    • compound: Overall sentiment score (-1.0 to +1.0)

    Note: neg + neu + pos = 1.0 (they're proportions)

    Understanding the Compound Score

    The compound score is the most useful - it's a single number from -1 to +1:

    • +1.0: Extremely positive

    • 0.0: Neutral

    • -1.0: Extremely negative

    Classification rules:

    • compound >= 0.05 → Positive

    • compound <= -0.05 → Negative

    • -0.05 < compound < 0.05 → Neutral

    3. Handling Negations

    Let's see how VADER handles negations compared to our basic analyzer.

    Negation Examples

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    # Test sentences with negations

    sentences = [

    • "This game is good.",*

    • "This game is not good.",*

    • "This game is not bad.",*

    • "I don't hate this game."*

    ]

    for sentence in sentences:

    • scores = sia.polarity_scores(sentence)*

    • compound = scores['compound']*

    • if compound >= 0.05:*

    • sentiment = "Positive"*

    • elif compound <= -0.05:*

    • sentiment = "Negative"*

    • else:*

    • sentiment = "Neutral"*

    • print(f"Text: {sentence}")*

    • print(f"Compound: {compound:.3f} → {sentiment}")*

    • print()*

    Expected Output:

    Text: This game is good.

    Compound: 0.440 → Positive

    Text: This game is not good.

    Compound: -0.296 → Negative

    Text: This game is not bad.

    Compound: 0.431 → Positive

    Text: I don't hate this game.

    Compound: 0.318 → Positive

    Notice: VADER correctly flips sentiment when it sees "not"!

    4. Handling Emphasis and Punctuation

    VADER understands that emphasis makes sentiment stronger.

    Emphasis Examples

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    # Different levels of emphasis

    emphasis_tests = [

    • "This game is good",*

    • "This game is GOOD",*

    • "This game is good!",*

    • "This game is GOOD!",*

    • "This game is GOOD!!!",*

    • "This game is gooood"*

    ]

    for text in emphasis_tests:

    • compound = sia.polarity_scores(text)['compound']*

    • print(f"{text:30} → Compound: {compound:.3f}")*

    Output shows increasing scores:

    This game is good → Compound: 0.440

    This game is GOOD → Compound: 0.506

    This game is good! → Compound: 0.502

    This game is GOOD! → Compound: 0.569

    This game is GOOD!!! → Compound: 0.629

    This game is gooood → Compound: 0.473

    Key Insights:

    • ALL CAPS increases sentiment strength

    • Exclamation marks boost sentiment

    • Multiple punctuation marks boost even more

    • Letter repetition adds emphasis

    5. Handling Degree Modifiers

    Words like "very", "extremely", and "somewhat" modify sentiment intensity.

    Degree Modifier Examples

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    modifiers = [

    • "The game is good.",*

    • "The game is very good.",*

    • "The game is extremely good.",*

    • "The game is somewhat good.",*

    • "The game is barely good."*

    ]

    for text in modifiers:

    • compound = sia.polarity_scores(text)['compound']*

    • print(f"{text:35} → {compound:.3f}")*

    Output:

    The game is good. → 0.440

    The game is very good. → 0.603

    The game is extremely good. → 0.632

    The game is somewhat good. → 0.439

    The game is barely good. → 0.296

    6. Building a VADER Sentiment Analyzer Function

    Let's create a complete function using VADER.

    Complete VADER Analyzer

    from nltk.sentiment import SentimentIntensityAnalyzer

    def analyze_sentiment_vader(text):

    • """Analyze sentiment using VADER"""*

    • sia = SentimentIntensityAnalyzer()*

    • scores = sia.polarity_scores(text)*

    • Get compound score*

    • compound = scores['compound']*

    • Classify based on compound score*

    • if compound >= 0.05:*

    • classification = "Positive"*

    • elif compound <= -0.05:*

    • classification = "Negative"*

    • else:*

    • classification = "Neutral"*

    • return {*

    • 'compound': compound,*

    • 'classification': classification,*

    • 'positive': scores['pos'],*

    • 'neutral': scores['neu'],*

    • 'negative': scores['neg']*

    • }*

    # Test the function

    test_reviews = [

    • "This game is absolutely AMAZING!!!",*

    • "Terrible game. Complete waste of money.",*

    • "The game is okay, nothing special.",*

    • "Not bad, but not great either."*

    ]

    for review in test_reviews:

    • result = analyze_sentiment_vader(review)*

    • print(f"Review: {review}")*

    • print(f"Result: {result['classification']} (compound: {result['compound']:.3f})")*

    • print()*

    7. Comparing Basic vs VADER Analyzers

    Let's compare our Week 3 analyzer with VADER on tricky examples.

    Side-by-Side Comparison

    from nltk.sentiment import SentimentIntensityAnalyzer

    from nltk.tokenize import word_tokenize

    # Our basic analyzer from Week 3

    def analyze_sentiment_basic(text):

    • positive_words = ['good', 'great', 'excellent', 'amazing', 'love']*

    • negative_words = ['bad', 'terrible', 'awful', 'hate', 'worst']*

    • tokens = word_tokenize(text.lower())*

    • pos_count = sum(1 for token in tokens if token in positive_words)*

    • neg_count = sum(1 for token in tokens if token in negative_words)*

    • score = pos_count - neg_count*

    • if score > 0:*

    • return "Positive"*

    • elif score < 0:*

    • return "Negative"*

    • else:*

    • return "Neutral"*

    # VADER analyzer

    sia = SentimentIntensityAnalyzer()

    def analyze_vader(text):

    • compound = sia.polarity_scores(text)['compound']*

    • if compound >= 0.05:*

    • return "Positive"*

    • elif compound <= -0.05:*

    • return "Negative"*

    • else:*

    • return "Neutral"*

    # Tricky test cases

    tricky_reviews = [

    • "This game is not good.",*

    • "The game is AMAZING!!!",*

    • "I really, really love this game!",*

    • "It's not terrible.",*

    • "Very bad game."*

    ]

    print("Basic vs VADER Comparison")

    print("=" * 60)

    for review in tricky_reviews:

    • basic = analyze_sentiment_basic(review)*

    • vader = analyze_vader(review)*

    • match = "✓" if basic == vader else "✗"*

    • print(f"Review: {review}")*

    • print(f" Basic: {basic:8} | VADER: {vader:8} {match}")*

    • print()*

    8. Testing VADER on Movie Reviews

    Let's see how accurate VADER is on our movie reviews dataset.

    VADER Accuracy Test

    from nltk.corpus import movie_reviews

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    # Test on positive reviews

    correct_pos = 0

    total_pos = 100

    for fileid in movie_reviews.fileids('pos')[:total_pos]:

    • text = movie_reviews.raw(fileid)*

    • compound = sia.polarity_scores(text)['compound']*

    • if compound >= 0.05:*

    • correct_pos += 1*

    # Test on negative reviews

    correct_neg = 0

    total_neg = 100

    for fileid in movie_reviews.fileids('neg')[:total_neg]:

    • text = movie_reviews.raw(fileid)*

    • compound = sia.polarity_scores(text)['compound']*

    • if compound <= -0.05:*

    • correct_neg += 1*

    # Calculate accuracy

    accuracy_pos = (correct_pos / total_pos) * 100

    accuracy_neg = (correct_neg / total_neg) * 100

    overall = (correct_pos + correct_neg) / (total_pos + total_neg) * 100

    print("VADER Accuracy on Movie Reviews:")

    print(f" Positive reviews: {accuracy_pos:.1f}%")

    print(f" Negative reviews: {accuracy_neg:.1f}%")

    print(f" Overall accuracy: {overall:.1f}%")

    9. Real-World Example: Analyzing Game Reviews

    Let's analyze some realistic game review examples.

    Game Review Analysis

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    game_reviews = [

    • "This game is absolutely incredible! The graphics are STUNNING and gameplay is super smooth. Highly recommend!!!",*

    • "Buggy mess. Game crashes every 10 minutes. Not worth the money at all.",*

    • "It's okay. Graphics are decent but gameplay gets repetitive after a while. Not bad but not great.",*

    • "I didn't think I would like this game, but I was wrong! It's actually pretty fun.",*

    • "WORST GAME EVER! Total waste of time and money. Extremely disappointing."*

    ]

    print("Game Review Sentiment Analysis")

    print("=" * 70)

    for i, review in enumerate(game_reviews, 1):

    • result = analyze_sentiment_vader(review)*

    • print(f"Review {i}: {review[:60]}...")*

    • print(f"Sentiment: {result['classification']}")*

    • print(f"Compound Score: {result['compound']:.3f}")*

    • print(f"Positive: {result['positive']:.2f} | Neutral: {result['neutral']:.2f} | Negative: {result['negative']:.2f}")*

    • print()*

    Practice Exercise

    Test VADER on your own reviews and compare with the basic analyzer:

    1. Write 3 reviews with tricky language:

      • One with negation ("not bad")

      • One with emphasis ("AMAZING!!!")

      • One with degree modifiers ("very good")

    2. Analyze them with both methods:

      • Basic word counting (Week 3)

      • VADER (Week 4)

    3. Compare the results - which is more accurate?

    from nltk.sentiment import SentimentIntensityAnalyzer

    sia = SentimentIntensityAnalyzer()

    # Your practice reviews

    my_review_1 = "Write your first review here"

    my_review_2 = "Write your second review here"

    my_review_3 = "Write your third review here"

    # Analyze with VADER

    print("Review 1:", analyze_sentiment_vader(my_review_1))

    print("Review 2:", analyze_sentiment_vader(my_review_2))

    print("Review 3:", analyze_sentiment_vader(my_review_3))

    Key Takeaways

    • VADER is much more sophisticated than basic word counting

    • Compound score (-1 to +1) is the main score to use

    • Negations are handled automatically ("not good" → negative)

    • Emphasis through caps, punctuation, and repetition increases strength

    • Degree modifiers like "very" and "extremely" adjust intensity

    • VADER achieves 70-80% accuracy on most review datasets

    When to Use VADER vs Basic Analyzer

    Use VADER when:

    • You need high accuracy

    • Text has complex language (negations, emphasis)

    • Working with social media or reviews

    • You want a quick, ready-to-use solution

    Use Basic Analyzer when:

    • Learning NLP fundamentals

    • Need full control over word lists

    • Working with very specific domain vocabulary

    • Building custom sentiment rules

    Next Week Preview

    In Week 5, we'll learn:

    • Combining multiple reviews into overall sentiment

    • Visualizing sentiment distributions

    • Finding sentiment trends over time

    • Building a complete sentiment analysis report

    • Preparing for the Chrome extension integration

    You now have a powerful sentiment analysis tool! Next, we'll learn how to use it at scale.

    Comments