Steam Reviewer Analyzer Week 3

    Steam Reviewer Analyzer Week 3

    Week 3 content of steam reviewer analyzer

    By AI Club on 10/6/2025
    0

    Week 3: Introduction to Sentiment Analysis

    Week 3 Goals

    • Understand sentiment scoring

    • Build a sentiment word dictionary

    • Calculate basic sentiment scores

    Code for this week: https://drive.google.com/file/d/1b8azmiBPdYNMpUQVgzgRtENVDfuUG7yO/view?usp=sharing

    Introduction

    Welcome to Week 3! Now that we can preprocess text and analyze word frequencies, we're ready to dive into sentiment analysis.

    Sentiment analysis answers the question: Is this review positive, negative, or neutral?

    This week, we'll build a simple but effective sentiment analyzer from scratch. You'll understand exactly how sentiment scoring works before we use more advanced tools.

    What is Sentiment Analysis?

    Sentiment analysis (also called opinion mining) is the process of determining whether text expresses a positive, negative, or neutral opinion.

    Real-World Examples:

    • Product reviews: "This phone is amazing!" → Positive

    • Movie reviews: "Terrible acting and boring plot." → Negative

    • Social media: "The weather is okay today." → Neutral

    How It Works (Basic Approach):

    1. Create lists of positive and negative words

    2. Count how many positive/negative words appear in the text

    3. Calculate a sentiment score based on the counts

    4. Classify the text as positive, negative, or neutral

    1. Building a Sentiment Word Dictionary

    The foundation of sentiment analysis is having lists of words that express positive or negative opinions.

    Creating Basic Word Lists

    # Positive words

    positive_words = [

    • 'good', 'great', 'excellent', 'amazing', 'wonderful',*

    • 'fantastic', 'awesome', 'love', 'best', 'perfect',*

    • 'beautiful', 'brilliant', 'outstanding', 'superb', 'enjoyable'*

    ]

    # Negative words

    negative_words = [

    • 'bad', 'terrible', 'awful', 'horrible', 'worst',*

    • 'hate', 'disappointing', 'poor', 'waste', 'boring',*

    • 'annoying', 'frustrating', 'ugly', 'useless', 'pathetic'*

    ]

    print("Positive words:", len(positive_words))

    print("Negative words:", len(negative_words))

    Why these words?

    • They clearly express positive or negative opinions

    • They're common in reviews

    • They're unambiguous in meaning

    2. Counting Sentiment Words

    Now let's count how many positive and negative words appear in a review.

    Simple Sentiment Counter

    from nltk.tokenize import word_tokenize

    positive_words = ['good', 'great', 'excellent', 'amazing', 'love']

    negative_words = ['bad', 'terrible', 'awful', 'hate', 'worst']

    review = "This game is great! I love the graphics. The story is amazing."

    # Tokenize and lowercase

    tokens = word_tokenize(review.lower())

    # Count positive and negative words

    positive_count = 0

    negative_count = 0

    for token in tokens:

    if token in positive_words:

    • positive_count += 1*

    • if token in negative_words:

    • negative_count += 1*

    print("Review:", review)

    print("Positive words found:", positive_count)

    print("Negative words found:", negative_count)

    Output:

    Review: This game is great! I love the graphics. The story is amazing.

    Positive words found: 3

    Negative words found: 0

    3. Calculating Sentiment Scores

    We can calculate a simple sentiment score by subtracting negative counts from positive counts.

    Basic Sentiment Score

    positive_count = 3

    negative_count = 0

    # Calculate sentiment score

    sentiment_score = positive_count - negative_count

    print("Sentiment score:", sentiment_score)

    # Classify the sentiment

    if sentiment_score > 0:

    • classification = "Positive"*

    elif sentiment_score < 0:

    • classification = "Negative"*

    else:

    • classification = "Neutral"*

    print("Classification:", classification)

    Output:

    Sentiment score: 3

    Classification: Positive

    Understanding the Score:

    • Positive number = More positive words than negative → Positive sentiment

    • Negative number = More negative words than positive → Negative sentiment

    • Zero = Equal positive and negative (or none) → Neutral sentiment

    4. Building a Complete Sentiment Analyzer Function

    Let's combine everything into one reusable function.

    Complete Sentiment Analysis Function

    from nltk.tokenize import word_tokenize

    def analyze_sentiment(text):

    • """Analyze sentiment of text using word counting"""*

    • Define sentiment word lists*

    • positive_words = [*

    • 'good', 'great', 'excellent', 'amazing', 'wonderful',* 'fantastic', 'awesome', 'love', 'best', 'perfect',* 'beautiful', 'brilliant', 'outstanding', 'superb', 'enjoyable'*

    • ]*

    • negative_words = [*

    • 'bad', 'terrible', 'awful', 'horrible', 'worst',* 'hate', 'disappointing', 'poor', 'waste', 'boring',* 'annoying', 'frustrating', 'ugly', 'useless', 'pathetic'*

    • ]*

    • Tokenize and lowercase*

    • tokens = word_tokenize(text.lower())*

    • Count sentiment words*

    • positive_count = sum(1 for token in tokens if token in positive_words)*

    • negative_count = sum(1 for token in tokens if token in negative_words)*

    • Calculate score*

    • sentiment_score = positive_count - negative_count*

    • Classify*

    • if sentiment_score > 0:*

    • classification = "Positive"*

    • elif sentiment_score < 0:*

    • classification = "Negative"*

    • else:*

    • classification = "Neutral"*

    • return {*

    • 'score': sentiment_score,* 'classification': classification,* 'positive_words': positive_count,* 'negative_words': negative_count*

    • }*

    # Test the function

    review1 = "This game is amazing! I love it."

    review2 = "Terrible game. Waste of money."

    review3 = "The game is okay."

    print("Review 1:", review1)

    print("Analysis:", analyze_sentiment(review1))

    print()

    print("Review 2:", review2)

    print("Analysis:", analyze_sentiment(review2))

    print()

    print("Review 3:", review3)

    print("Analysis:", analyze_sentiment(review3))

    Output:

    Review 1: This game is amazing! I love it.

    Analysis: {'score': 2, 'classification': 'Positive', 'positive_words': 2, 'negative_words': 0}

    Review 2: Terrible game. Waste of money.

    Analysis: {'score': -2, 'classification': 'Negative', 'positive_words': 0, 'negative_words': 2}

    Review 3: The game is okay.

    Analysis: {'score': 0, 'classification': 'Neutral', 'positive_words': 0, 'negative_words': 0}

    5. Testing on Movie Reviews Dataset

    Let's test our sentiment analyzer on real movie reviews from NLTK.

    Analyzing Real Reviews

    from nltk.corpus import movie_reviews

    import random

    # Get a random positive review

    pos_fileid = random.choice(movie_reviews.fileids('pos'))

    pos_text = movie_reviews.raw(pos_fileid)

    # Get a random negative review

    neg_fileid = random.choice(movie_reviews.fileids('neg'))

    neg_text = movie_reviews.raw(neg_fileid)

    # Analyze both

    print("=== POSITIVE REVIEW ===")

    print("First 200 characters:", pos_text[:200])

    print("Our analysis:", analyze_sentiment(pos_text))

    print("Actual label: Positive")

    print()

    print("=== NEGATIVE REVIEW ===")

    print("First 200 characters:", neg_text[:200])

    print("Our analysis:", analyze_sentiment(neg_text))

    print("Actual label: Negative")

    Checking Accuracy

    from nltk.corpus import movie_reviews

    # Test on first 100 positive reviews

    correct = 0

    total = 0

    for fileid in movie_reviews.fileids('pos')[:100]:

    • text = movie_reviews.raw(fileid)*

    • result = analyze_sentiment(text)*

    • if result['classification'] == 'Positive':*

    • correct += 1*

    • total += 1*

    accuracy_positive = (correct / total) * 100

    print(f"Accuracy on positive reviews: {accuracy_positive:.1f}%")

    # Test on first 100 negative reviews

    correct = 0

    total = 0

    for fileid in movie_reviews.fileids('neg')[:100]:

    • text = movie_reviews.raw(fileid)*

    • result = analyze_sentiment(text)*

    • if result['classification'] == 'Negative':*

    • correct += 1*

    • total += 1*

    accuracy_negative = (correct / total) * 100

    print(f"Accuracy on negative reviews: {accuracy_negative:.1f}%")

    overall_accuracy = (accuracy_positive + accuracy_negative) / 2

    print(f"Overall accuracy: {overall_accuracy:.1f}%")

    6. Improving the Sentiment Dictionary

    Our basic word list might miss some words. Let's expand it with gaming-specific terms.

    Adding Domain-Specific Words

    # Gaming-specific positive words

    gaming_positive = [

    • 'addictive', 'immersive', 'engaging', 'polished', 'smooth',*

    • 'fun', 'exciting', 'thrilling', 'impressive', 'stunning'*

    ]

    # Gaming-specific negative words

    gaming_negative = [

    • 'buggy', 'glitchy', 'broken', 'laggy', 'repetitive',*

    • 'clunky', 'unfinished', 'unplayable', 'crashes', 'boring'*

    ]

    # Combine with original lists

    all_positive = positive_words + gaming_positive

    all_negative = negative_words + gaming_negative

    print("Total positive words:", len(all_positive))

    print("Total negative words:", len(all_negative))

    Why add domain-specific words?

    • Different domains use different vocabulary

    • Game reviews mention "buggy" and "laggy" - movie reviews don't

    • More relevant words = better accuracy

    Practice Exercise

    1. Add 5 more positive words and 5 more negative words to the sentiment lists

    2. Test your expanded analyzer on these reviews:

      • "The gameplay is smooth and the graphics are stunning!"

      • "Buggy mess. The game crashes constantly."

      • "It's okay, nothing special."

    3. Calculate the sentiment score for each

    # Your expanded word lists

    my_positive_words = ['good', 'great', 'excellent'] # Add 5 more

    my_negative_words = ['bad', 'terrible', 'awful'] # Add 5 more

    # Test reviews

    test1 = "The gameplay is smooth and the graphics are stunning!"

    test2 = "Buggy mess. The game crashes constantly."

    test3 = "It's okay, nothing special."

    # Your task: Analyze each review with your expanded word lists

    Key Takeaways

    • Sentiment analysis classifies text as positive, negative, or neutral

    • Word counting is a simple but effective approach

    • Sentiment dictionaries contain lists of positive and negative words

    • Domain-specific words improve accuracy for specific topics

    • Even simple methods can achieve reasonable accuracy (50-70%)

    Limitations of This Approach

    Our basic sentiment analyzer has some limitations:

    • Doesn't understand context: "not good" is counted as positive

    • Ignores word importance: "amazing" and "good" count the same

    • Misses sarcasm: "Oh great, another bug" seems positive

    • Limited vocabulary: Only knows words in our lists

    Next week, we'll address these issues by using VADER sentiment analysis - a more sophisticated tool that handles negations, emphasis, and context!

    Next Week Preview

    In Week 4, we'll learn:

    • VADER sentiment analysis (advanced tool)

    • Understanding compound scores

    • Handling negations ("not good")

    • Dealing with emphasis ("AMAZING!!!")

    • Comparing our basic analyzer with VADER

    You've built your first sentiment analyzer from scratch! Now you understand the fundamentals.

    Comments