Week 3: Introduction to Sentiment Analysis

Week 3 Goals

Understand sentiment scoring
Build a sentiment word dictionary
Calculate basic sentiment scores

Code for this week: https://drive.google.com/file/d/1b8azmiBPdYNMpUQVgzgRtENVDfuUG7yO/view?usp=sharing

Introduction

Welcome to Week 3! Now that we can preprocess text and analyze word frequencies, we're ready to dive into sentiment analysis.

Sentiment analysis answers the question: Is this review positive, negative, or neutral?

This week, we'll build a simple but effective sentiment analyzer from scratch. You'll understand exactly how sentiment scoring works before we use more advanced tools.

What is Sentiment Analysis?

Sentiment analysis (also called opinion mining) is the process of determining whether text expresses a positive, negative, or neutral opinion.

Real-World Examples:

Product reviews: "This phone is amazing!" → Positive
Movie reviews: "Terrible acting and boring plot." → Negative
Social media: "The weather is okay today." → Neutral

How It Works (Basic Approach):

Create lists of positive and negative words
Count how many positive/negative words appear in the text
Calculate a sentiment score based on the counts
Classify the text as positive, negative, or neutral

1. Building a Sentiment Word Dictionary

The foundation of sentiment analysis is having lists of words that express positive or negative opinions.

Creating Basic Word Lists

# Positive words

positive_words = [

'good', 'great', 'excellent', 'amazing', 'wonderful',*
'fantastic', 'awesome', 'love', 'best', 'perfect',*
'beautiful', 'brilliant', 'outstanding', 'superb', 'enjoyable'*

]

# Negative words

negative_words = [

'bad', 'terrible', 'awful', 'horrible', 'worst',*
'hate', 'disappointing', 'poor', 'waste', 'boring',*
'annoying', 'frustrating', 'ugly', 'useless', 'pathetic'*

]

print("Positive words:", len(positive_words))

print("Negative words:", len(negative_words))

Why these words?

They clearly express positive or negative opinions
They're common in reviews
They're unambiguous in meaning

2. Counting Sentiment Words

Now let's count how many positive and negative words appear in a review.

Simple Sentiment Counter

from nltk.tokenize import word_tokenize

positive_words = ['good', 'great', 'excellent', 'amazing', 'love']

negative_words = ['bad', 'terrible', 'awful', 'hate', 'worst']

review = "This game is great! I love the graphics. The story is amazing."

# Tokenize and lowercase

tokens = word_tokenize(review.lower())

# Count positive and negative words

positive_count = 0

negative_count = 0

for token in tokens:

if token in positive_words:

positive_count += 1*
if token in negative_words:
negative_count += 1*

print("Review:", review)

print("Positive words found:", positive_count)

print("Negative words found:", negative_count)

Output:

Review: This game is great! I love the graphics. The story is amazing.

Positive words found: 3

Negative words found: 0

3. Calculating Sentiment Scores

We can calculate a simple sentiment score by subtracting negative counts from positive counts.

Basic Sentiment Score

positive_count = 3

negative_count = 0

# Calculate sentiment score

sentiment_score = positive_count - negative_count

print("Sentiment score:", sentiment_score)

# Classify the sentiment

if sentiment_score > 0:

classification = "Positive"*

elif sentiment_score < 0:

classification = "Negative"*

else:

classification = "Neutral"*

print("Classification:", classification)

Output:

Sentiment score: 3

Classification: Positive

Understanding the Score:

Positive number = More positive words than negative → Positive sentiment
Negative number = More negative words than positive → Negative sentiment
Zero = Equal positive and negative (or none) → Neutral sentiment

4. Building a Complete Sentiment Analyzer Function

Let's combine everything into one reusable function.

Complete Sentiment Analysis Function

from nltk.tokenize import word_tokenize

def analyze_sentiment(text):

"""Analyze sentiment of text using word counting"""*
Define sentiment word lists*
positive_words = [*
'good', 'great', 'excellent', 'amazing', 'wonderful',* 'fantastic', 'awesome', 'love', 'best', 'perfect',* 'beautiful', 'brilliant', 'outstanding', 'superb', 'enjoyable'*
]*
negative_words = [*
'bad', 'terrible', 'awful', 'horrible', 'worst',* 'hate', 'disappointing', 'poor', 'waste', 'boring',* 'annoying', 'frustrating', 'ugly', 'useless', 'pathetic'*
]*

Tokenize and lowercase*
tokens = word_tokenize(text.lower())*
Count sentiment words*
positive_count = sum(1 for token in tokens if token in positive_words)*
negative_count = sum(1 for token in tokens if token in negative_words)*
Calculate score*
sentiment_score = positive_count - negative_count*
Classify*
if sentiment_score > 0:*
classification = "Positive"*
elif sentiment_score < 0:*
classification = "Negative"*
else:*
classification = "Neutral"*
return {*
'score': sentiment_score,* 'classification': classification,* 'positive_words': positive_count,* 'negative_words': negative_count*
}*

# Test the function

review1 = "This game is amazing! I love it."

review2 = "Terrible game. Waste of money."

review3 = "The game is okay."

print("Review 1:", review1)

print("Analysis:", analyze_sentiment(review1))

print()

print("Review 2:", review2)

print("Analysis:", analyze_sentiment(review2))

print()

print("Review 3:", review3)

print("Analysis:", analyze_sentiment(review3))

Output:

Review 1: This game is amazing! I love it.

Analysis: {'score': 2, 'classification': 'Positive', 'positive_words': 2, 'negative_words': 0}

Review 2: Terrible game. Waste of money.

Analysis: {'score': -2, 'classification': 'Negative', 'positive_words': 0, 'negative_words': 2}

Review 3: The game is okay.

Analysis: {'score': 0, 'classification': 'Neutral', 'positive_words': 0, 'negative_words': 0}

5. Testing on Movie Reviews Dataset

Let's test our sentiment analyzer on real movie reviews from NLTK.

Analyzing Real Reviews

from nltk.corpus import movie_reviews

import random

# Get a random positive review

pos_fileid = random.choice(movie_reviews.fileids('pos'))

pos_text = movie_reviews.raw(pos_fileid)

# Get a random negative review

neg_fileid = random.choice(movie_reviews.fileids('neg'))

neg_text = movie_reviews.raw(neg_fileid)

# Analyze both

print("=== POSITIVE REVIEW ===")

print("First 200 characters:", pos_text[:200])

print("Our analysis:", analyze_sentiment(pos_text))

print("Actual label: Positive")

print()

print("=== NEGATIVE REVIEW ===")

print("First 200 characters:", neg_text[:200])

print("Our analysis:", analyze_sentiment(neg_text))

print("Actual label: Negative")

Checking Accuracy

from nltk.corpus import movie_reviews

# Test on first 100 positive reviews

correct = 0

total = 0

for fileid in movie_reviews.fileids('pos')[:100]:

text = movie_reviews.raw(fileid)*
result = analyze_sentiment(text)*
if result['classification'] == 'Positive':*
correct += 1*
total += 1*

accuracy_positive = (correct / total) * 100

print(f"Accuracy on positive reviews: {accuracy_positive:.1f}%")

# Test on first 100 negative reviews

correct = 0

total = 0

for fileid in movie_reviews.fileids('neg')[:100]:

text = movie_reviews.raw(fileid)*
result = analyze_sentiment(text)*
if result['classification'] == 'Negative':*
correct += 1*
total += 1*

accuracy_negative = (correct / total) * 100

print(f"Accuracy on negative reviews: {accuracy_negative:.1f}%")

overall_accuracy = (accuracy_positive + accuracy_negative) / 2

print(f"Overall accuracy: {overall_accuracy:.1f}%")

6. Improving the Sentiment Dictionary

Our basic word list might miss some words. Let's expand it with gaming-specific terms.

Adding Domain-Specific Words

# Gaming-specific positive words

gaming_positive = [

'addictive', 'immersive', 'engaging', 'polished', 'smooth',*
'fun', 'exciting', 'thrilling', 'impressive', 'stunning'*

]

# Gaming-specific negative words

gaming_negative = [

'buggy', 'glitchy', 'broken', 'laggy', 'repetitive',*
'clunky', 'unfinished', 'unplayable', 'crashes', 'boring'*

]

# Combine with original lists

all_positive = positive_words + gaming_positive

all_negative = negative_words + gaming_negative

print("Total positive words:", len(all_positive))

print("Total negative words:", len(all_negative))

Why add domain-specific words?

Different domains use different vocabulary
Game reviews mention "buggy" and "laggy" - movie reviews don't
More relevant words = better accuracy

Practice Exercise

Add 5 more positive words and 5 more negative words to the sentiment lists
Test your expanded analyzer on these reviews:
- "The gameplay is smooth and the graphics are stunning!"
- "Buggy mess. The game crashes constantly."
- "It's okay, nothing special."
Calculate the sentiment score for each

# Your expanded word lists

my_positive_words = ['good', 'great', 'excellent'] # Add 5 more

my_negative_words = ['bad', 'terrible', 'awful'] # Add 5 more

# Test reviews

test1 = "The gameplay is smooth and the graphics are stunning!"

test2 = "Buggy mess. The game crashes constantly."

test3 = "It's okay, nothing special."

# Your task: Analyze each review with your expanded word lists

Key Takeaways

Sentiment analysis classifies text as positive, negative, or neutral
Word counting is a simple but effective approach
Sentiment dictionaries contain lists of positive and negative words
Domain-specific words improve accuracy for specific topics
Even simple methods can achieve reasonable accuracy (50-70%)

Limitations of This Approach

Our basic sentiment analyzer has some limitations:

Doesn't understand context: "not good" is counted as positive
Ignores word importance: "amazing" and "good" count the same
Misses sarcasm: "Oh great, another bug" seems positive
Limited vocabulary: Only knows words in our lists

Next week, we'll address these issues by using VADER sentiment analysis - a more sophisticated tool that handles negations, emphasis, and context!

Next Week Preview

In Week 4, we'll learn:

VADER sentiment analysis (advanced tool)
Understanding compound scores
Handling negations ("not good")
Dealing with emphasis ("AMAZING!!!")
Comparing our basic analyzer with VADER

You've built your first sentiment analyzer from scratch! Now you understand the fundamentals.

Steam Reviewer Analyzer Week 3

Week 3: Introduction to Sentiment Analysis

Week 3 Goals

Introduction

What is Sentiment Analysis?

Real-World Examples:

How It Works (Basic Approach):

1. Building a Sentiment Word Dictionary

Creating Basic Word Lists

2. Counting Sentiment Words

Simple Sentiment Counter

3. Calculating Sentiment Scores

Basic Sentiment Score

Understanding the Score:

4. Building a Complete Sentiment Analyzer Function

Complete Sentiment Analysis Function

5. Testing on Movie Reviews Dataset

Analyzing Real Reviews

Checking Accuracy

6. Improving the Sentiment Dictionary

Adding Domain-Specific Words

Practice Exercise

Key Takeaways

Limitations of This Approach

Next Week Preview

Comments