Week 5: Streamlit UI

Welcome to Week 5 of the Amazon Review Analyzer project! You've built, trained, and optimized a powerful XGBoost model. Now it's time to make it accessible by building a user interface (UI). This week, we'll use Streamlit to create an interactive application where anyone can input a review and instantly see if your model thinks it's AI-generated or human-written. We chose Streamlit because it provides a simple, Python-based framework for building interactive interfaces to visualize data and interact with machine learning models. Since our project is already implemented in Python, Streamlit allows for seamless integration without requiring additional web development frameworks.

1. Setup

1.1 Install Streamlit

First, Streamlit should already be installed in your venv, but make sure to install it (after activating your venv) if you did not.

1.2 Create the Web App Directory Structure

Create a new directory called ‘webapp’ in the root of your project.

1.3 Create the Constants File

Create ‘webapp/utils/constants.py’ and add this category mapping dictionary. This maps user-friendly category names to the ones used in your dataset:

CATEGORY_MAPPING = {

"Unknown": "unknown",

"Books": "Books",

"Clothing, Shoes, and Jewelry": "Clothing_Shoes_and_Jewelry",

"Electronics": "Electronics",

"Home and Kitchen": "Home_and_Kitchen",

"Kindle Store": "Kindle_Store",

"Movies and TV": "Movies_and_TV",

"Pet Supplies": "Pet_Supplies",

"Sports and Outdoors": "Sports_and_Outdoors",

"Tools and Home Improvement": "Tools_and_Home_Improvement",

"Toys and Games": "Toys_and_Games",

}

Important: Verify these category names match the values in the category column in your model! Check your saved feature names to confirm (‘feature_names.json’). In case you are not able to import the category dict into your main file, create an ‘init.py’ file in your utils folder and paste in the following code:

from constants import CATEGORY_MAPPING

all = ["CATEGORY_MAPPING"]

2. Building the Streamlit App

2.1 Create the Main App File

Create ‘webapp/app.py’ and start with these imports:

import streamlit as st

import pandas as pd

import torch

import sys

from pathlib import Path

import joblib

import string

from nltk.sentiment import SentimentIntensityAnalyzer

import nltk

import json

import spacy

from collections import Counter

from utils.constants import CATEGORY_MAPPING # You shouldn’t have to change this unless you placed constants elsewhere

Then, add the src directory to the path so we can import our preprocess text function (we have done this before already so look back).

2.2 Create Cached Resource Loaders

Streamlit's ‘@st.cache_resource’ decorator ensures models are only loaded once, making your app much faster. Your Streamlit app will reload every time a user interaction occurs, so it is important that your model isn’t reloaded every time something like this happens. Add these functions:

@st.cache_resource

def get_nlp_models():

# Download VADER lexicon if not already present and only if you included sentiment analysis as a feature

try:

nltk.data.find("vader_lexicon")

except LookupError:

nltk.download("vader_lexicon", quiet=True)

# Load spaCy model (disable unused components for speed). This will be used to tokenize our reviews

nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])

analyzer = SentimentIntensityAnalyzer()

return nlp, analyzer

@st.cache_resource

def get_xgb_model():

# 1. Create a Path object to your saved model (i.e. model_name.pkl)

# 2. Load the model using joblib

return {"best_model": best_model}

2.3 Feature Extraction Functions

Now we need to recreate the same feature extraction process you used during training. This is critical because your features must match exactly!

2.3.1 POS (Part-of-Speech) Features

Based on your EDA, you should have identified which POS tags are important. First, create a constant called “POS_WHITELIST”. This should be a set of the POS tags you used as features.

Now, follow these steps to make functions to extract POS features:

Use the same ‘pos_counts’ function from our feature extraction script but add nlp as a parameter in order to tokenize the review
Do the same with ‘add_pos_features’, but again add nlp as a parameter

Note: You could probably import these functions from our src script, but it is easier to copy them in order to separate model logic from our web app directory. These are the kinds of choices that software devs have to make every day!

2.3.2 Main Feature Extraction Function

This function should extract all the features your model was trained on and should look similar to the function we wrote in a previous week for feature extraction:

def extract_features(text, rating=5.0, include_pos=False):

cleaned_text = preprocess_text(text)

nlp, analyzer = get_nlp_models()

df = pd.DataFrame(

[

{

"rating": rating,

"char_length": len(cleaned_text),

"word_count": len(cleaned_text.split()),

"punctuation_ct": sum(

1 for c in cleaned_text if c in string.punctuation

"is_extreme_star": rating in [1.0, 5.0],

"sentiment_score": analyzer.polarity_scores(cleaned_text)["compound"],

}

]

)

df["cleaned_text"] = cleaned_text # Text should already be cleaned, but let’s just make sure

if include_pos:

df = add_pos_features(df, nlp)

return df

2.4 Prepare Features for Prediction

Your model expects features in a specific order with specific column names. This function ensures everything is aligned:

def prepare_features_for_prediction(text, category="unknown", rating=5.0):

# TODO: Decide if you used POS features in your final model

include_pos = True

# TODO: call extract_features in order to create a df

# Load the feature names your model expects

# TODO: Update path to your saved feature_names.json from Week 4

with open("../scripts/xgb_model/feature_names.json", "r") as f:

feature_data = json.load(f)

# TODO: Initialize all category value columns to 0

# The last 10 features in feature_data are likely your category columns

# TODO: Set the appropriate category column to 1 based on the input category

# TODO: Ensure all expected features are present in the dataframe. Loop through each feature in feature_data and check if it is in df.columns

# If a feature is missing, add it with a value of 0.0

# Return features in the exact order the model expects

2.5 Prediction Function

Now create the function that actually makes predictions:

def xgb_predict(text, model, category="unknown", rating=5.0):

# TODO: Prepare features using prepare_features_for_prediction

# Make prediction

prediction = model.predict(features)[0]

probabilities = model.predict_proba(features)[0]

confidence = probabilities[prediction]

# Convert prediction to label (0=AI/CG, 1=Human/OR)

label = "Human" if prediction == 1 else "AI"

return label, confidence, probabilities.tolist()

3. Building the User Interface

3.0 Create the Main Function

Now for the fun part, building the actual user interface! Before you get started, I highly recommend watching the following videos and trying to follow along before I give steps to creating your own UI:

https://www.youtube.com/watch?v=-IM3531b1XU&list=PLXhX6b6y_bWTegYvt-ed5SKTmQUtzwOn4&index=21

https://www.youtube.com/watch?v=QetpwPnEpgA&list=PLXhX6b6y_bWTegYvt-ed5SKTmQUtzwOn4&index=2

https://www.youtube.com/watch?v=CSv2TBA9_2E&list=PLXhX6b6y_bWTegYvt-ed5SKTmQUtzwOn4&index=13

3.1 Create the Main Function

The following steps to build your UI will be mostly pseudocode because it would be quite boring if everyone’s UI looked the same. In addition, the videos above and Streamlit documentation should be of great guidance. I recommend sketching out what you would like your UI to look like before you even start programming it. For example:

Now build out the main function:

def main():

# Configure the page

st.set_page_config(

page_title="Amazon Review Analyzer",

page_icon="🤖"

)

# TODO: Title and description

# Load model with a loading spinner

with st.spinner("Loading XGBoost model..."):

model_dict = get_xgb_model()

if model_dict is None:

st.error("Failed to load model. Please check if model files exist.")

return

st.success("XGBoost model loaded successfully!")

# TODO: Create the input section (see 3.2 below)

# TODO: Create the results section (see 3.3 below)

if name == "__main__":

main()

3.2 Input Section

Add this code inside ‘main’ to create the input interface:

# TODO: Create two columns for layout

with col1:

st.write("**Enter Review Text:**")

# TODO: Create a text area for review input (st.text_area())

# Optional inputs for better predictions

col1_input, col2_input = st.columns(2)

with col1_input:

# TODO: Create a select box for product category

# Hint: Use the keys from CATEGORY_MAPPING

with col2_input:

# TODO: Create a number input for rating (1-5)

# Analyze button

analyze_button = st.button("Analyze Review", type="primary")

3.3 Results Section

Add this code to display results in the second column:

with col2:

st.write("**Analysis Results:**")

# Only run analysis if button is clicked and there's input

if analyze_button and input_review.strip():

with st.spinner("Analyzing with XGBoost model..."):

try:

# Map user-friendly category to dataset category

dataset_category = CATEGORY_MAPPING[category]

# TODO: Make prediction with xgb_predict()

# TODO: Display the prediction with appropriate styling

# Hint: You could use st.error() for AI and st.success() for Human

# TODO: Display confidence score

# Feature Analysis Expander (check out 3.4 BONUS below!)

with st.expander("Feature Analysis"):

# TODO: Extract and display the features used for prediction

# This helps users understand what the model is "seeing"

pass

except Exception as e:

st.error(f"Error during analysis: {str(e)}")

st.exception(e)

elif analyze_button and not input_review.strip():

st.warning("Please enter a review to analyze!")

3.4 BONUS: Feature Analysis Display

To make your app more educational, show users what features were extracted. This is also helpful for you to make sure that your model is seeing the correct features:

# Inside the expander in the results section:

with st.expander("Feature Analysis"):

include_pos = True # Match your model's feature set

# TODO: get the features using extract_features

st.write("**Extracted Features:**")

# TODO: Display features in two columns

# Column 1: Basic features (char_length, word_count, etc.)

# Column 2: POS features (VERB, NOUN, etc.)

with col1_feat:

# TODO: Display basic features

pass

with col2_feat:

# TODO: Display POS features

pass

4. Running Your App

4.1 Test Your App Locally

Navigate to the webapp directory and run:

streamlit run app.py

Your browser should automatically open to ‘http://localhost:8501’ where you can interact with your app!

4.2 Testing Checklist

Make sure to test:

Review text input works
Category selection works
Rating input works
Analyze button makes predictions
Results display correctly
Feature analysis shows correct values (optional)
Error handling works (try submitting empty text)
Different review lengths work
Both AI and Human predictions work

4.3 Test with Known Examples

Try testing with:

A review you know is human-written (maybe from the original dataset)
An obviously AI-generated review (use ChatGPT to generate one)
Edge cases: very short reviews, very long reviews, extreme ratings

5. BONUS P2

Want to make your app even better? Try these:

5.1 Add Example Reviews

# Add this after the title

st.write("**Try these examples:**")

example_human = "This product exceeded my expectations! The quality is outstanding."

example_ai = "This product is good. It works well. I recommend it to others."

if st.button("Load Human Example"):

st.session_state.review_text = example_human

if st.button("Load AI Example"):

st.session_state.review_text = example_ai

5.2 Add Model Information

# TODO: Add a sidebar with model info

# TODO: Load and display model metadata from selection_metadata.json

st.metric("Test AUC Score", f"{metadata['test_auc_best']:.4f}")

st.metric("Features Used", metadata['num_original_features'])

# TODO: loop through metadata[‘best_params’] to print out best params

5.3 Batch Analysis

Allow users to upload a CSV of reviews:

st.write("**Or upload multiple reviews:**")

uploaded_file = st.file_uploader("Choose a CSV file", type="csv")

if uploaded_file is not None:

df = pd.read_csv(uploaded_file)

# TODO: Process each review and display results in a table

5.4 Explanation Features

Show which features contributed most to the prediction:

# Use SHAP values for feature importance

import shap

explainer = shap.TreeExplainer(model_dict["best_model"])

shap_values = explainer.shap_values(features)

# Display feature importance plot

st.pyplot(shap.force_plot(explainer.expected_value, shap_values, features))

Wrapping Up

By the end of this week, you should have:

A fully functional Streamlit web application
Real-time review classification
User-friendly interface with inputs and results
Feature visualization capabilities
Error handling and validation
Testing with multiple review types

Great job! You've built a complete machine learning project from data cleaning through model inference. You are very close to a portfolio-worthy project that demonstrates some key data analytics and machine learning principles.

Next Steps

Touch on another approach to solving the issue of fake Amazon reviews… BERT
Deploy your model and Streamlit app so others can use it!

Amazon Review Analyzer Week 5

Week 5: Streamlit UI

1. Setup

1.1 Install Streamlit

1.2 Create the Web App Directory Structure

1.3 Create the Constants File

2. Building the Streamlit App

2.1 Create the Main App File

2.2 Create Cached Resource Loaders

2.3 Feature Extraction Functions

2.4 Prepare Features for Prediction

2.5 Prediction Function

3. Building the User Interface

3.0 Create the Main Function

3.1 Create the Main Function

3.2 Input Section

3.3 Results Section

3.4 BONUS: Feature Analysis Display

4. Running Your App

4.1 Test Your App Locally

4.2 Testing Checklist

4.3 Test with Known Examples

5. BONUS P2

5.1 Add Example Reviews

5.2 Add Model Information

5.3 Batch Analysis

5.4 Explanation Features

Wrapping Up

Next Steps

Comments