
Final week of content for the Amazon Review Analyzer project
Welcome to the final week of the Amazon Review Analyzer project! This week, you'll test your model against unseen competition data and deploy your Streamlit app so anyone can use it. Let's finish strong!
Now it's time to see how well your model performs on completely new data. This competition dataset has been held back all semester, so your model has never seen these reviews before. This is the true test of whether your model learned real patterns or just memorized the training data.
The competition script works similarly to your Streamlit app. It loads your trained model, extracts features from the competition reviews, makes predictions, and calculates your final scores.
Take a moment to open “comp_script.py” and skim through it. You'll notice it has functions you've seen before like extract_features(), and pos_counts().
Note: If you notice that the comp script and your Streamlit app have a lot of duplicate code (they do!), that's a sign you should refactor. Good programmers avoid repeating themselves. Consider creating a “feature_utils.py” file in your “webapp/utils/” folder to hold all the shared functions, then import them in both files. This makes your code cleaner and easier to maintain.
The competition script has several TODO comments that you need to complete. Let's go through them one by one:
Import your constants:
from utils.constants import CATEGORY_MAPPING
Change utils.constants to match wherever you saved your constants file.
Import preprocessing function:
sys.path.append(str(Path(__file__).resolve().parent.parent / "src"))
Update this path if your preprocess_text function is located somewhere different.
Set your model path:
MODEL_PATH = os.path.join(
os.path.dirname(__file__),
"scripts",
"xgb_model",
"best_model.pkl"
)
Update this to point to your trained model file. Make sure to use a relative path like shown above, not an absolute path like “C:\Users\YourName\....” Relative paths work on any computer and will work for a deployment environment.
Update feature names path:
feature_names_path = "../scripts/xgb_model/feature_names.json"
Update this to wherever you saved your feature_names.json file.
Note: This script only uses one sample from the competition dataset. That is because I will provide the rest of the dataset along with a function to load them all into a dataframe on Thursday during the competition. For now, your script should run, but it will have some errors/warnings with only one sample being predicted.
Once you've updated all the TODOs:
Make sure your virtual environment is activated
Run the script:
python comp_script.py
You should see output that looks like this:
============================================================
Amazon Review Classification Competition
Model Evaluation Script
============================================================
Loading your model...
✅ Model loaded successfully!
Loading NLP models...
✅ NLP models loaded!
Loading test data...
✅ Loaded 500 test samples
Extracting features...
Making predictions...
============================================================
📈 RESULTS
============================================================
🎯 Accuracy: 0.8420 (84.20%)
🎯 Precision: 0.8567
🎯 Recall: 0.8234
🎯 F1 Score: 0.8397
============================================================
🏆 Your model achieved 84.20% accuracy!
============================================================
Compare your competition results to your validation set performance from Week 5:
If scores are similar: Great! Your model generalizes well to new data.
If competition scores are much lower: Your model might be overfitting. It memorized patterns in the training data that don't apply to new reviews.
Tip from testing: When reviewing feature importance, we noticed that word_count and char_length can become dominant features that hide more meaningful patterns. If your model performs poorly, try training a new version without these features and see if it relies more on semantic features like sentiment and POS tags.
Now for the exciting part - making your app available online so anyone can use it!
Before deploying, you need to make some important changes:
Update model path in your Streamlit app Open your main Streamlit file and change the model path to be relative, just like you did in the competition script:
Handle the SpaCy model installation:
Remember previously when we had to download SpaCy's English model? Streamlit's servers need to do that too, but we can't run commands manually on their servers. Add this line to your “requirements.txt”:
This tells Streamlit to automatically download the SpaCy model during deployment. This may make it take a while to deploy your app because it has to download the SpaCy model, which should make you consider a different solution.
Note: If you run into deployment issues (SpaCy models can be tricky), consider removing the POS tag features from your model entirely and using a simpler feature set. It's better to have a deployed app with slightly lower accuracy than no app at all! You can still utilize your more feature-rich model locally.
Your repository needs to include everything necessary to run your app:
✅ All Python scripts (Streamlit app, training scripts, utilities)
✅ Your trained model file (model_name.pkl)
✅ feature_names.json
✅ requirements.txt
✅ Any data files or constants files
Run these commands to push to your remote repo:
git add .
git commit -m "Prepare app for deployment"
git push origin main
Important: Make sure your .pkl model file isn't too large (ideally under 100MB). Your model should be small enough to include in the repo, but if it wasn’t (for many ML cases), you would need to host it in the cloud.
If your GitHub repository is private, Streamlit needs permission to access it. You can easily allow Streamlit to access your private repos through your profile settings in share.streamlit.io
Now for the actual deployment! Follow along with this video tutorial for detailed visual steps:
"My app is taking forever to deploy"
The requirements.txt installation can take 10-15 minutes because of all the machine learning libraries. If it's been a while, check the logs for errors.
"ModuleNotFoundError" or "ImportError"
Make sure all file paths in your code are relative, not absolute
Check that your import statements match your actual file structure
"Model file not found"
Double-check that:
Your model file is actually in your GitHub repository
The path in your code matches the actual location
You used relative paths, not absolute paths
"SpaCy model download failed"
The SpaCy model can be tricky. If you keep getting errors:
Try the direct wheel link in “requirements.txt” (shown above)
If that doesn't work, consider removing POS features entirely
If you still want to use POS features, there should be other solutions online
Once deployment succeeds, you'll get a URL
Test it thoroughly:
Try several different reviews (both real-sounding and fake-sounding)
Test different product categories
Make sure predictions load properly
Verify that all UI elements work
Share the URL with friends or classmates and get their feedback!
By the end of this week, you should have:
Competition Script: A script ready to predict fake reviews for the competition dataset
Deployed App URL: The link to your working Streamlit app
Clean GitHub Repository: If you have time, ensure your repo is well-organized with:
A clear README explaining your project
All necessary code files
No unnecessary files (check your .gitignore)
Clear folder structure
Congratulations! 🎉
You've completed the Amazon Review Analyzer project! You've gone from setting up Python for the first time to deploying a machine learning application that anyone in the world can use. That's a huge accomplishment.
You now know how to:
Perform exploratory data analysis on real datasets
Engineer meaningful features from text data
Train and evaluate machine learning models
Build interactive web applications with Streamlit
Deploy applications to the cloud
Work with version control using Git and GitHub
These are real skills that data scientists and machine learning engineers use every day in their jobs.
Great job on completing this project! 🚀