An email triaging system that can sort large amounts of unread emails intelligently into relevant queues.
The email triage system uses a fine-tuned RoBERTa transformer model to perform multi-label text classification, allowing incoming messages to be automatically routed to the correct operational queue. For identifying high-urgency emails, the system employs a secondary ML pipeline built with TF-IDF text features, categorical metadata encoding, and a Linear SVM classifier. Both models are integrated into an Outlook-based automation script that processes unread mail, applies classification logic, and logs routing events in real time. The system also records detailed metadata including routing decisions, priority scores, sender information, and processing latency to support monitoring and visualization through a Streamlit dashboard.
Through this project, we gained hands-on experience fine-tuning transformer models (RoBERTa) for real-world text classification and integrating them into a production workflow. We strengthened our understanding of feature engineering and classical machine-learning techniques by building an SVM-based urgency classifier that complements the transformer model. We also learned how to automate Outlook using Python and COM APIs, enabling end-to-end email routing inside a live mailbox. Additionally, we developed a Streamlit dashboard that visualizes routing performance, priority distributions, and real-time processing logs, improving observability and system transparency.
We would love to further refine this tool by creating a comprehensive front end in the future so the user does not need to know how to use python scripts. Given sufficient time we would also further optimize our model and potentially add more features into its algorithm.