ML Model Detects Fraudulent Emails, Increases Efficiency
A B2B technology company provides computers, hardware, software and IT solutions for businesses, government entities and education organizations. With a large number of purchases and orders coming through their account representatives’ email inbox, the company looked to streamline purchasing requests. They created a workflow to automatically populate an online shopping cart on behalf of the representative with the client’s desired products. The only catch was that some of the emails were not from actual accounts and were fraudulent.
Finding Fraudulent Emails
The client wanted to use machine learning to easily sort emails into different workflows, possibly fraudulent or not likely to be fraudulent, and turned to longtime partner SPR to build the ML model.
The model would assign each email a fraud score. Emails determined to be fraudulent were filtered to another workflow to review, and emails determined to be legitimate would enter the automatic cart creation workflow. They were faced with productionalizing an ML model and implementing it into an active, existing process. The model would need to scale well and provide accurate results, while also generating results quickly.
Building the Model
To create the ML model, SPR leveraged existing legitimate and fraudulent emails received by the client to determine patterns common in fraudulent emails.
The SPR team prepared the data and vectorized the emails—turning text into numbers—so the model could detect patterns within the numbers. Then, the the model was tuned by adapting data parameters to provide the best balance between accuracy, false positives and false negatives. The majority of emails were legitimate, making the sample size for fraudulent emails small, so SPR made adjustments to the data set to overcome the class imbalance issue.
SPR drew on our experience in count vectorization to flag common words patterns within emails, and other vectorization strategies, including TFIDF, to assign each email a fraud score.
The resulting text classification model provided a score between 0 and 1, representing the probability that the email was fraudulent.
The ML-Empowered Workflow
Throughout the process, SPR served as an extension of the client’s team, providing mentoring and advisory services to help client employees understand what needed to be done in the production systems the ML model would be used in. After tuning, the model was then successfully implemented by the client’s team, and the organization benefitted from more streamlined email processes and decreased their time spent dealing with fraudulent communications.
Technologies Used: Python Data Science Stack