Gwendolyn Ang - BI Portfolio

About Me

Explore

Gwendolyn Ang

Fraud Detection & Other ML Projects

Optimizing telco, banking and retail operations through ML models and strategic business recommendations

⁠

💵 Anomaly Detection for Bank Fraud

My individual contributions to the group term project include: literature review, data pre-processing, feature selection through principal component analysis, exploratory data analysis, and research and comparison of ML models and evaluation metrics, and business implications and recommendations.

Presentation Deck Excerpts

🗼Model Explainability for Telco Churn

While complex ML models like deep learning and ensemble methods often achieve higher accuracy, they present challenges in transparency, making it difficult for stakeholders to understand how decisions are made. This lack of interpretability is especially critical in fields such as healthcare, finance, and the justice system, where trust, fairness, and societal impact are paramount.

Model explainability techniques, such as SHapley Additive exPlanations (SHAP), help address this issue by quantifying the contribution of each feature (e.g., income, age, credit score) to the model’s prediction, providing insights into the decision-making process (

Data Camp, 2023⁠

This individual project investigates churn rates for a telecommunications company using the Random Forest Classifier and SHAP. I performed key stages of the ML process, including data acquisition, preprocessing, transformation, model development, SHAP implementation, and feature importance visualizations before wrapping up with business recommendations.

Code Excerpts

Results and Discussion

Dependence Plots. Customers are more likely to churn when:

Their internet service is fiber optic as opposed to DLS or no internet.

Who have tenusres < 20 years, afterwhich churn rate stabilizes and slowly decreases over time.

They choose to pay via Electronic check vs. mailed check, bank transfer (automatic) and credit card (automatic).

They do not have a two-year contract.

They have monthly charges of 30-40 dollars, but more so when it is 70-100 and especially at 100-110. However, those 110-120 have lower SHAP values.

Total charges are at the lower end < 500. SHAP values are much lower and evenly distributed when monthly charges are from 500-6000, and even lower > than that.

Recommendations

Domain-Related

The top 2 feature explain 45-48% of the model, so the business can prioritize these with their time and budget.

InternetService_Fiber optic

Assuming the telco company provides the internet service, conduct FGDs with customers to assess if fiber optic churn is due to internal factors (ex. service quality, price, customer service) or external factors (ex. competitors offering better services). For example, if quality is the issue, prepare service recovery strategies like prioritization in allocating technicians for home visits and fee waivers.

Tenure

Assuming that this is the length of time a customer has stayed with the company, the goal should be to get more customers to cross the 10-20 year threshold. Two key periods for the telco company to touch base with them:

Upon account opening, offer freebies (ex. 2-year Netflix subscription, ClassPass credits, monthly 5% discount at a restaurant) in exchange for longer lock-in periods of 3-5 years.

Once their subscription is nearing its end, offer financial rewards like added service coverage for the same price so they once again renew for a longer period.

There are no rows in this table

⁠

The next set of recommendations are less important but can still impact churn by 75% in total.

PaymentMethod_Electronic check

More FGD is needed to understand if, how, and why payment method affects churn, but initial suggestions are to 1) streamline the payment process for electronic check customers (ex. if they have a hard time writing out the electronic check every month, they could be offered a plan to pay only every 2 months) and 2) educate them on other payment methods.

Contract_Two year

This is related to tenure and sets a bar for the ideal lock-in period as opposed to One-year and Month-to-Month contracts.

Monthly Charges

Segment users into 2 groups: those that pay below and those that pay above 70 dollars per month. In general, the latter group have higher churn rates likely because if they are shelling out more money, they have heavier reliance on services and more to lose if service doesn't perform well. Thus, FGD questions should focus on what they use telco services for.

Total Charges

This feature seems to be correlated to tenure since the more years a user has stayed with the telco, the higher the total amount paid. Both share the trend of having SHAP values peak towards the lower end of the x-axis.

However, another way of accumulating charges is if you have high utilization of telco services. Thus, the telco company should prioritize acquiring and retaining high-value customers. Although the the > 70 USD / month group in Monthly Charges had high churn rates, behavior reversed for customers in the 110-120 USD range.

There are no rows in this table

⁠

Model-Related

The model performed best when there were 6 trees in the forest (n_estimators from a range of 5-25 incremented by 5), tree depth of 5 (max_depth from a range of 5-12 incremented by 2) and 5 samples required to be a leaf node (min_samples_leaf from a range of 5-20 incremented by 5).

However, this still yielded a low recall score of 48.79%, meaning our model can do better at spotting churning customers. As mentioned earlier, false negatives are a huge blow for the business not just in terms of user count, but also financially as it reduces their customer lifetime value (CLV) and increases the customer acquisition cost (CAC). It's possible that the model has overfitted due to its high accuracy of 81%.

I also observed that the features generated are slightly different from that of my groupmates, possibly because 1) I added min_samples_leaf as a hyperparameter, and 2) encoded binary values as 0 and 1 while they did -1 and 1.

I recommend future experiments

Reduce the number of features to the top 5, excluding Total Charges (check first its correlation with tenure)

Prioritize improving the Random Forest model by incorporating other hyperparameters

Compare SHAP with LIME and PHP but use the same data preprocessing and Random Forest settings

🛒 Association Rule Learning for Market Basket Analysis

Association rule learning is an unsupervised learning technique used to discover interesting relationships or patterns between variables in large datasets, specifically looking at how the presence of one item is dependent on the presence of another item. Its applications include web usage mining, social network analysis and sentiment analysis.

For this project, the specific use case is market basket analysis, helping grocery store owners uncover co-occurrences of items in customers’ shopping carts and better understand purchasing patterns.

In this group project, my main contributions were co-writing and streamlining code, generating insights, and developing recommendations.

Key Insights

Recommendations

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.