Salesforce Effectiveness, most important features and multidimensional view

Introduction:

The accelerating shift to omnichannel retail, especially in lifestyle, electronics, and durables, has transformed the customer journey into a complex, nonlinear path.

Customers now interact with brands across multiple touchpoints (online versus offline) —from discovery and exploration to adoption to post-purchase assurance & service —each playing a vital role in influencing decisions and gaining referrals from their network.

In this context, a well-trained and adaptable salesforce becomes critical across levels.

While evaluating the sales managers(SMs)/associates across levels, highest weightage is given to the sales achievement which is a fair measure of evaluation.

However, this cannot be the only consideration while considering them for elevation to the next level or creating custom programs for upskilling – several other factors come into play making this entire exercise a multi-dimensional one.

The current model is a proposed to evaluate other factors (henceforth called features) which might be seemingly unrelated but impact professional growth, training/upskilling programs of the sales manager.

The idea is to bolster intuition, and traditional wisdom with statistical measures.

The proposed statistical model and the motivation:

The statistical model proposes to create an objective way to look at sales managers’ (SMs) contribution to the business and brand and go beyond just the immediate objective of bringing sales.

The model will assist in:

Eliminating decision-bias and HIPPO from decision making wrt SMs hiring, training, and career progression.
Propose with confidence the ‘most important parameters’ which shall ensure highest engagement and productivity from the current set of SMs and for future hiring.
Cluster the SMs so as to conclude on:
- Training/upskilling requirements, interventions needed
- Ideal ranges for salary, average sales value etc.
As an outcome, the model will also cut down on high churn rate which is very typical in this line of occupation.
With some changes it can also be used for several other functions.

The model has been created using a synthetic dataset and the observations and results are specific to this dataset, it acts to provide a framework for other similar datasets.

Who should refer:

It is intended to benefit:

Sales and Marketing managers and functional heads.
Talent acquisition teams who are tasked with finding the best SMs talent specific to a brand or category.
Learning and Development leaders.
Promotions/Incentives/Channel managers who create SM and retail specific programs.

What to expect:

The model has been created on synthetic data since real datasets pertaining to retail sales of organizations are confidential.

However, columns (or features as they are called in data science parlance) used in the dataset are consistent with what’s collected by most organizations.

The model will hold if new numerical columns are introduced into the dataset. If new categorical columns are introduced then those columns will need to be encoded (and will require 101 of encoding techniques).

The summary results and conclusions are specific to this dataset however the methodology is flexible and generic; offering a blueprint on ways to manage and grow SMs in organizations.

What’s so unique about this approach:

Using the existing data, the SMs are first segmented into cohorts using Unsupervised clustering (Kmode instead of Kmeans has been used to cluster since the data is both numerical and categorical).

Post segmentation, each SM is assigned to one of the clusters. Assignment to clusters serves 2 purposes:

Allows one to find out nuances of every cluster (high performance versus low, high engagement versus low etc.) and thus create programs to cater to each of the clusters.
Uses the labels as targets in predicting most important factors that go into classifying an SM in a particular cluster.

Under the hood:

A synthetic dataset with the 41 columns/features has been used for analysis. The dataset has 1,200 rows. For more details, refer Annexure-1.

Statistical & ML Technique used along with workflow:

In step 1, using unsupervised learning; clusters are being created which not only offer insights into existing set of promoters but also serve as label or target for supervised learning techniques used later for feature importances.

Step 1 – Data ingestion and exploration

While various sources have been mentioned, for the purpose of our modelling; we have used raw .csv synthetic data set.

The intent is to start with as exhaustive a set of features as available and keep snipping till the time model arrives at the most relevant set of features.

In this step, a correlation matrix is created and columns with high/perfect correlation (>0.9) are taken out.

Step 2 – Unsupervised clustering and pseudo labelling

Create clusters at lowest WCSS using the elbow method, and validate it using Silhouette plot. We created 3 clusters for our synthetic data set; however, the number of clusters can vary for several other actual data sets.

Here is the view of the clusters in the 2D map using a scatter plot on synthetic data. This may change when the actual dataset changes.

Step 3 – Using labels from step 2 as targets, run supervised learning algorithms and generate consensus on most important features:

Before running supervised algorithms, we run ANOVA test on linear regression models with pseudo labels as targets to see if there are features which are not statistically significant and can be removed from further analysis.

Post ANOVA, several features were dropped. From our synthetic dataset following features were dropped.

Insignificant vars: [‘EmpEnvironmentSatisfaction’, ‘EmpLastSalaryHikePercent’, ‘EmpRelationshipSatisfaction’, ‘ExperienceYearsInCurrentRole’, ‘YearsSinceLastPromotion’, ‘YearsWithCurrManager’, ‘Avg_no_of_demos_per_day_for_last_1_year’, ‘Avg_interaction_time_per_customer_seconds’, ‘Avg_sales_value’, ‘Avg_customer_feedback’, ‘Salary_per_Annum_lacs’, ‘Conversion_rate’, ‘_of_sales_from_high_value_products’, ‘No_of_interactions_per_customer_per_customer’, ‘Product_knowledge_score’, ‘Customers_attended_M_vs_F’]

This can vary for real datasets.

Finally, following features appear as consensus across all the models.

Feature	Consensus_Rank	Avg_Rank
ExperienceYearsInCurrentRole	1	1.6
NumCompaniesWorked	2	2.2
Age	3	2.5
YearsSinceLastPromotion	4	4.2
ExperienceYearsAtThisCompany	5	5
YearsWithCurrManager	6	5.2
EmpJobRolele	7	7
EmpJobLevel	8	8
EmpEducationLevel	9	8.2
DistanceFromHome	10	8.8
PerformanceRating	11	10.7
EmpJobInvolvement	12	11.2
Attritionle	13	14

Across the five models (Logistic Regression, Random Forest, Decision Tree, Gradient Boosting, Extra Trees), ExperienceYearsInCurrentRole, NumCompaniesWorked, and Age emerge as the top consensus features, appearing in the top 3 of every plot. These employee tenure and background metrics consistently explain the most variance in Salesforce effectiveness clusters.

We can also suggest the optimal range for the features arrived at through consensus above. The ranges have been defined for best performing clusters (0 and 2).

Step 4 – Final step 4, is a micro-dive of step 3.

We can use a combination of Ensemble model and global SHAP to isolate most important features by clusters.

The following stacking classifier has been used for our model.

Exploded SHAP map for Cluster 0

Importance in the business context:

We believe that this statistical model can be used in several contexts with slight tweaking.

It not only helps to meaningfully cluster the current set of employees considering several dimensions but also allows to isolate most important features/factors needed for them to succeed (or fail) in their current role.

This model also serves to uncover the ingredients (or features) which go into creating a performing SMs. This can be used for grade changes/promotion and while hiring the next set of people.

Reference Material:

Download the notebook from

https://github.com/remixwithkj/AIMLDL/tree/salesforce-effectiveness-and-most-important-factors-for-explained

Authors:

https://www.linkedin.com/in/kumarj

https://www.linkedin.com/in/dhruv-lalani