by Anh H. Nguyen
Happy new year! Oh wait, it's almost May...
(let me try again.)
Long time no see, dear readers! Ever since we ended the month of January on a high note with a trip and a newsletter summing up our 2020 journey, it has been pretty quiet around here. You may wonder what we have been up to for the last few months, and what we have in store for 2021. It may not look like it, but things have been heating up at Palexy recently and we do have a few exciting revelations in place. First of all, we are in the process of revamping our website in order to create a better experience for visitors. Please stay tuned and check back in a few weeks! More importantly, our CS team and engineer team have joined forces to develop a whole new metric. It is called EoS (effectiveness of staff) and it may potentially be a game changer for retailers. So what is EoS, and how does it work?
Recall versus Precision: the old tug-of-war
We brand ourselves (rightly) as an absolute B-to-B service provider, nevertheless Palexy is first and foremost a technology company. Machine learning is our lifeblood, the fundament of our continuing success, and we have the constantly improving accuracy to prove it. At the end of the day, however, it is the humans behind the technology that matter the most. Our data engineers are skillful for sure, but more importantly, they are deeply passionate about their work. It is this last part that helped give birth to EoS.
Oftentimes, "real world" data are brought to data engineers who look for trends and build algorithms to interpret said data. It is their job to make those raw data more accessible to enterprises, and our team, if we may say so, routinely do a superb job of it. In some rare cases, however, the data scientists have a flash of insight, or inspiration that comes from within their more technological expertise. They then modify it to fit in with the wishes and needs of the "real world". That is how EoS came to be.
Many machine learning professionals would testify that of the many confusing concepts in their field, Precision and Recall rank pretty highly on the difficulty scale. People usually could differentiate between those two easily, it is when they need to identify exactly what these two mean that they run into troubles. It turns out that half-baked ideas and definitions of Precision and Recall float around, baffling the baffled even more. So here is a super mini crash course of Precision versus Recall for the uninitiated.
Precision is the ratio of correct instances that were retrieved/all retrieved instances.
Recall is the ratio of correct instances that were retrieved/all correct instances.
Both of these revolve around correctness but in different ways.
The problem that Precision versus Recall poses for data scientists is that these two are like jealous sisters. They look quite close but hate each other's guts. You cannot have too much of one without sacrificing the other. To balance the tradeoff between Precision and Recall and achieve a good fit, data scientists came up with F Score, which unifies these two into one single metric. Of course in certain cases, it might be helpful to factor in Precision and Recall as well. But F Score is a good, solid step to evaluate machine learning models. To go even further, F1 Score is the weighted average of Precision and Recall and often regarded as the most useful parameter when it comes to unequal class distribution.
At this point, you may scratch your head and think: this is pretty interesting and all, but how does it help a retailer like me? What does it have to do with assessing my staff?
Do not worry, we are getting close!
Choosing between opposites
Think about population-wide testing of Covid-19, for example. Let's say you had three test kits at your disposal and 1000 subjects, among which 10 were positive.
Test kit A could detect the largest number of positive patients. Test kit A identified all 1000 subjects as positive. Test A was the recall champion.
Test kit B focused on exactitude. Every single subject test B identified as positive was indeed positive. Test B could only detected 1 subject though. Test B won on the precision front.
Test C identified 20 patients as positive, 8 of whom were actually Covid-19 carrier.
Test C seems like the best, doesn't it? We instinctively think so, but let's quantify it.
Test A's F1 Score is 0.0198.
Test B's F1 Score is 0.1818.
Test C's F1 Score is 0.5333.
Let's hope your government chose test C!
That was an extreme example for the sake of demonstration. But dilemmas like that abound in daily life, and finding the definite optimal point is not always so straightforward. Provided you have a fixed budget, should you buy one first-rate pair of shoes or 5 subpar ones? Should you accept an immensely well-paid but also extremely stressful job, or settle for an easy one with low pay? When you are running late for a critical meeting and your body is pumped with adrenalin, should you drive extra cautiously to avoid accidents or rush to make it? In each situation, the pros and cons lie diagonally, counteracting one another, and sometimes you have only a split second to make a decision. Imagine how helpful it would be if every conundrum in life comes with its own scores!
The Catch-22 of staff performance in retail
The test kits in the above example are fairly simple. Test kit A went wide (Recall). Test kit B went deep (Precision). Test kit C landed at a spot wide enough and deep enough to be chosen (F1 Score). But as we all know, retail staff are people and people are a tad more complex. How would one go about weighing various human attributes to arrive at the composition of the ideal worker?
After 2 years of working with retailers of all sizes in Vietnam, we narrowed it down to two components:
Staff interaction rate (IR): the quantity of customer engagements. The more the merrier! (Going wide)
Staff conversion rate (CVR): the quality of customer engagements. How skillful the staff are at converting shoppers to buyers. (Going deep)
Obeying the laws of physics, it is nearly impossible for retail personnels to excel at both. They could either try to connect with as many customers as possible, or devote their time and energy to a few. Since they are only human, top IR and top CVR could not coexist, which if you have noticed, is quite similar to the problem of Precision versus Recall. Following the same logic, the Customer Success team at Palexy hypothesized that there existed a point of equilibrium where staff effectiveness was the highest. Like the F1 Score, it would be the unifier of two opposing aspects.
Introducing the EoS
To calculate that metric, we needed two ingredients: the right IR and the right CVR. It goes without saying that both needed to be absolutely on the mark.
Thankfully, we already had them owing to our proprietary softwares.
Using the same formula as the F1 Score, we delivered the EoS (effectiveness of staff), an quantifiable compound metric that harmoniously blends two contradictory features of retail staff. With this new metric, retailers could select the best framework for their staffing strategy going forward.
We then tested the EoS in numerous stores and a pattern emerged: the higher the EoS, the better the stores were doing in terms of overall conversion rate, sales, and customer satisfaction. A low EoS alerted the management and guided them to underlying problems. The EoS could also be used as a new KPI for retail stores, laying the groundwork for more constructive adjustments to come.
We are confident in the promising potential of the EoS, but it is not the last of our inventions by any means. To us, data are not just numbers on a screen. Bent into the right shapes and viewed through the right lenses, they are revealing, vivid, dynamic, alive. At Palexy, we analyze, shake up, reconfigure, and play with data all day long. Bring your data to us today, and be prepared for the trove of wonders they surrender!