EP15: Why Data Normalization Costs Consumer Brands Millions in Sales

We dive deep into the complexities of data management within the consumer goods industry, focusing on how brands can connect data across retail, e-commerce, and supply chain partners.

Transcript

Abby Carruthers [00:00:02]:

From Alloy AI, this is shelf life. Today, we're doing a deep dive on the nuances of data in the consumer goods industry. Today, brands connect to dozens, if not hundreds of partners. How can you look across all of your retail e commerce distributors and supply chain partners to get a complete picture of your business? I'm your host, Abby Carruthers, product manager at Alloy AI. We'll be back with two of my alloy colleagues. Manfred Reiche, subject matter expert in CPG data, and Matthew Nias, engineering team lead right after this. As a consumer brand, you. You connect with dozens of external partners and internal systems to get a complete picture of your business.

Abby Carruthers [00:00:58]:

Each one is different. Alloy AI makes it easy to connect data from retailers, e commerce, supply chain partners, and even your own ERP, then easily surface insights to drive sales growth. Every day, brands use Alloy AI to see POS trends, measure promotion performance, and make better replenishment decisions with their retail partners. Thats why were trusted by Bicyde, Crayola, Valvoline, Melissa and Doug Bosch, and many more. Get a demo at alloy AI today. All right, Manfred, Matthew, welcome to shelf life.

Matthew Nyhus [00:01:30]:

Great to be here. Thanks for having me.

Manfred Reiche [00:01:33]:

Good to see you, Abby.

Abby Carruthers [00:01:34]:

Good to see you, too. Okay, to start us off, could you guys share for our listeners just a little bit about your professional backgrounds and how you came to work with consumer good data?

Manfred Reiche [00:01:45]:

So I started my career in consulting, implementing SAP, these massive projects that involve configuring software to help companies operate more efficiently and mostly keep track of their finances. I was kind of lucky at the time. Many of my projects involved working with master data teams. I didn't know what it took to configure a software to make it work, both for accounting purposes and for actually, like, running a business. So I did that for a few years. I actually traveled to, like, ten different states, rolling out SAP, and then whenever I joined, I joined alloy about, you know, six years ago, and I stumbled upon the same need, master data configuration and all these different things. So almost a decade of working with data across the board, and I don't know, I happen to be one of those two people that really likes this topic.

Abby Carruthers [00:02:41]:

Love that. A unique trait. And, Matthew, what about you?

Matthew Nyhus [00:02:46]:

Yeah, I came here straight out of university, so I went to UBC right here in Vancouver. Worked at some pretty big tech companies, but really liked alloy. Being small, being a little scrappy, and working on really interesting problems, and being on a small team, getting a lot of ownership over those. One of the first big projects that I worked on when I was here was how to ingest really generic data and have standard validations for that. And then right after that, working on a new product model that allowed much more transparent product matching to happen, which I think we'll talk about a lot in depth today. And I really enjoyed the technical aspect of those problems, getting a lot of ownership and also seeing the impact of that with customers, working with customers, trying to handle all the edge cases and really feeling a lot of ownership over. Really interesting.

Abby Carruthers [00:03:28]:

Awesome. Well, we're definitely going to want to dive into those details. Okay, so zooming out today, we're talking about data normalization. Could one of you just give us a high level of view of what you understand data normalization to be and why it's something that consumer brands should care about at all?

Manfred Reiche [00:03:44]:

I've worked with normalization for almost a decade, and I've never stopped to think about what it actually means. So I actually asked chat to PT this morning, as everyone does nowadays. Right? It gave me a super long answer. I'm not going to read the whole answer because I actually only liked one little piece of it. It says, normalization is the process of trying to make data more informative, consistent, comparable, or standardized. It's very generic. The standardization piece is what I like because chat. GPT then outlines these four or five different applications of normalization, signal processing, data science, and all these statistical things.

Manfred Reiche [00:04:23]:

But it missed what I think is important for retail companies, which I usually describe as a multi language translation layer. So part of what normalization does when you think about standardizing retail data is that every retailer speaks their own unique language. And I'm 99% sure it's going to be different than the language a brand usually uses internally. So normalization really means standardizing all these different languages into a common language that can help companies analyze data in one place, rather than having 17 different excel sheets in different languages that only apply one retailer at a time. So, yeah, a little bit of what I think it means and what it does.

Matthew Nyhus [00:05:08]:

And I'll add that it's not just the CPG brands who are our customers or the retailers. There's also distributors, there are data providers and data harmonizers. There's category data that we get. There's so many different places that our customers are now getting data from. And just like Manfred said, they're all reporting it in different languages and not just in products or locations. We'll get to metrics and time. There's all these different dimensions that people are reporting in their own unique language. And to have a global view of what's happening within your organization.

Matthew Nyhus [00:05:39]:

You really need to do that normalization across a lot of different dimensions across many data providers.

Abby Carruthers [00:05:45]:

So you both mentioned language and translating a few times. Now, what's really the benefit of speaking one language? I know, Manfred, you said in one place, but these companies, what are the insights and the outcomes that they can unlock from having that data be in one consistent language?

Manfred Reiche [00:06:05]:

So if you think about running a business, when you sell consumer packaged goods, you are going to want to sell in as many retailers as possible. So from a business standpoint, you don't actually care about where you're selling. You're trying to sell as much as you can across the board. You could sell at Target, you could sell at Walmart, you can sell it Best Buy Amazon, and you're really trying to open up channels every year and open up many sales streams. When you then consider data language, all these systems are integrated within their own walls to speak their own unique language. So I was blown away a few years ago when I found out that most brands make decisions with only using their own data. What did they send into these retailers? You actually have no idea what the consumer ended up doing at the shelf at Target or at the website online with Amazon. And the reason people or brands were nothing analyzing their business based on the consumer is because they didn't know how to translate between all these data sources.

Manfred Reiche [00:07:09]:

Right? Like there was no, let's call it no Duolingo, right? No. No translation layer between Walmart language, target language, Amazon language into your own. So I think that the analysis that they're now able to do by understanding the consumer and not just looking at the, you know, a very segmented world of one retailer at a time. But no, what are people doing across the whole us, across these 20,000 plus points of sale, helps them really monitor what's actually going on at the store and react versus having a siloed team where maybe your target team talks to the Walmart team and maybe they can kind of align. But if you have people running on different excel sheets that can't speak that same language, it's like doing business with Asia and you don't speak Mandarin, you're not going to be able to have a conversation.

Abby Carruthers [00:08:04]:

Awesome. And then, Matthew, can you tell us from a technical perspective, why is this translation something that's hard to do?

Matthew Nyhus [00:08:12]:

I think there's multiple levels of it. From the technical warehouse data warehouse perspective, we obviously have to ingest the data into a common language just to be able to store it. You don't want a different database schema for. For every single one of your different data providers. And so there's some form of transformation that has to happen just in order to ingest the data. But then on top of that, I think really what Manfred is talking about is how we actually use that data afterwards. Everything is being reported under different identifiers, maybe in different time granularities. People are saying that they have different identifiers for what's really the same product, different descriptions of locations that are really the same location.

Matthew Nyhus [00:08:51]:

And if I'm trying to view any of that, I need a common language, just like Manfred said. And we'll get into some of the details later about how we do that on a more technical basis. But really, this challenge is first transforming the data so that we can physically store it, but then also making all these relationships between common products, between common locations, having roll up and desegregation, so you can see things at different time granularities converting, so that all of our calculations are consistent for metrics. Those are all the things that we're thinking about in order to view data within this common language. On the technical side.

Abby Carruthers [00:09:21]:

So you've mentioned a few different dimensions there. I've heard you both talk about. Heard you both talk about products, locations, time. And there was a fourth. You're gonna have to remind me, what was it?

Matthew Nyhus [00:09:33]:

Metrics.

Abby Carruthers [00:09:35]:

Metrics. Okay. So I wanna work through each of those, starting with products. And I know, Manny, that even master data management on the product side is something that you and I have had many passion discussions about over the years. Can you share a few examples about what makes product normalization of product loss data management so difficult for consumer brands?

Manfred Reiche [00:09:55]:

So it's actually a combination of three things. Matthew is going to talk all about the technology because I think his team has done an awesome job on the tech. So I'm going to focus on the people and the process side of master data management, because let's say that, Abby, you work for Willy Wonka's chocolate factory, and you want to sell a brand new bar of chocolate, right? You worked super hard with Willy Wonka himself. You've designed this delicious flavor of something. You want to sell it, you want to produce it, you want to package it, you want to get it out the door. Well, you have to configure this product within your walls and create what they call a product mess. Right? Like, it's going to get assigned a SKu number for simplicity. Let's call this bar of chocolate SKU 123, which is Abby's secret flavor now you want to start selling this at Target, at Amazon, at Walmart.

Manfred Reiche [00:10:49]:

Your sales team has to go to these customers and you have this new master data item, the SKU. You have to go convince Walmart and these retailers to place an order on your behalf. They're going to create their own SKU number internally to place this order for 1000 bars of chocolate from you. So there's all these steps involved in creating. For every item you want to sell, you need a product master, you need to understand the recipe, you need to understand how you're going to sell this. And typically that's just what people refer to as master data management. You in the Willy Wonka world have a team dedicated to making sure that every item is uniquely identified so you can run your business. As soon as you go outside your walls and you start talking to all these other retailers, it gets very difficult.

Manfred Reiche [00:11:41]:

You bring in more people. It's no longer just your team who manages your master data. You have to talk to the Walmart team, the target team. You have to be very clear in understanding what they are going to call your item. Keeping a cross reference sheet between these two items. There are many technical components that are complicated, but even the fact that you have to loop in a bunch of people, you need very defined processes to create the SKU to make sure you can sell it under the right number. It's pretty involved in the amount of steps you have to go through to properly run a business and configure the software to even be available for you.

Abby Carruthers [00:12:16]:

To do that, can you share any examples or particular stories you've come across that have made product master data management challenging to solve?

Manfred Reiche [00:12:27]:

So, like everything with software, the hard part comes with all the exceptions you might encounter. Right, so I'll keep telling the story about we're launching Abby's new secret recipe of Willy Wonka chocolate and you've now convinced Walmart, target and Amazon to place orders from you. Your supply chain team is going to sell cases of twelve bars at a time, right? So when they order, people are going to order in cases. Turns out, you know, target just wants to order differently. They prefer ordering in. Each is because they want more flexibility and you want your product at the store. So although you have your entire system configured to know that, like when one word places an order, it represents a case of twelve units. Turns out that now target wants to be treated differently.

Manfred Reiche [00:13:12]:

So you have to configure an exception somewhere in your system to remind you that when target orders ten of your chocolate bars, it doesn't mean 120 bars in ten cases, it actually means ten bars. And if you don't keep track of this, you can imagine how you can just ship significantly more product. You can affect all your margins if you don't charge it accordingly. And now you're stuffing a channel because you haven't properly managed the master data associated to this wine.

Abby Carruthers [00:13:43]:

Actually, reminding me of a recent example where we were working with one brand where they called an each a pack and a case an each. And it's just another great example of not only the technical challenge, but the language challenge on top of that.

Manfred Reiche [00:13:58]:

We have a term for that here. Like, we always language police ourselves across the board. Right? Like to your point, what is an each? Yeah, you know, I will assume that each represents one individual unit, 1 bar of chocolate, some people in supply chain, and each means a case, some people refer to it as a pallet, right? So even, even the words in English we use can trick us into confusing ourselves on like, what we actually mean. So we try to always language police ourselves and make sure that we are speaking in numbers, not in words, and confirming, okay, twelve bars or one case of twelve bars. It's tricky.

Abby Carruthers [00:14:36]:

Absolutely. Numbers never lie. So Matthew, what does it mean when we add unit conversion as another layer on top of this technical challenge of mapping products?

Matthew Nyhus [00:14:47]:

Yeah, so the way we think about product matching is that you're basically drawing links between products that in the real world are the same product. So Walmart might have their own identifier for a product that's really that same bar of chocolate. And you have your own identifier with a Wonka identifier, that's that same bar of chocolate. And with those links we are copying over attributes. Walmart has all of this colorful descriptions and store information and selling information about their chocolate in their own identifier. And then you might provide your own description, your category, your subcategory, some information about Abby's secret flavor on your own side. And whichever way you're viewing the data, we want you to see all of that rich information. If you're viewing it in terms of Walmart sales data or your own internal SKU, we want you to see all of that descriptions.

Matthew Nyhus [00:15:37]:

With unit conversion, that becomes more complicated because some of those attributes are going to be shared and some aren't. You can imagine that description is going to be relevant whether it's in each's or a case of chocolate, but maybe a conversion factor that there's twelve bars in a case that's only going to be relevant on a case. And in the inverse, one over twelve will only be relevant on the each. So we have to start distinguishing attributes and their purpose within our system. What is relevant cross unit conversion. What's specific to a specific type of unit conversion? And we code this all in. So we have logic which is saying, what kind of attribute is this? What is the nature of this match? As we're doing all of this attribute association so that you're still viewing relevant actor data in your dashboards?

Abby Carruthers [00:16:18]:

Sounds like unit of measure conversion is a fun one then. What else? Any other examples of fun edge cases that make this a tricky problem to solve?

Manfred Reiche [00:16:28]:

I have so many. So let's talk about product rollovers. Okay, so we'll keep telling the story. You just introduced your new recipe and you like SKU 123. It's actually done pretty well. But someone told you that if packaging was red instead of blue, you actually sell more. People tend to buy red chocolate bars more than blue. So you go back to Willy Wonka, you convince them to change packaging.

Manfred Reiche [00:16:53]:

Well, it's the same chocolate bar. So are you going to change your internal number for it or are you going to keep manufacturing? SKU 123 becomes a decision, right. We go back to your process. What does your rulebook say about when you introduce a new SKU and when you don't? To keep it simple, we'll say your team says, no, no, no, it's the same recipe. You know, packaging is just, you know, an afterthought. We're going to keep it skew. One, two, three. We go back to your retailers, we go back to target, Walmart, Amazon.

Manfred Reiche [00:17:22]:

You think they're going to follow your best practice and your process? You know, we talked about target was the one with the tricky to measure. Let's say they actually accept and they keep the same number. Right. This time they're going to be easy on you. But Walmart doesn't like it. They want their internal system to differentiate a blue SkU from a red SKU, even if it's the same chocolate bar. So now you're dealing with Walmart's going to have two internal identifiers for your product. What are you starting to do? Right, like, it's still the same bar of chocolate in your system.

Manfred Reiche [00:17:54]:

It represents one identifier of a master data item. Walmart orders are now reported on two different ones. How do you handle that?

Abby Carruthers [00:18:03]:

How do you handle that? Tell us, Matthew.

Matthew Nyhus [00:18:06]:

Yeah, so this is another example of what I think of as the long tail of product matches. When you're thinking about matching products, you can think okay, Walmart reports a UPC, I report a UPC. That's all great. Maybe Walmart reports it as a different thing. Walmart UPC. And you have UPC. That's also pretty easy. You just remap the column names, all these other examples, unit conversion, these rotating skews are all these edge cases that we've encountered and that we've built systems to handle.

Matthew Nyhus [00:18:34]:

In Manfred's example, if you have a different granularity that you're reporting on, you have a different SKU for the red versus blue packaging, and Walmart doesn't, then you need a many to one match where many of your products map to one Walmart product. And when you're viewing the Walmart data, you have to be able to disaggregate the data into your skus or have a clean roll up. So you're still seeing the sales data accurately within some higher category level attribute from your own perspective. Similarly, we've seen retailers that actually rotate skus. Maybe it's seasonal, maybe there's other reasons. And so target may be reusing skus maybe every quarter, maybe every year. And those have to map all to the same of your internal SKU. You have many target products that are matching one of your own vendor product, and so you have many to ones in both directions.

Matthew Nyhus [00:19:21]:

And in all these cases, even as these product masters are changing, you have to have consistency. You have to be able to map historically as well as next quarter's sales so that you can look back two years of history and all see your data that's fixed and still accurate. Some other examples that we have is manual overrides. Even with all of this handling that we have, the reality is that data is messy. We see this all over the place, and we haven't designed a system that is supposed to handle 100% of it automatically, because that's just not realistic. The way that you really prevent this is that you build really good systems to have users come in, fix mistakes that happen. And even as you're receiving consistently incorrect values from retailers, your fixes still persist. So we have ways to make sure that when you're manually matching products that Walmart consistently insists is some other product and they're wrong, that your values are actually true, so that you're seeing all of your analysis still be accurate in your own system and the tail continues.

Matthew Nyhus [00:20:16]:

Obviously, there's more and more edge cases as you go down this, and there's lots of things to think about.

Abby Carruthers [00:20:21]:

Edge cases what make it fun, right?

Matthew Nyhus [00:20:23]:

Exactly.

Abby Carruthers [00:20:24]:

So I think it's interesting you talk there about the roll up of the data at different levels because that's starting to get into how we use it. Could you talk about why somebody might want to see the data about packaging level, the red versus blue versus. Somebody else might want to just look at the data, total sales for that chocolate bar.

Manfred Reiche [00:20:44]:

So it depends on who you are. The red and blue is a good example. I think Matthew reminded me of where this is actually very common is with seasonal skus. If you imagine going into the chocolate world, you're going to sell very well in Valentine's, you're going to sell very well in the, you know, October for Halloween and then usually around Christmas. And every year you might have a slightly different packaging. Right. So let's say that last year was blue. This year is red.

Manfred Reiche [00:21:14]:

If you are a sales analyst who wants to understand, right. I need to be able to predict how much I'm going to sell next season versus what I did last season. You actually don't care about the red and blue. It actually gets in your way of doing analysis because Matthew mentioned the inability to track historical data. If you know you're going to sell red this year and you're looking for sales on the red SKU, there's nothing there because you sold it on the blue SkU last year. So that's where the model that Matthew designed, that he can support multiple Walmart ids and roll it up to one common SKU. For the sales guy, it's exactly what he needed. Right? What did we sell last year? If this chocolate recipe, no matter the color, to do this analysis forecast for next year.

Manfred Reiche [00:21:59]:

If you're a supply chain person, you're on the hook for producing the right product. You actually do care of whether the orders are coming in red or the blue Skus. So what's neat about our product model, and I don't think Matthew has hyped it up enough, is we are designed to support all these cases. At the end of the day, we operate a database. Database are pretty rigid. But Matthew Steam has been super creative at designing these databases to be custom built for consumer packaged goods to easily handle these scenarios. So the sales guy can look at it with the aggregated number, the supply chain guy can look at it depending on what he wants with a different identifier without actually having to ever write a SQL query. It just works out of the box because of how we designed it and how we read the data from the first place.

Matthew Nyhus [00:22:49]:

And one of the things to throw in, maybe the marketer is trying to figure out did my red or my blue packaging do better? And they don't actually care about whether it was the single individually packaged thing or it was a whole bouquet display of your chocolates. They really care out of all of my red sales versus all of my blue sales, which one did better. And we can also do that. It's completely flexible to granularity. So you can be rolling up across different product skus and just care about the one feature, what's the color of my packaging? And look at that year over year comparison.

Abby Carruthers [00:23:19]:

I love that you mentioned displays there, Matthew, and the theme of long tail edge cases. I know that's another fun one. Displays are shipped in as a display and sold as individual units. We don't necessarily have to go into that right now. Any other examples you want to share on products before we move on to locations?

Manfred Reiche [00:23:37]:

Not necessarily an example, but I'll keep trying to praise Matthew's team and how we've designed this because a lot of what we've talked about has focused on just the architecture design to support multiple to many mappings. Right. The red and the blue versus one roll up. We support that out of the box. We can also support the fact that target and Walmart are going to report a different number and maybe they don't do the red and the blue. We support that. Right. So our database does it automatically.

Manfred Reiche [00:24:06]:

But I think he very quickly mentioned something I wanted it double down on, which is the fact that we actually provide automatic product normalization for many of our big retailer data feeds. Right. So it's in my mind it's two steps. Is the system designed to support all the different levels of granularity? Right. Can I speak all the languages? Check. And then does a human have to go to find the thesaurus? Right. The Duolingo. We tried because we've done this for so many years.

Manfred Reiche [00:24:35]:

We know that Walmart and target, Amazon provide their translation buried deep in their portals. So we actually designed an automatic process so that out of the box, you know, Abby, in your willy Wonka world can get an automatically normalized report. And he mentioned exceptions. So our translation layer is only as good as what we find in there and people are going to point to the wrong thing. It was like a year, year long project of how we learned to be able to override these exceptions. And it's so powerful. When I go back to the people that process, most companies do not have the right people to be building translation layers across all their retail languages. It's insane.

Manfred Reiche [00:25:24]:

If you imagine the process requires you to go do this, every product, every retailer. Right. Like, the amount of work grows exponentially. So the fact that we can automatically do this, and then your only interaction is when we know there's an issue, you override an exception. We've cut your workload, I don't know, down to like 95%. Right. Like, we can already do most of this, and you can have the analysis for your marketing team, your business team, your supply chain team, all in one place. It's kind of fun to think about all that we built over the last six years, because I used to be the human doing a lot of this by hand until these guys built the tech to do this.

Manfred Reiche [00:26:03]:

And it's awesome that we can offer it out of the box.

Abby Carruthers [00:26:06]:

Absolutely. And I think those escape patches that you mentioned, Matthew, also was really key there. I know we've all struggled with when data isn't correct, data has problems, data has messiness, and making sure you have the correct tools in place to be able to work around those as well. We spent a lot of time talking about, I want to lead us on to the second dimension of four. Got a lot to get through here. We talked about location normalizing as well. So help me understand, what does that mean relative to product normalization? How is it similar? How is it different? What are the challenges involved there?

Matthew Nyhus [00:26:42]:

From a technical side, it's quite similar. You still have the same concept, which is different places, different data providers are reporting locations under different identifiers. They are reporting them in different ways. Maybe one is reporting Walmart as the entire retailer, and they say, you have a bunch of sales or shipments into Walmart, and they can't be more specific than that. And somebody else is saying, actually, this specific Walmart location with this id is selling a lot of product, and you really care about that. And then someone else is saying, actually, this Walmart location, we don't know what id it is at this address, is selling a lot, and maybe the latter two are talking about the same place. And for the first case, you have to be disaggregating your data. You have to be breaking down those sales into where the sales are actually happening within different Walmart locations.

Matthew Nyhus [00:27:30]:

And so it's the same core problem, which is they're speaking different languages about the same real world thing. In this case, a location rather than a product, and you want to be viewing the data consistently. I don't want to have to spin up a dashboard and be in my head summing different rows because the location identifiers are slightly different from different data. Providers.

Abby Carruthers [00:27:47]:

And so, Manny, what are the business problems that users are looking to analyze when they're trying to use data about a store or a warehouse or some other real world location? What are they trying to look into when they're using data that's coming from different sources that might be reported in these different languages?

Manfred Reiche [00:28:04]:

So if we go back to how I was defining normalization, which is standardizing data sets so they can be combined together, you can imagine how trying to analyze your e commerce business is very different than trying to analyze your retail business because, you know, Target has 3000 stores. Amazon, you can actually ship to unlimited postal areas. So how do you know where you're doing? Well, right. But the fact that we can understand the geographical location of all those stores as well as we can understand all the zip codes where you're sending your Amazon products allow a business team to analyze how they're doing in Texas versus California. So we established that normalized language so that you can start combining these data sets and abstracting from individual stores to more regions and different areas that you might want to analyze. You can even tag, you know, some people have their sales teams divided up by regions, right? So if you wanted to analyze, you know, you have a west coast team, an east coast team, and maybe some in the south. So three field reps or like, you know, representatives, you can tag all our master data by state with the field rep. And now within, like in the same dashboard, you can just analyze how are they all performing versus each other by region.

Manfred Reiche [00:29:18]:

So it's all about making it seamless to report the data in a standard language in one place.

Abby Carruthers [00:29:25]:

And so you talked there about retail stores and e commerce, you know, consumer postal codes. What about fire up the supply chain? What are the challenges you face when it comes to normalizing data around distribution centers, warehouses, production facilities?

Manfred Reiche [00:29:45]:

So it's very, it's pretty straightforward to understand what's happening at the store because retailers report usually down that level. They tell you unit sales, they tell you inventory there allows you to analyze performance in stock rates. All that. When you start going up the supply chain and you are trying to take action to correct issues, you need to know who services that subsidize stores because you don't necessarily place order at all thousands of Walmart locations, there's actually between 30 and 50 walmart DC's that you need to service. The first step, we start building what alloy calls a supply chain graph, right? So we have the ability of connecting all the stores to where they're being serviced from within the retailer and then if you're going all the way upstream to your warehouses, some people have one warehouse, they have very simple distribution network, others have 4610. Right? So if you want to start combining, you know, before you take action on Walmart, do you have enough inventory to respond to something? It's important to translate all these insights into the language that your team speaks because usually your internal team is going to be aligned by your internal warehouses, not by, you know, Walmart DC's, not by Walmart stores. So it's important to be able to roll these up to the, you know, the right language to take action.

Abby Carruthers [00:31:09]:

So Manfred, you mentioned Amazon and Amazon postal codes. I know one of the data quirks that we've had challenges with in the past is the fact that, you know, these days with e commerce, what is a store? What is an ecommerce sale, right? You've got tons of different types of ways that an e commerce sale can be fulfilled, can be by online pickup in store. It can be shipped from store, shipped from a DC. So when you're thinking about location normalization, how does that world of fulfillment methods come into that solution?

Manfred Reiche [00:31:38]:

Yeah, you're bringing up a tricky subject because when I started at alloy, it was very easy to determine a brick and mortar sale versus an e commerce sale. Usually e commerce were associated to specific e commerce warehouses in the retailer world, right? So if you place an order on Target.com, comma, they would fulfill it from a specific e commerce warehouse. Super easy. You know, through the COVID push and through all these modernization of e commerce, like you say, you can now place an order online and pick it up at a store. So even though the good is leaving physical store, it should be actually recognized as an e commerce sale, not a store sale. So it's, you know, talking about how we modernize our database to keep up with the industry. This is actively something we are exploring on how to, you know, tag things. It's no longer just enough to know the store number, where the store happened.

Manfred Reiche [00:32:33]:

You need to know the fulfillment type of that, you know, sale so that you can properly associate your e commerce channels to your store channels.

Abby Carruthers [00:32:42]:

Absolutely. And then it gets even more complicated when you're looking at inventory. Right. Because where the inventory is deducted from depends on the type of sale as well.

Manfred Reiche [00:32:51]:

Yeah. So this goes back to our concept rate of language policing what used to be super easy designation of e commerce sales on the location dimension. It's now hard for us to know, like a given location can actually contribute to two types of sales. Now, a brick and mortar sale and an e commerce sale. Now that I'm talking about language policing, we've talked about products, we've talked about locations. There's another level of harmonization that I think people often don't think about, which is metric names. So let me ask you this. What is net sales?

Abby Carruthers [00:33:23]:

Putting me on the spot here, Manny. Net sales, I would say, well, firstly, assuming we're talking about point of sale sales or the consumer sale, I'd say it's the total volume consumable in a certain day, not including returns. So net of returns bringing that product back.

Manfred Reiche [00:33:44]:

I love it. You went with the alloy definition, which is what I call the supply chain definition. You are tracking the flow of goods. What left a store versus what was returned back into the store gives you the net outflow of goods. You talk to a business team. When you hear the word net, a lot of people usually think about, you know, in finance terms, it's what it brought in, in money versus what it cost. Right? So people will think about a different definition of net. And you can imagine, right, we've talked about all these portals speaking different languages.

Manfred Reiche [00:34:17]:

When you see the word net sales at each one of these portals, we have to make sure we know what we're looking at. And, like, our team is actually trained, as we designed our feeds, to confirm if a sales number includes returns. Does it include tax? Does it not include tax? Right. There's all these questions of things that, like, we're trying to get to the supply chain definition of sales, because that's our, we're trying to track the flow of goods first, then the correct flow of money. But when we would call it margin, when you want to track the money you made versus what it cost you. Right. There's a third dimension of normalization around metric names that we often don't think about. But you have to be really careful so that you're comparing apples to apples across all these data sources, not just what you think is the same number, and then you'd be completely underreporting somewhere.

Abby Carruthers [00:35:06]:

And so what are some of the things we can do if there's one retailer that reports a certain metric and another retailer doesn't report that metric, but that's a key KPI that we're wanting to analyze. What can we do to make sure that we have that data from both of those retailers?

Matthew Nyhus [00:35:20]:

It's a great question. One of the main approaches that we do to solve that is to derive metrics. We have a full list of hundreds of metrics that we can ingest, and they can both be ingested raw, which means that the retailer directly reports it. And this is what Manfred's talking about. Maybe the retailer, their definition lines up with our internal definition. That metric, we'll import it raw as that metric. The other option is to derive it from underlying raw or derived metrics. And you can imagine it like a graph where you're building up a series of metrics.

Matthew Nyhus [00:35:53]:

You start with whatever is your raw metrics that the retailer provides. And then based on satisfying the requirements, the dependencies of any potentially derived metrics, you derive those. A really simple example would be in stock versus out of stock percentage. Maybe one retailer provides in stock and that it provides out of stock. Well, they're just the inverse of each other, one minus the other value. And so whichever one the retailer reports, we derive its flip version. Maybe some retailers don't provide in stock or out of stock. Instead, they provide their stock values, how many in stock units they have, and a minimum that's required.

Matthew Nyhus [00:36:26]:

They need at least ten units to be treated as in stock. Well, we can calculate what is. Is that stored in stock or not? That's just a Boolean value. And then we can take a percentage across all of their locations that are supposed to supply that product, and we can calculate in stock percentage from that. And we can calculate out a stock percentage. So you can see these get more and more complicated, from just one minus value to having some per location value that needs to roll up across tracked items and locations across lots of different destinations. And from that, you build this graph of derived metrics. That is a really rich, consistent, normalized view of both metrics that retailers provide, as well as all these other ones that we've added in.

Abby Carruthers [00:37:09]:

So you stop needing to rely on the retailer necessarily reporting the data to get that particular insight.

Matthew Nyhus [00:37:15]:

Yeah, I mean, at a certain point, we have to rely on the retailer to give us some values because that's what we derive values from. But we don't need them to duplicate themselves. They only need to provide one of the values. And we'll do the math on the back end to fill in that picture.

Abby Carruthers [00:37:29]:

That's great. So any customer stories you can share, many where people have benefited from that capability.

Manfred Reiche [00:37:35]:

So like you're saying retailers have different levels of maturity in how they report data, right? So you mentioned one scenario they might not even report in stock percent. It's a very important metric to track, right? Like, rather than just your inventory, knowing your historical trends of in stock is really important. So you now get in alloy, you get a computation that works out of the box. All we need is inventory numbers over a given time range, and we can actually compute. We know whether or not a store should have inventory. We give you a in stock percent that you can track. Now, to have a conversation with a retailer that didn't have the capability in their portal to provide this insight. So now when you go to them and you say, hey, I'm out of stock of this item, they might be like, well, we just ran out, and you like, actually, we know you've been out for four weeks, six weeks, right.

Manfred Reiche [00:38:27]:

And it behooves them to order more. So it's all about bringing the right insight to order more.

Abby Carruthers [00:38:33]:

I've definitely seen that coming. To your retail partners with that insight, with that data, before they bring it to you, is the way to make for happy retailers.

Manfred Reiche [00:38:42]:

They're overwhelmed. They have to manage many, many, many brands in many locations. Right. And if there's anything that alloy can do to provide the right KPI, like you mentioned, in stock, percent is a good one. It's important to be able to give you that insight so they take action.

Abby Carruthers [00:38:55]:

Absolutely. All right, so I'll move us onto our fourth and final dimension that we were talking about normalizing, which is time. So what does that even mean, to normalize across time?

Manfred Reiche [00:39:04]:

Don't we all agree that time is like 60 seconds to a minute? No, 60 minutes to an hour. It's easy, huh? Well, let me introduce the concept of calendars and how they're different than your regular calendars. And every industry will have a different calendar. Some of them decide, like consumer electronics might decide to start the year on a certain month, and suites decide to start a year on a different month. Every retailer usually lines up with their own fiscal calendar. So what we found is that if you at Wonka want to run a report that says, what did I sell last year? The first question should be, what does last year mean? What is your fiscal calendar? We will define it for you according to whether you're April 445 or 554, whatever you want. We will define your calendar so we can speak your internal language and then all the raw data sources, no matter the fiscal calendar that they report in, we bring in so that you can look at it from your correct language.

Abby Carruthers [00:40:07]:

Tell me how that works. Technically, what does it mean to actually shift that data into a different fiscal calendar?

Matthew Nyhus [00:40:13]:

Yeah, on the backend, we just store metric values on particular days. So when you look at the data raw in our data warehouse, it looks pretty simple, but there's a lot going on both to get the data into that form and then also to view it in a helpful way in the UI and dashboards and analysis tools. So when we're ingesting data, there's two things you have to think about, the granularity that the data is reported in and also the cadence that it comes in. So you could have daily data that has a data point for every single day, but it only comes in once a month or once a week. And maybe they are restating data historically, and so maybe they say, hey, actually we told you that you sold ten chocolate bars last Wednesday. Actually, some were returned, and we don't track returns. So we're just going to tell you we sold eight last Wednesday. So that daily data has to be updated, it has to be freshest data that we have, and we'll store it there.

Matthew Nyhus [00:41:03]:

They might also report weekly data. Hey, you sold 100 chocolate bars last week. We don't know whether it was a really good Monday or a really good Saturday, but somewhere in that week we stored it. So we have different ways to disaggregate that once you're reporting in the UI, but we natively support daily and weekly data, and we just store that in our database. And then when you're querying it, depending on the granularity that you're requesting the data in, we'll roll that up. So if you're looking for monthly data, well, based on the fiscal calendar, we will group the days that line up with those months and just aggregate the data, whether it's a sum of sales or some other more complicated metric, across all those data points that line up with those days. If it's weekly data, similar thing if it's daily data. Maybe we have to disaggregate.

Matthew Nyhus [00:41:45]:

We'll do a guess. Hey, probably one 7th of each of the sales are happening on each day. And with those fiscal calendars, the really flexible thing about our backend is that we can support any arbitrary date selection. We don't limit you to, hey, you can look at the last day or the last week or the last four weeks. You can say, hey, I want March 19 in 2020 to April 17. I'm just making updates. In 2022, you can give us any days that you want and we'll happily aggregate the data up and report it to you. And that's what's really powering this, is that these fiscal calendars are automatically converting what does last month mean into specific days based on the partners that you have, based on the fiscal calendar, and then reporting that data and aggregating it up for you.

Abby Carruthers [00:42:28]:

So one of the questions I hear a lot is how do our performance grow year on year? Curious to hear from both of you. When we say year on year, how would you think about defining that? Does that mean the same Walmart week as last year? Week 42 compared to week 42, does it mean the same calendar period? Does it depend on when consumers were shopping last year?

Manfred Reiche [00:42:50]:

It.

Abby Carruthers [00:42:50]:

Was it the week before Thanksgiving or the week after Thanksgiving? Curious to hear how you think about that. Is there one particular way or do we need to support all of those different methods?

Manfred Reiche [00:43:00]:

So I will default to the fiscal definition. Let's say that last week was week 30 in your fiscal calendar. We should be comparing to week 30 in your fiscal calendar last year. That's how usually most of your accounting software is going to be tracking year over year. We're going to try to line up. If your fiscal defines a week starting in Monday, ending on Sunday, we will try to shift all the raw data Monday to Sunday, find out the 30th week of the year and give you values from that same period a year ago. So it doesn't matter if it was a leap year or something crazy like it's not necessarily 365 days ago or 363. We will line up to the fiscal.

Manfred Reiche [00:43:37]:

We have the ability to toggle to a different one. If you're having a conversation with the retailer who has a different definition and they want a slightly different definition at year, you can actually configure that. So going back to all the languages we're able to speak, we're going to try to default to your language first. But you will always have the ability to speak a retailer language to have that conversation.

Abby Carruthers [00:43:58]:

So tell us why all of that matters from a business perspective.

Manfred Reiche [00:44:02]:

So if you are a brand who is starting to utilize retail data for the first time, you are probably operating your analysis with Hewitt's. You have analysts who are downloading data into Excel, okay, small brands or actually most people tend to make decisions at the monthly level, right? Like you don't have enough time in a week to operate at weekly or daily level of insights. So you're going to be downloading reports from each portal usually rolled up to the month. What can get tricky is if your Walmart portal, when you define a month, is defined as four fiscal weeks that preceded your period, right? Four fiscal weeks with 28 days versus a different portal. Right? I'll name dollar general random one. Let's say that when they define last month, they actually mean last calendar month, which happen to have 31 days if you are marrying the two different Excel reports at the monthly level, you're now comparing 31 days in one report with 28 days in another. Your analysis is going to be off. And people stick to monthly level.

Manfred Reiche [00:45:00]:

Because Excel has a maximum row count, you max out a little bit more than a million rows. The beauty of what Matthew mentioned earlier about the importance of daily data is that once you're using our database designed for this, we will actively search for all your data at the daily level. We want the lowest level possible because it's quite easy to roll up from day to week to month, and I can even define what weekend months mean differently. But people shy away from that just because they don't have the tools. But when you come into alloy and you bring in a, you know, data in our database, you're now unlocking the same monthly analysis powered by daily insights like daily data points. And you don't have the problem of comparing 31 days versus 28. We will make sure that you're comparing whatever definition of the month you actually wanted seamlessly. So it's pretty important to be able to make the right decisions.

Manfred Reiche [00:45:54]:

And again, time is one of those dimensions people don't think about because we all assume time is the same. And I've seen people make big mistakes by misconstruing what one means.

Abby Carruthers [00:46:04]:

Absolutely. Those three days can make a big difference. Right? Seems like there's a lot more than meets the eye when it comes to normalization. Wrapping up. If you're speaking to a newer consumer brand company, just getting started with all this, just trying to wrap their arms about it, what would your piece of advice be on where to start?

Manfred Reiche [00:46:22]:

Ask for help. Right. So actually don't. Don't assume. Like, don't run away from this challenge because it's hard. Right? Like, most people shy away from it because for decades it's been impossible to normalize all these different data sources. Like, people are usually bound to analysis within their ERP walls because they're comfortable with their ERP walls because they have it. Teams that'll, you know, give them these insights, but you are way too removed from the end consumer and all the retailers you're working with, with.

Manfred Reiche [00:46:54]:

So ideally, don't be scared. Go one at a time, build these translation layers we're talking about. You now know the four dimensions that really are key to doing this and ask for help. We've done it. It works out of the box. So there's better ways to do this in sight than trying to rebuild all this from scratch.

Abby Carruthers [00:47:13]:

Ask for help. Solid advice. There. And Matthew, any final words of wisdom?

Matthew Nyhus [00:47:17]:

Share from the IT side, I, I would say be really careful with the assumptions that you're making. We split out this into four different types of normalization because we've been doing this for a relatively long time. We have a lot of experience with different sizes of companies that we're working with and hundreds and hundreds of different retailers and how they represent data. And we've come to this idea of product location, time and metrics, and then within that, the representation decisions we've made because of all that experience. And even while we've been doing this, these projects, we've had to revisit those technical decisions that we made. What granularity representing things? What assumptions can we make about the data? And you always have to make some assumptions, otherwise it's impossible to represent anything. But we've had to kind of revisit ideas where we made too strong an assumption because we didn't understand the shape of the data that we'd be seeing in one month or one year. So we'd be really careful with the limitations that you're putting on yourself because there's so many edge cases.

Matthew Nyhus [00:48:12]:

The tail is really long, and at some point you're going to be surprised by what you have to model.

Abby Carruthers [00:48:18]:

The tail is really long. I like that, as well as a little takeaway. That's all we have time for today. Thank you, Manfred. Thank you, Matthew, for sharing your wisdom with us, and we'll see you next time on shelf life.

Product

Technology

Sources

Destinations

Data Management

Sales & Performance Analysis

Inventory Management

Teams

Industries

Topics

Resources

EP15: Why Data Normalization Costs Consumer Brands Millions in Sales

Transcript