Solving The Size Curve Problem - Ordering The Optimal Distribution Of Inventory By Variant


I recently interviewed with a startup that’s tackling the inventory demand forecasting problem, whereby retailers have to forecast the future demand for their products, so they know how much inventory to purchase today. During the interview, I got asked a question that would consume me for the following week. “How would you approach the size curve problem?” That problem goes like this..

The Size Curve Problem

Most retailers sell a catalog of products, where each product has a collection of variants. For example, suppose you sell t-shirts. Your products might include a Beatles t-shirt, a Rolling Stones t-shirt, and a Fleetwood Mac t-shirt. However, for each t-shirt, you offer various sizes: small, medium, large, and xl (these are the variants). Now suppose you design new t-shirt, say, a Kiss t-shirt, and you’d like to place an initial order of 1,000 units. How should you divvy up those 1000 units by variant/size?

This is the size curve problem in a nutshell. In practice, variants can also include things like color, shape, material, style, etc.

Now, you might be thinking this is easy. Just look at past sales and measure the total units sold per variant to get the appropriate relative distribution of demand per variant. ..Not so fast. It turns out that, for various periods, some of your t-shirts were out of stock. Maybe you were so poorly supplied that, there’s never a point in time at which all variants were in stock at the same time.

Okay, so why don’t we just pick 10 days where smalls were in stock, 10 days where mediums were in stock, etc., add up the sales and then compare the relative demand? Again, no. Sales tend to be highly seasonal, so that method will unfairly represent high demand for variants in stock during peak sale days.

See the challenge now?

The Solution

After a few days of scratching my head, a feasible solution hit me. The general approach is to use maximum likelihood estimation, searching for a discrete probability distribution that makes the sales data most likely to have occurred. For a single product, it goes like this..

  1. Index the variants in order, small = 1, medium = 2, large = 3, xl = 4.
  2. Initialize arbitrary probabilities for $ p_1 = 0.25 $, $ p_2 = 0.25 $, $ p_3 $, and $ p_4 = 0.25 $ where $ p_i $ gives the true probability that the next requested variant will be variant $ i $.
  3. Let $ p_{jk} $ be the probability of observing a request for variant $ j $ next when we restrict ourselves to viewing only variants $ j $ and $ k $. Assuming independence, we get $ p_{jk} = \frac{p_j}{p_j + p_k} $.
  4. Let $ x_{jk} $ be the number of $ j $ units sold on days where variants $ j $ and $ k $ were both in stock (and didn’t run out of stock mid-day). Similarly, let $ n_{jk} $ be the combined units sold for variants $ j $ and $ k $ on those days.
  5. $ x_{jk} $ is a Binomial random variable. Thus, if we believe in our discrete probability distribution, the probability of observing $ x_{jk} $ is

$$ \binom {n_{jk}}{x_{jk}}p_{jk}^{x_{jk}}(1-p_{jk})^{n_{jk}-x_{jk}} $$

  1. The probability of all our pair-wise observations will just be the product of their individual probabilities. So, our likelihood function looks like this

$$ L(p,x)=\prod_{jk} \binom {n_{jk}}{x_{jk}}\left( p_{jk}^{x_{jk}}(1-p_{jk})^{n_{jk}-x_{jk}} \right) $$

where $ jk $ spans all distinct pair-wise indexes that we observed.

  1. However, since our goal is merely to find the vector probabilities, $ \bf p $, that maximizes the likelihood function, $ n_{jk} $ and $ x_{jk} $ terms are constants, so we can drop the $ \binom {n_{jk}}{x_{jk}} $ terms.

$$ L(p,x)=\prod_{jk} \left( p_{jk}^{x_{jk}}(1-p_{jk})^{n_{jk}-x_{jk}} \right) $$

  1. Next, we take the log to get

$$ log(L)= \sum_{jk} x_{jk}log(p_{jk})+ (n_{jk}-x_{jk})log(1-p_{jk}) $$

  1. Then we can calculate the partial derivative of $ log(L) $ w.r.t. $ p_i $ as

$$ \frac{\partial log(L)}{\partial p_i} = \sum_{i=j} \left( \frac{x_{jk}}{p_{jk}} - \frac{n_{jk} - x_{jk}}{1-p_{jk}} \right) \frac{p_k}{(p_j + p_k)^2} + \sum_{i=k} \left( \frac{x_{jk}}{p_{jk}} - \frac{n_{jk} - x_{jk}}{1-p_{jk}} \right) \frac{-p_j}{(p_j + p_k)^2} $$

  1. This looks nasty, but we know all these terms so we can plug and chug. I.e. we can use gradient descent to find the $ \bf p $ that maximizes $ log(L) $.


If you’re interested in seeing a demo of how this works, I’ve posted a sample sales dataset with a Python solver and an R solver for finding the size curve, on GitHub.

Sales data

sales <- fread("")
##           date variant sales depleted
##  1: 2021-01-01   small    15    FALSE
##  2: 2021-01-01  medium    20    FALSE
##  3: 2021-01-01   large    23    FALSE
##  4: 2021-01-01      xl    11    FALSE
##  5: 2021-01-02   small    73    FALSE
##  6: 2021-01-02  medium    99    FALSE
##  7: 2021-01-02   large   114    FALSE
##  8: 2021-01-02      xl   105    FALSE
##  9: 2021-01-03   small     2     TRUE
## 10: 2021-01-03  medium    21    FALSE
## 11: 2021-01-03   large    20    FALSE
## 12: 2021-01-03      xl    15    FALSE

Python Solution

R Solution