# Solving The Size Curve Problem - Ordering The Optimal Distribution Of Inventory By Variant
## Introduction
I recently interviewed with a startup that's tackling the [inventory demand forecasting](/blog/a-machine-learning-approach-to-inventory-demand-forecasting) problem, whereby retailers have to forecast the future demand for their products, so they know how much inventory to purchase today. During the interview, I got asked a question that would consume me for the following week. "How would you approach the size curve problem?" That problem goes like this..
## The Size Curve Problem
Most retailers sell a catalog of *products*, where each product has a collection of *variants*. For example, suppose you sell t-shirts. Your products might include a Beatles t-shirt, a Rolling Stones t-shirt, and a Fleetwood Mac t-shirt. However, for each t-shirt, you offer various sizes: small, medium, large, and xl (these are the variants). Now suppose you design new t-shirt, say, a Kiss t-shirt, and you'd like to place an initial order of 1,000 units. How should you divvy up those 1000 units by variant/size?
This is the size curve problem in a nutshell. In practice, variants can also include things like color, shape, material, style, etc.
Now, you might be thinking this is easy. Just look at past sales and measure the total units sold per variant to get the appropriate relative distribution of demand per variant. ..Not so fast. It turns out that, for various periods, some of your t-shirts were out of stock. Maybe you were so poorly supplied that, there's never a point in time at which all variants were in stock at the same time.
Okay, so why don't we just pick 10 days where smalls were in stock, 10 days where mediums were in stock, etc., add up the sales and then compare the relative demand? Again, no. Sales tend to be highly seasonal, so that method will unfairly represent high demand for variants in stock during peak sale days.
See the challenge now?
## The Solution
After a few days of scratching my head, a feasible solution hit me. The general approach is to use [maximum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation), searching for a discrete probability distribution that makes the sales data most likely to have occurred. For a single product, it goes like this..
1. Index the variants in order, small = 1, medium = 2, large = 3, xl = 4.
2. Initialize arbitrary probabilities for $ p_1 = 0.25 $, $ p_2 = 0.25 $, $ p_3 $, and $ p_4 = 0.25 $ where $ p_i $ gives the true probability that the *next* requested variant will be variant $ i $.
3. Let $ p_{jk} $ be the probability of observing a request for variant $ j $ next when we restrict ourselves to viewing only variants $ j $ and $ k $. Assuming independence, we get $ p_{jk} = \frac{p_j}{p_j + p_k} $.
4. Let $ x_{jk} $ be the number of $ j $ units sold on days where variants $ j $ and $ k $ were both in stock (and didn't run out of stock mid-day). Similarly, let $ n_{jk} $ be the combined units sold for variants $ j $ and $ k $ on those days.
5. $ x_{jk} $ is a [Binomial random variable](https://en.wikipedia.org/wiki/Binomial_distribution). Thus, if we believe in our discrete probability distribution, the probability of observing $ x_{jk} $ is
$$
\binom {n_{jk}}{x_{jk}}p_{jk}^{x_{jk}}(1-p_{jk})^{n_{jk}-x_{jk}}
$$
6. The probability of *all* our pair-wise observations will just be the product of their individual probabilities. So, our likelihood function looks like this
$$
L(p,x)=\prod_{jk} \binom {n_{jk}}{x_{jk}}\left( p_{jk}^{x_{jk}}(1-p_{jk})^{n_{jk}-x_{jk}} \right)
$$
where $ jk $ spans all distinct pair-wise indexes that we observed.
7. However, since our goal is merely to find the vector probabilities, $ \bf p $, that maximizes the likelihood function, $ n_{jk} $ and $ x_{jk} $ terms are constants, so we can drop the $ \binom {n_{jk}}{x_{jk}} $ terms.
$$
L(p,x)=\prod_{jk} \left( p_{jk}^{x_{jk}}(1-p_{jk})^{n_{jk}-x_{jk}} \right)
$$
8. Next, we take the log to get
$$
log(L)= \sum_{jk} x_{jk}log(p_{jk})+ (n_{jk}-x_{jk})log(1-p_{jk})
$$
9. Then we can calculate the partial derivative of $ log(L) $ w.r.t. $ p_i $ as
$$
\frac{\partial log(L)}{\partial p_i} = \sum_{i=j} \left( \frac{x_{jk}}{p_{jk}} - \frac{n_{jk} - x_{jk}}{1-p_{jk}} \right) \frac{p_k}{(p_j + p_k)^2} + \sum_{i=k} \left( \frac{x_{jk}}{p_{jk}} - \frac{n_{jk} - x_{jk}}{1-p_{jk}} \right) \frac{-p_j}{(p_j + p_k)^2}
$$
10. This looks nasty, but we know all these terms so we can plug and chug. I.e. we can use [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent#:~:text=Gradient%20descent%20is%20a%20first,the%20direction%20of%20steepest%20descent.) to find the $ \bf p $ that maximizes $ log(L) $.
## Code
If you're interested in seeing a demo of how this works, I've posted a sample sales dataset with a Python solver and an R solver for finding the size curve, [on GitHub](https://github.com/ben519/sizecurve).
### Sales data
```r
sales <- fread("https://raw.githubusercontent.com/ben519/sizecurve/master/sales.csv")
print(sales)
## date variant sales depleted
## 1: 2021-01-01 small 15 FALSE
## 2: 2021-01-01 medium 20 FALSE
## 3: 2021-01-01 large 23 FALSE
## 4: 2021-01-01 xl 11 FALSE
## 5: 2021-01-02 small 73 FALSE
## 6: 2021-01-02 medium 99 FALSE
## 7: 2021-01-02 large 114 FALSE
## 8: 2021-01-02 xl 105 FALSE
## 9: 2021-01-03 small 2 TRUE
## 10: 2021-01-03 medium 21 FALSE
## 11: 2021-01-03 large 20 FALSE
## 12: 2021-01-03 xl 15 FALSE
```
### Python Solution
### R Solution