Tlahui - Data Science blog

While working on a project involving Elo Ratings, I was in need of a way of continuously changing the value of a certain probability for testing purposes. The problem was interesting enough that I deviated a bit from my intended path to write this post.

The problem required me to find a way to vary a value within the range $[0, 1]$ while maintaining continuity. This immediately made me think of random walks. That is exactly when everything broke loose.

Random Walks

Random Walks are one of the simplest Stochastic Processes. The idea is that the process stands at $(x, y)$ and for a time step $t$, we would end up in $(x + t, y + S_t)$. Now, for this specific implementation, we will focus on the discrete form of the process (and the discretization of the continuous form as a side effect). We will assume that $S \sim N(\mu, \sigma)$ is normally distributed, although it can actually have any distribution. This has a cumulative effect (by adding all the different $S_t$ in time) which makes the function seem to be "walking" randomly through the range. It is actually quite simple to code this kind of process in python:

def random_walk(size=[1000, 10], starting_point=0, drift=0, std=1):
# Generate a Normal sample
sample = np.random.normal(loc=drift, scale=std, size=size)

# Generate a standard Random Walk starting at 0
base_walk = sample.cumsum(0)

# Change location of starting point
walk = base_walk + starting_point

return walk

We can see that modifying the parameter $\mu$ for $S$ causes an overall drifting of the process' tendency. If we then consider $\mathbb{P}(\text{event at }t) = S_t$, the drift can help us better test our models in an environment of faster probability changes.

Bounding

Now, we can see that the Random Walks are stable to a certain degree. But there is always the possibility that they break any bounds we impose on them, no matter how low the probability. Therefore, we need to impose some limits to the realizations of the process (what is meant by realization is a line in the plot, an actual observation within the probability space of the process). This is easier than trying to modify the construction of the random walk. We will demonstrate all the concepts with simple linear functions first, but this method will apply to any kind of continuous function.

Say we want to bind a function $f(x)$ to the interval $[a, b]$ and still maintain continuity without modifying the scale of $f(x)$. The picture, with no bounding yet, looks something like this:

A straightforward idea to bound the function is to bounce it between the boundaries. The first whing we want to do, then, is to locate the parts of the function that are out of bounds and reverse the direction. Be wary that I said reverse the direction and not reverse the sign, as this changes depending on where the bounds are located. Let's try that:

Now we have to solve the case when the first bounce itself lays out of bounds. Note that this can happen infinitely many times, so we need to be smart about it (and avoid using loops for instance). We can perform this transformation on $f(x)$ to get the amount of times it would have crossed the boundaries:

\[ bounces(x) = \left\lfloor \frac{f(x) - a}{b - a} \right\rfloor = c_x \]

If we detect that $f(x)$ has bounced an odd number of times, we have to reverse the direction. If it has bounced an even number of times, then the direction is already ok. First we flip the sign of $f(x)$ with this neat trick:

\[ g(x) = \left[ f(x) - a \right] \times (-1)^{\left\lvert c_x \right\rvert} + a \]

Let's give that a try:

As I mentioned earlier, to change the direction is a bit more nuanced than merely flipping the sign of $f(x)$. We also need to relocate the bouncing parts of the function. We can see that the translation needed is directly linked with the number of bounces, which we have already calculated. For this, let us add a term $\epsilon(x)$ to represent this translation:

$ g(x) = \left[ f(x) - a \right] \times (-1)^{\left\lvert c_x \right\rvert} $ $ + a + \epsilon(x) $

We just have to focus on an intersection point, $x_i$, such that:

\[ \begin{align} &bounces(x_k) = k \\ &bounces(x_k - \omega) = k - 1 \end{align} \]

with $c \in \mathbb{Z}$ and $0 \leq \omega < b - a$. This in turn means that:

\[ f(x) = k \times (b - a) + a \]

By continuity, we get that

\[ \lim_{\omega \to 0} f(x - \omega) = f(x) \] \[ \begin{align} \rightarrow k \times (b - a) \times (-1)^{\left\lvert k \right\rvert} + a + \epsilon(k) = \\ k \times (b - a) \times (-1)^{\left\lvert k - 1 \right\rvert} + a + \epsilon(k - 1) \end{align} \]

Furhtermore, bouncing off the boundaries requires that:

\[ \begin{align} \nonumber g(x) = \begin{cases} a, &\text{if } c \text{ is even} \\ b, &\text{if } c \text{ is odd} \end{cases} \end{align} \]

As one of my college professors used to say, after a bit of algebra we have that:

$ \epsilon(k) = (-1)^{\left\lvert k \right\rvert + 1} $ $ \times \left[ k + 1 - even(k) \right] $ $ \times (b - a) $

With this we can rewrite $g(x)$ as:

$ g(x) = a + (-1)^{\left\lvert c_x \right\rvert} $ $ \times \{ f(x) - a $ $ - [ c_x + 1 - even(c_x) ] \times (b - a) \} $

Testing with the Random Walk

We can now implement $g(x)$, which is array-friendly, into our Random Walk code from the beggining of the post to get some nice bounded processes.