Developer Guide

https://github.com/DeltaRCM/pyDeltaRCM/workflows/actions/badge.svg

This guide provides additional details for implementation that did not make it into the user guide. If you have not yet read the user guide, that is the place to start, then visit and refer to this guide for details as necessary.

Best Practices

Reproducibility

Important

tl;dr: any random numbers must be generated within a jitted function.

A major goal of the pyDeltaRCM project is to enable fully reproducible simulation results. Variability in pyDeltaRCM arises from the weighted random walks of water and sediment parcels during model iteration, so the state of the random source is essential to reproducibility. We encourage developers to ensure that their subclass models are also reproducible, and provide some information on how to do so in this section.

For many subclassing models, it will be straightforward to ensure models are reproducible. Out of the box, models will use the core pyDeltaRCM “seed” functionality to make models reproducible, and checkpointing should easily integrate with most use cases. However, models that implement processes or functions that rely on random numbers or samples from random distributions will need to take care to ensure models are reproducible.

Random numbers

For pyDeltaRCM, the source of random values is a pseudo-random number generator (RNG) from numpy, which works by inputting the current state of the RNG to an algorithm, and returning a sample (an integer) that can be mapped to any probability density function. Each time the RNG yields a sample, the RNG state is changed, such that repeated samples from the RNG appear to be random, but are actually a deterministic sequence for a given initial RNG state (i.e., for a given seed).

As a result of the deterministic RNG, model runs are exactly reproducible if they begin from the same initial RNG state. However, this means that any change to the state of the RNG that is not recorded will make a run non-reproducible. Additionally, any “random” behavior implemented in the model that does not also modify the state of the RNG, is non-reproducible.

With a default setup of numpy, the functions of np.random will utilize the same underlying random number generator, such that the following code always evaluates to the same results. In this case, it would be relatively simple to keep runs reproducible, because any call to a np.random function would modify the state of the underlying generator.

>>> np.random.seed(10); np.random.uniform(); np.random.normal(); np.random.uniform();
0.771320643266746
0.03777261440227079
0.7488038825386119

In pyDeltaRCM though, we use numba just-in-time compilation for several steps in the model routine to make execution faster, including getting the random values that drive parcel movement during model iteration. Within a Python console, calls to the numba RNG will not affect the state of the numpy RNG, and vice versa; even though the pre-JIT compiled code appears to call np.random.uniform (e.g., get_random_uniform).

Important

pyDeltaRCM only takes responsibility for the numba RNG!

A simple example

So why does it matter? Well, calling np.random.normal(100, 10) in a model subclass will give you a random value, and modify the state of the numpy RNG, but it will not affect the state of the numba RNG (the one pyDeltaRCM actually keeps track of). Thus, the values returned from the call to np.random.normal are not known and are not reproducible.

In the following simple example, see how the reproducible model uses a random number generated from a jitted function. This ensures the numba RNG is used for random variability in the model, and runs are reproducible.

>>> from numba import njit

>>> @njit
... def get_random_normal():
...     """Get a random number from standard normal distribution.
...     """
...     return np.random.normal(0, 1)


>>> class BrokenAndNotReproducible(pyDeltaRCM.DeltaModel):
...
...     def __init__(self, input_file=None, **kwargs):
...         """Initialize a model that can never be reproduced.
...         """
...
...         super().__init__(input_file=input_file, **kwargs)
...
...     def update(self):
...         """Reimplement update method for demonstration."""
...
...         # the core pyDeltaRCM RNG is used in computations, e.g.,
...         _sample0 = pyDeltaRCM.shared_tools.get_random_uniform(1)
...
...         # now, we do something custom in our subclass
...         _sample1 = np.random.normal(0, 1)
...
...         # and write it out to view
...         print(_sample0, _sample1)


>>> class BeautifulAndVeryReproducible(pyDeltaRCM.DeltaModel):
...
...     def __init__(self, input_file=None, **kwargs):
...         """Initialize a reproducible model.
...         """
...
...         super().__init__(input_file=input_file, **kwargs)
...
...     def update(self):
...         """Reimplement update method for demonstration."""
...
...         # the core pyDeltaRCM RNG is used in computations, e.g.,
...         _sample0 = pyDeltaRCM.shared_tools.get_random_uniform(1)
...
...         # now, we do something custom in our subclass
...         _sample1 = get_random_normal()
...
...         # and write it out to view
...         print(_sample0, _sample1)

Now, we will initialize and run each model for three timesteps, twice. Running each twice will allow us to see if the model is reproducible (i.e., are all of the numbers exactly the same between runs). First, run the Broken model:

broken = BrokenAndNotReproducible(seed=10)
>>> for i in range(3):
...     broken.update() 
0.771320643266746 1.213653088541954
0.0207519493594015 -0.40009453994985783
0.6336482349262754 0.7719410676752912
broken = BrokenAndNotReproducible(seed=10)
>>> for i in range(3):
...     broken.update() 
0.771320643266746 -0.17926017697487434
0.0207519493594015 -0.4421037872728855
0.6336482349262754 -0.2725394596633578

Now, run the reproducible model:

beautiful = BeautifulAndVeryReproducible(seed=10)
>>> for i in range(3):
...     beautiful.update()
0.771320643266746 0.03777261440227079
0.7488038825386119 -0.1354484915560101
0.4985070123025904 -0.6643797082723693
beautiful = BeautifulAndVeryReproducible(seed=10)
>>> for i in range(3):
...     beautiful.update()
0.771320643266746 0.03777261440227079
0.7488038825386119 -0.1354484915560101
0.4985070123025904 -0.6643797082723693

From these results, we can see that the values returned from the built-in uniform RNG as the first sample of each iteration (i.e., the left column) is always deterministic (in broken and beautiful), whereas both the built-in and the custom RNG are deterministic (in beautiful).

Important

Be sure to only generate random numbers inside jitted functions!

Note

It is generally okay to not worry about reproducibility when you are developing your subclassing model and trying to work out how model mechanics will depend on randomness – but once you start to do real simulations you may analyze, be sure to take the time to make your model reproducible.

Model development

Todo

add some notes about slicing arrays, and how we pad them with a custom pad operation when needed. Look at shared tools docs for starting point.