A primer in stochastic modeling & analysis

The Greek word “stochastic” is derived from the transliterated word stokhazesthai, which means “to aim” or “to guess.” Additionally, it suggests “random” or “chance.” The opposite meaning is “deterministic” or “certain.” Thus, a “stochastic model” forecasts a set of possible weighted outcomes associated with likelihoods or probabilities. The presence of uncertainty in situations is represented by a stochastic model, where randomness is associated with a process. Possibilities for events within the model are assigned by the observer.

A “deterministic model” calculates a single outcome from a given set of circumstances. It can be used to predict outcomes possessing 100% certainty. The equations in a deterministic model precisely describe the input and outputs of a model.

Phenomena, of and by themselves, are not modeled as stochastic or deterministic. The observer determines whether to model the phenomenon as either stochastic or deterministic based upon the problem to be solved. A stochastic model is a tool that allows for random variation in one or more inputs over time. In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions.

Stochastic models engage random variables, (variables whose possible values depend on the outcomes of a chance phenomenon). The variables are either discrete or continuous. A common example of a discrete variable is a dice. The throwing of a dice is a purely random event. At the same time, the dice express a finite number of outcomes {1, 2, 3, 4, 5, and 6}.

An outcome of a discrete random variable contains a certain probability. For example, the probability of each dice outcome is 1/6 because the outcomes are of equal probabilities. Of special note is the total probability outcome of a discrete variable, which always equals 1. Discrete data is countable while continuous data is measurable. Discrete data contains distinct or separate values.

A common example of a continuous variable is the return of investment of stocks. The returns can take an infinite number of possible values (as percentages). The probability of a certain outcome for the continuous random variable is zero. Continuous data includes any value within a range.

Each variable possesses a specific probability distribution function (a mathematical function that represents the probabilities of occurrence of all possible outcomes). The probability distributions of discrete variables are specified in terms of probability mass functions. The probability distributions of continuous variables are specified in terms of probability density functions.

The basic steps for building a stochastic model are (Deviant, 2011):

  • Create the sample space (Ω) — a list of all possible outcomes,
  • Assign probabilities to sample space elements,
  • Identify the events of interest,
  • Calculate the probabilities for the events of interest.

A very simple example of this process in action would be rolling a die in a casino. If you roll a six or a one, you win $10. The steps would be (Deviant, 2011):

  • The sample space includes all possibilities for die roll outcomes: Ω = {1,2,3,4,5,6}.
  • The probability for any number being rolled is 1/6.
  • The event of interest is “roll a 6 or roll a 1”.
  • The probability for “roll a 6 or 1” is 1/6 + 1/6 = 2/6 = 1/3.

One of the benefits of a stochastic model is making uncertainty explicit. Thus, ranges and likely outcomes and easier to quantify. Nonetheless, the output from a stochastic model is an outcome of the assumptions placed into it. Stochastic modeling permits the construction of a simulation that exhibits volatility and variability (randomness). Consequently, reality is better represented from multiple perspectives.

Stochastic methods can be implemented with a wide range of programming languages

Stochastic methods have become a vital part of both scientific research and practice, across domains as wide as biology, chemistry, ecology, neuroscience, physics, image processing, signal processing, information theory, computer science, cryptography, telecommunications, and finance. For emerging applications in artificial intelligence and game theory, in particular, stochastic models and their solutions enable cutting-edge research. In both theoretical and practical challenges, bringing together the relevant stochastic, numerical, and mathematical methods is no easy task, so it’s important to at least have an easy time developing and executing the required software.

It’s easy to be confused about which programming languages and environments are the best for stochastic modeling and analysis. These languages and environments need to solve the typical stochastic optimization problems, which are often NP-hard. They also need specialized libraries that facilitate efficient and effective development, while still maintaining high performance. The technical details are important, but one can’t ignore soft factors, like developer experience. With these factors in mind, the best programming languages for stochastic modeling and analysis are Python, Julia, R, and Java.

Python

Python offers a powerful and attractive programming environment with an enormous number of libraries, tailored to the needs of various fields. Moreover, the ease with which one can write code in Python is significantly higher than in other languages such as Java and C++. Python libraries such as Stochastic and Stochpy have an assortment of easy-to-use features. With an ever-growing number of facilities and features made available through the open-source community, stochastic modeling and analysis libraries are always getting better.

A couple of useful examples of using stochastic models in Python are:

  1. Stochastic method is employed to refine the preliminary spam detection and to find the maximum likelihood for spam e-mail classification. The method is based on the Bayesian theorem, hidden Markov model (HMM), and the Viterbi algorithm (Mansourbeigi, 2019, March).
  2. An open-source Python package (GillesPy) was constructed for model construction and simulation of stochastic biochemical systems. GillesPy encompasses a Python framework for model building and an interface to the StochKit2 suite of efficient simulation algorithms based on the Gillespie stochastic simulation algorithms (Abel, et al., 2016).

Julia

Julia has become popular among students and industry experts using stochastic programming to solve challenging problems. The syntax is very similar to Python, and the comprehensive documentation makes it very easy to learn and write in Julia. Stochastic equations and their inability to be vectorized pose a considerable challenge for computation.

Typically, researchers resort to the use of brute force methods that have nested loop code structures. Julia has optimizations that can often solve these loops faster than other programming languages. Finally, the ability to make clean, composable code, across object-oriented and functional programming techniques, makes Julia one of our recommendations for stochastic modeling.

Some examples of using stochastic models in Julia are:

  1. The stochastization of one-step processes was applied to the SIR (Susceptible-Infected-Recovered) epidemic model to demonstrate the advantages of a stochastic representation of the system. The approach was based on the paradigm of analytical-numerical calculations and implemented using auxiliary packages that are domain-specific extensions (DSL): Catalyst.jl and DifferentialEquations.jl (Fedorov, et al., 2021).
  2. Stochastic models enable the exploration of various biochemical network motifs to explain particular phenomena observed through the use of modern, high-resolution experimental techniques. Implementations of key algorithms and examples are provided using the Julia programming language (Warne, et al., 2020).

R

R is another higher-level programming language that contains extensive functionality for stochastic modeling and analysis. The stpm library provides built-in functions and modules to study the evolution of measured variables, with time being one of the variables. Such studies have been instrumental in stochastic medical studies and for studying hazard functions, which is a mammoth challenge for today’s programmers. In addition, the other libraries such as adaptivetau and ctsmr provide a great framework for block models, competitive game theory, and genetic and swarm algorithms. R is a great option for programming stochastic models.

Let’s look at some examples of using R to develop stochastic models:

  1. The ARIMA (Autoregressive Integrated Moving Average) model is a tool for understanding and predicting future values in autocorrelated time. Stochastic models can be successfully used in the domain of time series analysis of sensor data in structural health monitoring. Applying these stochastic models in R through the implementation of the arima() function serves as a guideline when choosing the correct parameters when applying an ARIMA model (Kosorus, et al., 2011, August).
  2. Stochastic networks programmed in the R, (based on random graphs), were used in the field of population-level HIV transmission modeling to investigate the impact of circular migration (periodic migrations between urban and rural segments of a larger population) frequency on HIV prevalence (Gill, n.d.).

Java

Since the proliferation of object-oriented programming concepts, Java has been the go-to programming language, be it for software development or software architecture. One salient feature of Java, or any object-oriented programming language, is the ability to handle complicated structures of computation. In Java, simple libraries were developed as early as 2002 for stochastic modeling - all variety of random variable distributions and associated random variate generation algorithms. The familiarity of Java for many developers further improves its suitability for stochastic modeling and analysis.

Finally, these examples of implementing stochastic models in Java are insightful:

  1. The Department of Mathematics and Informatics at Iuliu Hatieganu University Cluj-Napoca constructed a range of class libraries for simulation tools in Java that will be used for pharmaceutical research and education, with the purpose to optimize the use of substances and reactants (Prodan & Prodan, 2002, April).
  2. BEAST, a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree is written in Java. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented (Drummond & Rambaut, 2007).

Next steps

Stochastic modeling and analysis have a crucial part to play on several avenues of research – not just limited to computational mathematics, but also life-changing sciences such as medical imaging. Researchers who use stochastic modeling need to be equipped with the most effective and user-friendly tools to carry out their research efficiently and effectively. A programming language and environment, with all the necessary libraries and built-in packages, is perhaps the most important of these tools.

References:

Abel, J. H., Drawert, B., Hellander, A., & Petzold, L. R. (2016). GillesPy: a python package for stochastic model building and simulation. In IEEE Life Sciences Letters, 2(3), 35-38.

Deviant, S. (2011). The practically cheating statistics handbook (3rd ed.). Lulu.com.

Drummond, A. J., & Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC evolutionary biology, 7(1), 1-8.

Fedorov, A. V., Masolova, A. O., Korolkova, A. V., & Kulyabov, D. S. (2021). Implementation of an analytical-numerical approach to stochastization of one-step processes in the Julia programming language. In XI International Conference Information and Telecommunication Technologies and Mathematical Modeling of High-Tech Systems (ITTMM-2021) April 19–23, 2021, Moscow.

Gill, N. (n.d.). Deterministic and stochastic models of infectious disease: circular migrations and HIV transmission dynamics. Occasional Papers. Department of Mathematics, University of Chicago.

Kosorus, H., Honigl, J., & Kung, J. (2011, August). Using R, WEKA, and RapidMiner in time series analysis of sensor data for structural health monitoring. In 2011 22nd International Workshop on Database and Expert Systems Applications (pp. 306-310). IEEE.

Mansourbeigi, S. M. H. (2019, March). Stochastic methods to find maximum likelihood for spam e-mail classification. In Workshops of the International Conference on Advanced Information Networking and Applications (pp. 623-632). Springer.

Pinsky, M., & Karlin, S. (2010). An introduction to stochastic modeling (4th ed.). Academic Press.

Prodan, A., & Prodan, R. (2002, April). A collection of Java class libraries for stochastic modeling and simulation. In International Conference on Computational Science (pp. 1040-1048). Springer, Berlin, Heidelberg.

Warne, D. J., Baker, R. E., & Simpson, M. J. (2020). A practical guide to pseudo-marginal methods for computational inference in systems biology. Journal of Theoretical Biology, 496, 110255.