SymbolicRegression.jlDiscover Mathematical Laws from Data

Fast, flexible evolutionary algorithms for interpretable machine learning

Get Started

API Reference 📚

View on GitHub

🔬

Interpretable By Design

Discovers interpretable mathematical expressions instead of black-box models.

🚀

Production Ready

Years of development have produced mature, highly optimized parallel evolutionary algorithms.

🔧

Extremely Customizable

Customize everything: operators, loss functions, complexity, input types, optimizer, and more.

🔌

Julia Native

Built for automatic interoperability with the entire scientific computing stack.

SymbolicRegression.jl searches for symbolic expressions which optimize a particular objective.

Latest release	Documentation	Forums	Paper

Build status	Coverage

Check out PySR for a Python frontend. Cite this software

Quickstart

Install in Julia with:

julia

using Pkg
Pkg.add("SymbolicRegression")

MLJ Interface

The easiest way to use SymbolicRegression.jl is with MLJ. Let's see an example:

julia

import SymbolicRegression: SRRegressor
import MLJ: machine, fit!, predict, report

# Dataset with two named features:
X = (a = rand(500), b = rand(500))

# and one target:
y = @. 2 * cos(X.a * 23.5) - X.b ^ 2

# with some noise:
y = y .+ randn(500) .* 1e-3

model = SRRegressor(
    niterations=50,
    binary_operators=[+, -, *],
    unary_operators=[cos],
)

Now, let's create and train this model on our data:

julia

mach = machine(model, X, y)

fit!(mach)

You will notice that expressions are printed using the column names of our table. If, instead of a table-like object, a simple array is passed (e.g., X=randn(100, 2)), x1, ..., xn will be used for variable names.

Let's look at the expressions discovered:

julia

report(mach)

Finally, we can make predictions with the expressions on new data:

julia

predict(mach, X)

This will make predictions using the expression selected by model.selection_method, which by default is a mix of accuracy and complexity.

You can override this selection and select an equation from the Pareto front manually with:

julia

predict(mach, (data=X, idx=2))

where here we choose to evaluate the second equation.

For fitting multiple outputs, one can use MultitargetSRRegressor (and pass an array of indices to idx in predict for selecting specific equations). For a full list of options available to each regressor, see the API page.

Low-Level Interface

The heart of SymbolicRegression.jl is the equation_search function. This takes a 2D array and attempts to model a 1D array using analytic functional forms. Note: unlike the MLJ interface, this assumes column-major input of shape [features, rows].

julia

import SymbolicRegression: Options, equation_search

X = randn(2, 100)
y = 2 * cos.(X[2, :]) + X[1, :] .^ 2 .- 2

options = Options(
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
    populations=20
)

hall_of_fame = equation_search(
    X, y, niterations=40, options=options,
    parallelism=:multithreading
)

You can view the resultant equations in the dominating Pareto front (best expression seen at each complexity) with:

julia

import SymbolicRegression: calculate_pareto_frontier

dominating = calculate_pareto_frontier(hall_of_fame)

This is a vector of PopMember type - which contains the expression along with the cost. We can get the expressions with:

julia

trees = [member.tree for member in dominating]

Each of these equations is an Expression{T} type for some constant type T (like Float32).

These expression objects are callable – you can simply pass in data:

julia

tree = trees[end]
output = tree(X)

Constructing expressions

Expressions are represented under-the-hood as the Node type which is developed in the DynamicExpressions.jl package. The Expression type wraps this and includes metadata about operators and variable names.

You can manipulate and construct expressions directly. For example:

julia

using SymbolicRegression: Options, Expression, Node

options = Options(;
    binary_operators=[+, -, *, /], unary_operators=[cos, exp, sin]
)
operators = options.operators
variable_names = ["x1", "x2", "x3"]
x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3]

tree = cos(x1 - 3.2 * x2) - x1 * x1

This tree has Float64 constants, so the type of the entire tree will be promoted to Node{Float64}.

We can convert all constants (recursively) to Float32:

julia

float32_tree = convert(Expression{Float32}, tree)

We can then evaluate this tree on a dataset:

julia

X = rand(Float32, 3, 100)

tree(X)

This callable format is the easy-to-use version which will automatically set all values to NaN if there were any Inf or NaN during evaluation. You can call the raw evaluation method with eval_tree_array:

julia

output, did_succeed = eval_tree_array(tree, X)

where did_succeed explicitly declares whether the evaluation was successful.

Exporting to SymbolicUtils.jl

We can view the equations in the dominating Pareto frontier with:

julia

dominating = calculate_pareto_frontier(hall_of_fame)

We can convert the best equation to SymbolicUtils.jl with the following function:

julia

import SymbolicRegression: node_to_symbolic

eqn = node_to_symbolic(dominating[end].tree)
println(simplify(eqn*5 + 3))

We can also print out the full pareto frontier like so:

julia

import SymbolicRegression: compute_complexity, string_tree

println("Complexity\tMSE\tEquation")

for member in dominating
    complexity = compute_complexity(member, options)
    loss = member.loss
    string = string_tree(member.tree, options)

    println("$(complexity)\t$(loss)\t$(string)")
end

SymbolicRegression.jlDiscover Mathematical Laws from Data

Interpretable By Design

Production Ready

Extremely Customizable

Julia Native

Quickstart ​

MLJ Interface ​

Low-Level Interface ​

Constructing expressions ​

Exporting to SymbolicUtils.jl ​

Contents ​

Quickstart

MLJ Interface

Low-Level Interface

Constructing expressions

Exporting to SymbolicUtils.jl

Contents