Toy Examples with Code

Preamble

using SymbolicRegression
using DataFrames

We'll also code up a simple function to print a single expression:

function get_best(; hof::HallOfFame{T,L}, options) where {T,L}
    dominating = calculate_pareto_frontier(hof)

    df = DataFrame(;
        tree=[m.tree for m in dominating],
        loss=[m.loss for m in dominating],
        complexity=[compute_complexity(m.tree, options) for m in dominating],
    )

    df[!, :score] = vcat(
        [L(0.0)],
        -1 .* log.(df.loss[2:end] ./ df.loss[1:(end - 1)]) ./
        (df.complexity[2:end] .- df.complexity[1:(end - 1)]),
    )

    min_loss = min(df.loss...)

    best_idx = argmax(df.score .* (df.loss .<= (2 * min_loss)))

    return df.tree[best_idx], df
end

Here's a simple example where we find the expression 2 cos(x3) + x0^2 - 2.

X = 2randn(5, 1000)
y = @. 2*cos(X[4, :]) + X[1, :]^2 - 2

options = Options(; binary_operators=[+, -, *, /], unary_operators=[cos])
hof = EquationSearch(X, y; options=options, niterations=30)

Let's look at the most accurate expression:

best, df = get_best(; hof, options)
println(best)

2. Custom operator

Here, we define a custom operator and use it to find an expression:

X = 2randn(5, 1000)
y = @. 1/X[1, :]

options = Options(; binary_operators=[+, *], unary_operators=[inv])
hof = EquationSearch(X, y; options=options)
println(get_best(; hof, options)[1])

3. Multiple outputs

Here, we do the same thing, but with multiple expressions at once, each requiring a different feature.

X = 2rand(5, 1000) .+ 0.1
y = @. 1/X[1:3, :]
options = Options(; binary_operators=[+, *], unary_operators=[inv])
hofs = EquationSearch(X, y; options=options)
bests = [get_best(; hof=hofs[i], options)[1] for i=1:3]
println(bests)

4. Plotting an expression

For now, let's consider the expressions for output 1. We can see the SymbolicUtils version with:

eqn = node_to_symbolic(bests[1], options)

We can get the LaTeX version with:

using Latexify
latexify(string(eqn))

Let's plot the prediction against the truth:

using Plots

scatter(y[1, :], bests[1](X), xlabel="Truth", ylabel="Prediction")

Here, we have used the convenience function (::Node{T})(X) to evaluate the expression. However, this will only work because calling Options() will automatically set up calls to eval_tree_array. In practice, you should use the eval_tree_array function directly, which is the form of:

eval_tree_array(bests[1], X, options)

5. Other types

SymbolicRegression.jl can handle most numeric types you wish to use. For example, passing a Float32 array will result in the search using 32-bit precision everywhere in the codebase:

X = 2randn(Float32, 5, 1000)
y = @. 2*cos(X[4, :]) + X[1, :]^2 - 2

options = Options(; binary_operators=[+, -, *, /], unary_operators=[cos])
hof = EquationSearch(X, y; options=options, niterations=30)

we can see that the output types are Float32:

best, df = get_best(; hof, options)
println(typeof(best))
# Node{Float32}

We can also use Complex numbers:

cos_re(x::Complex{T}) where {T} = cos(abs(x)) + 0im

X = 15 .* rand(ComplexF64, 5, 1000) .- 7.5
y = @. 2*cos_re((2+1im) * X[4, :]) + 0.1 * X[1, :]^2 - 2

options = Options(; binary_operators=[+, -, *, /], unary_operators=[cos_re], maxsize=30)
hof = EquationSearch(X, y; options=options, niterations=100)

6. Additional features

For the many other features available in SymbolicRegression.jl, check out the API page for Options. You might also find it useful to browse the documentation for the Python frontend PySR, which has additional documentation. In particular, the tuning page is useful for improving search performance.