API
EquationSearch
Missing docstring for EquationSearch(X::AbstractMatrix{T}, y::AbstractVector{T}; niterations::Int=10, weights::Union{AbstractVector{T}, Nothing}=nothing, varMap::Union{Array{String, 1}, Nothing}=nothing, options::Options=Options(), numprocs::Union{Int, Nothing}=nothing, procs::Union{Array{Int, 1}, Nothing}=nothing, runtests::Bool=true ) where {T<:Real}
. Check Documenter's build log for details.
Options
SymbolicRegression.../Options.jl.Options
— MethodOptions(;kws...)
Construct options for EquationSearch
and other functions.
Arguments
binary_operators=(div, plus, mult)
: Tuple of binary operators to use. Each operator should be defined for two input scalars, and one output scalar. All operators need to be defined over the entire real line (excluding infinity - these are stopped before they are input). Thus,log
should be replaced withlog_abs
, etc. For speed, define it so it takes two reals of the same type as input, and outputs the same type. For the SymbolicUtils simplification backend, you will need to define a generic method of the operator so it takes arbitrary types.unary_operators=(exp, cos)
: Same, but for unary operators (one input scalar, gives an output scalar).constraints=nothing
: Array of pairs specifying size constraints for each operator. The constraints for a binary operator should be a 2-tuple (e.g.,(-1, -1)
) and the constraints for a unary operator should be anInt
. A size constraint is a limit to the size of the subtree in each argument of an operator. e.g.,[(^)=>(-1, 3)]
means that the^
operator can have arbitrary size (-1
) in its left argument, but a maximum size of3
in its right argument. Default is no constraints.batching=false
: Whether to evolve based on small mini-batches of data, rather than the entire dataset.batchSize=50
: What batch size to use if using batching.loss=L2DistLoss()
: What loss function to use. Can be one of the following losses, or any other loss of typeSupervisedLoss
. You can also pass a function that takes a scalar target (left argument), and scalar predicted (right argument), and returns a scalar. This will be averaged over the predicted data. If weights are supplied, your function should take a third argument for the weight scalar. Included losses: Regression: -LPDistLoss{P}()
, -L1DistLoss()
, -L2DistLoss()
(mean square), -LogitDistLoss()
, -HuberLoss(d)
, -L1EpsilonInsLoss(ϵ)
, -L2EpsilonInsLoss(ϵ)
, -PeriodicLoss(c)
, -QuantileLoss(τ)
, Classification: -ZeroOneLoss()
, -PerceptronLoss()
, -L1HingeLoss()
, -SmoothedL1HingeLoss(γ)
, -ModifiedHuberLoss()
, -L2MarginLoss()
, -ExpLoss()
, -SigmoidLoss()
, -DWDMarginLoss(q)
.npopulations=nothing
: How many populations of equations to use. By default this is set equal to the number of coresnpop=1000
: How many equations in each population.ncyclesperiteration=300
: How many generations to consider per iteration.ns=10
: Number of equations in each subsample during regularized evolution.topn=10
: Number of equations to return to the host process, and to consider for the hall of fame.alpha=0.100000f0
: The probability of accepting an equation mutation during regularized evolution is given by exp(-delta_loss/(alpha * T)), where T goes from 1 to 0. Thus, alpha=infinite is the same as no annealing.maxsize=20
: Maximum size of equations during the search.maxdepth=nothing
: Maximum depth of equations during the search, by default this is set equal to the maxsize.parsimony=0.000100f0
: A multiplicative factor for how much complexity is punished.useFrequency=false
: Whether to use a parsimony that adapts to the relative proportion of equations at each complexity; this will ensure that there are a balanced number of equations considered for every complexity.fast_cycle=false
: Whether to thread over subsamples of equations during regularized evolution. Slightly improves performance, but is a different algorithm.migration=true
: Whether to migrate equations between processes.hofMigration=true
: Whether to migrate equations from the hall of fame to processes.fractionReplaced=0.1f0
: What fraction of each population to replace with migrated equations at the end of each cycle.fractionReplacedHof=0.1f0
: What fraction to replace with hall of fame equations at the end of each cycle.shouldOptimizeConstants=true
: Whether to use NelderMead optimization to periodically optimize constants in equations.optimizer_nrestarts=3
: How many different random starting positions to consider when using NelderMead optimization.hofFile=nothing
: What file to store equations to, as a backup.perturbationFactor=1.000000f0
: When mutating a constant, either multiply or divide by (1+perturbationFactor)^(rand()+1).probNegate=0.01f0
: Probability of negating a constant in the equation when mutating it.mutationWeights=[10.000000, 1.000000, 1.000000, 3.000000, 3.000000, 0.010000, 1.000000, 1.000000]
:annealing=true
: Whether to use simulated annealing.warmupMaxsize=0
: Whether to slowly increase the max size from 5 up tomaxsize
. If nonzero, specifies how many cycles (populations*iterations) before increasing by 1.verbosity=convert(Int, 1e9)
: Whether to print debugging statements or not.bin_constraints=nothing
:una_constraints=nothing
:seed=nothing
: What random seed to use.nothing
uses no seed.progress=false
: Whether to use a progress bar output (verbosity
will have no effect).
Printing and Evaluation
SymbolicRegression.../EquationUtils.jl.stringTree
— MethodstringTree(tree::Node, options::Options; kws...)
Convert an equation to a string.
Arguments
varMap::Union{Array{String, 1}, Nothing}=nothing
: what variables to print for each feature.
SymbolicRegression.../EvaluateEquation.jl.evalTreeArray
— MethodevalTreeArray(tree::Node, cX::AbstractMatrix{T}, options::Options)
Evaluate a binary tree (equation) over a given input data matrix. The options contain all of the operators used. This function fuses doublets and triplets of operations for lower memory usage.
Returns
(output, complete)::Tuple{AbstractVector{T}, Bool}
: the result, which is a 1D array, as well as if the evaluation completed successfully (true/false). Afalse
complete means an infinity or nan was encountered, and a large loss should be assigned to the equation.
SymbolicUtils.jl interface
SymbolicRegression.../InterfaceSymbolicUtils.jl.node_to_symbolic
— Methodnode_to_symbolic(tree::Node, options::Options;
varMap::Union{Array{String, 1}, Nothing}=nothing,
evaluate_functions::Bool=false,
index_functions::Bool=false)
The interface to SymbolicUtils.jl. Passing a tree to this function will generate a symbolic equation in SymbolicUtils.jl format.
Arguments
tree::Node
: The equation to convert.options::Options
: Options, which contains the operators used in the equation.varMap::Union{Array{String, 1}, Nothing}=nothing
: What variable names to use for each feature. Default is [x1, x2, x3, ...].evaluate_functions::Bool=false
: Whether to evaluate the operators, or leave them as symbolic.index_functions::Bool=false
: Whether to generate special names for the operators, which then allows one to convert back to aNode
format usingsymbolic_to_node
.
Pareto frontier
SymbolicRegression.../HallOfFame.jl.calculateParetoFrontier
— MethodcalculateParetoFrontier(X::AbstractMatrix{T}, y::AbstractVector{T},
hallOfFame::HallOfFame, options::Options;
weights=nothing, varMap=nothing) where {T<:Real}
Compute the dominating Pareto frontier for a given hallOfFame. This is the list of equations where each equation has a better loss than all simpler equations.
SymbolicRegression.../HallOfFame.jl.calculateParetoFrontier
— MethodcalculateParetoFrontier(dataset::Dataset{T}, hallOfFame::HallOfFame,
options::Options) where {T<:Real}