Types
Equations
Equations are specified as binary trees with the Node
type, defined as follows.
DynamicExpressions.NodeModule.Node
— TypeNode{T} <: AbstractExpressionNode{T}
Node defines a symbolic expression stored in a binary tree. A single Node
instance is one "node" of this tree, and has references to its children. By tracing through the children nodes, you can evaluate or print a given expression.
Fields
degree::UInt8
: Degree of the node. 0 for constants, 1 for unary operators, 2 for binary operators.constant::Bool
: Whether the node is a constant.val::T
: Value of the node. Ifdegree==0
, andconstant==true
, this is the value of the constant. It has a type specified by the overall type of theNode
(e.g.,Float64
).feature::UInt16
: Index of the feature to use in the case of a feature node. Only used ifdegree==0
andconstant==false
. Only defined ifdegree == 0 && constant == false
.op::UInt8
: Ifdegree==1
, this is the index of the operator inoperators.unaops
. Ifdegree==2
, this is the index of the operator inoperators.binops
. In other words, this is an enum of the operators, and is dependent on the specificOperatorEnum
object. Only defined ifdegree >= 1
l::Node{T}
: Left child of the node. Only defined ifdegree >= 1
. Same type as the parent node.r::Node{T}
: Right child of the node. Only defined ifdegree == 2
. Same type as the parent node. This is to be passed as the right argument to the binary operator.
Constructors
Node([T]; val=nothing, feature=nothing, op=nothing, l=nothing, r=nothing, children=nothing, allocator=default_allocator)
Node{T}(; val=nothing, feature=nothing, op=nothing, l=nothing, r=nothing, children=nothing, allocator=default_allocator)
Create a new node in an expression tree. If T
is not specified in either the type or the first argument, it will be inferred from the value of val
passed or l
and/or r
. If it cannot be inferred from these, it will default to Float32
.
The children
keyword can be used instead of l
and r
and should be a tuple of children. This is to permit the use of splatting in constructors.
You may also construct nodes via the convenience operators generated by creating an OperatorEnum
.
You may also choose to specify a default memory allocator for the node other than simply Node{T}()
in the allocator
keyword argument.
When you create an Options
object, the operators passed are also re-defined for Node
types. This allows you use, e.g., t=Node(; feature=1) * 3f0
to create a tree, so long as *
was specified as a binary operator. This works automatically for operators defined in Base
, although you can also get this to work for user-defined operators by using @extend_operators
:
SymbolicRegression.InterfaceDynamicExpressionsModule.@extend_operators
— Macro@extend_operators options
Extends all operators defined in this options object to work on the AbstractExpressionNode
type. While by default this is already done for operators defined in Base
when you create an options and pass define_helper_functions=true
, this does not apply to the user-defined operators. Thus, to do so, you must apply this macro to the operator enum in the same module you have the operators defined.
When using these node constructors, types will automatically be promoted. You can convert the type of a node using convert
:
Base.convert
— Methodconvert(::Type{<:AbstractExpressionNode{T1}}, n::AbstractExpressionNode{T2}) where {T1,T2}
Convert a AbstractExpressionNode{T2}
to a AbstractExpressionNode{T1}
. This will recursively convert all children nodes to AbstractExpressionNode{T1}
, using convert(T1, tree.val)
at constant nodes.
Arguments
::Type{AbstractExpressionNode{T1}}
: Type to convert to.tree::AbstractExpressionNode{T2}
: AbstractExpressionNode to convert.
You can set a tree
(in-place) with set_node!
:
DynamicExpressions.NodeModule.set_node!
— Functionset_node!(tree::AbstractExpressionNode{T}, new_tree::AbstractExpressionNode{T}) where {T}
Set every field of tree
equal to the corresponding field of new_tree
.
You can create a copy of a node with copy_node
:
DynamicExpressions.NodeModule.copy_node
— Methodcopy_node(tree::AbstractExpressionNode; break_sharing::Val{BS}=Val(false)) where {BS}
Copy a node, recursively copying all children nodes. This is more efficient than the built-in copy.
If break_sharing
is set to Val(true)
, sharing in a tree will be ignored.
Expressions
Expressions are represented using the Expression
type, which combines the raw Node
type with an OperatorEnum
.
DynamicExpressions.ExpressionModule.Expression
— TypeExpression{T, N, D} <: AbstractExpression{T, N}
(Experimental) Defines a high-level, user-facing, expression type that encapsulates an expression tree (like Node
) along with associated metadata for evaluation and rendering.
Fields
tree::N
: The root node of the raw expression tree.metadata::Metadata{D}
: A named tuple of settings for the expression, such as the operators and variable names.
Constructors
Expression(tree::AbstractExpressionNode, metadata::NamedTuple)
: Construct from the fields@parse_expression(expr, operators=operators, variable_names=variable_names, node_type=Node)
: Parse a Julia expression with a given context and create an Expression object.
Usage
This type is intended for end-users to interact with and manipulate expressions at a high level, abstracting away the complexities of the underlying expression tree operations.
These types allow you to define and manipulate expressions with a clear separation between the structure and the operators used.
Parametric Expressions
Parametric expressions are a type of expression that includes parameters which can be optimized during the search.
DynamicExpressions.ParametricExpressionModule.ParametricExpression
— TypeParametricExpression{T,N<:ParametricNode{T},D<:NamedTuple} <: AbstractExpression{T,N}
(Experimental) An expression to store parameters for a tree
DynamicExpressions.ParametricExpressionModule.ParametricNode
— TypeA type of expression node that also stores a parameter index
These types allow you to define expressions with parameters that can be tuned to fit the data better. You can specify the maximum number of parameters using the expression_options
argument in SRRegressor
.
Template Expressions
Template expressions allow you to specify predefined structures and constraints for your expressions. These use the new TemplateStructure
type to define how expressions should be combined and evaluated.
SymbolicRegression.TemplateExpressionModule.TemplateExpression
— TypeTemplateExpression{T,F,N,E,TS,D} <: AbstractStructuredExpression{T,F,N,E,D}
A symbolic expression that allows the combination of multiple sub-expressions in a structured way, with constraints on variable usage.
TemplateExpression
is designed for symbolic regression tasks where domain-specific knowledge or constraints must be imposed on the model's structure.
Constructor
TemplateExpression(trees; structure, operators, variable_names)
trees
: ANamedTuple
holding the sub-expressions (e.g.,f = Expression(...)
,g = Expression(...)
).structure
: ATemplateStructure
which holds functions that define how the sub-expressions are combined in different contexts.operators
: AnOperatorEnum
that defines the allowed operators for the sub-expressions.variable_names
: An optionalVector
ofString
that defines the names of the variables in the dataset.
Example
Let's create an example TemplateExpression
that combines two sub-expressions f(x1, x2)
and g(x3)
:
# Define operators and variable names
options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
operators = options.operators
variable_names = ["x1", "x2", "x3"]
# Create sub-expressions
x1 = Expression(Node{Float64}(; feature=1); operators, variable_names)
x2 = Expression(Node{Float64}(; feature=2); operators, variable_names)
x3 = Expression(Node{Float64}(; feature=3); operators, variable_names)
# Create TemplateExpression
example_expr = (; f=x1, g=x3)
st_expr = TemplateExpression(
example_expr;
structure=TemplateStructure{(:f, :g)}(nt -> sin(nt.f) + nt.g * nt.g),
operators,
variable_names,
)
We can also define constraints on which variables each sub-expression is allowed to access:
variable_constraints = (; f=[1, 2], g=[3])
st_expr = TemplateExpression(
example_expr;
structure=TemplateStructure(
nt -> sin(nt.f) + nt.g * nt.g; variable_constraints
),
operators,
variable_names,
)
When fitting a model in SymbolicRegression.jl, you would provide the TemplateExpression
as the expression_type
argument, and then pass expression_options=(; structure=TemplateStructure(...))
as additional options. The variable_constraints
will constraint f
to only have access to x1
and x2
, and g
to only have access to x3
.
SymbolicRegression.TemplateExpressionModule.TemplateStructure
— TypeTemplateStructure{K,S,N,E,C} <: Function
A struct that defines a prescribed structure for a TemplateExpression
, including functions that define the result of combining sub-expressions in different contexts.
The K
parameter is used to specify the symbols representing the inner expressions. If not declared using the constructor TemplateStructure{K}(...)
, the keys of the variable_constraints
NamedTuple
will be used to infer this.
Fields
combine
: Optional function taking aNamedTuple
of function keys => expressions, returning a single expression. Fallback method used byget_tree
on aTemplateExpression
to generate a singleExpression
.combine_vectors
: Optional function taking aNamedTuple
of function keys => vectors, returning a single vector. Used for evaluating the expression tree. You may optionally define a method with a second argumentX
for if you wish to include the data matrixX
(of shape[num_features, num_rows]
) in the computation.combine_strings
: Optional function taking aNamedTuple
of function keys => strings, returning a single string. Used for printing the expression tree.variable_constraints
: OptionalNamedTuple
that defines which variables each sub-expression is allowed to access. For example, requestingf(x1, x2)
andg(x3)
would be equivalent to(; f=[1, 2], g=[3])
.
Example usage:
# Define a template structure
structure = TemplateStructure(
combine=e -> e.f + e.g, # Create normal `Expression`
combine_vectors=e -> (e.f .+ e.g), # Output vector
combine_strings=e -> "($e.f) + ($e.g)", # Output string
variable_constraints=(; f=[1, 2], g=[3]) # Constrain dependencies
)
# Use in options
model = SRRegressor(;
expression_type=TemplateExpression,
expression_options=(; structure=structure)
)
The variable_constraints
field allows you to specify which variables can be used in different parts of the expression.
Population
Groups of equations are given as a population, which is an array of trees tagged with score, loss, and birthdate–-these values are given in the PopMember
.
SymbolicRegression.PopulationModule.Population
— TypePopulation(pop::Array{PopMember{T,L}, 1})
Create population from list of PopMembers.
Population(dataset::Dataset{T,L};
population_size, nlength::Int=3, options::AbstractOptions,
nfeatures::Int)
Create random population and score them on the dataset.
Population(X::AbstractMatrix{T}, y::AbstractVector{T};
population_size, nlength::Int=3,
options::AbstractOptions, nfeatures::Int,
loss_type::Type=Nothing)
Create random population and score them on the dataset.
Population members
SymbolicRegression.PopMemberModule.PopMember
— TypePopMember(t::AbstractExpression{T}, score::L, loss::L)
Create a population member with a birth date at the current time. The type of the Node
may be different from the type of the score and loss.
Arguments
t::AbstractExpression{T}
: The tree for the population member.score::L
: The score (normalized to a baseline, and offset by a complexity penalty)loss::L
: The raw loss to assign.
PopMember(
dataset::Dataset{T,L},
t::AbstractExpression{T},
options::AbstractOptions
)
Create a population member with a birth date at the current time. Automatically compute the score for this tree.
Arguments
dataset::Dataset{T,L}
: The dataset to evaluate the tree on.t::AbstractExpression{T}
: The tree for the population member.options::AbstractOptions
: What options to use.
Hall of Fame
SymbolicRegression.HallOfFameModule.HallOfFame
— TypeHallOfFame{T<:DATA_TYPE,L<:LOSS_TYPE}
List of the best members seen all time in .members
, with .members[c]
being the best member seen at complexity c. Including only the members which actually have been set, you can run .members[exists]
.
Fields
members::Array{PopMember{T,L},1}
: List of the best members seen all time. These are ordered by complexity, with.members[1]
the member with complexity 1.exists::Array{Bool,1}
: Whether the member at the given complexity has been set.
Dataset
SymbolicRegression.CoreModule.DatasetModule.Dataset
— TypeDataset{T<:DATA_TYPE,L<:LOSS_TYPE}
Fields
X::AbstractMatrix{T}
: The input features, with shape(nfeatures, n)
.y::AbstractVector{T}
: The desired output values, with shape(n,)
.index::Int
: The index of the output feature corresponding to this dataset, if any.n::Int
: The number of samples.nfeatures::Int
: The number of features.weights::Union{AbstractVector,Nothing}
: If the dataset is weighted, these specify the per-sample weight (with shape(n,)
).extra::NamedTuple
: Extra information to pass to a custom evaluation function. Since this is an arbitrary named tuple, you could pass any sort of dataset you wish to here.avg_y
: The average value ofy
(weighted, ifweights
are passed).use_baseline
: Whether to use a baseline loss. This will be set tofalse
if the baseline loss is calculated to beInf
.baseline_loss
: The loss of a constant function which predicts the average value ofy
. This is loss-dependent and should be updated withupdate_baseline_loss!
.variable_names::Array{String,1}
: The names of the features, with shape(nfeatures,)
.display_variable_names::Array{String,1}
: A version ofvariable_names
but for printing to the terminal (e.g., with unicode versions).y_variable_name::String
: The name of the output variable.X_units
: Unit information ofX
. When used, this is a vector ofDynamicQuantities.Quantity{<:Any,<:Dimensions}
with shape(nfeatures,)
.y_units
: Unit information ofy
. When used, this is a singleDynamicQuantities.Quantity{<:Any,<:Dimensions}
.X_sym_units
: Unit information ofX
. When used, this is a vector ofDynamicQuantities.Quantity{<:Any,<:SymbolicDimensions}
with shape(nfeatures,)
.y_sym_units
: Unit information ofy
. When used, this is a singleDynamicQuantities.Quantity{<:Any,<:SymbolicDimensions}
.
SymbolicRegression.LossFunctionsModule.update_baseline_loss!
— Functionupdate_baseline_loss!(dataset::Dataset{T,L}, options::AbstractOptions) where {T<:DATA_TYPE,L<:LOSS_TYPE}
Update the baseline loss of the dataset using the loss function specified in options
.