PySRRegressor Reference¶
Highperformance symbolic regression algorithm.
This is the scikitlearn interface for SymbolicRegression.jl. This model will automatically search for equations which fit a given dataset subject to a particular loss and set of constraints.
Most default parameters have been tuned over several example equations,
but you should adjust niterations
, binary_operators
, unary_operators
to your requirements. You can view more detailed explanations of the options
on the options page of the
documentation.
Parameters:
Name  Type  Description  Default 

model_selection 
str

Model selection criterion when selecting a final expression from
the list of best expression at each complexity.
Can be 
'best'

binary_operators 
list[str]

List of strings for binary operators used in the search.
See the operators page
for more details.
Default is 
None

unary_operators 
list[str]

Operators which only take a single scalar as input.
For example, 
None

niterations 
int

Number of iterations of the algorithm to run. The best
equations are printed and migrate between populations at the
end of each iteration.
Default is 
40

populations 
int

Number of populations running.
Default is 
15

population_size 
int

Number of individuals in each population.
Default is 
33

max_evals 
int

Limits the total number of evaluations of expressions to
this number. Default is 
None

maxsize 
int

Max complexity of an equation. Default is 
20

maxdepth 
int

Max depth of an equation. You can use both 
None

warmup_maxsize_by 
float

Whether to slowly increase max size from a small number up to
the maxsize (if greater than 0). If greater than 0, says the
fraction of training time at which the current maxsize will
reach the userpassed maxsize.
Default is 
0.0

timeout_in_seconds 
float

Make the search return early once this many seconds have passed.
Default is 
None

constraints 
dict[str, int  tuple[int, int]]

Dictionary of int (unary) or 2tuples (binary), this enforces
maxsize constraints on the individual arguments of operators.
E.g., 
None

nested_constraints 
dict[str, dict]

Specifies how many times a combination of operators can be
nested. For example, 
None

loss 
str

String of Julia code specifying the loss function. Can either
be a loss from LossFunctions.jl, or your own loss written as a
function. Examples of custom written losses include:

'L2DistLoss()'

complexity_of_operators 
dict[str, float]

If you would like to use a complexity other than 1 for an
operator, specify the complexity here. For example,

None

complexity_of_constants 
float

Complexity of constants. Default is 
1

complexity_of_variables 
float

Complexity of variables. Default is 
1

parsimony 
float

Multiplicative factor for how much to punish complexity.
Default is 
0.0032

use_frequency 
bool

Whether to measure the frequency of complexities, and use that
instead of parsimony to explore equation space. Will naturally
find equations of all complexities.
Default is 
True

use_frequency_in_tournament 
bool

Whether to use the frequency mentioned above in the tournament,
rather than just the simulated annealing.
Default is 
True

alpha 
float

Initial temperature for simulated annealing
(requires 
0.1

annealing 
bool

Whether to use annealing. Default is 
False

early_stop_condition 
float  str

Stop the search early if this loss is reached. You may also
pass a string containing a Julia function which
takes a loss and complexity as input, for example:

None

ncyclesperiteration 
int

Number of total mutations to run, per 10 samples of the
population, per iteration.
Default is 
550

fraction_replaced 
float

How much of population to replace with migrating equations from
other populations.
Default is 
0.000364

fraction_replaced_hof 
float

How much of population to replace with migrating equations from
hall of fame. Default is 
0.035

weight_add_node 
float

Relative likelihood for mutation to add a node.
Default is 
0.79

weight_insert_node 
float

Relative likelihood for mutation to insert a node.
Default is 
5.1

weight_delete_node 
float

Relative likelihood for mutation to delete a node.
Default is 
1.7

weight_do_nothing 
float

Relative likelihood for mutation to leave the individual.
Default is 
0.21

weight_mutate_constant 
float

Relative likelihood for mutation to change the constant slightly
in a random direction.
Default is 
0.048

weight_mutate_operator 
float

Relative likelihood for mutation to swap an operator.
Default is 
0.47

weight_randomize 
float

Relative likelihood for mutation to completely delete and then
randomly generate the equation
Default is 
0.00023

weight_simplify 
float

Relative likelihood for mutation to simplify constant parts by evaluation
Default is 
0.002

crossover_probability 
float

Absolute probability of crossovertype genetic operation, instead of a mutation.
Default is 
0.066

skip_mutation_failures 
bool

Whether to skip mutation and crossover failures, rather than
simply resampling the current member.
Default is 
True

migration 
bool

Whether to migrate. Default is 
True

hof_migration 
bool

Whether to have the hall of fame migrate. Default is 
True

topn 
int

How many top individuals migrate from each population.
Default is 
12

should_optimize_constants 
bool

Whether to numerically optimize constants (NelderMead/Newton)
at the end of each iteration. Default is 
True

optimizer_algorithm 
str

Optimization scheme to use for optimizing constants. Can currently
be 
'BFGS'

optimizer_nrestarts 
int

Number of time to restart the constants optimization process with
different initial conditions.
Default is 
2

optimize_probability 
float

Probability of optimizing the constants during a single iteration of
the evolutionary algorithm.
Default is 
0.14

optimizer_iterations 
int

Number of iterations that the constants optimizer can take.
Default is 
8

perturbation_factor 
float

Constants are perturbed by a max factor of
(perturbation_factor*T + 1). Either multiplied by this or
divided by this.
Default is 
0.076

tournament_selection_n 
int

Number of expressions to consider in each tournament.
Default is 
10

tournament_selection_p 
float

Probability of selecting the best expression in each
tournament. The probability will decay as p*(1p)^n for other
expressions, sorted by loss.
Default is 
0.86

procs 
int

Number of processes (=number of populations running).
Default is 
cpu_count()

multithreading 
bool

Use multithreading instead of distributed backend.
Using procs=0 will turn off both. Default is 
None

cluster_manager 
str

For distributed computing, this sets the job queue system. Set
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
"htc". If set to one of these, PySR will run in distributed
mode, and use 
None

batching 
bool

Whether to compare population members on small batches during
evolution. Still uses full dataset for comparing against hall
of fame. Default is 
False

batch_size 
int

The amount of data to use if doing batching. Default is 
50

fast_cycle 
bool

Batch over population subsamples. This is a slightly different
algorithm than regularized evolution, but does cycles 15%
faster. May be algorithmically less efficient.
Default is 
False

precision 
int

What precision to use for the data. By default this is 
32

random_state 
int, Numpy RandomState instance or None

Pass an int for reproducible results across multiple function calls.
See :term: 
None

deterministic 
bool

Make a PySR search give the same result every run.
To use this, you must turn off parallelism
(with 
False

warm_start 
bool

Tells fit to continue from where the last call to fit finished.
If false, each call to fit will be fresh, overwriting previous results.
Default is 
False

verbosity 
int

What verbosity level to use. 0 means minimal print statements.
Default is 
1000000000.0

update_verbosity 
int

What verbosity level to use for package updates.
Will take value of 
None

progress 
bool

Whether to use a progress bar instead of printing to stdout.
Default is 
True

equation_file 
str

Where to save the files (.csv extension).
Default is 
None

temp_equation_file 
bool

Whether to put the hall of fame file in the temp directory.
Deletion is then controlled with the 
False

tempdir 
str

directory for the temporary files. Default is 
None

delete_tempfiles 
bool

Whether to delete the temporary files after finishing.
Default is 
True

julia_project 
str

A Julia environment location containing a Project.toml (and potentially the source code for SymbolicRegression.jl). Default gives the Python package directory, where a Project.toml file should be present from the install. 
None

update 
Whether to automatically update Julia packages.
Default is 
True


output_jax_format 
bool

Whether to create a 'jax_format' column in the output,
containing jaxcallable functions and the default parameters in
a jax array.
Default is 
False

output_torch_format 
bool

Whether to create a 'torch_format' column in the output,
containing a torch module with trainable parameters.
Default is 
False

extra_sympy_mappings 
dict[str, Callable]

Provides mappings between custom 
None

extra_jax_mappings 
dict[Callable, str]

Similar to 
None

extra_torch_mappings 
dict[Callable, Callable]

The same as 
None

denoise 
bool

Whether to use a Gaussian Process to denoise the data before
inputting to PySR. Can help PySR fit noisy data.
Default is 
False

select_k_features 
int

whether to run feature selection in Python using random forests,
before passing to the symbolic regression code. None means no
feature selection; an int means select that many features.
Default is 
None

**kwargs 
dict

Supports deprecated keyword arguments. Other arguments will result in an error. 
{}

Attributes 
required  
equations_ 
pandas.DataFrame  list[pandas.DataFrame]

Processed DataFrame containing the results of model fitting. 
required 
n_features_in_ 
int

Number of features seen during :term: 
required 
feature_names_in_ 
ndarray of shape (

Names of features seen during :term: 
required 
nout_ 
int

Number of output dimensions. 
required 
selection_mask_ 
list[int] of length

List of indices for input features that are selected when

required 
tempdir_ 
Path

Path to the temporary equations directory. 
required 
equation_file_ 
str

Output equation file name produced by the julia backend. 
required 
raw_julia_state_ 
tuple[list[PyCall.jlwrap], PyCall.jlwrap]

The state for the julia SymbolicRegression.jl backend post fitting. 
required 
equation_file_contents_ 
list[pandas.DataFrame]

Contents of the equation file output by the Julia backend. 
required 
show_pickle_warnings_ 
bool

Whether to show warnings about what attributes can be pickled. 
required 
Examples:
>>> import numpy as np
>>> from pysr import PySRRegressor
>>> randstate = np.random.RandomState(0)
>>> X = 2 * randstate.randn(100, 5)
>>> # y = 2.5382 * cos(x_3) + x_0  0.5
>>> y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2  0.5
>>> model = PySRRegressor(
... niterations=40,
... binary_operators=["+", "*"],
... unary_operators=[
... "cos",
... "exp",
... "sin",
... "inv(x) = 1/x", # Custom operator (julia syntax)
... ],
... model_selection="best",
... loss="loss(x, y) = (x  y)^2", # Custom loss function (julia syntax)
... )
>>> model.fit(X, y)
>>> model
PySRRegressor.equations_ = [
0 0.000000 3.8552167 3.360272e+01 1
1 1.189847 (x0 * x0) 3.110905e+00 3
2 0.010626 ((x0 * x0) + 0.25573406) 3.045491e+00 5
3 0.896632 (cos(x3) + (x0 * x0)) 1.242382e+00 6
4 0.811362 ((x0 * x0) + (cos(x3) * 2.4384754)) 2.451971e01 8
5 >>>> 13.733371 (((cos(x3) * 2.5382) + (x0 * x0)) + 0.5) 2.889755e13 10
6 0.194695 ((x0 * x0) + (((cos(x3) + 0.063180044) * 2.53... 1.957723e13 12
7 0.006988 ((x0 * x0) + (((cos(x3) + 0.32505524) * 1.538... 1.944089e13 13
8 0.000955 (((((x0 * x0) + cos(x3)) + 0.8251649) + (cos(... 1.940381e13 15
]
>>> model.score(X, y)
1.0
>>> model.predict(np.array([1,2,3,4,5]))
array([1.15907818, 1.15907818, 1.15907818, 1.15907818, 1.15907818])
Source code in pysr/sr.py
628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 

pysr.sr.PySRRegressor.fit(X, y, Xresampled=None, weights=None, variable_names=None)
¶
Search for equations to fit the dataset and store them in self.equations_
.
Parameters:
Name  Type  Description  Default 

X 
ndarray  pandas.DataFrame

Training data of shape (n_samples, n_features). 
required 
y 
ndarray  pandas.DataFrame

Target values of shape (n_samples,) or (n_samples, n_targets). Will be cast to X's dtype if necessary. 
required 
Xresampled 
ndarray  pandas.DataFrame

Resampled training data, of shape (n_resampled, n_features),
to generate a denoised data on. This
will be used as the training data, rather than 
None

weights 
ndarray  pandas.DataFrame

Weight array of the same shape as 
None

variable_names 
list[str]

A list of names for the variables, rather than "x0", "x1", etc.
If 
None

Returns:
Name  Type  Description 

self 
object

Fitted estimator. 
Source code in pysr/sr.py
1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 

pysr.sr.PySRRegressor.predict(X, index=None)
¶
Predict y from input X using the equation chosen by model_selection
.
You may see what equation is used by printing this object. X should have the same columns as the training data.
Parameters:
Name  Type  Description  Default 

X 
ndarray  pandas.DataFrame

Training data of shape 
required 
index 
int  list[int]

If you want to compute the output of an expression using a
particular row of 
None

Returns:
Name  Type  Description 

y_predicted 
ndarray of shape (n_samples, nout_)

Values predicted by substituting 
Raises:
Type  Description 

ValueError

Raises if the 
Source code in pysr/sr.py
1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 

pysr.sr.PySRRegressor.from_file(equation_file, *, binary_operators=None, unary_operators=None, n_features_in=None, feature_names_in=None, selection_mask=None, nout=1, **pysr_kwargs)
classmethod
¶
Create a model from a saved model checkpoint or equation file.
Parameters:
Name  Type  Description  Default 

equation_file 
str

Path to a pickle file containing a saved model, or a csv file containing equations. 
required 
binary_operators 
list[str]

The same binary operators used when creating the model. Not needed if loading from a pickle file. 
None

unary_operators 
list[str]

The same unary operators used when creating the model. Not needed if loading from a pickle file. 
None

n_features_in 
int

Number of features passed to the model. Not needed if loading from a pickle file. 
None

feature_names_in 
list[str]

Names of the features passed to the model. Not needed if loading from a pickle file. 
None

selection_mask 
list[bool]

If using select_k_features, you must pass 
None

nout 
int

Number of outputs of the model.
Not needed if loading from a pickle file.
Default is 
1

**pysr_kwargs 
dict

Any other keyword arguments to initialize the PySRRegressor object. These will overwrite those stored in the pickle file. Not needed if loading from a pickle file. 
{}

Returns:
Name  Type  Description 

model 
PySRRegressor

The model with fitted equations. 
Source code in pysr/sr.py
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 

pysr.sr.PySRRegressor.sympy(index=None)
¶
Return sympy representation of the equation(s) chosen by model_selection
.
Parameters:
Name  Type  Description  Default 

index 
int  list[int]

If you wish to select a particular equation from

None

Returns:
Name  Type  Description 

best_equation 
str, list[str] of length nout_

SymPy representation of the best equation. 
Source code in pysr/sr.py
pysr.sr.PySRRegressor.latex(index=None, precision=3)
¶
Return latex representation of the equation(s) chosen by model_selection
.
Parameters:
Name  Type  Description  Default 

index 
int  list[int]

If you wish to select a particular equation from

None

precision 
int

The number of significant figures shown in the LaTeX
representation.
Default is 
3

Returns:
Name  Type  Description 

best_equation 
str or list[str] of length nout_

LaTeX expression of the best equation. 
Source code in pysr/sr.py
pysr.sr.PySRRegressor.pytorch(index=None)
¶
Return pytorch representation of the equation(s) chosen by model_selection
.
Each equation (multiple given if there are multiple outputs) is a PyTorch module
containing the parameters as trainable attributes. You can use the module like
any other PyTorch module: module(X)
, where X
is a tensor with the same
column ordering as trained with.
Parameters:
Name  Type  Description  Default 

index 
int  list[int]

If you wish to select a particular equation from

None

Returns:
Name  Type  Description 

best_equation 
torch.nn.Module

PyTorch module representing the expression. 
Source code in pysr/sr.py
pysr.sr.PySRRegressor.jax(index=None)
¶
Return jax representation of the equation(s) chosen by model_selection
.
Each equation (multiple given if there are multiple outputs) is a dictionary
containing {"callable": func, "parameters": params}. To call func
, pass
func(X, params). This function is differentiable using jax.grad
.
Parameters:
Name  Type  Description  Default 

index 
int  list[int]

If you wish to select a particular equation from

None

Returns:
Name  Type  Description 

best_equation 
dict[str, Any]

Dictionary of callable jax function in "callable" key, and jax array of parameters as "parameters" key. 
Source code in pysr/sr.py
pysr.sr.PySRRegressor.latex_table(indices=None, precision=3, columns=['equation', 'complexity', 'loss', 'score'])
¶
Create a LaTeX/booktabs table for all, or some, of the equations.
Parameters:
Name  Type  Description  Default 

indices 
list[int]  list[list[int]]

If you wish to select a particular subset of equations from

None

precision 
int

The number of significant figures shown in the LaTeX
representations.
Default is 
3

columns 
list[str]

Which columns to include in the table.
Default is 
['equation', 'complexity', 'loss', 'score']

Returns:
Name  Type  Description 

latex_table_str 
str

A string that will render a table in LaTeX of the equations. 
Source code in pysr/sr.py
pysr.sr.PySRRegressor.refresh(checkpoint_file=None)
¶
Update self.equations_ with any new options passed.
For example, updating extra_sympy_mappings
will require a .refresh()
to update the equations.
Parameters:
Name  Type  Description  Default 

checkpoint_file 
str

Path to checkpoint hall of fame file to be loaded.
The default will use the set 
None
