Learning Symbolic Physics with Graph Networks
Miles D. Cranmer
Princeton University
Princeton, NJ, USA
mcranmer@princeton.edu
Rui Xu
Princeton University
Princeton, NJ, USA
ruix@princeton.edu
Peter Battaglia
DeepMind
London, UK
peterbattaglia@google.com
Shirley Ho
Flatiron Institute
New York City, NY, USA
shirleyho@flatironinstitute.org
Abstract
We introduce an approach for imposing physically motivated inductive biases on
graph networks to learn interpretable representations and improved zero-shot gen-
eralization. Our experiments show that our graph network models, which imple-
ment this inductive bias, can learn message representations equivalent to the true
force vector when trained on n-body gravitational and spring-like simulations. We
use symbolic regression to fit explicit algebraic equations to our trained model’s
message function and recover the symbolic form of Newton’s law of gravitation
without prior knowledge. We also show that our model generalizes better at infer-
ence time to systems with more bodies than had been experienced during training.
Our approach is extensible, in principle, to any unknown interaction law learned
by a graph network, and offers a valuable technique for interpreting and inferring
explicit causal theories about the world from implicit knowledge captured by deep
learning.
1 Introduction
Discovering laws through observation of natural phenomenon is the central challenge of the sci-
ences. Modern deep learning also involves discovering knowledge about the world but focuses
mostly on implicit knowledge representations, rather than explicit and interpretable ones. One rea-
son is that the goal of deep learning is often optimizing test accuracy and learning efficiency in
narrowly specified domains, while science seeks causal explanations and general-purpose knowl-
edge across a wide range of phenomena. Here we explore an approach for imposing physically
motivated inductive biases on neural networks, training them to predict the dynamics of physical
systems and interpreting their learned representations and computations to discover the symbolic
physical laws which govern the systems. Moreover, our results also show that this approach im-
proves the generalization performance of the learned models.
The first ingredient in our approach is the “graph network” (GN) [Battaglia et al., 2018], a type of
graph neural network [Scarselli et al., 2009, Bronstein et al., 2017, Gilmer et al., 2017], which is
effective at learning the dynamics of complex physical systems [Battaglia et al., 2016, Chang et al.,
2016, Sanchez-Gonzalez et al., 2018, Mrowca et al., 2018, Li et al., 2018, Kipf et al., 2018]. We
impose inductive biases on the architecture and train models with supervised learning to predict the
dynamics of 2D and 3D n-body gravitational systems and a hanging string. If the trained models
can accurately predict the physical dynamics of held-out test data, we can assume they have dis-
covered some level of general-purpose physical knowledge, which is implicitly encoded in their
Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada.
weights. Crucially, we recognize that the forms of the graph network’s message and pooling func-
tions have correspondences to the forms of force and superposition in classical mechanics, respec-
tively. The message pooling is what we call a “linearized latent space:” a vector space where latent
representations of the interactions between bodies (forces or messages) are linear (summable). By
imposing our inductive bias, we encourage the GN’s linearized latent space to match the true one.
Some other interesting approaches for learning low-dimensional general dynamical models include
Packard et al. [1980], Daniels and Nemenman [2015], and Jaques et al. [2019].
The second ingredient is using symbolic regression we use eureqa from Schmidt and Lipson
[2009] to fit compact algebraic expressions to a set of inputs and messages produced by our
trained model. eureqa works by randomly combining mathematical building blocks such as math-
ematical operators, analytic functions, constants, and state variables, and iteratively searches the
space of mathematical expressions to find the model that best fits a given dataset. The resulting
symbolic expressions are interpretable and readily comparable with physical laws.
The contributions of this paper are:
1. A modified GN with inductive biases that promote learning general-purpose physical laws.
2. Using symbolic regression to extract analytical physical laws from trained neural networks.
3. Improved zero-shot generalization to larger systems than those in training.
2 Model
Graph networks are a type of deep neural network which operates on graph-structured data. The
format of the graphs on which GNs operate is defined as 3-tuples
1
, G = (u, V, E), where:
u R
L
u
is a global attribute vector of length L
u
,
V = {v
i
}
i=1
:
N
v
is a set of node attribute vectors, v
i
R
L
v
of length L
v
, and
E = {(e
k
, r
k
, s
k
)}
k=1
:
N
e
is a set of edge attribute vectors, e
k
R
L
e
of length L
e
, and indices
r
k
, s
k
{1
:
N
v
} of the “receiver” and “sender” nodes connected by the k-th edge.
Our GN implementation is depicted in fig. 1. Note: it does not include global and edge attributes.
This GN processes a graph by first computing pairwise interactions, or “messages”, e
0
k
, between
nodes connected by edges, with a “message function”, φ
e
:
R
L
v
×R
L
v
×R
L
e
R
L
e
0
. Next, the set
of messages incident on each i-th receiver node are pooled into ¯e
0
i
= ρ
ev
({e
0
k
}
r
k
=i,k=1
:
N
e
), where
¯e
0
i
R
L
e
0
, and ρ
ev
is a permutation-invariant operation which can take variable numbers of input
vectors, such as elementwise summation. Finally, the pooled messages are used to compute node
updates, v
0
i
, with a “node update function”, φ
v
:
R
L
v
× R
L
e
0
R
L
v
0
. Our specific architectural
implementation is very similar to the “interaction network” (IN) variant [Battaglia et al., 2016].
The forms of φ
e
, ρ
ev
, φ
v
, and the associated input and output attribute vectors have correspon-
dences to Newton’s formulation of classical mechanics, which motivated the original development
of INs. The key observation is that e
0
k
could learn to correspond to the force vector imposed on
the r
k
-th body due to its interaction with the s
k
-th body. In our examples, the force vector is equal
to the derivative of the Lagrangian:
δL
δq
, and this could be generally imposed if one knows
d
dt
(
δL
δ
˙
q
)
and manually integrates the ODE with the output of the graph net. In a general n-body gravitational
system in n dimensions, note that the forces are minimally represented in an R
n
vector space. Thus,
if L
e
0
= n, we exploit the GN’s “linearized latent space” for physical interpretability: we encourage
e
0
k
to be the force.
We sketch a non-rigorous proof-like demonstration of our hypothesis. Newtonian mechanics pre-
scribes that force vectors, f
k
F, can be summed to produce a net force,
P
k
f
k
=
¯
f F,
which can then be used to update the dynamics of a body. Our model uses the i-th body’s pooled
messages, ¯e
0
i
, to update the body’s velocity via Euler integration, v
0
i
= v
i
+ φ
v
(v
i
, ¯e
0
i
). If we as-
sume our GN is trained to predict velocity updates perfectly for any number of bodies, this means
¯
f
i
=
P
r
k
=i
f
k
= φ
v
i
(
P
r
k
=i
e
0
k
) = φ
v
i
(¯e
0
i
), where φ
v
i
(·) = φ
v
(v
i
, ·). We have the result for
a single interaction:
¯
f
i
= f
k,r
k
=i
= φ
v
i
(e
0
k,r
k
=i
) = φ
v
i
(¯e
0
i
). Thus, we can substitute into the
multi-interaction case:
P
r
k
=i
φ
v
i
(e
0
k
) = φ
v
i
(¯e
0
i
) = φ
v
i
(
P
r
k
=i
e
0
k
), and so φ
v
i
has to be a linear
1
We adhere closely to the notation used in Battaglia et al. [2018] to formalize our model.
2
Nodes
Node Pairs
Messages ( )
Summed
Message
First MLP
( )
Updated
Nodes
Node
Update
Second MLP
( )
Backpropagation
to learn the
simulator
Convert to Symbolic Equation
via Symbolic Regression
Single timestep
Next timestep
Minimize
Dimension
Figure 1: A schematic depicting how we extract physical knowledge from a GN.
transformation. Therefore, for cases where φ
v
i
is invertible (mapping between the same dimensional
space), e
0
k
= (φ
v
i
)
1
(f
k
), and so the message vectors are linear transformations of the true forces
when L
e
0
= D. We demonstrate this hypothesis on trained GNs in section 3.
3 Experiments
We set up 100,000 simulations with random masses and initial conditions for both a 1/r and 1/r
2
force law in 2D, a 1/r
2
law in 3D, and a string with an r
2
force law between nodes in 2D with a
global gravity, for 1000 time steps each. These laws are chosen arbitrarily as examples of different
symbolic forms. The three n-body problems have six bodies in their training set, and the string has
ten nodes, of which the two end nodes are fixed. We train a GN on each of these problems where we
choose L
e
0
= D, i.e., the length of the message vectors in the GN matches the dimensionality of the
force vectors: 2 for the 2D simulations and 3 for the 3D simulations. Our GN, a pure TensorFlow
[Abadi et al., 2015] model, has both φ
e
and φ
v
as three-hidden-layer multilayer perceptrons (MLPs)
with 128 hidden nodes per layer with ReLU activations. We optimize the L1 loss between the
predicted velocity update and the true velocity update of each node.
Once we have a trained model, we record the messages, e
0
k
, for all bodies over 1000 new simulations
for each environment. We fit a linear combination of the vector components of the true force, f
k
, to
each component of e
0
k
, as can be seen in fig. 2 for 1/r. The results for each system show that the e
0
k
vectors have learned to be a linear combination of the components when L
e
0
= D. We see similar
linear relations for all other simulations.
We are also able to find the force law when it is unknown by using symbolic regression to fit an
algebraic function that approximates φ
e
. We demonstrate this on the trained GN for the 1/r problem
using eureqa from Schmidt and Lipson [2009] to fit algebraic equations that fit the message. We
allow it to use algebraic operators +, , ×, /, as well as input variables (x and y for component
separation, r for distance, and m
2
for sending body mass) and real constants. Complexity is scored
by counting the number of occurences of each operator, constant, and input variable. This returns a
list of the models with the lowest mean square error at each complexity. We parametrize Occam’s
razor to find the “best” algebraic model by first sorting the best models by complexity, and then
taking the model that maximizes the fractional drop in mean square error (MSE) over the next
simplest model: arg max
c
( log(MSE
c
)/c), where c is the complexity. The best model found
by the symbolic regression for the first output element of φ
e
is (0.46m
2
y 1.55m
2
x)/r
2
,
3
Figure 2: These plots demonstrate that the graph network’s messages have learned to be linear
transformations of the two vector components of the true force: f
x
and f
y
, for the 1/r law in 2D.
which is a linear combination of the components of the true force, m
2
ˆr/r. We can see this is
approximately the same linear transformation as the components in the left plot of fig. 2, but this
algebraic expression was learned from scratch.
We now test whether the GN will generalize to more nodes better than a GN with a larger L
e
0
. This
is because it is possible for a GN to “cheat” with a high dimension message-passing space, trained
on a fixed number of bodies. One example of cheating would be for φ
e
to concatenate each sending
node’s properties along the message, and φ
v
to calculate forces from these and add them. When
a new body is added, this calculation might break. While it is still possible for φ
e
to develop an
elaborate encoding scheme with L
e
0
= D to cheat at this problem, it seems more natural for φ
e
to
learn the true force when L
e
0
= D and therefore show improved generalization to a greater number
of nodes.
Figure 3: These plots demonstrate the improvement in generalization from minimizing the message
passing space. The loss of GNs with different message-passing space dimension (L
e
0
), trained on
a 6-body and 4-body system, in the left and right plots, respectively (indicated by the vertical line),
are tested on a variable number of bodies in a 1/r
2
simulation in 3D.
We test the hypothesis of better generalization with L
e
0
= D in fig. 3, by training GNs with different
L
e
0
on the 3D 1/r
2
simulations. The observed trend is that systems with L
e
0
> D see their loss
blow up with a larger number of bodies — presumably because they have “cheated” slightly and not
learned the force law in φ
e
but in a combination of φ
e
and φ
v
, whereas the L
e
0
{2, 3} systems’
φ
e
has learned a projection of the true forces and is able to generalize better for greater number of
bodies. A conclusion of this may be that one can optimize GNs by minimizing L
e
0
to the known
minimum dimension required to transmit information (e.g., 3 for 3D forces), or, if this dimension is
unknown, until the loss drops off.
4
4 Conclusion
We have demonstrated an approach for imposing physically motivated inductive biases on graph net-
works to learn interpretable representations and improved zero-shot generalization. We have shown
through experiment that our graph network models which implement this inductive bias can learn
message representations equivalent to the true force vector for n-body gravitational and spring-like
simulations in 2D and 3D. We also have demonstrated a generic technique for finding an unknown
force law: symbolic regression models to fit explicit algebraic equations to our trained model’s mes-
sage function. Because GNs have more explicit sub-structure than their more homogeneous deep
learning relatives (e.g., plain MLPs, convolutional networks), we can draw more fine-grained inter-
pretations of their learned representations and computations. Finally, we have demonstrated that our
model generalizes better at inference time to systems with more bodies than had been experienced
during training.
Acknowledgments: Miles Cranmer and Rui Xu thank Professor S.Y. Kung for insightful suggestions
on early work, as well as Zejiang Hou for his comments on an early presentation. Miles Cranmer
would like to thank David Spergel for advice on this project, and Thomas Kipf, Alvaro Sanchez,
and members of the DeepMind team for helpful comments on a draft of this paper. We thank the
referees for insightful comments that both improved this paper and inspired future work.
References
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean,
M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefow-
icz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man
´
e, R. Monga, S. Moore, D. Murray, C. Olah,
M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va-
sudevan, F. Vi
´
egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng.
TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https:
//www.tensorflow.org/. Software available from tensorflow.org.
P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende, et al. Interaction networks for learning about objects,
relations and physics. In Advances in neural information processing systems, pages 4502–4510,
2016.
P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tac-
chetti, D. Raposo, A. Santoro, R. Faulkner, et al. Relational inductive biases, deep learning, and
graph networks. arXiv preprint arXiv:1806.01261, 2018.
M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning:
going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum. A compositional object-based approach
to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
B. C. Daniels and I. Nemenman. Automated adaptive inference of phenomenological dynamical
models. Nature Communications, 6(1):1–8, Aug. 2015. ISSN 2041-1723.
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for
quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-
Volume 70, pages 1263–1272. JMLR. org, 2017.
M. Jaques, M. Burke, and T. Hospedales. Physics-as-Inverse-Graphics: Joint Unsupervised Learning
of Objects and Physics from Video. arXiv:1905.11169 [cs], May 2019.
T. Kipf, E. Fetaya, K.-C. Wang, M. Welling, and R. Zemel. Neural relational inference for interacting
systems. arXiv preprint arXiv:1802.04687, 2018.
Y. Li, J. Wu, R. Tedrake, J. B. Tenenbaum, and A. Torralba. Learning particle dynamics for manip-
ulating rigid bodies, deformable objects, and fluids. arXiv preprint arXiv:1810.01566, 2018.
D. Mrowca, C. Zhuang, E. Wang, N. Haber, L. F. Fei-Fei, J. Tenenbaum, and D. L. Yamins. Flex-
ible neural representation for physics prediction. In Advances in Neural Information Processing
Systems, pages 8799–8810, 2018.
5
N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw. Geometry from a time series. Physical
review letters, 45(9):712, 1980.
A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and
P. Battaglia. Graph networks as learnable physics engines for inference and control. arXiv preprint
arXiv:1806.01242, 2018.
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network
model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data. Science, 324
(5923):81–85, 2009.
6