Learning Symbolic Physics with Graph Networks

Miles D. Cranmer

Princeton University

Princeton, NJ, USA

mcranmer@princeton.edu

Rui Xu

Princeton University

Princeton, NJ, USA

ruix@princeton.edu

Peter Battaglia

DeepMind

London, UK

peterbattaglia@google.com

Shirley Ho

Flatiron Institute

New York City, NY, USA

shirleyho@flatironinstitute.org

Abstract

We introduce an approach for imposing physically motivated inductive biases on

graph networks to learn interpretable representations and improved zero-shot gen-

eralization. Our experiments show that our graph network models, which imple-

ment this inductive bias, can learn message representations equivalent to the true

force vector when trained on n-body gravitational and spring-like simulations. We

use symbolic regression to ﬁt explicit algebraic equations to our trained model’s

message function and recover the symbolic form of Newton’s law of gravitation

without prior knowledge. We also show that our model generalizes better at infer-

ence time to systems with more bodies than had been experienced during training.

Our approach is extensible, in principle, to any unknown interaction law learned

by a graph network, and offers a valuable technique for interpreting and inferring

explicit causal theories about the world from implicit knowledge captured by deep

learning.

1 Introduction

Discovering laws through observation of natural phenomenon is the central challenge of the sci-

ences. Modern deep learning also involves discovering knowledge about the world but focuses

mostly on implicit knowledge representations, rather than explicit and interpretable ones. One rea-

son is that the goal of deep learning is often optimizing test accuracy and learning efﬁciency in

narrowly speciﬁed domains, while science seeks causal explanations and general-purpose knowl-

edge across a wide range of phenomena. Here we explore an approach for imposing physically

motivated inductive biases on neural networks, training them to predict the dynamics of physical

systems and interpreting their learned representations and computations to discover the symbolic

physical laws which govern the systems. Moreover, our results also show that this approach im-

proves the generalization performance of the learned models.

The ﬁrst ingredient in our approach is the “graph network” (GN) [Battaglia et al., 2018], a type of

graph neural network [Scarselli et al., 2009, Bronstein et al., 2017, Gilmer et al., 2017], which is

effective at learning the dynamics of complex physical systems [Battaglia et al., 2016, Chang et al.,

2016, Sanchez-Gonzalez et al., 2018, Mrowca et al., 2018, Li et al., 2018, Kipf et al., 2018]. We

impose inductive biases on the architecture and train models with supervised learning to predict the

dynamics of 2D and 3D n-body gravitational systems and a hanging string. If the trained models

can accurately predict the physical dynamics of held-out test data, we can assume they have dis-

covered some level of general-purpose physical knowledge, which is implicitly encoded in their

Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada.

weights. Crucially, we recognize that the forms of the graph network’s message and pooling func-

tions have correspondences to the forms of force and superposition in classical mechanics, respec-

tively. The message pooling is what we call a “linearized latent space:” a vector space where latent

representations of the interactions between bodies (forces or messages) are linear (summable). By

imposing our inductive bias, we encourage the GN’s linearized latent space to match the true one.

Some other interesting approaches for learning low-dimensional general dynamical models include

Packard et al. [1980], Daniels and Nemenman [2015], and Jaques et al. [2019].

The second ingredient is using symbolic regression — we use eureqa from Schmidt and Lipson

[2009] — to ﬁt compact algebraic expressions to a set of inputs and messages produced by our

trained model. eureqa works by randomly combining mathematical building blocks such as math-

ematical operators, analytic functions, constants, and state variables, and iteratively searches the

space of mathematical expressions to ﬁnd the model that best ﬁts a given dataset. The resulting

symbolic expressions are interpretable and readily comparable with physical laws.

The contributions of this paper are:

1. A modiﬁed GN with inductive biases that promote learning general-purpose physical laws.

2. Using symbolic regression to extract analytical physical laws from trained neural networks.

3. Improved zero-shot generalization to larger systems than those in training.

2 Model

Graph networks are a type of deep neural network which operates on graph-structured data. The

format of the graphs on which GNs operate is deﬁned as 3-tuples

, G = (u, V, E), where:

u ∈ R

is a global attribute vector of length L

V = {v

}

i=1

is a set of node attribute vectors, v

∈ R

of length L

, and

E = {(e

, r

, s

)}

k=1

is a set of edge attribute vectors, e

∈ R

of length L

, and indices

, s

∈ {1

} of the “receiver” and “sender” nodes connected by the k-th edge.

Our GN implementation is depicted in ﬁg. 1. Note: it does not include global and edge attributes.

This GN processes a graph by ﬁrst computing pairwise interactions, or “messages”, e

, between

nodes connected by edges, with a “message function”, φ

×R

→ R

. Next, the set

of messages incident on each i-th receiver node are pooled into ¯e

= ρ

e→v

({e

}

=i,k=1

), where

¯e

∈ R

, and ρ

e→v

is a permutation-invariant operation which can take variable numbers of input

vectors, such as elementwise summation. Finally, the pooled messages are used to compute node

updates, v

, with a “node update function”, φ

× R

→ R

. Our speciﬁc architectural

implementation is very similar to the “interaction network” (IN) variant [Battaglia et al., 2016].

The forms of φ

, ρ

e→v

, φ

, and the associated input and output attribute vectors have correspon-

dences to Newton’s formulation of classical mechanics, which motivated the original development

of INs. The key observation is that e

could learn to correspond to the force vector imposed on

the r

-th body due to its interaction with the s

-th body. In our examples, the force vector is equal

to the derivative of the Lagrangian:

δL

δq

, and this could be generally imposed if one knows

(

δL

)

and manually integrates the ODE with the output of the graph net. In a general n-body gravitational

system in n dimensions, note that the forces are minimally represented in an R

vector space. Thus,

if L

= n, we exploit the GN’s “linearized latent space” for physical interpretability: we encourage

to be the force.

We sketch a non-rigorous proof-like demonstration of our hypothesis. Newtonian mechanics pre-

scribes that force vectors, f

∈ F, can be summed to produce a net force,

f ∈ F,

which can then be used to update the dynamics of a body. Our model uses the i-th body’s pooled

messages, ¯e

, to update the body’s velocity via Euler integration, v

= v

+ φ

, ¯e

). If we as-

sume our GN is trained to predict velocity updates perfectly for any number of bodies, this means

= φ

(

) = φ

(¯e

), where φ

(·) = φ

, ·). We have the result for

a single interaction:

= f

k,r

= φ

k,r

) = φ

(¯e

). Thus, we can substitute into the

multi-interaction case:

) = φ

(¯e

) = φ

(

), and so φ

has to be a linear

We adhere closely to the notation used in Battaglia et al. [2018] to formalize our model.

Nodes

Node Pairs

Messages ( )

Summed

Message

First MLP

( )

Updated

Nodes

Node

Update

Second MLP

( )

Backpropagation

to learn the

simulator

Convert to Symbolic Equation

via Symbolic Regression

Single timestep

Next timestep

Minimize

Dimension

Figure 1: A schematic depicting how we extract physical knowledge from a GN.

transformation. Therefore, for cases where φ

is invertible (mapping between the same dimensional

space), e

= (φ

)

−1

), and so the message vectors are linear transformations of the true forces

when L

= D. We demonstrate this hypothesis on trained GNs in section 3.

3 Experiments

We set up 100,000 simulations with random masses and initial conditions for both a 1/r and 1/r

force law in 2D, a 1/r

law in 3D, and a string with an r

force law between nodes in 2D with a

global gravity, for 1000 time steps each. These laws are chosen arbitrarily as examples of different

symbolic forms. The three n-body problems have six bodies in their training set, and the string has

ten nodes, of which the two end nodes are ﬁxed. We train a GN on each of these problems where we

choose L

= D, i.e., the length of the message vectors in the GN matches the dimensionality of the

force vectors: 2 for the 2D simulations and 3 for the 3D simulations. Our GN, a pure TensorFlow

[Abadi et al., 2015] model, has both φ

and φ

as three-hidden-layer multilayer perceptrons (MLPs)

with 128 hidden nodes per layer with ReLU activations. We optimize the L1 loss between the

predicted velocity update and the true velocity update of each node.

Once we have a trained model, we record the messages, e

, for all bodies over 1000 new simulations

for each environment. We ﬁt a linear combination of the vector components of the true force, f

, to

each component of e

, as can be seen in ﬁg. 2 for 1/r. The results for each system show that the e

vectors have learned to be a linear combination of the components when L

= D. We see similar

linear relations for all other simulations.

We are also able to ﬁnd the force law when it is unknown by using symbolic regression to ﬁt an

algebraic function that approximates φ

. We demonstrate this on the trained GN for the 1/r problem

using eureqa from Schmidt and Lipson [2009] to ﬁt algebraic equations that ﬁt the message. We

allow it to use algebraic operators +, −, ×, /, as well as input variables (∆x and ∆y for component

separation, r for distance, and m

for sending body mass) and real constants. Complexity is scored

by counting the number of occurences of each operator, constant, and input variable. This returns a

list of the models with the lowest mean square error at each complexity. We parametrize Occam’s

razor to ﬁnd the “best” algebraic model by ﬁrst sorting the best models by complexity, and then

taking the model that maximizes the fractional drop in mean square error (MSE) over the next

simplest model: arg max

(−∆ log(MSE

)/∆c), where c is the complexity. The best model found

by the symbolic regression for the ﬁrst output element of φ

is (0.46m

∆y − 1.55m

∆x)/r

Figure 2: These plots demonstrate that the graph network’s messages have learned to be linear

transformations of the two vector components of the true force: f

and f

, for the 1/r law in 2D.

which is a linear combination of the components of the true force, m

ˆr/r. We can see this is

approximately the same linear transformation as the components in the left plot of ﬁg. 2, but this

algebraic expression was learned from scratch.

We now test whether the GN will generalize to more nodes better than a GN with a larger L

. This

is because it is possible for a GN to “cheat” with a high dimension message-passing space, trained

on a ﬁxed number of bodies. One example of cheating would be for φ

to concatenate each sending

node’s properties along the message, and φ

to calculate forces from these and add them. When

a new body is added, this calculation might break. While it is still possible for φ

to develop an

elaborate encoding scheme with L

= D to cheat at this problem, it seems more natural for φ

learn the true force when L

= D and therefore show improved generalization to a greater number

of nodes.

Figure 3: These plots demonstrate the improvement in generalization from minimizing the message

passing space. The loss of GNs with different message-passing space dimension (L

), trained on

a 6-body and 4-body system, in the left and right plots, respectively (indicated by the vertical line),

are tested on a variable number of bodies in a 1/r

simulation in 3D.

We test the hypothesis of better generalization with L

= D in ﬁg. 3, by training GNs with different

on the 3D 1/r

simulations. The observed trend is that systems with L

> D see their loss

blow up with a larger number of bodies — presumably because they have “cheated” slightly and not

learned the force law in φ

but in a combination of φ

and φ

, whereas the L

∈ {2, 3} systems’

has learned a projection of the true forces and is able to generalize better for greater number of

bodies. A conclusion of this may be that one can optimize GNs by minimizing L

to the known

minimum dimension required to transmit information (e.g., 3 for 3D forces), or, if this dimension is

unknown, until the loss drops off.

4 Conclusion

We have demonstrated an approach for imposing physically motivated inductive biases on graph net-

works to learn interpretable representations and improved zero-shot generalization. We have shown

through experiment that our graph network models which implement this inductive bias can learn

message representations equivalent to the true force vector for n-body gravitational and spring-like

simulations in 2D and 3D. We also have demonstrated a generic technique for ﬁnding an unknown

force law: symbolic regression models to ﬁt explicit algebraic equations to our trained model’s mes-

sage function. Because GNs have more explicit sub-structure than their more homogeneous deep

learning relatives (e.g., plain MLPs, convolutional networks), we can draw more ﬁne-grained inter-

pretations of their learned representations and computations. Finally, we have demonstrated that our

model generalizes better at inference time to systems with more bodies than had been experienced

during training.

Acknowledgments: Miles Cranmer and Rui Xu thank Professor S.Y. Kung for insightful suggestions

on early work, as well as Zejiang Hou for his comments on an early presentation. Miles Cranmer

would like to thank David Spergel for advice on this project, and Thomas Kipf, Alvaro Sanchez,

and members of the DeepMind team for helpful comments on a draft of this paper. We thank the

referees for insightful comments that both improved this paper and inspired future work.

References

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean,

M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefow-

icz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man

e, R. Monga, S. Moore, D. Murray, C. Olah,

M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va-

sudevan, F. Vi

egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng.

TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https:

//www.tensorflow.org/. Software available from tensorﬂow.org.

P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende, et al. Interaction networks for learning about objects,

relations and physics. In Advances in neural information processing systems, pages 4502–4510,

2016.

P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tac-

chetti, D. Raposo, A. Santoro, R. Faulkner, et al. Relational inductive biases, deep learning, and

graph networks. arXiv preprint arXiv:1806.01261, 2018.

M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning:

going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.

M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum. A compositional object-based approach

to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.

B. C. Daniels and I. Nemenman. Automated adaptive inference of phenomenological dynamical

models. Nature Communications, 6(1):1–8, Aug. 2015. ISSN 2041-1723.

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for

quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-

Volume 70, pages 1263–1272. JMLR. org, 2017.

M. Jaques, M. Burke, and T. Hospedales. Physics-as-Inverse-Graphics: Joint Unsupervised Learning

of Objects and Physics from Video. arXiv:1905.11169 [cs], May 2019.

T. Kipf, E. Fetaya, K.-C. Wang, M. Welling, and R. Zemel. Neural relational inference for interacting

systems. arXiv preprint arXiv:1802.04687, 2018.

Y. Li, J. Wu, R. Tedrake, J. B. Tenenbaum, and A. Torralba. Learning particle dynamics for manip-

ulating rigid bodies, deformable objects, and ﬂuids. arXiv preprint arXiv:1810.01566, 2018.

D. Mrowca, C. Zhuang, E. Wang, N. Haber, L. F. Fei-Fei, J. Tenenbaum, and D. L. Yamins. Flex-

ible neural representation for physics prediction. In Advances in Neural Information Processing

Systems, pages 8799–8810, 2018.

N. H. Packard, J. P. Crutchﬁeld, J. D. Farmer, and R. S. Shaw. Geometry from a time series. Physical

review letters, 45(9):712, 1980.

A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and

P. Battaglia. Graph networks as learnable physics engines for inference and control. arXiv preprint

arXiv:1806.01242, 2018.

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network

model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.

M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data. Science, 324

(5923):81–85, 2009.