When you can’t count, sample! Computable entropies beyond equilibrium from basin volumes.

,


I. INTRODUCTION
From steam engines to LCD displays and polymer materials, the field of thermodynamics has played a pivotal role in the development of modern technology.The key to this success has been the efficacy with which equilibrium statistical mechanics predicts the macroscopic properties of physical systems from knowledge of microscopic interactions alone.This predictive power hinges on the knowledge of the underlying probability distribution for the microscopic states of the system.Assuming that the system being considered is in thermal equilibrium, the probability of observing any of its microscopic configurations is given by the Boltzmann distribution, p C = Z −1 exp(−βE C ), where Z = C exp(−βE C ) is the partition function, E C is the energy of the configuration and β is an inverse temperature imposed by a thermostat.Knowledge of the partition function affords us the ability to compute the free energy, F = −β −1 ln Z, which determines the thermodynamic stability of the state of the system, and allows us to derive all thermodynamic observables from its derivatives.
An alternative, but completely equivalent [1], approach is to compute the entropy of the system, and deduce thermodynamics from its maximisation.This definition of entropy was first introduced by Gibbs [2] and, when the states of the system occur with equal probability (i.e., p C = 1/Ω with Ω the volume of the region of accessible states), it reduces to the well-known Boltzmann entropy, S B = log Ω + const.[3].The entropy also allows us to predict the direction of spontaneous thermodynamic transformations, as the entropy of an isolated system can only increase with time: put simply, heat flows from hot to cold [4,15].The state of affairs is not as simple far from equilibrium.Let us consider an arbitrary system evolving according to some well-defined, but in general nonequilibrium, dynamics.In such a system, there is no guarantee that objects like temperature or energy can be defined, and there is generally no clear prescription for the probability distribution of microscopic configurations.Is it still possible to find analogies with equilibrium statistical mechanics?To address this question, we must introduce two theoretical notions.
First, entropy can be universally defined as the average amount of surprise (or uncertainty) inherent to a random variable (rare observations are more surprising, or informative, than common ones) [6].Mathematically, Shannon [16] showed that this amounts exactly to a rewriting of the Gibbs entropy (1), typically referred to as the Shannon entropy, where p i is the probability of observing state i and the sum is over all accessible states, this time for an arbitrary statistical system with a generic (stationary) probability distribution.This information-theoretic interpretation of entropy can be used to define, measure, and interpret entropies in any physical system, whatever its dynamics might be.This is why there has been recent interest in direct measurements of this quantity using various algorithms, that are not restricted to the one we present here: for instance, the Shannon entropy of physical systems can be estimated using ideas from data compression [17][18][19][20][21].
Second, the states of interest of a system can often be described by the stable structures of its dynamics, be they extrema of a high-dimensional function that can be reached by steepest descent (e.g., energy minima of a potential energy landscape), or the fixed points and limit cycles of a generic dynamical system.To each of these structures, we can associate a basin of attraction, i.e., the set of all initial conditions leading to the same structure via the dynamics, see Fig. 1 for an illustration.
In this perspective we show how we can make use of these facts to arrive to a general protocol for estimating the entropy of systems out of equilibrium, from granular to generic dynamical systems.The key observation is that while Shannon's entropy, Eq. 2, provides a universal definition of entropy, we do not know what the a-priori probabilities, p i , of observing a given state of the system are, suggesting that its evaluation may not be possible.To the contrary, we show that the problem of evaluating these probabilities, p i , or alternatively the problem of enumerating the number of possible states, Ω (e.g., if one wishes to compute a Boltzmann-like entropy, S B = log Ω + const.),can be reduced to a tractable sampling problem.Indeed, as discussed at length in Sec.II, one can compute the a-priori probability of observing a given stable structure by measuring the volume of its associated basin of attraction [5] -and it is therefore possible to estimate the Shannon entropy for an arbitrary system.
In the following, we start by presenting the basinsampling approach for entropy measurements in Sec.II.Then, in Sec.III, we highlight its recent successes in the context of granular packings.Finally, in Sec.IV, we discuss exciting extensions of this idea that we believe shall shape the future of computational non-equilibrium physics.

II. ENTROPY FROM BASINS A. Turning intractable counting into sampling
While the probabilities of observing the states of a nonequilibrium system are in general not known a-priori, it is a generic feature of dynamical systems that for a given initial condition they will end up trapped in small regions of their state space, be they actual basins of an high-dimensional energy landscape, or more general dynamical attractors.Assuming that there are Ω such basins, one can readily compute a Boltzmann-like entropy, S B = log Ω + const.In a thermodynamic system, it simply amounts to the entropy at zero temperature.If we restrict ourselves to a set of typical inherent structures observed at finite temperature (viz., the configurations that the system relaxes to by steepest descent from finite temperature samples), this quantity becomes the system's configurational entropy (distinct from the vibrational contributions to the total entropy) [82].By analogy, in any dynamical system, one can always define such an entropy, and compute it by counting basins of attraction.
Counting the number of basins of attractions is tantamount to enumerating all microstates in the microcanonical ensemble of statistical mechanics: while it is in principle a valid strategy, it is in practice numerically intractable.Until recently, this issue was believed to be impossible to overcome [22][23][24].The problem has since been solved by the introduction of Monte Carlo methods capable of integrating the volume of individual basins of attraction using ideas originally introduced for the calculation of the free energy of solids in statistical mechanics [25].Thanks to this new class of methods, pioneered by Frenkel, Xu [26], Asenjo [27], and Martiniani [28][29][30][31], alongside collaborators, it is now possible to approach these problems and measure quantities that could never have been computed with previous techniques.
The gist of the method is as follows: since the accessible volume, V, of configuration space is tiled by Ω basins of attraction, by definition of the arithmetic mean we have that where v i is the volume of basin i in configuration space, and ⟨•⟩ is the mean taken over all basins.As a result, one can write the number of basins as Ω = V/⟨v⟩.This seemingly innocent equation is in fact crucial, as it turns the intractable enumeration problem of counting Ω into a sampling problem, namely the computation of the average basin volume ⟨v⟩, which can be performed over a finite set of basins.
There are two main challenges in computing this quantity in the context of many-body systems: on the one hand, high-dimensional geometry makes volume estimations difficult and, on the other hand, mean volume es-timations by direct sampling are typically biased.We start by considering the challenges of estimating volumes in high dimensions, and how we can overcome them by means of suitably-modified free energy calculations.
B. A starter in high-dimensional geometry: there is plenty of room in the corners The computation of high-dimensional basin volumes is a difficult task that cannot be accomplished by simple quadrature [22].To illustrate this difficulty, we take the example of a simple shape with a volume that is known in any dimension: the hypercube.In the top line of Fig. 2, we present 2d sketches of the aspect of hypercubes, r ∈ [−a; a] d , in dimensions d = 2, 3, 4, and 5.In each case, following recommendations on drawings of high-dimensional convex volumes [13], we represent the d−dimensional hypercube with half-sidelength a = 1 by linking each of the 2 d summits, that sit at a distance √ da from the center, to 2 of its neighbors using hyperbolas tangent to the largest hypersphere contained by the hypercube -namely the unit hypersphere.The reason for choosing hyperbolas is that if one splits the cube into two radial regions of equal volumes, R in = r : r < R 1/2 and R out = r : r > R 1/2 , where r is the distance from the center and R 1/2 a threshold distance, the (d − 1)dimensional cross-section of each diagonal of the hypercube decays exponentially with r in R out [13].In other words, in large d, one corner of a hypercube has a vanishing volume over surface ratio, a property shared with hyperbolic objects, like Gabriel's horn (also called Torricelli's trumpet) in 3d [129].
This is an illustration of the fact that, in high dimensions, even simple compact objects present tendril-, or tentacle-like regions [5,24,29,30,32] that extend very far but get very thin.Yet, somewhat counter-intuitively, most of the volume of high-dimensional objects is contained in these extended objects!This is shown in Fig. 2 by the density plot inside each shape, that represents the actual weight of each hyperspherical shell within each hypercube.While in d = 2, the shell that contributes the most to the volume is the unit circle, as the dimensionality increases, it shifts to larger and larger radial distances, at R max ∼ d/3 in the limit of large d. 1o complete this picture, in the second row of Fig. 2, we plot in orange the distribution of mass of hypercubes with a = 1 along the radial direction, w(r), and in purple the corresponding line for the hyperball, which is simply the surface area S d of the (d − 1)-sphere with radius √ da, normalised like w(r), i.e. by the volume of the cube, (2a) d .As the dimensionality of space increases, the volume of the hypercube concentrates deeper and deeper into disconnected tendril-like objects, at distances such that the intersection between the hypercube and the hypersphere becomes very small: in Fig. 2, as highlighted by lines and inset texts, the ratio between the value of the orange curve at its maximum and the value of the purple curve at the same distance decreases with d.Furthermore, the contribution of the unit ball to the total volume of the hypercube vanishes as the dimension increases.These last two properties make practical volume estimations extremely complicated: a naïve Monte Carlo integration using uniformly sampled points in a hyperball is bound to fail since the vast majority of points will fall outside of the shape of interest, while an integration within a ball close to the center will yield only a tiny fraction of the overall volume, an example of the so-called curse of dimensionality [24,30,32].
To more concretely illustrate how naïve random sampling methods fail at estimating volumes in high dimensions, let us consider the volume estimation problem for a simple hypercube, using Monte Carlo integration.We sample N s points uniformly drawn in a hyperball, B(R), with radius R centered at the same point as the hypercube, measure the fraction f M C of points that fall within the hypercube, and compute the corresponding volume, In general, one does not necessarily know the largest linear size of the object of interest, so that R should in principle be varied.We use this strategy to estimate the volume of hypercubes with a = 1 by varying the ball radius, R, between 0 and the length of the longest diagonal of the cube, √ d.In Fig. 3, we plot the measured volume divided by the true volume of the hypercube, V = 2 d , against R/ √ d, for d between 2 (mauve) and 64 (red), for N s = 10 5 .At low dimensions of space, this method converges smoothly to V M C = V as R → √ d.As d increases, the volume concentrates more and more around the mode of the distribution of mass of the cube, d/3, and as a result the curves become step-like.However, as d increases, the measurement also becomes less reliable upon approaching R → √ d: it first displays increasingly large fluctuations (d ≤ 16) then violently falls to 0 (d ≥ 32).The reason is that most points in the ball are actually sampled at radii such that the hypercube is already made of narrow spikes, as illustrated in Fig. 2. Indeed, the ratio of the volume of the hypercube to that of the hyperball with radius If we did not know that we were measuring a simple cube, and ball-picked from spheres of increasing radius to produce the same kind of curve as in Fig. 3, in high dimension our estimate would be far off as the mode of the mass distribution of the cube moves further and further into the corners.In the simple example here, in d = 64, the best estimate with our choice of N s would be roughly d from the center of the cube to midpoints located on the unit hypersphere with hyperbolic lines.The inside of each shape is colored with a density plot of the relative contribution w(r) of each radial shell inside the cube to the full volume, as obtained from Monte Carlo integration using 10 6 points uniformly drawn inside each cube.Bottom: Histograms of relative weights of radial shells w(r) for the hypercube (orange) and for the hyperball (purple).The lines highlight how large the intersection of the cube and a hyperspherical shell is at the distance that contains most of the cube's volume.As reminded in the insets, this intersection here represents 1, 1, 0.78, and 0.62 times the surface of the full hyperspherical shell, respectively.
50% off!This shows that simple Monte Carlo integration, while efficient in small dimensions, fails at capturing volumes in high dimensions.Note that the case illustrated here is actually fairly ideal: we know the location of the center of the hypercube, which makes the measurement easier, and the hypercube is a rather regular object.In real-world situations, a Monte Carlo integration of highdimensional volumes is likely to be far worse.

C. Free energy methods for volume computations
It has been shown that volume estimations and enumeration problems can be treated in a way that amounts to a free energy calculation in the spirit of the Frenkel-Ladd method [25].The idea is that the volume v of a domain Γ can be rewritten as where O Γ is the characteristic function of the domain to integrate over, or "oracle", which can be exponentiated to yield the potential U Γ which is 0 inside the domain and ∞ outside it.Written in this form, the volume can be interpreted as the partition function of a free Brownian walker exploring the domain Γ, with a hard wall at the boundary, at an inverse temperature β.
The most popular and earliest class of methods for the computation of partition functions is based on thermody-  namic integration (TI) [7,35,36], which consists of parameterizing the system's Hamiltonian in such a way that we can "morph" an unknown partition function into one that we know how to compute analytically.In practice, this amounts to replacing a high-dimensional integral over phase space volume with a low-dimensional integral over one or more Hamiltonian parameters (each point in the integrand is obtained from an equilibrium simulation with a given choice of Hamiltonian parameters), or to estimating ratios of partition functions from equilibrium samples obtained from simulations with different Hamiltonian parameters [37,38], as we show presently.
In the context of volume estimations, the potential U Γ that encodes the oracle is athermal, so that we can take β = 1 without loss of generality.Then, in the spirit of umbrella sampling [7,39], we can introduce simple biasing potentials that depend on a control parameter that allows us to go continuously from an integral of unknown volume to one of known volume.For instance, one can tether random walkers to a reference point, r 0 , inside the basin of attraction (e.g., the minimum energy configuration) using harmonic springs of varying stiffness k, see Fig. 4. For a random walk constrained to remain within the domain of interest Γ, one can compute a volume v k weighted by the Boltzmann factor (viz., the corresponding partition function) as and define the dimensionless basin free energy as the negative log-volume f k = − log v k .When k = 0, the walker is completely free to explore the basin volume, while for k → ∞ the walk is reduced to a small region surrounding r 0 that fits entirely within the basin.Computing the basin volume amounts to measuring the dimensionless free energy difference between the walkers with k = 0 and k → ∞ so that where we added the analytical reference free energy f k→∞ to the numerical estimate (denoted by a "hat") of the free energy difference between k = 0 and k → ∞.This is necessary because free energies can only be computed numerically up to an additive constant equal for all k's.
The analytical reference can be computed by a Gaussian integral at the largest stiffness value, k max , provided that k max is large enough that the corresponding random walk is essentially unaffected by the boundary of the domain.
Intuitively, this importance sampling method should outperform brute-force Monte Carlo sampling because the steps of the walks are chosen to remain close to one another, so that even within the high-dimensional tentacles, the rejection rate of the steps will be much smaller than what we would get by ball-picking within a sphere that contains the shape of interest.In other words, this method samples points compactly within the volume being estimated, rather than throwing darts at random in a huge volume around it.
Nevertheless, running independent biased random walks remains a poor strategy in high dimensions.Take for instance the example of the hypercube: Due to the r d−1 scaling for the surface of a hypersphere, most points of the walk will be concentrated far away from the center of the cube, and instead will live close to the heaviest shell of the mass distribution at d/3.In high dimensions, this maximum lies far into the corners, so that a single random walk would typically spend very long times in one of the 2 d corners.As a result, attempting a direct estimation of v with independent free random walks would typically require either exponentially many (2 d ) realizations, or exponentially long equilibration times so that the walk can escape a given corner, reach the (typically tiny) convex core, and explore another corner, 2 d times.
To overcome this problem, one can take inspiration in the large body of work on free energy estimations at low temperature in rugged energy landscapes (see for instance [7,8]).A typical strategy is to use parallel tempering [29,37,38], which amounts to running a collection of simulations with different control parameters (typically different temperatures), each called a "replica", and allowing for configuration exchanges between high and low temperature replicas so that the low temperature replicas do not become trapped in a local region of the energy landscape, all while respecting the detailed balance condition.In the context of basin volume calculations, we exchange coordinates between random walks with different stiffness, k, so that walkers at low stiffness can escape the tentacles by swapping coordinates with walkers at high values of k that are constrained to "live" closer to the hyperspherical core, as illustrated in Fig. 4.
The dimensionless free energy difference in Eq. 6 can be computed by a simple thermodynamic integration over k [7] or using more sophisticated estimators like the Multi-Bennet Acceptance Ratio Method (MBAR) [33], which should yield more statistically accurate results.The MBAR estimator hinges on the idea that one can always relate the free energy at one value of k with the free energy at every other value of k, via where U k (r) = −k|r − r 0 | 2 /2 is the biasing potential with spring stiffness k (but can in principle assume any shape), K is the total number of biased random walks, and N m is the number of uncorrelated equilibrium samples obtained from the m-th random walk.This system of implicit equations can be solved numerically, typically using a self-consistent Newton-Raphson scheme [33].Ideally, for the quality of the MBAR solution to be as good as possible, one should measure a similar amount of equilibrium samples in every region of the volume being measured.This problem boils down to the choice of U k and we discuss it briefly in the next subsection.
Finally, MBAR yields the optimal reweighing of the set of histograms h k (r), where r ≡ |r − r 0 |, allowing for the reconstruction of the full density of states, which, in the context of volume measurements, amounts to the mass distribution, h(r), within the object of interest, that verifies with w k (r) = h k (r)/ k ′ h ′ k (r) a set of normalised weights.

D. Measuring hypercubes
To illustrate how effective this technique is compared to brute-force Monte Carlo, we now apply it to the measurement of high-dimensional hypercubes, r ∈ [−a; a] d .For each dimensionality, d, we run a collection of K d random walks -that we refer to as replicas -associated to spring constants (k 1 = 0, k 2 , . . ., k K d ), for N s = 5 × 10 5 steps, and attempting a parallel tempering coordinate exchange between replicas with neighboring values of k every 10 steps.The number of replicas at a given dimensionality, K d , is set so that the histogram of sampled distances from the center, h ki (r), overlaps significantly with the histogram of replicas with neighboring values of k, namely h(r) ki±1 .
The scaling of the number of required replicas with dimensionality, d, can be estimated from the mean and variance of the radial distribution for a d-dimensional Gaussian, h k (r) ∼ r d−1 exp (−kr 2 /2).We start by noting that for parallel tempering to be efficient, and for the MBAR estimation to be reliable, these distributions must display significant overlap.Choosing the replicas so that their modes are separated by the standard deviation of the narrowest walk, one can show that the required number of replicas, K d , scales linearly with d (see App. A).Recall, for context, that naïve Monte Carlo sampling typically requires a number of samples that grows exponentially with d.
We show the output histograms, h k (r), for a practical implementation of these random walks in a d = 100dimensional hypercube in Fig. 5.In this example, we used K = 64 replicas with k's chosen so that the modes of the distributions are linearly spaced in r.Of these 64 k values, 48 were positive, leading to k = 0, and 16 were negative (the most negative being k = −4.5),so that the biasing potential pushes these walkers further into the corners of the cube.As expected, high values of k (red histograms) yield narrower distributions closer to the origin, while lower ones (towards purple) are broader and distant from the origin.
We chose k max (the left-most curve in Fig. 5) so that its mode would coincide with that of the largest ball inscribed in the cube.Note how this leaves a significant gap in the histograms in the regions nearest to the origin due to the power-law r d−1 in h k (yet another manifestation of the curse of dimensionality).In order to improve the MBAR free energy estimation, and the density of states reconstruction, we chose to sample independent points from h in (r) ∝ r 1−d e −kinr 2 /2 , also constrained to the inside of the cube, to accumulate samples near the origin (gray line in Fig. 5).The corresponding biasing potential is U in = −(d − 1) ln r + k in r 2 /2, where we choose k in = 4 so that the distribution has width σ ∼ 1/ √ k in = 0.5.Finally, to get absolute free energy values, we use the negative log-volume of the largest ball inscribed in the hypercube as the known reference in Eq. 6, instead of the free energy for a tight harmonic trap, so that the free energy becomes where fB(a) is the estimated free energy for a set of points sampled directly from the largest inscribed ball, B(a), whose volume is known analytically.The black line overlayed onto the histograms in Fig. 5 is the reconstructed mass distribution, or density of states, of the cube obtained using MBAR according to Eq. (8).Notice that, as expected, the region of large r leading to √ da = 5 contains very little volume.
We use the same procedure when varying the dimensionality of the cube.The result for the ratio V /V between the measured volume and the real volume of the cube, obtained using the fastMBAR [34] implementations of MBAR, is shown in Fig. 6 as a function of the dimensionality (red squares).The results are compared with a brute-force Monte Carlo method using the same order of magnitude of total number of samples, N s = 10 8 (gray disks).Notice that the x axis is in log scale.
In spite of the finite number of samples, and of the rather crude choice of K d , the error of the MBAR mea- surement grows roughly linearly with d and remains below 10% even as d reaches 500, while the Monte Carlo estimate essentially measures zero volume when dimensionality reaches d = 30, the point at which the ratio of the volume of the cube to that of the smallest circumscribed ball is roughly 1/N s .Notice that in d = 500, that ratio is of the order of 10 −156 , meaning that naïve Monte Carlo would be absolutely unfeasible.Here, using our approach instead of naïve MC sampling, with comparable numbers of samples, increases the maximum dimension for which a volume can be measured, albeit with some statistical error, from 30 to thousands of dimensions.Since the number of samples used per walk in this example is still rather modest (5 × 10 5 ), the statistical error can be reduced by making the random walks longer.
Our approach also yields the mass distribution, as reconstructed by unbiasing the histograms of individual random walks, according to Eq. ( 8).In Fig. 7a we show the hypercube mass distribution, h(r), for a few dimensionalities up to d = 1000.As expected, as d increases, h(r) tends to a sharp peak centered at d/3a, indicated by a red dashed line for d = 1000.
To illustrate the difficulty of measuring not just the volume, but also the mass distribution of the cube in such high dimensionality, in Fig. 7b we plot h(r)/r d−1 for d = 1000, which amounts to the probability of landing inside the cube when sampling uniformly on the surface of the hypersphere with radius r.Because the intersection between the cube and a sphere becomes very small, we plot the base-10 log of that ratio.A value of particular interest is that obtained for the mode of h(r), as it gives an estimate of how many trials would be needed to find a single point inside the cube if we were performing bruteforce Monte Carlo, restricted to the shell of the cube that contributes most to its mass.In d = 1000, we find that this number is of the order of 10 76 .Even if one could somehow guess the radius of the shell that contributes most to a high-dimensional volume, direct Monte Carlo sampling of that shell alone is in general impossible.In fact, we manage to reconstruct the density of states reliably until points that would require 10 120 Monte Carlo shots on a sphere, using only 10 8 points across all walks.Altogether, we have shown for a simple example that using importance sampling and a free energy estimation method (here, MBAR) enables us to measure highdimensional volumes much more efficiently than bruteforce Monte Carlo by virtue of the correlations between successive positions of the random walkers, and by avoiding getting stuck in singular features of the domain thanks to parallel tempering.Going beyond the simple example given here, the contrast in efficiency can be made even starker by refining the precise design of the algorithm we presented.First, one could choose a different set of biasing potentials than the one used above, which reproduces the strategy of Ref. [28].For instance, the scaling we used for the number of replicas is likely to be a worst-case scenario, as it assumes equally-spaced distributions in r that are all as narrow as the narrowest one, h kmax .One can instead use an iterative choice for the values of k's such that the mode of h kn+1 , r * n+1 , lies at r * n + σ n with σ n the standard deviation of h kn .Assuming perfect Gaussian distributions for each random walk, and requiring that the values go up to the mode of the mass distribution of the cube minus the standard deviation, d/3 − 1/15 − 4/(45d), some simple algebra leads to the asymptotic scaling K d ∼ d 1/2 .Since MBAR does not require the biasing potentials to be harmonic, it is likely that this scaling could be brought down even further with an altogether different choice of biasing potentials, e.g. by choosing a series of potentials of the form , where r = |r − r 0 |, k is a fixed width, and r i is a tunable scalar distance from r 0 .
Second, one could accelerate the diffusion of the random walks by "cloud sampling", which amounts to biasing the Monte Carlo sampling on the average weight of a larger number of trial points at each step [5].Alternatively, one could introduce some deterministic drift in the random walks, so as to make them closer to recentlyproposed piecewise-deterministic processes [40], or socalled Galilean Monte Carlo [41,42] methods, where particles travel following straight lines and bounce on walls instead of diffusing around.
The higher exploration efficiency afforded to us by this method is far from being free in general basin volume computations: in order for the random walk to remain within the basin, we need to query an "oracle" at every step to determine if we are inside or outside the shape of interest [31].While this oracle is a simple geometric condition in the case of the hypercube, for basins of attraction in a many-body energy landscape we must solve for the path of steepest descent at every step of the random walk [9], meaning that a single Monte Carlo step typically requires hundreds of energy (function) evaluations.In liquids, for instance, the cost of one such energy evaluation scales at best like the number of particles, N , which needs to be large for thermodynamic properties to be measured accurately.While costly, we will show in Sec.III that this cost is manageable in a practical example.

E. Unbiasing sampled volumes
Recall that our original aim was to estimate the mean basin volume, ⟨v⟩, in Eq. 3. So, after having measured the volume of a large number of basins of attraction, we must deal with the issue that a naïve average over the basin volumes of randomly sampled energy minima is in general biased.Indeed, if one samples configuration space uniformly and tags each point with its basin of attraction, each basin is sampled with probability p i = v i /V, that is to say proportionally to its volume.Thus, to compute a Boltzmann entropy S B = log Ω + const., we need to perform a fit of the observed basin volume distribution using a putative functional form to undo the bias and obtain the true mean, ⟨v⟩.Concretely, if the measured (biased) distribution of volumes is called B(v), and the unbiased (true) distribution U (v), one can write where the v is the bias proportional to volumes, and N is a normalisation factor.By integration, one finds that the unbiased mean is given by where one typically needs to fit B(v) to some functional form, for instance a parametric generalized Gaussian distributions, or a nonparametric kernel density estimate.The Shannon entropy, S S = − Ω i=1 p i log v i + const., instead is by definition the biased average of the negative log-volumes of basins, and it can be obtained directly with no assumptions.
All in all, given a proper algorithmic basis, basinvolume measurements provide a generic way of computing the Gibbs-Shannon entropy of any dynamical system whose properties are controlled by the ensemble of its steady-state structures.In the following, we shall first show how this idea was applied to the special case of granular packings, and then how it could answer open questions in fundamental physics.

III. THE CASE OF GRANULAR ENTROPY
The first application of basin-volume calculations to measure an entropy was performed in the context of granular packings in two dimensions [26][27][28][29].Due to their athermal nature, granular systems cannot be described by Gibbsian statistical mechanics [2].Nevertheless, in the late 1980s, Edwards and Oakeshott [43] proposed that the collection of stable packings of a fixed number of particles in a fixed volume could play the role of an ensemble, and that one could arrive at a statistical mechanical formalism by making the assumption that all stable packings are equally probable once the system has settled in a jammed state.In other words, in the Edwards' ensemble, jammed states occur with uniform probability measure (also known as Edwards' measure [44]).The existence of a Boltzmann-like distribution in volume and stress in granular media has been supported by a number of experiments [45][46][47][48][49][50][51][52], lending credence to Edwards' theory.However, theoretical checks of Edwards' key hypothesis on the equiprobability of packings was largely considered as being not directly testable in simulations, as it seemingly required an explicit enumeration of all possible packings.This problem can be rephrased as a basin volume measurement: consider a system of polydisperse disks with a hard core and a soft corona, modelled as a shifted Weeks-Chandler-Anderson (WCA) [53] soft potential (see Fig. 8).Consider N such particles, put them into a periodic box with sidelength L at a packing fraction ϕ.The system is endowed with a many-body energy landscape that, at high enough densities, contains many different basins.At the bottom of each basin lies a single local minimum, that can be interpreted as a zero-temperature configuration of the system.At the characteristic unjamming density ϕ J (the point at which the system goes from a liquid to a disordered solid), each minimum corresponds to a packing of non-frictional hard particles.Among these minima, there is a subset of mechanically stable configurations.Depending on the system of inter-   (a) Gibbs-Shannon entropy SG (blue) and Boltzmann entropy SB (orange) as a function of ϕ, as obtained from a parameter-free kernel-density estimate (KDE) fit of the distribution of basin volumes.The dashed curves are second-order polynomial fits.As ϕ approaches the unjamming density for N = 64 particles ϕJ = 0.82 (dashed gray line), SG tends to SB, implying equiprobability of all states precisely at unjamming.(b) Scatter plot of the log of the probability of landing in a given basin, pi (proportional to the volume vi) against the log of the pressure P measured in that basin.Power-law scaling relations (solid lines) are found for several densities indicated in the inset.(c) Exponents found for the power laws of (b), plotted against ϕ.The dashed black line is a linear fit, and the dashed gray line indicates ϕJ .The raw data from Ref. [28] was used here and re-analyzed independently.est, it is common to define mechanical stability through a non-zero bulk modulus, which is equivalent to saying that the number of contacts verifies N c ≥ d(N nr − 1), where N nr is the number of particles that are not rattlers (mobile particles) [54], thus defining the ensemble of collectively jammed states [55].A smaller subset of minima is that of states stable not only against compression, but also shear: the condition of non-zero shear modulus this time imposes N c ≥ d(N nr − 1) + 1, which defines the ensemble of strictly jammed states [55].In practice, we restrict ourselves to the latter case [28], which is a more broadly accepted definition of jamming.Therefore, by implementing the machinery of Sec.II to the energy landscape of hard-WCA particles, and restricting the set of relevant basins to mechanically stable configurations, one can estimate the Boltzmann and Gibbs-Shannon entropy of granular packings.
The main result, originally obtained in Ref. [28] for N = 64 hard-WCA disks, is reproduced in Fig. 9a.This plot shows that the Gibbs-Shannon entropy, S G = − Ω i=1 p i log p i + const., and the Boltzmann entropy S B = log Ω + const., both evaluated using basin volume measurements, converge precisely at ϕ J ≈ 0.82, implying that only at this density is Edwards' hypothesis, p i = 1/Ω, verified.It can be checked through finite-size analysis [28] that this packing fraction is consistent with the lower range of values for which the system unjams at N = 64 following the same preparation protocol, so that the two entropies coincide only at unjamming.Note that, while the value of the unjamming density observed in compression experiments depends on the precise preparation protocol, what we report is a property of the energy landscape of jammed packings, which is a generic feature of the system independent of preparation recipes.
Furthermore, as shown in Fig. 9b, the authors also found that above jamming, there exists a robust power law relationship, p i ∼ P −N λ(ϕ) , between the probability of observing a packing and its pressure.This power law suggests a hierarchical structure of the energy landscape of hard-WCA particles, where low-energy minima have large basin volumes and high-energy minima have small volumes, see Fig. 10 for an illustration of this property.
Finally, as shown in Fig. 9c, the exponent λ decreases roughly linearly as the packing fraction approaches jamming from above, and reaches 0 at ϕ J .In other words, at jamming, the volume of basins becomes a flat distribution that does not depend on the pressure in the system, adding to the evidence that all basins are equiprobable at ϕ J .As shown in App.B, this argument on λ can be made more formal, and the difference between the Boltzmann and Gibbs entropies can be written as lim In other words, in order for the two entropies to agree at a given density in the thermodynamic limit, the exponent λ must go to 0 as observed.
All in all, a first application to granular packings has shown the potential of the method proposed in Sec.II.By replacing an intractable enumeration problem by a simpler sampling problem, the basin-volume method allowed to test an hypothesis that had been left unchecked for 30 years.It also yielded insight into new physics at the jamming transition, by revealing a hierarchical structure of the basins of attraction above jamming.

IV. THE ROAD AHEAD: FROM PACKINGS TO GENERIC DYNAMICAL SYSTEMS
In the rest of the paper, we propose perspectives for the basin-volume method, ranging from extensions of the problem of granular packings to completely unrelated problems.

A. Future work on granular packings
As discussed in Sec.III, the basin-volume method led to a direct observation that Edwards' hypothesis is only valid strictly at jamming in d = 2.There are however still several aspects of this problem that remain unchecked, and could be addressed by the very same method.
First, these results were obtained within the isochoric ensemble, approaching unjamming where the measured pressure vanishes, P → 0 + .It has been argued that the equiprobability of packings could break down when switching to the isobaric ensemble [56], viz., when allowing volume fraction fluctuations through particle inflation and deflation to maintain constant hydrostatic pressure, or more generally in the isostress ensemble that also constrains the shear stresses.
The isobaric and isostress ensembles can be explored within a basin volume framework.To do so, in the spirit of Parrinello-Rahman barostats [57][58][59], the trick is to allow for deformations of the simulation box (both isotropic compressions and constant-volume deformations), and to replace the energy landscape, E, by an enthalpy-like landscape, H [54], where explicitly contains the dependence on the full stress and strain tensors σ and ε through their Frobenius inner product A : B = i,j A ij B ij , as well as a reference volume V 0 of the simulation box.Note that a subcase of this strategy is that in which only isotropic compression is allowed, ε = V /V 0 I, with I the identity tensor.In that case, only the trace T r[σ] ≡ −P participates to the box deformation term, leading to the usual definition of enthalpy with respect to the hydrostatic pressure P , namely H = E + P V .Basins are then defined through steepest descent paths that lead to the same enthalpy minimum, in a configuration space comprising not only particle positions, but also d(d + 1)/2 parameters describing simulation box deformations [54].In practice, one would impose a finite but small pressure, and look at the asymptotic results as P → 0 + .Indeed, a technical difficulty in this approach is that, at exactly zero pressure, fluid solutions with arbitrarily large volumes trivially minimize the enthalpy.One would then need to constrain the maximal volume of the system to avoid converging exclusively to fluid states.Finally, to perform basin volume calculations in the enthalpy landscape, a modification of the Monte Carlo algorithm used for the volume estimate is required.In the isochoric case presented above, the volumes of basins are computed by sampling configurations with constant N and V , using an oracle defined through a minimum of the energy, and umbrella sampling on the particles' positional degrees of freedom.In the isobaric case, this time, one needs to sample configurations with constant N and P , with an oracle defined through a minimum of the enthalpy, and using umbrella sampling not only on the particles' positions, but also on the degrees of freedom of the box shape.This requires the inclusion of independent shearing and stretching (viz., box deformations) moves in our Monte Carlo sampling [60][61][62][63] as the building block of the random walks.Each of these moves is accepted or rejected according to a Monte Carlo criterion set by a combination of the oracle, which checks that the new point still falls to the same minimum of the enthalpy, Eq. 13, and the umbrella sampling biasing potentials that contain box deformations.Implementing this algorithm as part of a basin volume calculation, while numerically costlier than the usual isochoric strategy, would constitute an important check on the validity of the Edwards hypothesis.
Second, one may reasonably wonder whether the results are still observed in higher dimensions of space, notably in d = 3, where many open questions remain to be answered to fully understand the ensemble of jammed states, thus constituting a topic of current research [64].
Finally, the observation that, in the isochoric ensemble and above jamming the basin volumes are linked to the pressure of the packings via a power-law suggest that configuration space is tiled by a hierarchy of basins (see Fig. 10(a)), which has neither been fully appreciated, nor explored.We check that such a hierarchical dependence exists in Fig. 10(b), using the same data as in Fig. 9, and report a density-dependent power-law between the volume v i of a basin and the energy at the corresponding minimum.This power law results from P being a smooth function of the energy, as shown in Fig. 10(c), 10.Hierarchical energy landscape of soft sphere packings.(a) Sketch: the power law between basin volume, vi, and pressure suggests that a minimum lying at Ei > 0 has a basin with volume vi ∝ E −γ i with γ a positive exponent.We verify this scaling relationship in (b), using the same data as in Fig. 9. Panel (c) shows a smooth dependence of P as a function of E, with an asymptote P ∝ √ E at unjamming.
where we emphasize that unjamming is accompanied by an asymptote P ∝ √ E. This behaviour is expected near unjamming, where overlaps are small so that the pair potential is well captured by a harmonic approximation, U (r) ∼ (r − r 0 ) 2 [5], and the pressure contribution from one interacting pair, as obtained from a virial expression, reads p ∼ r 0 (r − r 0 ) ∼ √ U which, summed over, can be shown to yield P ∼ √ E as E → 0.

B. Beyond granular jams: glassy systems
Based on our discussion thus far, a reader may think that basin volume calculations can only be used to measure granular entropies.In fact, these methods could be used to measure a number of indicators for the shapes of the basins, in addition to the radial mass distribution, as well as their neighborhoods.Such measurements could have deep fundamental implications, if one does not consider a granular (or zero-temperature) system, but a jammed liquid at a finite temperature.Indeed, in a finite-temperature glassy system, the shape of basins controls transition rates and relaxation times [65][66][67][68], which has prompted a large body of recent work on the enumeration and characterization of minima of rough land-scapes [66,[69][70][71][72].
Achieving a clear picture of the basins of attraction in the configuration space of a dense liquid in physical dimensions would therefore be very exciting, as it would open avenues for fundamental checks regarding glassy dynamics.In an exact mean-field treatment of hard sphere glasses, a Kauzmann transition known as random firstorder transition (RFOT) is attained [76][77][78].In this framework the configurational entropy S conf = log Ω F is defined in terms of the number of free energy minima Ω F , so that the notion and number of "glassy states" is well defined and the ideal glass transition results from the population of glassy states becoming subextensive in system size.In finite dimensions, free energy minima are not infinitely long-lived and the precise definition of a "glassy state" remains debated [79].A popular take on the problem was proposed by Stillinger and Weber [10,74,81], who suggested that the supercooled glassy states would reside close to the minima of the potential energy surface, known as inherent structures (IS) of the liquid.For each IS there is an associated basin of attraction, that is the set of all initial conditions leading to the i-th IS by steepest descent.The partition function can then be expressed classically as a sum over the individual IS, Q = Ω i=1 q i , where, for β = 1/k B T , q i = Γi e −βU (r) dr, (14) and U(r) is the interaction potential.Until recently, q i could not be computed, and the accepted operational definition for the configurational entropy was S conf = S liq − S harm , where S liq is the total entropy computed by thermodynamic integration, and S harm is the vibrational entropy computed from the harmonic approximation (with or without anharmonic corrections), averaged over many IS [82].Using this and another approach based on a Frenkel-Ladd-like computation, Berthier and coworkers recently produced compelling results suggesting that the change in S conf for supercooled liquids in d = 3 is consistent with the existence of an ideal glass transition at T K > 0 [79,[83][84][85].Through basin volume calculations, it is possible to compute q i directly, and thereby obtain a general approach to estimating the configurational entropy of supercooled liquids, without resorting to any approximations.In particular, this method does not need to assume large system sizes, and could allow to precisely study finite-size effects in complex free energy landscapes, a problem that is nontrivial even in simple theories in 1d [80].
Finally, in the context of glassy systems, free energy methods could in principle be used to obtain not just the (Boltzmann weighted) volume of the basins of attraction of inherent structures, but also more complicated information about their geometry, topology, and connectivity (e.g., what is the chromatic number for the tiling of basins in the energy landscape?[130]).Understanding the properties of the most probable paths between minima of the energy landscape is of particular interest FIG.11.Generalised basins for dynamical systems.In a generic dynamical system, basin of attractions can be associated not only to fixed points, but also limit cycles, or any kind of stable attractor of the dynamics.Black arrows represent example trajectories, that fall into steady-state structures (white).These structures can be fixed points (yellow basin), simple limit cycles (green basins) or any complicated high-dimensional attractor (red basin).
to understand the dynamics of relaxation in glasses, and could help bridge the gap between microscopic dynamics and mesoscopic models that recently shed new light on the interpretation of the relaxation spectrum of glassy materials [86,87].
C. Generic dynamical systems: ecosystems, neural networks, and more.
In the remainder of this paper, we discuss what we believe to be some of the most exciting opportunities for basin-volume methods, namely their applications to generic dynamical systems.Indeed, going back to the method presented in Sec.II, we never explicitly relied on the existence of an underlying energy function.In principle, one can define any dynamics of their choice, let them evolve, and classify initial conditions according to which steady-state or dynamical attractor they eventually fall into, as sketched in Fig. 11.One can then always define the Shannon entropy associated with the volumes of these attractors, and use it as a generic descriptor of the system for a given set of parameters.This approach is completely general and can be applied to any system with regular enough dynamics that they admit attractors.We propose a few promising leads for such approaches.
There are many examples of dynamical systems where such an approach would be relevant.For instance, mapping basins of attraction is an ubiquitous problem in control theory, where approximate methods for measuring basins have been proposed to determine the stability of the flight control-law of the F/A-18 Hornet aircraft with respect to large perturbations [88].Enumerating the number of stationary points, and their distribution, for certain classes of random functions, is a classical problem in mathematics and statistics [11,[89][90][91][92][93][94][95][96][97][98][99][100][101][102].In particular, in combinatorial optimization, the size and connectedness of the space of solutions that satisfy a large number of constraints controls how easy it is to find a solution, which has implications in a variety of real-life problems ranging from computer science to optimal transport [103][104][105][106]. Understanding the structure of high-dimensional basins of attraction is also of great theoretical interest, as the problem of mapping high-dimensional basin volumes has been described as uncharted mathematical territory [24].It is also a current challenge in cosmology, where characterising the number and volume of basins of attraction in axion landscapes constitutes a promising lead for elucidating the cosmological constant problem [107].
A particular example of dynamical systems where basin volume methods could prove fruitful is the study of ecosystems, i.e. ensembles of populations following Lotka-Volterra dynamics [108].In the limit of symmetric antagonistic interactions, such systems have been shown to map onto glassy Hamiltonian systems, which leads to aging dynamics in a rugged landscape, the properties of which have been studied analytically at the mean-field scale [111][112][113][114]. Therefore, in this setting, one can apply the strategy of basin volume calculations to determine the existence and nature of phase transitions in the limit of a finite number of species.Furthermore, since an Hamiltonian structure is not needed to define an entropy via basin-volume calculations, the same approach can also be used in the much more complicated case of non-reciprocal, not-all-antagonistic interactions.In this scenario, qualitative results, both experimental and theoretical, have shown that stable structures tend to emerge in steady state [115][116][117].Basin-volume approaches could provide insight into these systems, for instance by quantifying how many ecosystems are possible in steady-state for a given mix of interactions, or which ecosystem is the most typical.
Finally, another exciting lead is the application of basin-volume methods to Deep Neural Networks (DNNs) [118].It has been widely observed that the parameters learned by DNN during training correspond to "flat minima" of the loss-function landscape [119][120][121][122][123][124].Basin volume calculations may enable researchers to answer questions concerning the structure of the error landscapes of DNNs and to identify the relationship between the probability of finding a given solution, its flatness and its generalization performance.Addressing these questions would have a significant impact on our understanding of generalization in deep learning systems with implications for high-stakes applications such as transportation, security and medicine.

V. CONCLUSIONS
In this perspective, we presented the principles, recent successes, and possible future applications of a powerful new approach to study disordered many-body systems through the estimation of the number and volume of basins of attractions.This approach enables the computation of the Shannon entropy, that appears as a natural extension of the thermodynamic entropy of equilibrium systems, in a much broader class of systems.It has already been shown that basin-volume computations work in the specific case of granular packings, but there is an avenue ahead.Not only can this method lead to estimations of the features of complex energy landscapes in thermal systems like supercooled liquids, molecular liquids or spin glasses, it also provides a systematic and quantitative way of characterising the state space of generic dynamical systems.At a time when numerical and experimental studies of a broad spectrum of non-equilibrium many-body systems are booming, in spite of the absence of a general theoretical framework to understand them, such a generic tool could prove to be invaluable.Our vision is that the theoretical ideas and protocols presented herein will become new canonical forms of computation adopted in multiple areas of science and engineering.

FIG. 1 .
FIG. 1. Illustration of basins of attraction.Left: Example of a of 2d energy landscape.High-energy points are shown in blue, low-energy points in red.Right: In the same landscape, trajectories of steepest descent and ascent ẋ = ±∇U(x) are spawned from random points, and confined to an arbitrary region (blue circle).Trajectories converging to the same minimum belong to the same basin of attraction and are plotted in the same color.

4 FIG. 2 .
FIG.2.High-dimensional volumes.Top: 2d sketches of hypercubes in d = 2, 3, 4, and 5.The outer black line links vertices located at a distance √ d from the center of the cube to midpoints located on the unit hypersphere with hyperbolic lines.The inside of each shape is colored with a density plot of the relative contribution w(r) of each radial shell inside the cube to the full volume, as obtained from Monte Carlo integration using 10 6 points uniformly drawn inside each cube.Bottom: Histograms of relative weights of radial shells w(r) for the hypercube (orange) and for the hyperball (purple).The lines highlight how large the intersection of the cube and a hyperspherical shell is at the distance that contains most of the cube's volume.As reminded in the insets, this intersection here represents 1, 1, 0.78, and 0.62 times the surface of the full hyperspherical shell, respectively.

3 FIG. 3 .
FIG. 3. Naïve random sampling shortcomings.Direct Monte Carlo evaluation V (R) of the volume of a hypercube with actual volume V , using Ns = 10 5 points uniformly drawn from a hyperball with radius R, for dimensions going from 2 to 64.Here the half-sidelength of the hypercube is a = 1.The dashed line indicates the asymptotic mode of the mass distribution of the cube, d/3.

FIG. 4 .
FIG.4.Probing basin volumes with random walks.Different random walkers are linked by springs with different rigidities to the configuration at the minimum.The first three panels represent the probability density of the trajectories of such walkers, with rigidities growing from left to right.At very small (possibly negative) rigidities, the walker explores the outer rim of the basins, following tentacle-like structures, while at large rigidities the random walk typically explores a hypersphere around the minimum.Putting together and properly reweighing the information sampled by these walkers, one can faithfully recover the full volume as if it had been uniformly sampled (last panel).

FIG. 5 .FIG. 6 .
FIG.5.Biased walks in a hypercube.Each colored line is a histogram h k (r) of distances to the center for a random walk constrained to remain in a 100d hypercube with unit sidelength, a = 0.5, and subjected to a biasing potential U k (r) = kr 2 /2.Color encodes the value of the rigidity k, from large (red) to small (purple).Here, the K = 64 spring constants are chosen so that the modes of the distribution are equally spaced in r, from r = a = 0.5 (red) to r ≈ 3.1 > d/3a (purple).A 1d gaussian is sampled to obtain points near the center (gray line) to improve the MBAR solution.The reconstructed density of states of the cube is shown in black.Note that the 16 histograms to the right of the black line correspond to negative values of k.

- 76 FIG. 7 .
FIG.7.Mass distribution estimate.(a) Mass distribution of the unit-sidelength hypercube, a = 0.5, as estimated from MBAR, using the same runs as in Fig.6, in d = 2, 10, 50, 100, 200, 500, 1000 dimensions.(b) Decimal log of the ratio between the mass distribution and that of the smallest ball containing the whole cube for d = 1000.The dashed red lines indicates the location d/3a of the maximum for d = 1000, and the value of the ratio it corresponds to in (b).We indicate the probability pMC of landing a single point inside the cube when drawing points uniformly from that specific hyperspherical shell.

FIG. 8 .
FIG. 8. Energy landscape of hard-WCA particles.Left: Snapshot of jammed packing of polydisperse disks with hard cores (dark shaded) plus soft repulsive coronas (light shaded).Right: Illustration of configurational space for jammed packings.The hatched regions are inaccessible due to hard-core overlaps.Single-colored regions with contour lines represent the basins of attraction of distinct minima.Blue region with solid dots indicates the coexisting unjammed fluid region (observed only for finite size systems) and hypothetical marginally stable packings.Figures adapted from Ref. [28].

FIG. 9 .
FIG. 9. Checking Edwards' hypothesis.(a) Gibbs-Shannon entropy SG (blue) and Boltzmann entropy SB (orange) as a function of ϕ, as obtained from a parameter-free kernel-density estimate (KDE) fit of the distribution of basin volumes.The dashed curves are second-order polynomial fits.As ϕ approaches the unjamming density for N = 64 particles ϕJ = 0.82 (dashed gray line), SG tends to SB, implying equiprobability of all states precisely at unjamming.(b) Scatter plot of the log of the probability of landing in a given basin, pi (proportional to the volume vi) against the log of the pressure P measured in that basin.Power-law scaling relations (solid lines) are found for several densities indicated in the inset.(c) Exponents found for the power laws of (b), plotted against ϕ.The dashed black line is a linear fit, and the dashed gray line indicates ϕJ .The raw data from Ref.[28] was used here and re-analyzed independently.