Disconnectivity graphs for visualizing combinatorial optimization problems:
challenges of embedding to Ising machines

Dmitrii Dobrynin d.dobrynin@fz-juelich.de Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany RWTH Aachen University, Aachen, Germany Adrien Renaudineau Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany Mohammad Hizzani Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany RWTH Aachen University, Aachen, Germany
Dmitri Strukov University of California Santa Barbara, Santa Barbara, CA, USA Masoud Mohseni LSIP, Hewlett Packard Labs, Milpitas, CA, USA John Paul Strachan j.strachan@fz-juelich.de Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany RWTH Aachen University, Aachen, Germany

(March 2, 2024)

Abstract

Physics-based Ising machines (IM) have risen to the challenge of solving hard combinatorial optimization problems with higher speed and better energy efficiency. Generally, such dedicated systems employ local search heuristics to traverse energy landscapes in searching for optimal solutions. Extending landscape geometry visualization tools, disconnectivity graphs, we quantify and address some of the major challenges met by IMs in the field of combinatorial optimization. Using efficient sampling methods, we visually capture landscapes of problems having diverse structure and hardness and featuring strong degeneracies, which act as entropic barriers for IMs. Furthermore, we investigate energy barriers, local minima, and configuration space clustering effects caused by locality reduction methods when embedding combinatorial problems to the Ising hardware. For this purpose, we sample disconnectivity graphs of PUBO energy landscapes and their different QUBO mappings accounting for both local minima and saddle regions. We demonstrate that QUBO energy landscape properties lead to the subpar performance of quadratic IMs and suggest directions for their improvement.

^†^†preprint: APS/123-QED

I Introduction

Recent years have seen an increasing interest in using classical and quantum Ising machines (IM) for solving combinatorial optimization problems relevant for fundamental research and industrial applications [1]. Most of these devices rely on algorithms and physical principles implementing heuristic local search routines, e.g. discrete Monte-Carlo (MC) sampling (simulated annealing, parallel tempering) or noisy/chaotic continuous dynamics. Examples of the former are memristive crossbar arrays employed to efficiently perform vector-matrix multiplication [2, 3], or digital ASIC annealers [4]. The latter versions of IMs include coherent Ising machines, oscillator networks, quantum annealers, and others [5, 6, 7, 8]. The main attraction for the use of IM is the match of algorithm operations with their physical implementations, which offers to reduce the exponential scaling (polynomially or by a prefactor) of time-to-solution and/or energy-to-solution metrics for the hardest problems to date.

In this context, there are several outstanding challenges faced by IMs on both algorithmic and hardware levels, resulting in strong compromises being adopted in their practical deployment. One and possibly the most important difficulty concerns their application to practically interesting (large) problem sizes. The support of only second-order couplings of “spins”, together with connectivity topology constraints (e.g. the chimera graph [9]) results in the introduction of multiple auxiliary variables in order to either avoid higher-order terms, or reach necessary levels of sparsity. The added new variables can scale super-linearly in the number of original variables, not only further challenging the scaling to large problems, but also increasing the search space and modifying the optimization energy landscape. As a result, IMs can be limited to smaller-scale problems, and even these can become harder than their native formulation [10, 11] due to the worsened landscape geometry.

A second challenge lies in the algorithmic limitations of IMs. In particular, their reliance on local search heuristics fundamentally puts a bound on the problem classes they are capable of solving [12]. Being inherently local, IMs are prone to suffer from energy barriers rejecting MC moves, and from entropic barriers or degeneracies hampering both sensible exploration and exploitation [13]. Clear understanding of features of benchmark problems and the corresponding constraints of the Ising hardware is essential to facilitate future advances in the field.

A major challenge for optimization solver design and understanding is the navigation of the high-dimensional configuration space. Only a few methods have been developed over the years to visualize high-dimensional cost (energy) functions of optimization problems. One example is disconnectivity graphs (DG, also called barrier trees) [14, 15, 16], which aim to simplify the exponentially large configuration space by capturing local minima and their connectivity through energy barriers. It is possible to use DGs to gain quantitative insights into phenomena in a variety of applications ranging from metastable states of protein folding [17] to thermodynamic effects in Lennard-Jones systems [18], biomolecules [19], and spin glasses [20]. However, due to exponential complexity of DG construction and the degeneracy of optimization problems of interest, attaining energy landscape visualization is a computational feat on its own [21].

The contribution of this paper is as follows. Firstly, in Sec. III.1 we describe an extension for the efficient sampling algorithm of [22] to support DGs of energy landscapes featuring strong degeneracy of the configuration space (millions of states), capturing not only local minima but also degenerate saddle points. We further modify this approach providing means to construct DGs for quadratic optimization problems resulting from locality reduction due to IM hardware mapping. We achieve this by meaningfully reducing the search space over auxiliary variables and defining “effective” barriers. Secondly, in Sec. III.2 and Sec. III.3 using 3-SAT as a representative higher-order problem class, we plot DGs for problems of sizes inaccessible to the methods reported previously. With our methods we compare easy to hard instances, and random to industrial (structured) instances. Finally, in Sec. III.4 we demonstrate suboptimal energy landscape features of hardware embedding quadratization methods for 3-SAT from the perspective of clustering and entropy of energy minima, which are some of the culprits of algorithmic hardness [23, 24].

II Background

The Ising Hamiltonian (2-local), which IMs natively solve is:

H_{\mathrm{Ising}}=\sum_{i<j}^{N}J_{ij}s_{i}s_{j}+\sum_{i}^{N}h_{i}s_{i}\,,

(1)

where $s_{i}\in\{-1,1\}$ , $J_{ij}$ are spin interaction strengths, and $h_{i}$ denote local magnetic fields. Finding the ground state of Eq. 1 is an NP-Complete problem [25], and therefore solving (approximating) this task efficiently is of profound practical interest. Alternatively, the Ising Hamiltonian can be formulated as a quadratic pseudo-boolean function:

H_{\mathrm{QUBO}}=\sum_{i<j}^{N}Q_{ij}x_{i}x_{j}+\sum_{i}^{N}b_{i}x_{i}+C\,,

(2)

with binary variables $x_{i}\in\{0,1\}$ . Deciding the ground state of this function among $2^{N}$ possible configurations is what is commonly called quadratic unconstrained binary optimization (QUBO) problem. In this work we will use Ising and QUBO terms interchangeably due to their equivalence.

The generalization of QUBO to support higher order interactions of variables is what is usually referred to as PUBO (“P” for polynomial):

f(\mathbf{x})=f(x_{1},x_{2},\dots,x_{N})=\sum_{\{i\}_{k}\subseteq V}a_{\{i\}}% \prod_{\{i\}_{k}}x_{i}+C\,,

(3)

which is correspondingly equivalent to the k-local Ising. Here $\{i\}_{k}\subseteq V$ denote all possible subsets of the set of variables of highest cardinality $k\geq 1$ .

The present work devotes particular attention to the k-SAT problem (see below Eq. 4), one of the oldest and well-studied NP-Complete problems [26, 27, 28]. The motivation behind this choice lies in the fact that apart from being practically important for a variety of applications [29], k-SAT highlights the hardware and algorithmic challenges of IMs [30]. As will be discussed in this work, it features strong degeneracy of the solution space, an abundance of energy barriers, clustering of solutions, and can only be natively supported by the PUBO formulation, making it a formidable problem class for local search based quadratic IMs.

A general statement of the k-SAT decision problem is simple: is there a binary variable assignment $\mathbf{x}\in\mathbb{B}^{N}$ of the following conjunctive normal form (CNF):

(l_{i_{1,1}}\vee l_{i_{1,2}}\dots\vee l_{i_{1,k}})\wedge\dots\wedge(l_{i_{m,1}% }\vee l_{i_{m,2}}\dots\vee l_{i_{m,k}})\,,

(4)

where $i\in\{1,N\}$ , $m\in\{1,M\}$ , $l=x$ or $l=\bar{x}$ , so that all $M$ clauses are satisfied? With $k\geq 3$ it is NP-Complete like Ising/QUBO and thus worst case exponentially hard [31]. The k-local PUBO cost function Eq. 3 is easily obtained from Eq. 4 by the De-Morgan law of logic and using the equality $\bar{x}=1-x$ .

Many optimization landscape features have been established for hard constraint satisfaction problems [27, 32], of which k-SAT is a conventional example. By increasing the number of constraints from the small number, where the problem is easily satisfiable, to larger values up to a point of unsatisfiability, optimization landscapes undergo phase transitions where the dominating “simple” configuration region of connected solutions gets shattered into exponentially many clusters of solutions. Each cluster consists of several configurations which can be easily accessed from each other by local dynamics [33]. Furthermore, some of the variables in such cluster configurations could also be “frozen” [34, 35], i.e. remain unchanged regardless of the state of others. In other words, not only it can be difficult to traverse the landscape in the search of isolated clusters, but also to transition between such clusters, it is imperative to modify an extensive fraction of variables simultaneously; thus, non-local moves can be essential [36]. For the recent fundamental results on this topic we mention the works [37, 12, 38]. In order to highlight such hardness of combinatorial optimization, in this work we focus on illustrating how the landscapes are perceived by local search.

Early efforts to visualize energy/fitness landscapes arose in the context of theoretical chemistry and biology [14, 15, 16]. Authors of these works introduced the concept of disconnectivity graphs (DG) implementing a map of exponentially large potential energy configuration spaces to a two-dimensional tree. Fig. 1 sketches the idea behind such mapping: every leaf corresponds to a local minimum, while the branches represent the magnitude of energy barriers and connectivity (lowest barrier separation) of local minima with respect to each other.

Refer to caption — Figure 1: A simplified view of a disconnectivity graph. Every local minimum corresponds to a leaf; the height of energy barriers is reflected by the energy of branch connections. Arrangement of minima does not represent distance, i.e. by default has no explicit meaning.

In principle, arbitrary energy landscapes can be defined by a triplet [40]: $X$ being a set of configurations, neighbourhood $\mathcal{N}(x)$ of every state $x$ in $X$ , and energy/fitness function $f(X)\in\mathbb{R}$ . A local move of a solver can then be defined as a change of state from $x$ to a state in $\mathcal{N}(x)$ based on a certain search algorithm, e.g. choosing a random neighbour or the one with largest energy change.

Special attention, however, should be given to the degeneracy of such landscapes: many configurations form neighbourhoods which can be traversed by a local rule at no energy cost. Furthermore, the concept of a local minimum becomes ambiguous and non-local in degenerate landscapes [39, 41]. As Figs. 1(a) and 1(b) demonstrate, it is impossible to know if a descending energy path exists from the leftmost stable state $\mathbf{x}_{b}$ unless exploration finding the rightmost state $\mathbf{x}_{d}$ is performed. In this work we will call a stable “plateau” of Fig. 1(a) a saddle cluster, while the plateau in Fig. 1(b) will be called a local/global minimum cluster. The terminology of saddles/local minima of this work is chosen to resemble similar terms from continuous optimization. There, multiple works highlight profound difficulties of navigating high-dimensional landscapes arising from both types of critical points [42, 43].

The works of [39, 41, 21] have addressed the complexity of constructing DGs of degenerate landscapes with exhaustive enumeration of states. While being computationally infeasible for problems larger than $\approx 30\!-\!40$ variables, these works carried out classifications of saddles or local minima and the ways the states can be connected within a cluster and to other clusters. For example, a difference in possible connectivity of stable points is illustrated in Fig. 1(c). Approach of this work is closest to that of [20] in which the highlighted saddle points are treated as being disconnected. This choice is motivated by the golf-course-type energy landscapes of 3-XORSAT problems [24], where the paths to good solutions are mostly impeded by the entropic barriers, rather than the energy barriers.

In Fig. 1(c) there is no barrier between $\mathbf{x}_{b}$ and the global minimum $\mathbf{x}_{h}$ , but the path to it lies through a local minimum $\mathbf{x}_{d}$ . As a result, joining the states separated by a “hole” would result in a deceiving visualization hiding landscape features important for local search routines. With the methods this work (see Sec. III.1.2-III.1.3), we will address such diversity of scenarios by distinguishing the states with connections to global minimum from those separated from it by either barriers or “holes”. This will provide a clear explanation of why second order IMs can be greatly challenged by higher-order combinatorial optimization problems.

III Results

III.1 Methods

III.1.1 Locality reduction of k-SAT

The issue of locality reduction to support the formulation of Eq. 2 has been heavily investigated over the recent years [44], with the efforts aimed at introducing quadratizations that have the smallest possible number of auxiliary variables, minimize bit-precision requirements on the weights, or have algorithmically favourable properties, e.g. submodularity [45].

Perhaps the simplest method to meet the first requirement is to use quadratization by substitution, i.e. to introduce auxiliary variables $y$ for each pair of variables $x_{p}x_{q}$ in the original PUBO function (3) until the problem of required order is obtained, i.e. 2nd order for QUBO (2). The constraints are then enforced by either explicitly considering the equalities $x_{p}x_{q}=y$ , or by the addition of quadratic penalty terms in the cost function for each substitution:

	$\displaystyle f(\mathbf{x})$	$\displaystyle=$	$\displaystyle\pm x_{1}x_{2}\dots x_{k}\to$		(5)
		$\displaystyle\to$	$\displaystyle g(\mathbf{x},\mathbf{y})=\pm yx_{3}\dots x_{k}+P_{\pm}(x_{1},x_{% 2},y)\,.$		(5)

The choice of the $P$ function is not unique; for instance, one can make sure that

f(\mathbf{x})=\min_{y}g(\mathbf{x},\mathbf{y})

(6)

is satisfied, thereby preserving global minima of the original problem. Additionally, the choice of $x_{i}x_{j}$ admits some freedom and can be optimized for the minimum number of auxiliary variables by solving a vertex cover problem [46]. For simplicity, we use an efficient greedy routine to perform such optimization (for more details on quadratization methods outlined below cf. App. B).

For example, a commonly used quadratization penalty choice for locality reduction was suggested by Rosenberg (3rd order example) [47]:

\pm x_{p}x_{q}x_{k}=\min_{y}\left[\pm yx_{k}+(3y-2x_{p}y-2x_{q}y+x_{p}x_{q})% \right]\,,

(7)

where $x_{p}x_{q}$ was replaced by $y$ , and the remaining terms penalize the mismatch of $x_{p}x_{q}$ and $y$ . Thus, every appearance of $x_{p}x_{q}$ in the $k\geq 3$ terms of the PUBO function is substituted by the same $y$ , and for each such substitution $3y-2x_{p}y-2x_{q}y+x_{p}x_{q}$ penalty is added. This mapping is also implicitly used when the approach of reversible logic of [48, 49] is employed.

Performing standard simulated annealing optimization of 3-SAT problems we found a different mapping to be computationally superior to the Rosenberg version. The new mapping extends the quadratization ideas [50, 51] and [46] by approaching the monomials with positive and negative coefficients differently:

	$\displaystyle-x_{1}\dots x_{k}=\min_{y}\left[(k-1)y-\sum_{i=1}^{k}x_{i}y\right% ]\,,$
	$\displaystyle x_{1}\dots x_{k}=x_{2}\dots x_{k}-\bar{x}_{1}\dots x_{k}$
	$\displaystyle=x_{2}\dots x_{k}+\min_{y}\left[(k-1)y-\bar{x}_{1}y-\sum_{i=2}^{k% }x_{i}y\right]\,,$		(8)

and thus we call it KZFD-BG after the authors. However, contrary to the original application of these penalties individually for each term in the PUBO function, the variable substitution in this work is done as in the Rosenberg case, i.e. sharing substituted pairs across multiple monomials (see App. B). As a result, this yields the same number of native and auxiliary variables regardless of the mapping used. We address simulated annealing performance difference of the mappings in the context of PUBO and QUBO comparison in Sec. III.4.

Any locality reduction method modifies the “native” optimization landscape in non-trivial ways and can make its exploration algorithmically more challenging. In particular, Eq. 6 guarantees that for every stable state $\mathbf{x^{*}}$ of $f(\mathbf{x})$ with respect to a single bit-flip: $f(\dots,\bar{x}^{*}_{i},\dots)-f(\dots,x^{*}_{i},\dots)\geq 0,\,\forall i$ , the quadratization $g(\mathbf{x^{*}},\mathbf{y^{*}})$ is also in a stable state, which is given by $\min_{\mathbf{y}}g(\mathbf{x^{*}},\mathbf{y})\equiv g(\mathbf{x^{*}},\mathbf{y% ^{*}})$ . Indeed, the bit-flip energy changes with respect to auxiliary variables are non-negative due to the definition of $g$ : $g(\mathbf{x^{*}},\dots,\bar{y}_{i}^{*},\dots)\geq g(\mathbf{x^{*}},\dots,y_{i}% ^{*},\dots)=\min_{y}g(\mathbf{x^{*}},\mathbf{y})$ . In turn, the energy change of flipping $x$ is also non-negative because of the following chain:

	$\displaystyle g(\dots,\bar{x}_{i}^{},\dots,\mathbf{y^{}})\geq\min_{\mathbf{y% }}g(\dots,\bar{x}_{i}^{*},\dots,\mathbf{y})$		(9)
	$\displaystyle=f(\dots,\bar{x}_{i}^{},\dots)\geq f(\mathbf{x^{}})=g(\mathbf{x% ^{}},\mathbf{y^{}})\,.$		(10)

However, such correspondence does not hold in the opposite direction, i.e. a stable state of $g(\mathbf{x},\mathbf{y})$ is not guaranteed to be a stable state of $f(\mathbf{x})$ .

For illustration, in Fig. 3 a “linear” landscape represents states connected by a bitflip local move in the $N$ dimensional hypercube. The left sketch depicts a degenerate case with states $\mathbf{x}_{b}$ , $\mathbf{x}_{c}$ being stable, but $\mathbf{x}_{a}$ and $\mathbf{x}_{d}$ unstable. In turn, the right sketch shows how the quadratization mapping induces a rugged structure on top of the original manifold due to the auxiliary variables and the penalty terms. For every state $\mathbf{x}$ there is a corresponding minimizing auxiliary state $\mathbf{y}$ (possibly non-unique) according to Eq. 6. The low energy state $\mathbf{x}_{e}$ that was easily accessible by a greedy local search descend can now be separated by energy barriers due to the necessity to adapt $\mathbf{y}$ for every $\mathbf{x}$ .

If $\mathbf{x}$ and $\mathbf{y}$ are treated on equal footing, then one is forced to explore a configuration space $2^{|\{y\}|}$ times bigger than the native problem. The problem that already had highly non-trivial landscape structure caused by frustrations and long-distance correlations of variables, after quadratization will have these features hidden or worsened by the mismatch of “gradients” and energy barriers, ultimately causing significant deterioration of the IMs’ ability to find solutions [11].

Finally, we note that the sparsifying approaches that aim to reduce degrees of interaction between variables can introduce even more energy barriers into the problem due to auxiliary variables and penalties akin to the locality reduction methods. We do not focus on sparsification in this work; nonetheless, one example is given in App. A.

III.1.2 Sampling algorithm outline

In order to study and visualize with DGs the energy landscapes of degenerate optimization problems and their QUBO mapping modifications, we extend the Generalized Wang-Landau (GWL) [52, 53] sampling approach of the works [22, 54]. The GWL non-Markov Chain Monte-Carlo algorithm carries out random walks in the configuration space aiming to achieve approximately uniform attendance of all predefined energy levels $l\in[1,L]$ and all recorded basins of attraction $k\in[1,K]$ .

During preprocessing steps (see Fig. 4a) one defines the landscape partition into sectors in energy $[E^{1},E^{2},\dots,E^{L}]$ and affinity to a basin of attraction of a local minimum $\mathbf{x}^{k}$ : $\mathbf{x}\in B_{k,l}$ , if $E(\mathbf{x})\in[E^{l},E^{l+1})$ and $\mathbf{descent}(\mathbf{x})=\mathbf{x}^{k}$ . We note that the $\mathbf{descent}$ routine can be defined differently, and it makes sense to choose its definition mimicking the actual solver algorithm that would be used for solving studied problems in practice (see below Sec. III.1.3). Next, the sampling of states (random walk) is performed with the following acceptance probability:

p_{a\to b}=\min{\left[1,\exp{\left\{\beta(E_{a}-E_{b})\right\}\frac{\gamma_{k_% {a},l_{a}}}{\gamma_{k_{b},l_{b}}}}\right]}\,,

(11)

where $\gamma_{k,l}$ is a current estimation of the statistical weight of a sector:

\frac{1}{Z}\sum_{\mathbf{x}\in B_{l,k}}\exp{(-\beta E(\mathbf{x}))}\approx% \gamma_{k,l}\,.

(12)

This estimation is constantly updated, when the sector $B_{l,k}$ is visited, by:

\gamma_{l,k}^{t+1}=\gamma_{l,k}^{t}e^{f}\,,

(13)

where $f$ can follow a decreasing schedule usually starting from the value $f=1$ . It is numerically convenient to also define a histogram $\theta^{t+1}_{l,k}\equiv\ln\gamma^{t+1}_{l,k}=\theta^{t}_{l,k}+f$ , which is initialized at 0 for all $l,k$ at the beginning of the algorithm. If the exploration of as many minima as possible is preferred, then $f$ is not decreased over time [54], but in this case the estimation of $\gamma$ would not be accurate [55].

When a step $\mathbf{x}_{a}\to\mathbf{x}_{b}$ is tried (accepted or not), one saves the energy $\max{[E_{a},E_{b}]}$ as a current energy barrier estimation between basins $k_{a}$ and $k_{b}$ . This “educated guess” can then potentially be improved with the ridge descent algorithm [22]. If the zero energy barrier is found for a state perceived as local minimum (e.g. $\mathbf{x}_{f}\to\mathbf{x}_{g}\to\mathbf{x}_{h}$ in Fig. 1(c)), the status of such minimum is changed to a saddle, and its histogram is added to the corresponding lower basin of attraction (e.g. $\mathbf{x}_{h}$ ). If a saddle is connected to several lower basins, then the visits are distributed uniformly at random among them.

We keep track of only $K$ number of lowest in energy local minimum/saddle clusters adaptively uniting them by discovered connectivity and thus allowing space for additional clusters to be taken into account. In addition, we define a special $K+1$ column of the histogram for all of the states that do not fit into the first $K$ clusters [54]. If a certain basin has insufficiently flat histogram of visits, then it was not explored enough during sampling due to its negligible statistical weight. We found these states to be rare and at high energies; thus, if encountered, these minima are not displayed in the final DG.

More details on DG construction of this work are provided in App. C.

III.1.3 Extension for degeneracy

By design, GWL uniformly samples states across basins of attraction of local minima and energy levels. Its main purpose in this work is to discover as many regions of the landscape as possible without being stuck in a particular place, thereby not biasing the DG estimation. What is crucial in the definition of the algorithm is the $\mathbf{descent}$ routine, which identifies local minima and saddle points. When a problem has no degeneracies, e.g. S-K spin glass with Gaussian weights, one can define the descend (hill climbing) as the steepest descent, i.e. spins with the highest energy reduction are flipped. However, in this work we are interested in highly degenerate integer valued optimization problems, where such definition is not possible.

As briefly discussed above in Fig. 2, local minima and saddles are perceived differently depending on the algorithm employed for solving such problems. In this case, constructing exact DGs, apart from being infeasible for large problems, may result in misleading conclusions. For instance, a very large saddle point may have only one zero energy exit from itself, which may never be found by a local search routine, effectively being a local minimum, but it would still be depicted as a saddle on a DG, or even worse, not shown at all if saddles are not considered.

Here we aim to balance between efficient exploration of the landscape and visualization of relevant landscape features. For this purpose, we use random descent (see Fig. 4b), in which a greedy local move is performed in the first-seen random direction decreasing the energy. Once a stable state is encountered (white circle in Fig. 4b), a limited exploration of the “plateau” region is performed until either the budget of allowed moves is exhausted, or an exit from the saddle is found (green circle in Fig. 4b). The states encountered during such exploration of saddles/local minima are stored in a single cluster (including the unstable exit states, i.e. 4 states are stored in Fig. 4b). If some of the stored states are encountered again during GWL sampling, all of the states that belong to a single cluster are joined, with their histograms summed up.

Previously, the works [56, 57] addressed the difficulty of clustering in the context of improving the uniform sampling of the ground states. Once a ground state $\mathbf{x}$ was found, a ballistic search (BS) routine was carried out: starting from some global minimum state, a chain of zero energy states was constructed by flipping every variable maximum once. With the use of such chains, the cluster sizes and thus connectivity of states were estimated more reliably. We experimented with this method for clusters at every energy level and found it useful for clustering remote configurations when the number of states becomes infeasibly large.

Finally, in Fig. 4 we illustrate breadth-first search (BFS) that we use to exactly evaluate sizes of clusters at the end of sampling, when such statistics are of interest. Both stable and unstable states (exits) participate in BFS, and we confine their number by a predefined bound of states per energy level (usually $10^{7}$ in this work).

III.1.4 Extension for QUBO mapping

Two neighbouring configurations are considered to be a part of a single degenerate cluster in the native (PUBO Eq. 3) landscape if a local move separating them is of zero energy cost, as shown for states $\mathbf{x}_{b}$ and $\mathbf{x}_{c}$ in Fig. 3. The state $\mathbf{x}_{d}$ does not belong to a cluster since it has a move of negative energy to the state $\mathbf{x}_{e}$ . However, in the special case of the QUBO mapping landscape, the same state $\mathbf{x}_{d}$ would now be considered a saddle point since the decrease in energy is only achieved through an intermediate $\mathbf{y}$ adjustment.

The presence of a barrier in QUBO for a transition $\mathbf{x}_{a}\to\mathbf{x}_{b}$ (when originally there could be no barrier at all) puts the local search at a disadvantage due to the higher rejection rate of local moves. Raising the temperature of sampling, e.g. of simulated annealing, would not fully solve the problem since it would harm the necessary exploitation of the low-energy manifold. Additionally, once local search is complete, a solver discards $\mathbf{y}$ values using the states of $\mathbf{x}$ as a solution. The search over the subspace of $\mathbf{y}$ , thus, does not look for new solutions, but rather varies the induced QUBO barriers between the neighbours in the $\mathbf{x}$ space.

To highlight the significance of landscape ruggedness of quadratization compared to the native space and facilitate fairer comparison, here we provide QUBO with additional capabilities by assuming that a local search solver can “look beyond” the QUBO landscape barriers to a certain adjustable degree. Fig. 5 depicts the case where the penalty terms of a QUBO mapping introduce interactions that favour auxiliary variable states different by Hamming distance 3 for two native configurations separated by a single bit-flip, i.e. $x_{i}$ and $\bar{x}_{i}$ .

If the problem is approached head-on, one would need to either climb a steep barrier of $x_{i}\to\bar{x}_{i}$ and then adapt 3 auxiliary variables, or sequentially flip each of $y_{a}\to\bar{y}_{a}$ , i.e. climb a long barrier. Such scenario of long barriers is argued to be difficult for tunnelling in quantum annealers [58], considering that the mapping quadratization is essential due to strict hardware limitations. We note, however, that with every sequential flip of $y_{a}$ , the barrier of the $x_{i}\to\bar{x}_{i}$ move is reduced, i.e. allowing more $y_{a}$ to be explored raises the chance to overcome the QUBO barriers introduced by the mapping in the first place.

As a result, we augment disconnectivity graph analysis by introducing a QUBO factor $F$ , which stands for the maximum number of allowed auxiliary variable flips for every native move $x_{i}\to\bar{x}_{i}$ . With large enough $F$ the original (PUBO) landscape is recovered, while for small $F$ values “effective” energy barriers are still present, and thus the landscape connectivity is worsened by the mapping. In addition, $F$ serves as means to compare different QUBO mappings head-to-head, with mappings allowing small $F$ being arguably better for the local search of IMs. We perform such comparison supported by the simulated annealing results in Sec. III.4.

The algorithm to compute the effective barriers is as follows. First, at a fixed position in the space of $\mathbf{x}$ we set the auxiliary variables $\mathbf{y}$ in a valid state required by Eq. 6 ¹¹1 This valid state may not be unique, i.e. there could be degeneracy in the auxiliary variable space $\mathbf{y}$ for each fixed $\mathbf{x}$ . In this case we consider all such configurations and take the minimum of the computed effective QUBO barrier. . Next, the bit-flip energy change $\Delta E_{\mathrm{QUBO}}(\dots,x_{i}\to\bar{x}_{i},\dots\mathbf{y})$ is computed, which corresponds to the “vanilla” QUBO barrier at $F=0$ . Second, in order to calculate the effective QUBO barrier of $x_{i}\to\bar{x}_{i}$ , we list all auxiliary variables $\{y_{a}\}$ that interact with $x_{i}$ , i.e. $Q_{ia}\neq 0$ . Out of all listed $y_{a}$ , we choose $F$ variables with the minimum values of $\Delta E(\dots,\bar{x}_{i},\dots,y_{a}\to\bar{y}_{a},\dots)<0$ . Finally, the effective barrier (see Fig. 5) $\Delta E_{\mathrm{QUBO,F}}$ is obtained by (assuming $y$ variables don’t interact with each other):

		$\displaystyle\Delta E_{\mathrm{PUBO}}(x_{i}\to\bar{x}_{i})\leq\Delta E_{% \mathrm{QUBO,F}}(x_{i}\to\bar{x}_{i})\equiv$		(14)
		$\displaystyle\Delta E_{\mathrm{QUBO}}(x_{i}\to\bar{x}_{i})+\sum_{a=1}^{F}% \Delta E(\bar{x}_{i},y_{a}\to\bar{y}_{a})<$
		$\displaystyle\Delta E_{\mathrm{QUBO}}(x_{i}\to\bar{x}_{i})\,.$

III.1.5 Disconnectivity graphs notation

In the following sections we adopt the following convention when plotting disconnectivity graphs (e.g. see Fig. 6). The y-axis stands for the PUBO/QUBO energy. Every circle represents a separate local minimum/saddle cluster. The diameter of such circle corresponds to the square root of the cluster degeneracy, i.e. the area of a circle is proportional the number of connected configurations within a cluster. There is no explicit meaning behind the x-axis distance between the DG leaves and branches. If a circle is shown to have a zero-energy connection to lower clusters, then it represents a saddle cluster. If two or more saddles appear connected, then the situation depicted in Fig. 1(c) between $\mathbf{x}_{b}$ and $\mathbf{x}_{f}$ is in place. Red clusters in Figs. 7a and 7b have no direct connection to the global minimum denoted by green, i.e. all local minima are red, as well as some saddles (e.g. the state $\mathbf{x}_{b}$ in Fig. 1(c)). Blue saddle clusters in Figs. 7a and 7b were found to be connected to the global minimum by a descent algorithm of choice without energy barriers (e.g. $\mathbf{x}_{f}$ in Fig. 1(c)).

Every DG is accompanied by a histogram of the number of states obtained with BFS at each energy level. The degeneracy of every cluster is denoted by $\mathcal{N}_{k}$ , while the total number of states per energy is plotted as a normalized by $N$ (number of variables) natural logarithm of $\sum_{k}\mathcal{N}_{k}$ . The grey histogram shows the total number of BFS aggregated states (including saddle exits). The blue and red histograms count the corresponding stable states shown by circles on the DG.

III.2 Easy and hard problems

The finite size fluctuations of relatively small random 3-SAT problems usually employed for IM benchmarking in practice results in a strong spread of their hardness. In this section, with the help of DGs we aim to highlight the landscape features exhibited by such instances of different hardness. In [11] IM TTS ${}_{99}$ ²²2Time-to-solution $99\%$ (TTS ${}_{99}$ ) is the time needed for a stochastic solver to reach a solution at least once with probability $p=0.99$ . was estimated for a collection of randomly generated problems from the open benchmarking library SATLIB [61]. The instance uf50-920 shown in Fig. 5(a) was observed to be relatively “hard” with $\approx 57000$ TTS ${}_{99}$ algorithmic steps, while the instance uf50-933 in Fig. 5(b) was “easy” with $\approx 3700$ steps.

A clear distinction is seen in both the number of global minimum configurations, as well as the number of distinct global minimum clusters (1 vs 7). It is in general unclear what is the property of optimization landscapes that would measure hardness best for a particular solver. In [36] authors use the number of global minimum clusters as a proxy for predicting Survey Propagation’s ability to find global minima in 4-SAT random instances. The same correlation is clearly observed in this work. However, instead of trying to predict the hardness of instances by DGs, we demonstrate how DGs can be used to gain insights into experimentally observed algorithm behaviours by a diverse set of sampled landscape properties.

In addition to much larger cardinality of the set of global minima, the easy problem features a large saddle point at $E=1$ ( $20877$ states) which is only connected to global minima and acts as a basin of attraction for solvers. In comparison, the hard instance contains similar saddles with only $\approx 206$ states. There were no local minima found at $E\geq 6$ in the easy instance, while the hard instance features local minima even at $E=8$ .

At every energy level we observe massive saddle points which are connected to the global minimum. We note, however, that it becomes very important for local search not to descend in the wrong local minimum cluster, even though in principle it is possible to descent to a solution without overcoming any barriers. With this observation we highlight the higher ratio of local minima/disconnected saddles in the “hard” instance compared to the “easy” one.

The majority of energy barriers in both tested 3-SAT problems is the minimum possible one, $\Delta E=1$ (as also previously observed in [62]). In other words, the constructed DGs illustrate the significance of entropic barriers that are determined by probabilities of descending into better areas of the landscape, which resulted in the observed $\approx\!15$ times difference in the time-to-solution metric.

III.3 Random and industrial problems

The uniform random 3-SAT problems are a common benchmark for testing the performance of heuristic solvers. In the thermodynamic limit of $N\to\infty$ and $M\to\infty$ their static properties are understood within the framework of the replica symmetry breaking (or cavity) methods of statistics [32]. In general, the lack of structure of random CSP causes state-of-the-art exact solvers to struggle near the phase transition ratio $\alpha=M/N$ and ultimately take exponential time to find solutions due to the difficulty of truncating the search space based on exponentially growing deep decision trees [26].

On the other hand, it is not a difficult task to engineer a structured problem to challenge a heuristic solver. A very small basin of attraction of a global minimum with overall rugged landscape would make a local search heuristic relying on stochastic exploration get lost. As a result, stochastic by design, IMs can have a hard time outperforming exact routines exploiting inherent structures of problems. In order to draw conclusions about the capabilities that IMs would need to tackle both combinatorial optimization classes, we employ DGs to visualize the distinction in landscape properties between fully random and structured “industrial” instances.

To represent the structured industrial class, we generated a 3-SAT formulation of the factoring problem of the number 55 using the method from [63]. This resulted in a 3-SAT instance having 68 boolean variables and 248 clauses with only one global minimum. For comparison, a random uniform instance near the phase transition ratio $\alpha$ was generated of the same 68 variable size, but with 295 clauses. We obtained an instance with a single global minimum cluster having 4 configurations. The DG of the uniform random instance is shown in Fig. 6(a), while the DG of factoring — in Fig. 6(b).

We truncated the DG of the random problem to energy levels 5 or lower, resulting in $\approx 400$ distinct local minimum/saddle clusters, while the semi-prime factoring DG has managed to fit as many as $\approx 650$ clusters already at the energy $E\leq 3$ . This indicates much more pronounced ruggedness of the factoring 3-SAT problem with weak connectivity of saddle points.

As in the previous section, we observed exponentially large connected saddle clusters at every energy level of random 3-SAT: it is possible to traverse huge distances in the optimization landscape without the need to overcome any energy barriers. It means that the hardness of this problem class arises mainly from the entropic barriers, leaving gradient-based solvers oblivious about meaningful exploration directions.

In comparison, the number of saddle clusters disconnected from the global minimum in the factoring problem constitute a much bigger fraction of the overall number of captured clusters. Moreover, the number of stable states of the factoring problem at the given energies is much smaller than in the random case (we didn’t need to impose sampling limits), indicating that the energy barriers are the main contributor to the hardness. These results support the conclusion of [64], where authors argue there is no evidence for an advantage of employing SAT reductions for factoring problems, both using classical SOTA SAT solvers, and their hypothetical quantum counterparts.

The limitation of DGs is that they compress combinatorial landscape information to local minima and barriers between them, while the distances in solution space are left aside. Since we are able to store all of the discovered by GWL+BFS states, one approach to probing such distance information is by calculating the mutual overlaps of local minima. The mutual overlap of states for Ising formulated problems is defined by $q_{ab}=\frac{1}{N}\sum_{i=1}^{N}s^{a}_{i}s^{b}_{i}$ [28], where $s_{i}=2x_{i}-1$ . We show computed histograms of overlaps of local minima for the given random and industrial instances in Figs. 6(c) and 6(d).

The histograms of random and structured problems exhibit distinct behaviour with the random instance having the majority of states at zero overlap values. This property is not explicitly shown by the DG visualization. It is implied, however, by the very large saddle clusters. Compared to the random instance, the local minima of the factoring problem at $E\leq 2$ are closer to each other without showing evidence of a gap in the overlaps [12]. Thus, IMs can be challenged by different landscape features depending on the problem class, suggesting a strong algorithmic need for specialization.

III.4 QUBO mappings of 3-SAT

In this section we study energy landscapes of QUBO mappings of 3-SAT using the DGs construction extension introduced in Sec. III.1.4. We explicitly state the $F$ factor when mappings are compared with each other. We say that $F=\infty$ when all auxiliary variables are probed, essentially recovering the native (PUBO) landscape. At a given $F$ the “effective” barrier definition is illustrated in Fig. 5.

In Fig. 8 we plot a KZFD-BG QUBO mapping landscape of the instance from Fig. 5(a) truncated to the subspace of energies $E\leq 5$ . The QUBO barrier factor was chosen to be $F=1$ , meaning that at every step $\mathbf{x}_{a}\to\mathbf{x}_{b}$ only one QUBO auxiliary variable is allowed to be flipped in order to overcome the QUBO barriers between the native states $\mathbf{x}$ . One can observe the following features of the QUBO landscape of 3-SAT:

•

The connectivity of states is drastically reduced with large saddle point clusters of the PUBO landscape shattered into multiple disconnected saddles or local minima in the QUBO landscape (in total $1621$ clusters). This has a direct negative effect on the ability to find global minima for the local search heuristics at low temperatures/noise.
•

The global minimum cluster (which consists of 2 states for this instance) is preserved, but only $20$ from the original $206$ configurations were found to be connected from energy $E=1$ towards the solution. In other words, blue clusters have become red clusters. The same behaviour is observed higher in energies, i.e. local search faces energy barriers and “holes” in addition to entropic barriers.

In order to highlight the necessity to carefully approach mapping into Ising hardware, we would like to directly compare the QUBO mappings introduced in Sec. III.1.1 to each other from the perspective of connectivity of states (clustering) at different values of the QUBO factor $F$ . We use 1000 SATLIB instances of size 50 with 218 clauses (uf50 1-1000) to accumulate statistics from sampled DGs. On average, our QUBO mapping scheme of variable substitution and penalty terms introduced $138\pm 4$ auxiliary variables.

In Fig. 8(a) we show the histogram of the number of local minimum/disconnected saddle clusters $\mathcal{N}(s)\equiv\exp{(N\Sigma(s))}$ of size $S\equiv\exp{(Ns)}$ for KZFD-BG and Rosenberg mappings, and for the native space. The parameter $\Sigma(s)$ is usually referred to as cluster complexity, while $s$ is the cluster entropy [27]. As mentioned in Sec. III.1.3, the distinct clusters are sampled with the GWL algorithm; however, the cluster entropy estimations are improved further by BFS. The results are averaged over all considered instances for the states sampled at energies $e=E/N\leq 0.04$ .

We explicitly plot the global minima ( $E=0$ ) distribution as a sanity check: the RSB theory predicts its maximum value in the thermodynamic limit being at the entropy $s\approx 0.06$ , while the curve itself should be below 0 complexity when the clause-to-variable ratio of 3-SAT is above the phase transition value $4.267$ (we have $4.36$ ) [32]. Both features are present for our sampled data.

We observe shattering of the native landscape clusters by the Rosenberg mapping to be stronger than that of the KZFD-BG mapping. This result can be interpreted as follows. On average, in order to transition (overcome the barrier) from state $\mathbf{x}_{a}$ to state $\mathbf{x}_{b}$ having the same energy, the Rosenberg mapping needs to flip at least $F_{\mathrm{Ros}}$ auxiliary variables, introduced by quadratization, while KZFD-BG would safely transition after flipping only $F_{\mathrm{KZFD}}<F_{\mathrm{Ros}}$ . As a result, $F$ can be seen as a measure of the ruggedness or shattering of the quadratizatized optimization landscape induced by the mapping.

As displayed by the histogram in Fig. 8(a), both QUBO mappings feature large clusters that are not accounted for in the PUBO case. These are the native space saddle clusters connected to the global minimum (thus not shown on the histogram), which were transformed by quadratization to either local minima or disconnected saddles. In Sec. III.2 we discussed the value of connected saddle points at low temperatures/noise for finding global minima using IMs. QUBO mappings, thus, can transform saddle points into local minima effectively impeding the descend in energy. The ratio of local minima/disconnected saddles to all stable states is shown in Fig. 8(b). With increasing $F$ we approach the native landscape faster for the KZFD-BG mapping than for the Rosenberg mapping. This suggests a potential algorithm for IMs that are forced by hardware to use quadratization methods. With sufficient exploration of the auxiliary space, it is possible to recover the native (PUBO) landscape geometry and thereby benefit from the reduced number of local minima/disconnected saddles, provided that the costs of specific hardware implementations do not outweigh such benefits in terms of time-to-solution/energy-to-solution metrics.

With our analysis we would also like to highlight the effect of choosing quadratization methods on performance of solvers. While all such methods preserve global minima, the geometry of combinatorial landscapes may change beyond recognition, thereby drastically decreasing the local search performance. To support the observed landscape advantage of KZFD-BG mapping over Rosenberg, we performed simulated annealing for a collection of SATLIB 3-SAT problem instances in Fig. 10. The advantage of the KZFD-BG mapping clearly exhibits itself in the solver performance giving dominating TTS ${}_{99}$ for the majority of instances. For the problem size of $N=50$ none of the KZFD-BG mapped instances timed-out within our imposed computational budget of SA sweeps. On the other hand, the solver using the Rosenberg mapping failed to find a solution for several instances even at this modest problem size.

IV Conclusion

In this work we have suggested methods to sample disconnectivity graphs of degenerate combinatorial optimization problems, while also introducing extensions for quadratic embeddings motivated by hardware constraints of Ising machines. DGs have proven to be able to visually capture energy landscape properties of instances with different structure (industrial and random), hardness, and order (quadratic and higher order). To characterize clustering/ruggedness of the configuration space arising from locality reduction, we have introduced a new method, QUBO factor $F$ . From this perspective we have discussed the reasons behind observed experimental performance gap between different QUBO mappings, as well as between QUBO and PUBO.

The directions for future work include investigating other definitions of neighbourhoods beyond the simple bit-flip in order to visualize and gain intuition into how optimization landscapes are perceived by different local (or non-local) search routines. For example, isoenergetic cluster moves [65] allow solvers to make large Hamming distance steps, defining a different neighbourhood for each configuration, thus a new DG with distinct connectivity of states. Moreover, understanding of the energy landscape geometry is of great importance in a variety of fields ranging from inference and learning in energy-based models [66] to attractor dynamics and storage capacity in associative memories [67, 68]. One application example is non-equilibrium inhomogeneous sampling methods [36, 69] which essentially modify energy barriers reducing the hardness of sampling of high-quality and diverse solutions.

Sparsifying methods motivated by the available connectivity topology of the Ising hardware constitute a complementary problem which can also be studied with the methods of this work. Finally, the distinct properties of auxiliary variables imply the possibility to introduce adaptive algorithms leveraging the specific native-auxiliary interactions within the constrains of Ising machines.

V Acknowledgements

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under the Air Force Research Laboratory (AFRL) Agreement No. FA8650-23-3-7313. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. The authors gratefully acknowledge computing time on the supercomputer JURECA [70] Forschungszentrum Jülich under grant no. “optimization”. We also gratefully acknowledge the generous funding of this work under NEUROTEC II (Verbundkoordinator / Förderkennzeichen: Forschungszentrum Jülich / 16ME0398K) by the Bundesministerium für Bildung und Forschung.

Appendix A Sparsification by auxiliary variables

Let’s assume that due to hardware limitations we are unable to support full interaction connectivity of a variable $x_{1}$ of the PUBO function in Eq. 3. Due to its multilinear form, we can split the interactions of $x_{1}$ , i.e.

f(\mathbf{x})=A(x_{i_{a}},\dots,x_{k_{a}})x_{1}+B(x_{i_{b}},\dots,x_{k_{b}})x_% {1}\,,

(15)

where $A$ and $B$ are independent functions (assuming such exist). $x_{1}$ in the second term can be substituted by an auxiliary variable $y$ with the introduction of a penalty as follows:

g(\mathbf{x},y)=Ax_{1}+By+P(x_{1}+y-2x_{1}y)\,,

(16)

which obeys $f(\mathbf{x})=\min_{y}g(\mathbf{x},y)$ as in locality reduction methods if $P\geq|B|$ . As a result, the local search move $(x_{1},y)=(0,0)\to(\bar{x}_{1},\bar{y})=(1,1)$ can only be made with single flips through a higher energy barrier than in the denser original formulation (see Tab. 1).

Table 1: Sparsification truth table

$x$	$y$	$g(\mathbf{x},y)$	$f(\mathbf{x})$
0	0	0	0
0	1	$B+\|B\|$	0
1	0	$A+\|B\|$	$A+B$
1	1	$A+B$	$A+B$

Appendix B 3-SAT to QUBO mappings

In this section we describe in detail the mappings of 3-SAT problems formulated as conjugate normal forms (CNF) to quadratic pseudo-boolean functions (QUBO).

Notation:

$x$ are boolean variables, $l$ are literals that stand for either $x$ or its negation $\bar{x}\equiv 1-x$ , $y$ are boolean auxiliary variables.

The problem of maximizing the total number of clauses of size $k=3$ is reformulated as a minimization problem of a third order pseudo-boolean polynomial of literals as follows (inverting the expression and using De-Morgan law):

	$\displaystyle(\bar{l}_{1_{1}}\vee\bar{l}_{2_{1}}\vee\bar{l}_{3_{1}})\wedge(% \bar{l}_{1_{2}}\vee\bar{l}_{2_{2}}\vee\bar{l}_{3_{2}})\wedge\dots$
	$\displaystyle\rightarrow l_{1_{1}}l_{2_{1}}l_{3_{1}}+l_{1_{2}}l_{2_{2}}l_{3_{2% }}+\dots\,,$		(17)

where each $l_{i}=x_{a_{i}}$ or $l_{i}=\bar{x}_{a_{i}}$ , $i\in[1,3M]$ , $a\in[1,N]$ . The simplest mapping of this expression to QUBO (quadratization) would be to introduce the Rosenberg penalties for every term in the sum:

		$\displaystyle l_{1_{1}}l_{2_{1}}l_{3_{1}}+l_{1_{2}}l_{2_{2}}l_{3_{2}}+\dots=% \min_{y\in\mathbb{B}}g(l(x),y)\,,$		(18)
		$\displaystyle g(l(x),y)=\sum_{i\in[1,M]}y_{i}l_{3_{i}}+(3y_{i}-2y_{i}l_{1_{i}}% -2y_{i}l_{2_{i}}+l_{1_{i}}l_{2_{i}})\,.$		(18)

The validity of such quadratizaton (Eq. 6) directly follows from the fact that auxiliary variables $y_{i}$ are introduced independently for each term of Eq. (17) and $l_{1}l_{2}l_{3}=\min_{y}\left[yl_{3}+(3y-2yl_{1}-2yl_{2}+l_{1}l_{2})\right]$ .

B.1 Non-term-wise Rosenberg

In order to get the “classic” Rosenberg [47] quadratizaton, we write the PUBO of Eq. (17) for variables $x$ :

	$\displaystyle l_{1_{1}}l_{2_{1}}l_{3_{1}}$	$\displaystyle+$	$\displaystyle l_{1_{2}}l_{2_{2}}l_{3_{2}}+\dots=\sum_{i<j<k}S_{ijk}x_{i}x_{j}x% _{k}$		(19)
		$\displaystyle+$	$\displaystyle\sum_{i<j}W_{ij}x_{i}x_{j}+\sum_{i}B_{i}x_{i}+C\,.$		(19)

The pairs $x_{m}x_{n}$ in a set covering all terms of order 3 are substituted by auxiliary variables $y_{(mn)}$ with the addition of a penalties as in Eq. (18):

	$\displaystyle\sum_{i<j<k}S_{ijk}x_{i}x_{j}x_{k}+\dots=\min_{y}\sum_{(mn),k}S_{% (mn)k}y_{(mn)}x_{k}+$
	$\displaystyle+\sum_{(mn)}P^{R}_{(mn)}(3y_{(mn)}-2y_{(mn)}x_{n}-2y_{(mn)}x_{m}+% x_{m}x_{n})$
	$\displaystyle+\dots[\leq\mathrm{2nd\;order\;terms}]\,,$		(20)

where the lower bound penalty coefficients are now index-dependent [71]:

	$\displaystyle P^{R}_{(mn)}\geq\max\left[\sum_{k}S^{+}_{(mn)k},-\sum_{k}S^{-}_{% (mn)k}\right]$
	$\displaystyle S^{+}_{(mn)k}>0,\;S^{-}_{(mn)k}<0\,.$		(21)

B.2 Non-term-wise KZFD-BG

Here we modify the Rosenberg mapping of Sec. B.1 applying quadratization ideas of [50, 51, 46], where the positive and negative monomials get different penalty terms of Eq. 8 (here third order):

	$\displaystyle-x_{1}x_{2}x_{3}=\min_{y}\left[2y-\sum_{i=1}^{3}yx_{i}\right]\,,$
	$\displaystyle x_{1}x_{2}x_{3}=x_{2}x_{3}-\bar{x}_{1}x_{2}x_{3}$
	$\displaystyle=x_{2}x_{3}+\min_{y}\left[2y-y\bar{x}_{1}-\sum_{i=2}^{3}yx_{i}% \right]\,.$		(22)

Rearranging the summands in these equations, we get for the positive monomial:

x_{m}x_{k}x_{n}\to x_{k}y+(y-yx_{m}-yx_{n}+x_{m}x_{n})\,,

(23)

and for the negative monomial:

-x_{m}x_{n}x_{k^{\prime}}\to-yx_{k^{\prime}}+y-x_{m}x_{n}+(y-yx_{m}-yx_{n}+x_{% m}x_{n})\,.

(24)

As a result, an arbitrary 3rd order pseudo-boolean function is quadratizatized as:

	$\displaystyle\sum_{i<j<k}S_{ijk}x_{i}x_{j}x_{k}+\dots=\min_{y}\sum_{(mn),k}S^{% +}_{(mn)k}y_{(mn)}x_{k}$
	$\displaystyle+\sum_{(mn),k^{\prime}}S^{-}_{(mn)k^{\prime}}(y_{(mn)}x_{k^{% \prime}}-y_{(mn)}+x_{m}x_{n})$
	$\displaystyle+\sum_{(mn)}P_{(mn)}(y_{(mn)}-y_{(mn)}x_{n}-y_{(mn)}x_{m}+x_{m}x_% {n})$
	$\displaystyle+\dots[\leq\mathrm{2nd\;order\;terms}]\,,$		(25)

where $S^{+}_{(mn)k^{\prime}},S^{-}_{(mn)k^{\prime}}$ denote coefficients of positive and negative monomials. The penalty parameters $P^{K}_{(mn)}$ are chosen as:

	$\displaystyle P^{K}_{(mn)}\geq\sum_{k}S^{+}_{(mn)k}-\sum_{k}S^{-}_{(mn)k}$
	$\displaystyle S^{+}_{(mn)k}>0,\;S^{-}_{(mn)k}<0\,.$		(26)

Indeed, for every auxiliary variable index $(mn)$ we have

	$\displaystyle g(x,y_{(mn)})=\left(N^{+}+N^{-}\right)y_{(mn)}-\|N^{-}\|(x_{m}x_{n% }-y_{(mn)})$
	$\displaystyle+P^{K}_{(mn)}(y_{(mn)}-y_{(mn)}x_{n}-y_{(mn)}x_{m}+x_{m}x_{n})\,,$		(27)

where we defined

	$\displaystyle N^{+}\equiv\sum_{k}S^{+}_{(mn)k}x_{k},\;N^{-}\equiv\sum_{k^{% \prime}}S^{-}_{(mn)k^{\prime}}x_{k}$
	$\displaystyle\|N^{+}\|=\sum_{k}S^{+}_{(mn)k},\;-\|N^{-}\|=\sum_{k}S^{-}_{(mn)k}\,.$		(28)

As a result, $f(x)=\min_{y}g(x,y)$ due to Eq. 26 is guaranteed, as shown in Tab. 2. Compared to the Rosenberg mapping, non-term-wise KZFD-BG has smaller dynamic range 2nd order interactions, since that $2P^{R}>P^{K}$ .

The same native variable pairs $x_{i}x_{j}$ are chosen for substitution for both mappings in this work for fair comparison. Their choice is a result of a greedy (i.e. efficient) optimization algorithm choosing the most frequent variable pairs which achieves significant reduction (possibly not the optimal ³³3Finding a minimal set of pairs of variables to be substituted is in general an NP-Hard problem.) of the QUBO space compared to the term-wise methods (e.g. Eq. 18).

Table 2: KZFD-BG QUBO mapping truth table for every substituted pair

x_{n}x_{m}

and an auxiliary

y_{(mn)}

$y_{(mn)}$	$x_{m}$	$x_{n}$	$g(x,y_{(mn)})$	$f(x)$
0	0	0	0	0
1	0	0	$N^{+}+N^{-}+\|N^{-}\|+P^{K}_{(mn)}$	0
0	1	0	0	0
1	1	0	$N^{+}+N^{-}+\|N^{-}\|$	0
0	1	1	$-\|N^{-}\|+P^{K}_{(mn)}$	$N^{+}+N^{-}$
1	1	1	$N^{+}+N^{-}$	$N^{+}+N^{-}$

Appendix C implementation details

The original code developed for this work uses GWL sampling described in Sec. III.1 and has the following output: GWL histogram of visits to basins of attraction and energy levels, sampled cluster sizes, symmetric matrix of energy barriers between clusters. The information about the connectivity of clusters and their type (local minimum/saddle) is then derived from the barrier matrix during postprocessing. As input the program takes the conjugate normal form of a 3-SAT problem, or the equivalent 3rd order sparse PUBO formulation.

The hyperparameters that can be tuned of DG sampling are: number of parallel threads of sampling, total GWL steps per thread, hill climbing type (random/steepest), usage of ballistic search, cluster exploration and breadth-first search limits.

Extended disconnectivity graphs construction from the aforementioned data sampled with the GWL algorithm is based on the functions from pele library [73].

References

Mohseni et al. [2022] N. Mohseni, P. L. McMahon, and T. Byrnes, Ising machines as hardware solvers of combinatorial optimization problems, Nature Reviews Physics 4, 363 (2022).
Mahmoodi et al. [2019] M. R. Mahmoodi, M. Prezioso, and D. B. Strukov, Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization, Nature Communications 10, 5113 (2019).
Cai et al. [2020] F. Cai, S. Kumar, T. Van Vaerenbergh, X. Sheng, R. Liu, C. Li, Z. Liu, M. Foltin, S. Yu, Q. Xia, J. J. Yang, R. Beausoleil, W. D. Lu, and J. P. Strachan, Power-efficient combinatorial optimization using intrinsic noise in memristor hopfield neural networks, Nature Electronics 3, 409 (2020).
Aramon et al. [2019] M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, and H. G. Katzgraber, Physics-inspired optimization for quadratic unconstrained problems using a digital annealer, Frontiers in Physics 7, 10.3389/fphy.2019.00048 (2019).
Peng et al. [2008] X. Peng, Z. Liao, N. Xu, G. Qin, X. Zhou, D. Suter, and J. Du, Quantum adiabatic algorithm for factorization and its experimental implementation, Phys. Rev. Lett. 101, 220405 (2008).
Borders et al. [2019] W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno, and S. Datta, Integer factorization using stochastic magnetic tunnel junctions, Nature 573, 390 (2019).
Mallick et al. [2020] A. Mallick, M. K. Bashar, D. S. Truesdell, B. H. Calhoun, S. Joshi, and N. Shukla, Using synchronized oscillators to compute the maximum independent set, Nature Communications 11, 4689 (2020).
Goto et al. [2021] H. Goto, K. Endo, M. Suzuki, Y. Sakai, T. Kanao, Y. Hamakawa, R. Hidaka, M. Yamasaki, and K. Tatsumura, High-performance combinatorial optimization based on classical mechanics, Science Advances 7, eabe7953 (2021).
Könz et al. [2021] M. S. Könz, W. Lechner, H. G. Katzgraber, and M. Troyer, Embedding overhead scaling of optimization problems in quantum annealing, PRX Quantum 2, 040322 (2021).
Valiante et al. [2021] E. Valiante, M. Hernandez, A. Barzegar, and H. G. Katzgraber, Computational overhead of locality reduction in binary optimization problems, Computer Physics Communications 269, 108102 (2021).
Hizzani et al. [2023] M. Hizzani, A. Heittmann, G. Hutchinson, D. Dobrynin, T. V. Vaerenbergh, T. Bhattacharya, A. Renaudineau, D. Strukov, and J. P. Strachan, Memristor-based hardware and algorithms for higher-order hopfield optimization solver outperforming quadratic ising machines (2023), arXiv:2311.01171 [cs.ET] .
Gamarnik [2021] D. Gamarnik, The overlap gap property: A topological barrier to optimizing over random structures, Proceedings of the National Academy of Sciences 118, 10.1073/pnas.2108492118 (2021).
Bernaschi et al. [2021] M. Bernaschi, M. Bisson, M. Fatica, E. Marinari, V. Martin-Mayor, G. Parisi, and F. Ricci-Tersenghi, How we are leading a 3-xorsat challenge: From the energy landscape to the algorithm and its efficient implementation on gpus(a), Europhysics Letters 133, 60005 (2021).
Becker and Karplus [1997] O. M. Becker and M. Karplus, The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics, The Journal of Chemical Physics 106, 1495 (1997).
Wales et al. [1998] D. J. Wales, M. A. Miller, and T. R. Walsh, Archetypal energy landscapes, Nature 394, 758 (1998).
Doye et al. [1999] J. P. K. Doye, M. A. Miller, and D. J. Wales, The double-funnel energy landscape of the 38-atom Lennard-Jones cluster, The Journal of Chemical Physics 110, 6896 (1999).
Chakraborty et al. [2014] D. Chakraborty, R. Collepardo-Guevara, and D. J. Wales, Energy landscapes, folding mechanisms, and kinetics of rna tetraloop hairpins, Journal of the American Chemical Society 136, 18052 (2014).
Calvo et al. [2007] F. Calvo, T. V. Bogdan, V. K. de Souza, and D. J. Wales, Equilibrium density of states and thermodynamic properties of a model glass former, The Journal of Chemical Physics 127, 044508 (2007).
Wales [2005] D. J. Wales, Energy landscapes and properties of biomolecules, Physical Biology 2, S86 (2005).
Garstecki et al. [1999] P. Garstecki, T. X. Hoang, and M. Cieplak, Energy landscapes, supergraphs, and “folding funnels” in spin systems, Phys. Rev. E 60, 3219 (1999).
Biswas and Katzgraber [2020] K. Biswas and H. G. Katzgraber, Adding color: Visualization of energy landscapes in spin glasses (2020), arXiv:2004.12431 [cond-mat.dis-nn] .
Zhou [2011] Q. Zhou, Random walk over basins of attraction to construct ising energy landscapes, Phys. Rev. Lett. 106, 180602 (2011).
Coja-Oghlan et al. [2017] A. Coja-Oghlan, A. Haqshenas, and S. Hetterich, Walksat stalls well below satisfiability, SIAM Journal on Discrete Mathematics 31, 1160 (2017).
Bellitti et al. [2021] M. Bellitti, F. Ricci-Tersenghi, and A. Scardicchio, Entropic barriers as a reason for hardness in both classical and quantum algorithms, Phys. Rev. Res. 3, 043015 (2021).
Barahona [1982] F. Barahona, On the computational complexity of ising spin glass models, Journal of Physics A: Mathematical and General 15, 3241 (1982).
Selman and Kirkpatrick [1996] B. Selman and S. Kirkpatrick, Critical behavior in the computational cost of satisfiability testing, Artificial Intelligence 81, 273 (1996), frontiers in Problem Solving: Phase Transitions and Complexity.
Mé zard et al. [2005] M. Mé zard, T. Mora, and R. Zecchina, Clustering of solutions in the random satisfiability problem, Physical Review Letters 94, 10.1103/physrevlett.94.197205 (2005).
Mezard and Montanari [2009] M. Mezard and A. Montanari, Information, Physics, and Computation (Oxford University Press, Inc., USA, 2009).
Marques-Silva [2008] J. Marques-Silva, Practical applications of boolean satisfiability, in 2008 9th International Workshop on Discrete Event Systems (2008) pp. 74–80.
Bhattacharya et al. [2024] T. Bhattacharya, G. H. Hutchinson, G. Pedretti, X. Sheng, J. Ignowski, T. V. Vaerenbergh, R. Beausoleil, J. P. Strachan, and D. B. Strukov, Computing high-degree polynomial gradients in memory (2024), arXiv:2401.16204 [cs.ET] .
Garey and Johnson [1990] M. R. Garey and D. S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., USA, 1990).
Montanari et al. [2008] A. Montanari, F. Ricci-Tersenghi, and G. Semerjian, Clusters of solutions and replica symmetry breaking in random k-satisfiability, Journal of Statistical Mechanics: Theory and Experiment 2008, P04004 (2008).
Zdeborová and Krząkała [2007] L. Zdeborová and F. Krząkała, Phase transitions in the coloring of random graphs, Phys. Rev. E 76, 031131 (2007).
Ardelius and Zdeborová [2008] J. Ardelius and L. Zdeborová, Exhaustive enumeration unveils clustering and freezing in the random 3-satisfiability problem, Phys. Rev. E 78, 040101 (2008).
Li et al. [2009] K. Li, H. Ma, and H. Zhou, From one solution of a 3-satisfiability formula to a solution cluster: Frozen variables and entropy, Phys. Rev. E 79, 031102 (2009).
Mohseni et al. [2021] M. Mohseni, D. Eppens, J. Strumpfer, R. Marino, V. Denchev, A. K. Ho, S. V. Isakov, S. Boixo, F. Ricci-Tersenghi, and H. Neven, Nonequilibrium monte carlo for unfreezing variables in hard combinatorial optimization (2021), arXiv:2111.13628 [cond-mat.dis-nn] .
Bresler and Huang [2021] G. Bresler and B. Huang, The algorithmic phase transition of random $k$ -sat for low degree polynomials (2021), arXiv:2106.02129 [cs.CC] .
Gamarnik et al. [2022] D. Gamarnik, C. Moore, and L. Zdeborová, Disordered systems insights on computational hardness, Journal of Statistical Mechanics: Theory and Experiment 2022, 114015 (2022).
Flamm et al. [2002] C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger, Barrier trees of degenerate landscapes, Zeitschrift für Physikalische Chemie 216, 155 (2002).
Reidys and Stadler [2002] C. M. Reidys and P. F. Stadler, Combinatorial landscapes, SIAM Review 44, 3 (2002).
Hallam and Prugel-Bennett [2005] J. Hallam and A. Prugel-Bennett, Large barrier trees for studying search, IEEE Transactions on Evolutionary Computation 9, 385 (2005).
Dauphin et al. [2014] Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization (2014), arXiv:1406.2572 [cs.LG] .
Lee et al. [2019] J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, First-order methods almost always avoid strict saddle points, Mathematical Programming 176, 311 (2019).
Dattani [2019] N. Dattani, Quadratization in discrete optimization and quantum mechanics (2019), arXiv:1901.04405 [quant-ph] .
Schrijver [2000] A. Schrijver, A combinatorial algorithm minimizing submodular functions in strongly polynomial time, Journal of Combinatorial Theory, Series B 80, 346 (2000).
Boros and Gruber [2014] E. Boros and A. Gruber, On quadratization of pseudo-boolean functions (2014), arXiv:1404.6538 [math.OC] .
Rosenberg [1975] I. G. Rosenberg, Reduction of bivalent maximization to the quadratic case, Cahiers Centre Études Recherche Opér. 17 (1975).
Biamonte [2008] J. D. Biamonte, Nonperturbative $\mathit{k}$ -body to two-body commuting conversion hamiltonians and embedding problem instances into ising spins, Phys. Rev. A 77, 052331 (2008).
Aadit et al. [2022] N. A. Aadit, A. Grimaldi, M. Carpentieri, L. Theogarajan, J. M. Martinis, G. Finocchio, and K. Y. Camsari, Massively parallel probabilistic computing with sparse ising machines, Nature Electronics 5, 460 (2022).
Kolmogorov and Zabin [2004] V. Kolmogorov and R. Zabin, What energy functions can be minimized via graph cuts?, IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 147 (2004).
Freedman and Drineas [2005] D. Freedman and P. Drineas, Energy minimization via graph cuts: settling what is possible, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2 (2005) pp. 939–946 vol. 2.
Landau et al. [2004] D. P. Landau, S.-H. Tsai, and M. Exler, A new approach to Monte Carlo simulations in statistical physics: Wang-Landau sampling, American Journal of Physics 72, 1294 (2004).
Liang [2005] F. Liang, A generalized wang-landau algorithm for monte carlo computation, Journal of the American Statistical Association 100, 1311 (2005).
Tang and Zhou [2012] W. Tang and Q. Zhou, Finding multiple minimum-energy conformations of the hydrophobic-polar protein model via multidomain sampling, Phys. Rev. E 86, 031909 (2012).
Barbu and Zhu [2020] A. Barbu and S.-C. Zhu, Monte Carlo Methods (Springer Singapore, 2020).
Hartmann [2000] A. K. Hartmann, A new method for analysing ground-state landscapes: ballistic search, Journal of Physics A: Mathematical and General 33, 657 (2000).
Mann and Hartmann [2010] A. Mann and A. K. Hartmann, Numerical solution-space analysis of satisfiability problems, Phys. Rev. E 82, 056702 (2010).
Denchev et al. [2016] V. S. Denchev, S. Boixo, S. V. Isakov, N. Ding, R. Babbush, V. Smelyanskiy, J. Martinis, and H. Neven, What is the computational value of finite-range tunneling?, Phys. Rev. X 6, 031015 (2016).
Note [1] This valid state may not be unique, i.e. there could be degeneracy in the auxiliary variable space $\mathbf{y}$ for each fixed $\mathbf{x}$ . In this case we consider all such configurations and take the minimum of the computed effective QUBO barrier.
Note [2] Time-to-solution $99\%$ (TTS ${}_{99}$ ) is the time needed for a stochastic solver to reach a solution at least once with probability $p=0.99$ .
Hoos and Stützle [2000] H. Hoos and T. Stützle, Satlib: An online resource for research on sat (2000) pp. 283–292.
Frank et al. [1997] J. Frank, P. Cheeseman, and J. Stutz, When gravity fails: Local search topology (1997), arXiv:cs/9712101 [cs.AI] .
[63] P. Purdom and A. Sabry, Cnf generator for factoring problems.
Mosca and Verschoor [2022] M. Mosca and S. R. Verschoor, Factoring semi-primes with (quantum) sat-solvers, Scientific Reports 12, 7982 (2022).
Houdayer [2001] J. Houdayer, A cluster monte carlo algorithm for 2-dimensional spin glasses, The European Physical Journal B - Condensed Matter and Complex Systems 22, 479 (2001).
Lecun et al. [2006] Y. Lecun, S. Chopra, and R. Hadsell, A tutorial on energy-based learning (2006).
Hopfield [1982] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities., Proceedings of the National Academy of Sciences 79, 2554 (1982), https://www.pnas.org/doi/pdf/10.1073/pnas.79.8.2554 .
Krotov and Hopfield [2016] D. Krotov and J. J. Hopfield, Dense associative memory for pattern recognition (2016), arXiv:1606.01164 [cs.NE] .
Mohseni et al. [2023] M. Mohseni, M. M. Rams, S. V. Isakov, D. Eppens, S. Pielawa, J. Strumpfer, S. Boixo, and H. Neven, Sampling diverse near-optimal solutions via algorithmic quantum annealing, Phys. Rev. E 108, 065303 (2023).
Jülich Supercomputing Centre [2021] Jülich Supercomputing Centre, JURECA: Data Centric and Booster Modules implementing the Modular Supercomputing Architecture at Jülich Supercomputing Centre, Journal of large-scale research facilities 7, 10.17815/jlsrf-7-182 (2021).
Babbush et al. [2013] R. Babbush, B. O’Gorman, and A. Aspuru-Guzik, Resource efficient gadgets for compiling adiabatic quantum optimization problems, Annalen der Physik 525, 877 (2013).
Note [3] Finding a minimal set of pairs of variables to be substituted is in general an NP-Hard problem.
[73] pele: Python energy landscape explorer.

Disconnectivity graphs for visualizing combinatorial optimization problems: challenges of embedding to Ising machines