License: CC BY 4.0
arXiv:2403.01320v1 [cond-mat.dis-nn] 02 Mar 2024

Disconnectivity graphs for visualizing combinatorial optimization problems:
challenges of embedding to Ising machines

Dmitrii Dobrynin d.dobrynin@fz-juelich.de Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany RWTH Aachen University, Aachen, Germany    Adrien Renaudineau Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany    Mohammad Hizzani Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany RWTH Aachen University, Aachen, Germany   
Dmitri Strukov
University of California Santa Barbara, Santa Barbara, CA, USA
   Masoud Mohseni LSIP, Hewlett Packard Labs, Milpitas, CA, USA    John Paul Strachan j.strachan@fz-juelich.de Peter Grünberg Institut (PGI-14), Forschungszentrum Jülich, Jülich, Germany RWTH Aachen University, Aachen, Germany
(March 2, 2024)
Abstract

Physics-based Ising machines (IM) have risen to the challenge of solving hard combinatorial optimization problems with higher speed and better energy efficiency. Generally, such dedicated systems employ local search heuristics to traverse energy landscapes in searching for optimal solutions. Extending landscape geometry visualization tools, disconnectivity graphs, we quantify and address some of the major challenges met by IMs in the field of combinatorial optimization. Using efficient sampling methods, we visually capture landscapes of problems having diverse structure and hardness and featuring strong degeneracies, which act as entropic barriers for IMs. Furthermore, we investigate energy barriers, local minima, and configuration space clustering effects caused by locality reduction methods when embedding combinatorial problems to the Ising hardware. For this purpose, we sample disconnectivity graphs of PUBO energy landscapes and their different QUBO mappings accounting for both local minima and saddle regions. We demonstrate that QUBO energy landscape properties lead to the subpar performance of quadratic IMs and suggest directions for their improvement.

preprint: APS/123-QED

I Introduction

Recent years have seen an increasing interest in using classical and quantum Ising machines (IM) for solving combinatorial optimization problems relevant for fundamental research and industrial applications [1]. Most of these devices rely on algorithms and physical principles implementing heuristic local search routines, e.g. discrete Monte-Carlo (MC) sampling (simulated annealing, parallel tempering) or noisy/chaotic continuous dynamics. Examples of the former are memristive crossbar arrays employed to efficiently perform vector-matrix multiplication [2, 3], or digital ASIC annealers [4]. The latter versions of IMs include coherent Ising machines, oscillator networks, quantum annealers, and others [5, 6, 7, 8]. The main attraction for the use of IM is the match of algorithm operations with their physical implementations, which offers to reduce the exponential scaling (polynomially or by a prefactor) of time-to-solution and/or energy-to-solution metrics for the hardest problems to date.

In this context, there are several outstanding challenges faced by IMs on both algorithmic and hardware levels, resulting in strong compromises being adopted in their practical deployment. One and possibly the most important difficulty concerns their application to practically interesting (large) problem sizes. The support of only second-order couplings of “spins”, together with connectivity topology constraints (e.g. the chimera graph [9]) results in the introduction of multiple auxiliary variables in order to either avoid higher-order terms, or reach necessary levels of sparsity. The added new variables can scale super-linearly in the number of original variables, not only further challenging the scaling to large problems, but also increasing the search space and modifying the optimization energy landscape. As a result, IMs can be limited to smaller-scale problems, and even these can become harder than their native formulation [10, 11] due to the worsened landscape geometry.

A second challenge lies in the algorithmic limitations of IMs. In particular, their reliance on local search heuristics fundamentally puts a bound on the problem classes they are capable of solving [12]. Being inherently local, IMs are prone to suffer from energy barriers rejecting MC moves, and from entropic barriers or degeneracies hampering both sensible exploration and exploitation [13]. Clear understanding of features of benchmark problems and the corresponding constraints of the Ising hardware is essential to facilitate future advances in the field.

A major challenge for optimization solver design and understanding is the navigation of the high-dimensional configuration space. Only a few methods have been developed over the years to visualize high-dimensional cost (energy) functions of optimization problems. One example is disconnectivity graphs (DG, also called barrier trees) [14, 15, 16], which aim to simplify the exponentially large configuration space by capturing local minima and their connectivity through energy barriers. It is possible to use DGs to gain quantitative insights into phenomena in a variety of applications ranging from metastable states of protein folding [17] to thermodynamic effects in Lennard-Jones systems [18], biomolecules [19], and spin glasses [20]. However, due to exponential complexity of DG construction and the degeneracy of optimization problems of interest, attaining energy landscape visualization is a computational feat on its own [21].

The contribution of this paper is as follows. Firstly, in Sec. III.1 we describe an extension for the efficient sampling algorithm of [22] to support DGs of energy landscapes featuring strong degeneracy of the configuration space (millions of states), capturing not only local minima but also degenerate saddle points. We further modify this approach providing means to construct DGs for quadratic optimization problems resulting from locality reduction due to IM hardware mapping. We achieve this by meaningfully reducing the search space over auxiliary variables and defining “effective” barriers. Secondly, in Sec. III.2 and Sec. III.3 using 3-SAT as a representative higher-order problem class, we plot DGs for problems of sizes inaccessible to the methods reported previously. With our methods we compare easy to hard instances, and random to industrial (structured) instances. Finally, in Sec. III.4 we demonstrate suboptimal energy landscape features of hardware embedding quadratization methods for 3-SAT from the perspective of clustering and entropy of energy minima, which are some of the culprits of algorithmic hardness [23, 24].

II Background

The Ising Hamiltonian (2-local), which IMs natively solve is:

HIsing=i<jNJijsisj+iNhisi,subscript𝐻Isingsuperscriptsubscript𝑖𝑗𝑁subscript𝐽𝑖𝑗subscript𝑠𝑖subscript𝑠𝑗superscriptsubscript𝑖𝑁subscript𝑖subscript𝑠𝑖H_{\mathrm{Ising}}=\sum_{i<j}^{N}J_{ij}s_{i}s_{j}+\sum_{i}^{N}h_{i}s_{i}\,,italic_H start_POSTSUBSCRIPT roman_Ising end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (1)

where si{1,1}subscript𝑠𝑖11s_{i}\in\{-1,1\}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { - 1 , 1 }, Jijsubscript𝐽𝑖𝑗J_{ij}italic_J start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are spin interaction strengths, and hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote local magnetic fields. Finding the ground state of Eq. 1 is an NP-Complete problem [25], and therefore solving (approximating) this task efficiently is of profound practical interest. Alternatively, the Ising Hamiltonian can be formulated as a quadratic pseudo-boolean function:

HQUBO=i<jNQijxixj+iNbixi+C,subscript𝐻QUBOsuperscriptsubscript𝑖𝑗𝑁subscript𝑄𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗superscriptsubscript𝑖𝑁subscript𝑏𝑖subscript𝑥𝑖𝐶H_{\mathrm{QUBO}}=\sum_{i<j}^{N}Q_{ij}x_{i}x_{j}+\sum_{i}^{N}b_{i}x_{i}+C\,,italic_H start_POSTSUBSCRIPT roman_QUBO end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_C , (2)

with binary variables xi{0,1}subscript𝑥𝑖01x_{i}\in\{0,1\}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 }. Deciding the ground state of this function among 2Nsuperscript2𝑁2^{N}2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT possible configurations is what is commonly called quadratic unconstrained binary optimization (QUBO) problem. In this work we will use Ising and QUBO terms interchangeably due to their equivalence.

The generalization of QUBO to support higher order interactions of variables is what is usually referred to as PUBO (“P” for polynomial):

f(𝐱)=f(x1,x2,,xN)={i}kVa{i}{i}kxi+C,𝑓𝐱𝑓subscript𝑥1subscript𝑥2subscript𝑥𝑁subscriptsubscript𝑖𝑘𝑉subscript𝑎𝑖subscriptproductsubscript𝑖𝑘subscript𝑥𝑖𝐶f(\mathbf{x})=f(x_{1},x_{2},\dots,x_{N})=\sum_{\{i\}_{k}\subseteq V}a_{\{i\}}% \prod_{\{i\}_{k}}x_{i}+C\,,italic_f ( bold_x ) = italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT { italic_i } start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊆ italic_V end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT { italic_i } end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT { italic_i } start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_C , (3)

which is correspondingly equivalent to the k-local Ising. Here {i}kVsubscript𝑖𝑘𝑉\{i\}_{k}\subseteq V{ italic_i } start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊆ italic_V denote all possible subsets of the set of variables of highest cardinality k1𝑘1k\geq 1italic_k ≥ 1.

The present work devotes particular attention to the k-SAT problem (see below Eq. 4), one of the oldest and well-studied NP-Complete problems [26, 27, 28]. The motivation behind this choice lies in the fact that apart from being practically important for a variety of applications [29], k-SAT highlights the hardware and algorithmic challenges of IMs [30]. As will be discussed in this work, it features strong degeneracy of the solution space, an abundance of energy barriers, clustering of solutions, and can only be natively supported by the PUBO formulation, making it a formidable problem class for local search based quadratic IMs.

A general statement of the k-SAT decision problem is simple: is there a binary variable assignment 𝐱𝔹N𝐱superscript𝔹𝑁\mathbf{x}\in\mathbb{B}^{N}bold_x ∈ blackboard_B start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT of the following conjunctive normal form (CNF):

(li1,1li1,2li1,k)(lim,1lim,2lim,k),subscript𝑙subscript𝑖11subscript𝑙subscript𝑖12subscript𝑙subscript𝑖1𝑘subscript𝑙subscript𝑖𝑚1subscript𝑙subscript𝑖𝑚2subscript𝑙subscript𝑖𝑚𝑘(l_{i_{1,1}}\vee l_{i_{1,2}}\dots\vee l_{i_{1,k}})\wedge\dots\wedge(l_{i_{m,1}% }\vee l_{i_{m,2}}\dots\vee l_{i_{m,k}})\,,( italic_l start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ italic_l start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ ∨ italic_l start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ ( italic_l start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ italic_l start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ ∨ italic_l start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m , italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , (4)

where i{1,N}𝑖1𝑁i\in\{1,N\}italic_i ∈ { 1 , italic_N }, m{1,M}𝑚1𝑀m\in\{1,M\}italic_m ∈ { 1 , italic_M }, l=x𝑙𝑥l=xitalic_l = italic_x or l=x¯𝑙¯𝑥l=\bar{x}italic_l = over¯ start_ARG italic_x end_ARG, so that all M𝑀Mitalic_M clauses are satisfied? With k3𝑘3k\geq 3italic_k ≥ 3 it is NP-Complete like Ising/QUBO and thus worst case exponentially hard [31]. The k-local PUBO cost function Eq. 3 is easily obtained from Eq. 4 by the De-Morgan law of logic and using the equality x¯=1x¯𝑥1𝑥\bar{x}=1-xover¯ start_ARG italic_x end_ARG = 1 - italic_x.

Many optimization landscape features have been established for hard constraint satisfaction problems [27, 32], of which k-SAT is a conventional example. By increasing the number of constraints from the small number, where the problem is easily satisfiable, to larger values up to a point of unsatisfiability, optimization landscapes undergo phase transitions where the dominating “simple” configuration region of connected solutions gets shattered into exponentially many clusters of solutions. Each cluster consists of several configurations which can be easily accessed from each other by local dynamics [33]. Furthermore, some of the variables in such cluster configurations could also be “frozen” [34, 35], i.e. remain unchanged regardless of the state of others. In other words, not only it can be difficult to traverse the landscape in the search of isolated clusters, but also to transition between such clusters, it is imperative to modify an extensive fraction of variables simultaneously; thus, non-local moves can be essential [36]. For the recent fundamental results on this topic we mention the works [37, 12, 38]. In order to highlight such hardness of combinatorial optimization, in this work we focus on illustrating how the landscapes are perceived by local search.

Early efforts to visualize energy/fitness landscapes arose in the context of theoretical chemistry and biology [14, 15, 16]. Authors of these works introduced the concept of disconnectivity graphs (DG) implementing a map of exponentially large potential energy configuration spaces to a two-dimensional tree. Fig. 1 sketches the idea behind such mapping: every leaf corresponds to a local minimum, while the branches represent the magnitude of energy barriers and connectivity (lowest barrier separation) of local minima with respect to each other.

Refer to caption
Figure 1: A simplified view of a disconnectivity graph. Every local minimum corresponds to a leaf; the height of energy barriers is reflected by the energy of branch connections. Arrangement of minima does not represent distance, i.e. by default has no explicit meaning.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: (a-b) Two types of degenerate landscapes: a saddle cluster and a local minimum cluster. State 𝐱dsubscript𝐱𝑑\mathbf{x}_{d}bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is a zero barrier exit point from the saddle. Outlined are the stable states. (c) The highlighted saddle points can be treated as connected [39] or disconnected [20] based on the adopted definition of disconnectivity graphs.

In principle, arbitrary energy landscapes can be defined by a triplet [40]: X𝑋Xitalic_X being a set of configurations, neighbourhood 𝒩(x)𝒩𝑥\mathcal{N}(x)caligraphic_N ( italic_x ) of every state x𝑥xitalic_x in X𝑋Xitalic_X, and energy/fitness function f(X)𝑓𝑋f(X)\in\mathbb{R}italic_f ( italic_X ) ∈ blackboard_R. A local move of a solver can then be defined as a change of state from x𝑥xitalic_x to a state in 𝒩(x)𝒩𝑥\mathcal{N}(x)caligraphic_N ( italic_x ) based on a certain search algorithm, e.g. choosing a random neighbour or the one with largest energy change.

Special attention, however, should be given to the degeneracy of such landscapes: many configurations form neighbourhoods which can be traversed by a local rule at no energy cost. Furthermore, the concept of a local minimum becomes ambiguous and non-local in degenerate landscapes [39, 41]. As Figs. 1(a) and 1(b) demonstrate, it is impossible to know if a descending energy path exists from the leftmost stable state 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT unless exploration finding the rightmost state 𝐱dsubscript𝐱𝑑\mathbf{x}_{d}bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is performed. In this work we will call a stable “plateau” of Fig. 1(a) a saddle cluster, while the plateau in Fig. 1(b) will be called a local/global minimum cluster. The terminology of saddles/local minima of this work is chosen to resemble similar terms from continuous optimization. There, multiple works highlight profound difficulties of navigating high-dimensional landscapes arising from both types of critical points [42, 43].

The works of [39, 41, 21] have addressed the complexity of constructing DGs of degenerate landscapes with exhaustive enumeration of states. While being computationally infeasible for problems larger than 3040absent3040\approx 30\!-\!40≈ 30 - 40 variables, these works carried out classifications of saddles or local minima and the ways the states can be connected within a cluster and to other clusters. For example, a difference in possible connectivity of stable points is illustrated in Fig. 1(c). Approach of this work is closest to that of [20] in which the highlighted saddle points are treated as being disconnected. This choice is motivated by the golf-course-type energy landscapes of 3-XORSAT problems [24], where the paths to good solutions are mostly impeded by the entropic barriers, rather than the energy barriers.

In Fig. 1(c) there is no barrier between 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and the global minimum 𝐱hsubscript𝐱\mathbf{x}_{h}bold_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, but the path to it lies through a local minimum 𝐱dsubscript𝐱𝑑\mathbf{x}_{d}bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. As a result, joining the states separated by a “hole” would result in a deceiving visualization hiding landscape features important for local search routines. With the methods this work (see Sec. III.1.2-III.1.3), we will address such diversity of scenarios by distinguishing the states with connections to global minimum from those separated from it by either barriers or “holes”. This will provide a clear explanation of why second order IMs can be greatly challenged by higher-order combinatorial optimization problems.

III Results

III.1 Methods

III.1.1 Locality reduction of k-SAT

The issue of locality reduction to support the formulation of Eq. 2 has been heavily investigated over the recent years [44], with the efforts aimed at introducing quadratizations that have the smallest possible number of auxiliary variables, minimize bit-precision requirements on the weights, or have algorithmically favourable properties, e.g. submodularity [45].

Perhaps the simplest method to meet the first requirement is to use quadratization by substitution, i.e. to introduce auxiliary variables y𝑦yitalic_y for each pair of variables xpxqsubscript𝑥𝑝subscript𝑥𝑞x_{p}x_{q}italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in the original PUBO function (3) until the problem of required order is obtained, i.e. 2nd order for QUBO (2). The constraints are then enforced by either explicitly considering the equalities xpxq=ysubscript𝑥𝑝subscript𝑥𝑞𝑦x_{p}x_{q}=yitalic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = italic_y, or by the addition of quadratic penalty terms in the cost function for each substitution:

f(𝐱)𝑓𝐱\displaystyle f(\mathbf{x})italic_f ( bold_x ) =\displaystyle== ±x1x2xkplus-or-minussubscript𝑥1subscript𝑥2subscript𝑥𝑘absent\displaystyle\pm x_{1}x_{2}\dots x_{k}\to± italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → (5)
\displaystyle\to g(𝐱,𝐲)=±yx3xk+P±(x1,x2,y).𝑔𝐱𝐲plus-or-minus𝑦subscript𝑥3subscript𝑥𝑘subscript𝑃plus-or-minussubscript𝑥1subscript𝑥2𝑦\displaystyle g(\mathbf{x},\mathbf{y})=\pm yx_{3}\dots x_{k}+P_{\pm}(x_{1},x_{% 2},y)\,.italic_g ( bold_x , bold_y ) = ± italic_y italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y ) .

The choice of the P𝑃Pitalic_P function is not unique; for instance, one can make sure that

f(𝐱)=minyg(𝐱,𝐲)𝑓𝐱subscript𝑦𝑔𝐱𝐲f(\mathbf{x})=\min_{y}g(\mathbf{x},\mathbf{y})italic_f ( bold_x ) = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_g ( bold_x , bold_y ) (6)

is satisfied, thereby preserving global minima of the original problem. Additionally, the choice of xixjsubscript𝑥𝑖subscript𝑥𝑗x_{i}x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT admits some freedom and can be optimized for the minimum number of auxiliary variables by solving a vertex cover problem [46]. For simplicity, we use an efficient greedy routine to perform such optimization (for more details on quadratization methods outlined below cf. App. B).

For example, a commonly used quadratization penalty choice for locality reduction was suggested by Rosenberg (3rd order example) [47]:

±xpxqxk=miny[±yxk+(3y2xpy2xqy+xpxq)],plus-or-minussubscript𝑥𝑝subscript𝑥𝑞subscript𝑥𝑘subscript𝑦plus-or-minus𝑦subscript𝑥𝑘3𝑦2subscript𝑥𝑝𝑦2subscript𝑥𝑞𝑦subscript𝑥𝑝subscript𝑥𝑞\pm x_{p}x_{q}x_{k}=\min_{y}\left[\pm yx_{k}+(3y-2x_{p}y-2x_{q}y+x_{p}x_{q})% \right]\,,± italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT [ ± italic_y italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( 3 italic_y - 2 italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_y - 2 italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_y + italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ] , (7)

where xpxqsubscript𝑥𝑝subscript𝑥𝑞x_{p}x_{q}italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT was replaced by y𝑦yitalic_y, and the remaining terms penalize the mismatch of xpxqsubscript𝑥𝑝subscript𝑥𝑞x_{p}x_{q}italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT and y𝑦yitalic_y. Thus, every appearance of xpxqsubscript𝑥𝑝subscript𝑥𝑞x_{p}x_{q}italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in the k3𝑘3k\geq 3italic_k ≥ 3 terms of the PUBO function is substituted by the same y𝑦yitalic_y, and for each such substitution 3y2xpy2xqy+xpxq3𝑦2subscript𝑥𝑝𝑦2subscript𝑥𝑞𝑦subscript𝑥𝑝subscript𝑥𝑞3y-2x_{p}y-2x_{q}y+x_{p}x_{q}3 italic_y - 2 italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_y - 2 italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_y + italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT penalty is added. This mapping is also implicitly used when the approach of reversible logic of [48, 49] is employed.

Performing standard simulated annealing optimization of 3-SAT problems we found a different mapping to be computationally superior to the Rosenberg version. The new mapping extends the quadratization ideas [50, 51] and [46] by approaching the monomials with positive and negative coefficients differently:

x1xk=miny[(k1)yi=1kxiy],subscript𝑥1subscript𝑥𝑘subscript𝑦𝑘1𝑦superscriptsubscript𝑖1𝑘subscript𝑥𝑖𝑦\displaystyle-x_{1}\dots x_{k}=\min_{y}\left[(k-1)y-\sum_{i=1}^{k}x_{i}y\right% ]\,,- italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT [ ( italic_k - 1 ) italic_y - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y ] ,
x1xk=x2xkx¯1xksubscript𝑥1subscript𝑥𝑘subscript𝑥2subscript𝑥𝑘subscript¯𝑥1subscript𝑥𝑘\displaystyle x_{1}\dots x_{k}=x_{2}\dots x_{k}-\bar{x}_{1}\dots x_{k}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
=x2xk+miny[(k1)yx¯1yi=2kxiy],absentsubscript𝑥2subscript𝑥𝑘subscript𝑦𝑘1𝑦subscript¯𝑥1𝑦superscriptsubscript𝑖2𝑘subscript𝑥𝑖𝑦\displaystyle=x_{2}\dots x_{k}+\min_{y}\left[(k-1)y-\bar{x}_{1}y-\sum_{i=2}^{k% }x_{i}y\right]\,,= italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT [ ( italic_k - 1 ) italic_y - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_y - ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y ] , (8)

and thus we call it KZFD-BG after the authors. However, contrary to the original application of these penalties individually for each term in the PUBO function, the variable substitution in this work is done as in the Rosenberg case, i.e. sharing substituted pairs across multiple monomials (see App. B). As a result, this yields the same number of native and auxiliary variables regardless of the mapping used. We address simulated annealing performance difference of the mappings in the context of PUBO and QUBO comparison in Sec. III.4.

Refer to caption
Figure 3: (left) PUBO landscape sketch: neighbouring states are connected by a single flip. (right) QUBO mapping landscape; the auxiliary 𝐲𝐲\mathbf{y}bold_y adaptation may introduce new energy barriers preventing the otherwise possible descent in energy.

Any locality reduction method modifies the “native” optimization landscape in non-trivial ways and can make its exploration algorithmically more challenging. In particular, Eq. 6 guarantees that for every stable state 𝐱*superscript𝐱\mathbf{x^{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT of f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ) with respect to a single bit-flip: f(,x¯i*,)f(,xi*,)0,i𝑓subscriptsuperscript¯𝑥𝑖𝑓subscriptsuperscript𝑥𝑖0for-all𝑖f(\dots,\bar{x}^{*}_{i},\dots)-f(\dots,x^{*}_{i},\dots)\geq 0,\,\forall iitalic_f ( … , over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … ) - italic_f ( … , italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … ) ≥ 0 , ∀ italic_i, the quadratization g(𝐱*,𝐲*)𝑔superscript𝐱superscript𝐲g(\mathbf{x^{*}},\mathbf{y^{*}})italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) is also in a stable state, which is given by min𝐲g(𝐱*,𝐲)g(𝐱*,𝐲*)subscript𝐲𝑔superscript𝐱𝐲𝑔superscript𝐱superscript𝐲\min_{\mathbf{y}}g(\mathbf{x^{*}},\mathbf{y})\equiv g(\mathbf{x^{*}},\mathbf{y% ^{*}})roman_min start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y ) ≡ italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). Indeed, the bit-flip energy changes with respect to auxiliary variables are non-negative due to the definition of g𝑔gitalic_g: g(𝐱*,,y¯i*,)g(𝐱*,,yi*,)=minyg(𝐱*,𝐲)𝑔superscript𝐱superscriptsubscript¯𝑦𝑖𝑔superscript𝐱superscriptsubscript𝑦𝑖subscript𝑦𝑔superscript𝐱𝐲g(\mathbf{x^{*}},\dots,\bar{y}_{i}^{*},\dots)\geq g(\mathbf{x^{*}},\dots,y_{i}% ^{*},\dots)=\min_{y}g(\mathbf{x^{*}},\mathbf{y})italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … , over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … ) ≥ italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … ) = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y ). In turn, the energy change of flipping x𝑥xitalic_x is also non-negative because of the following chain:

g(,x¯i*,,𝐲*)min𝐲g(,x¯i*,,𝐲)𝑔superscriptsubscript¯𝑥𝑖superscript𝐲subscript𝐲𝑔superscriptsubscript¯𝑥𝑖𝐲\displaystyle g(\dots,\bar{x}_{i}^{*},\dots,\mathbf{y^{*}})\geq\min_{\mathbf{y% }}g(\dots,\bar{x}_{i}^{*},\dots,\mathbf{y})italic_g ( … , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … , bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≥ roman_min start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT italic_g ( … , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … , bold_y ) (9)
=f(,x¯i*,)f(𝐱*)=g(𝐱*,𝐲*).absent𝑓superscriptsubscript¯𝑥𝑖𝑓superscript𝐱𝑔superscript𝐱superscript𝐲\displaystyle=f(\dots,\bar{x}_{i}^{*},\dots)\geq f(\mathbf{x^{*}})=g(\mathbf{x% ^{*}},\mathbf{y^{*}})\,.= italic_f ( … , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , … ) ≥ italic_f ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = italic_g ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) . (10)

However, such correspondence does not hold in the opposite direction, i.e. a stable state of g(𝐱,𝐲)𝑔𝐱𝐲g(\mathbf{x},\mathbf{y})italic_g ( bold_x , bold_y ) is not guaranteed to be a stable state of f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ).

For illustration, in Fig. 3 a “linear” landscape represents states connected by a bitflip local move in the N𝑁Nitalic_N dimensional hypercube. The left sketch depicts a degenerate case with states 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, 𝐱csubscript𝐱𝑐\mathbf{x}_{c}bold_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT being stable, but 𝐱asubscript𝐱𝑎\mathbf{x}_{a}bold_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and 𝐱dsubscript𝐱𝑑\mathbf{x}_{d}bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT unstable. In turn, the right sketch shows how the quadratization mapping induces a rugged structure on top of the original manifold due to the auxiliary variables and the penalty terms. For every state 𝐱𝐱\mathbf{x}bold_x there is a corresponding minimizing auxiliary state 𝐲𝐲\mathbf{y}bold_y (possibly non-unique) according to Eq. 6. The low energy state 𝐱esubscript𝐱𝑒\mathbf{x}_{e}bold_x start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT that was easily accessible by a greedy local search descend can now be separated by energy barriers due to the necessity to adapt 𝐲𝐲\mathbf{y}bold_y for every 𝐱𝐱\mathbf{x}bold_x.

If 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y are treated on equal footing, then one is forced to explore a configuration space 2|{y}|superscript2𝑦2^{|\{y\}|}2 start_POSTSUPERSCRIPT | { italic_y } | end_POSTSUPERSCRIPT times bigger than the native problem. The problem that already had highly non-trivial landscape structure caused by frustrations and long-distance correlations of variables, after quadratization will have these features hidden or worsened by the mismatch of “gradients” and energy barriers, ultimately causing significant deterioration of the IMs’ ability to find solutions [11].

Finally, we note that the sparsifying approaches that aim to reduce degrees of interaction between variables can introduce even more energy barriers into the problem due to auxiliary variables and penalties akin to the locality reduction methods. We do not focus on sparsification in this work; nonetheless, one example is given in App. A.

Refer to caption
Figure 4: (a) Generalized Wang-Landau (GWL) for sampling and barrier estimation. GWL proposal xaxbsubscript𝑥𝑎subscript𝑥𝑏x_{a}\to x_{b}italic_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is sampled with probability of Eq. 11. The basin of attraction is identified by an algorithm of choice. (b) Random descent illustration. The highlighted red circle state belongs to the basin of the green highlighted local minimum state. (c) If necessary, breadth-first search accurately calculates cluster sizes at the end of sampling. The light grey states are unstable exits, the dark grey states are stable saddles.

III.1.2 Sampling algorithm outline

In order to study and visualize with DGs the energy landscapes of degenerate optimization problems and their QUBO mapping modifications, we extend the Generalized Wang-Landau (GWL) [52, 53] sampling approach of the works [22, 54]. The GWL non-Markov Chain Monte-Carlo algorithm carries out random walks in the configuration space aiming to achieve approximately uniform attendance of all predefined energy levels l[1,L]𝑙1𝐿l\in[1,L]italic_l ∈ [ 1 , italic_L ] and all recorded basins of attraction k[1,K]𝑘1𝐾k\in[1,K]italic_k ∈ [ 1 , italic_K ].

During preprocessing steps (see Fig. 4a) one defines the landscape partition into sectors in energy [E1,E2,,EL]superscript𝐸1superscript𝐸2superscript𝐸𝐿[E^{1},E^{2},\dots,E^{L}][ italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_E start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ] and affinity to a basin of attraction of a local minimum 𝐱ksuperscript𝐱𝑘\mathbf{x}^{k}bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT: 𝐱Bk,l𝐱subscript𝐵𝑘𝑙\mathbf{x}\in B_{k,l}bold_x ∈ italic_B start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT, if E(𝐱)[El,El+1)𝐸𝐱superscript𝐸𝑙superscript𝐸𝑙1E(\mathbf{x})\in[E^{l},E^{l+1})italic_E ( bold_x ) ∈ [ italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_E start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT ) and 𝐝𝐞𝐬𝐜𝐞𝐧𝐭(𝐱)=𝐱k𝐝𝐞𝐬𝐜𝐞𝐧𝐭𝐱superscript𝐱𝑘\mathbf{descent}(\mathbf{x})=\mathbf{x}^{k}bold_descent ( bold_x ) = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. We note that the 𝐝𝐞𝐬𝐜𝐞𝐧𝐭𝐝𝐞𝐬𝐜𝐞𝐧𝐭\mathbf{descent}bold_descent routine can be defined differently, and it makes sense to choose its definition mimicking the actual solver algorithm that would be used for solving studied problems in practice (see below Sec. III.1.3). Next, the sampling of states (random walk) is performed with the following acceptance probability:

pab=min[1,exp{β(EaEb)}γka,laγkb,lb],subscript𝑝𝑎𝑏1𝛽subscript𝐸𝑎subscript𝐸𝑏subscript𝛾subscript𝑘𝑎subscript𝑙𝑎subscript𝛾subscript𝑘𝑏subscript𝑙𝑏p_{a\to b}=\min{\left[1,\exp{\left\{\beta(E_{a}-E_{b})\right\}\frac{\gamma_{k_% {a},l_{a}}}{\gamma_{k_{b},l_{b}}}}\right]}\,,italic_p start_POSTSUBSCRIPT italic_a → italic_b end_POSTSUBSCRIPT = roman_min [ 1 , roman_exp { italic_β ( italic_E start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT - italic_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) } divide start_ARG italic_γ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ] , (11)

where γk,lsubscript𝛾𝑘𝑙\gamma_{k,l}italic_γ start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT is a current estimation of the statistical weight of a sector:

1Z𝐱Bl,kexp(βE(𝐱))γk,l.1𝑍subscript𝐱subscript𝐵𝑙𝑘𝛽𝐸𝐱subscript𝛾𝑘𝑙\frac{1}{Z}\sum_{\mathbf{x}\in B_{l,k}}\exp{(-\beta E(\mathbf{x}))}\approx% \gamma_{k,l}\,.divide start_ARG 1 end_ARG start_ARG italic_Z end_ARG ∑ start_POSTSUBSCRIPT bold_x ∈ italic_B start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( - italic_β italic_E ( bold_x ) ) ≈ italic_γ start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT . (12)

This estimation is constantly updated, when the sector Bl,ksubscript𝐵𝑙𝑘B_{l,k}italic_B start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT is visited, by:

γl,kt+1=γl,ktef,superscriptsubscript𝛾𝑙𝑘𝑡1superscriptsubscript𝛾𝑙𝑘𝑡superscript𝑒𝑓\gamma_{l,k}^{t+1}=\gamma_{l,k}^{t}e^{f}\,,italic_γ start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_γ start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT , (13)

where f𝑓fitalic_f can follow a decreasing schedule usually starting from the value f=1𝑓1f=1italic_f = 1. It is numerically convenient to also define a histogram θl,kt+1lnγl,kt+1=θl,kt+fsubscriptsuperscript𝜃𝑡1𝑙𝑘subscriptsuperscript𝛾𝑡1𝑙𝑘subscriptsuperscript𝜃𝑡𝑙𝑘𝑓\theta^{t+1}_{l,k}\equiv\ln\gamma^{t+1}_{l,k}=\theta^{t}_{l,k}+fitalic_θ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT ≡ roman_ln italic_γ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT + italic_f, which is initialized at 0 for all l,k𝑙𝑘l,kitalic_l , italic_k at the beginning of the algorithm. If the exploration of as many minima as possible is preferred, then f𝑓fitalic_f is not decreased over time [54], but in this case the estimation of γ𝛾\gammaitalic_γ would not be accurate [55].

When a step 𝐱a𝐱bsubscript𝐱𝑎subscript𝐱𝑏\mathbf{x}_{a}\to\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is tried (accepted or not), one saves the energy max[Ea,Eb]subscript𝐸𝑎subscript𝐸𝑏\max{[E_{a},E_{b}]}roman_max [ italic_E start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] as a current energy barrier estimation between basins kasubscript𝑘𝑎k_{a}italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and kbsubscript𝑘𝑏k_{b}italic_k start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. This “educated guess” can then potentially be improved with the ridge descent algorithm [22]. If the zero energy barrier is found for a state perceived as local minimum (e.g. 𝐱f𝐱g𝐱hsubscript𝐱𝑓subscript𝐱𝑔subscript𝐱\mathbf{x}_{f}\to\mathbf{x}_{g}\to\mathbf{x}_{h}bold_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT → bold_x start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT → bold_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT in Fig. 1(c)), the status of such minimum is changed to a saddle, and its histogram is added to the corresponding lower basin of attraction (e.g. 𝐱hsubscript𝐱\mathbf{x}_{h}bold_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT). If a saddle is connected to several lower basins, then the visits are distributed uniformly at random among them.

We keep track of only K𝐾Kitalic_K number of lowest in energy local minimum/saddle clusters adaptively uniting them by discovered connectivity and thus allowing space for additional clusters to be taken into account. In addition, we define a special K+1𝐾1K+1italic_K + 1 column of the histogram for all of the states that do not fit into the first K𝐾Kitalic_K clusters [54]. If a certain basin has insufficiently flat histogram of visits, then it was not explored enough during sampling due to its negligible statistical weight. We found these states to be rare and at high energies; thus, if encountered, these minima are not displayed in the final DG.

More details on DG construction of this work are provided in App. C.

III.1.3 Extension for degeneracy

By design, GWL uniformly samples states across basins of attraction of local minima and energy levels. Its main purpose in this work is to discover as many regions of the landscape as possible without being stuck in a particular place, thereby not biasing the DG estimation. What is crucial in the definition of the algorithm is the 𝐝𝐞𝐬𝐜𝐞𝐧𝐭𝐝𝐞𝐬𝐜𝐞𝐧𝐭\mathbf{descent}bold_descent routine, which identifies local minima and saddle points. When a problem has no degeneracies, e.g. S-K spin glass with Gaussian weights, one can define the descend (hill climbing) as the steepest descent, i.e. spins with the highest energy reduction are flipped. However, in this work we are interested in highly degenerate integer valued optimization problems, where such definition is not possible.

As briefly discussed above in Fig. 2, local minima and saddles are perceived differently depending on the algorithm employed for solving such problems. In this case, constructing exact DGs, apart from being infeasible for large problems, may result in misleading conclusions. For instance, a very large saddle point may have only one zero energy exit from itself, which may never be found by a local search routine, effectively being a local minimum, but it would still be depicted as a saddle on a DG, or even worse, not shown at all if saddles are not considered.

Here we aim to balance between efficient exploration of the landscape and visualization of relevant landscape features. For this purpose, we use random descent (see Fig. 4b), in which a greedy local move is performed in the first-seen random direction decreasing the energy. Once a stable state is encountered (white circle in Fig. 4b), a limited exploration of the “plateau” region is performed until either the budget of allowed moves is exhausted, or an exit from the saddle is found (green circle in Fig. 4b). The states encountered during such exploration of saddles/local minima are stored in a single cluster (including the unstable exit states, i.e. 4 states are stored in Fig. 4b). If some of the stored states are encountered again during GWL sampling, all of the states that belong to a single cluster are joined, with their histograms summed up.

Previously, the works [56, 57] addressed the difficulty of clustering in the context of improving the uniform sampling of the ground states. Once a ground state 𝐱𝐱\mathbf{x}bold_x was found, a ballistic search (BS) routine was carried out: starting from some global minimum state, a chain of zero energy states was constructed by flipping every variable maximum once. With the use of such chains, the cluster sizes and thus connectivity of states were estimated more reliably. We experimented with this method for clusters at every energy level and found it useful for clustering remote configurations when the number of states becomes infeasibly large.

Finally, in Fig. 4 we illustrate breadth-first search (BFS) that we use to exactly evaluate sizes of clusters at the end of sampling, when such statistics are of interest. Both stable and unstable states (exits) participate in BFS, and we confine their number by a predefined bound of states per energy level (usually 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT in this work).

III.1.4 Extension for QUBO mapping

Two neighbouring configurations are considered to be a part of a single degenerate cluster in the native (PUBO Eq. 3) landscape if a local move separating them is of zero energy cost, as shown for states 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and 𝐱csubscript𝐱𝑐\mathbf{x}_{c}bold_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT in Fig. 3. The state 𝐱dsubscript𝐱𝑑\mathbf{x}_{d}bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT does not belong to a cluster since it has a move of negative energy to the state 𝐱esubscript𝐱𝑒\mathbf{x}_{e}bold_x start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT. However, in the special case of the QUBO mapping landscape, the same state 𝐱dsubscript𝐱𝑑\mathbf{x}_{d}bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT would now be considered a saddle point since the decrease in energy is only achieved through an intermediate 𝐲𝐲\mathbf{y}bold_y adjustment.

The presence of a barrier in QUBO for a transition 𝐱a𝐱bsubscript𝐱𝑎subscript𝐱𝑏\mathbf{x}_{a}\to\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT (when originally there could be no barrier at all) puts the local search at a disadvantage due to the higher rejection rate of local moves. Raising the temperature of sampling, e.g. of simulated annealing, would not fully solve the problem since it would harm the necessary exploitation of the low-energy manifold. Additionally, once local search is complete, a solver discards 𝐲𝐲\mathbf{y}bold_y values using the states of 𝐱𝐱\mathbf{x}bold_x as a solution. The search over the subspace of 𝐲𝐲\mathbf{y}bold_y, thus, does not look for new solutions, but rather varies the induced QUBO barriers between the neighbours in the 𝐱𝐱\mathbf{x}bold_x space.

To highlight the significance of landscape ruggedness of quadratization compared to the native space and facilitate fairer comparison, here we provide QUBO with additional capabilities by assuming that a local search solver can “look beyond” the QUBO landscape barriers to a certain adjustable degree. Fig. 5 depicts the case where the penalty terms of a QUBO mapping introduce interactions that favour auxiliary variable states different by Hamming distance 3 for two native configurations separated by a single bit-flip, i.e. xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and x¯isubscript¯𝑥𝑖\bar{x}_{i}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

If the problem is approached head-on, one would need to either climb a steep barrier of xix¯isubscript𝑥𝑖subscript¯𝑥𝑖x_{i}\to\bar{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and then adapt 3 auxiliary variables, or sequentially flip each of yay¯asubscript𝑦𝑎subscript¯𝑦𝑎y_{a}\to\bar{y}_{a}italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, i.e. climb a long barrier. Such scenario of long barriers is argued to be difficult for tunnelling in quantum annealers [58], considering that the mapping quadratization is essential due to strict hardware limitations. We note, however, that with every sequential flip of yasubscript𝑦𝑎y_{a}italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, the barrier of the xix¯isubscript𝑥𝑖subscript¯𝑥𝑖x_{i}\to\bar{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT move is reduced, i.e. allowing more yasubscript𝑦𝑎y_{a}italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT to be explored raises the chance to overcome the QUBO barriers introduced by the mapping in the first place.

As a result, we augment disconnectivity graph analysis by introducing a QUBO factor F𝐹Fitalic_F, which stands for the maximum number of allowed auxiliary variable flips for every native move xix¯isubscript𝑥𝑖subscript¯𝑥𝑖x_{i}\to\bar{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. With large enough F𝐹Fitalic_F the original (PUBO) landscape is recovered, while for small F𝐹Fitalic_F values “effective” energy barriers are still present, and thus the landscape connectivity is worsened by the mapping. In addition, F𝐹Fitalic_F serves as means to compare different QUBO mappings head-to-head, with mappings allowing small F𝐹Fitalic_F being arguably better for the local search of IMs. We perform such comparison supported by the simulated annealing results in Sec. III.4.

The algorithm to compute the effective barriers is as follows. First, at a fixed position in the space of 𝐱𝐱\mathbf{x}bold_x we set the auxiliary variables 𝐲𝐲\mathbf{y}bold_y in a valid state required by Eq. 6 111 This valid state may not be unique, i.e. there could be degeneracy in the auxiliary variable space 𝐲𝐲\mathbf{y}bold_y for each fixed 𝐱𝐱\mathbf{x}bold_x. In this case we consider all such configurations and take the minimum of the computed effective QUBO barrier. . Next, the bit-flip energy change ΔEQUBO(,xix¯i,𝐲)\Delta E_{\mathrm{QUBO}}(\dots,x_{i}\to\bar{x}_{i},\dots\mathbf{y})roman_Δ italic_E start_POSTSUBSCRIPT roman_QUBO end_POSTSUBSCRIPT ( … , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … bold_y ) is computed, which corresponds to the “vanilla” QUBO barrier at F=0𝐹0F=0italic_F = 0. Second, in order to calculate the effective QUBO barrier of xix¯isubscript𝑥𝑖subscript¯𝑥𝑖x_{i}\to\bar{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we list all auxiliary variables {ya}subscript𝑦𝑎\{y_{a}\}{ italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT } that interact with xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i.e. Qia0subscript𝑄𝑖𝑎0Q_{ia}\neq 0italic_Q start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT ≠ 0. Out of all listed yasubscript𝑦𝑎y_{a}italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, we choose F𝐹Fitalic_F variables with the minimum values of ΔE(,x¯i,,yay¯a,)<0\Delta E(\dots,\bar{x}_{i},\dots,y_{a}\to\bar{y}_{a},\dots)<0roman_Δ italic_E ( … , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , … ) < 0. Finally, the effective barrier (see Fig. 5) ΔEQUBO,FΔsubscript𝐸QUBOF\Delta E_{\mathrm{QUBO,F}}roman_Δ italic_E start_POSTSUBSCRIPT roman_QUBO , roman_F end_POSTSUBSCRIPT is obtained by (assuming y𝑦yitalic_y variables don’t interact with each other):

ΔEPUBO(xix¯i)ΔEQUBO,F(xix¯i)Δsubscript𝐸PUBOsubscript𝑥𝑖subscript¯𝑥𝑖Δsubscript𝐸QUBOFsubscript𝑥𝑖subscript¯𝑥𝑖absent\displaystyle\Delta E_{\mathrm{PUBO}}(x_{i}\to\bar{x}_{i})\leq\Delta E_{% \mathrm{QUBO,F}}(x_{i}\to\bar{x}_{i})\equivroman_Δ italic_E start_POSTSUBSCRIPT roman_PUBO end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ roman_Δ italic_E start_POSTSUBSCRIPT roman_QUBO , roman_F end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≡ (14)
ΔEQUBO(xix¯i)+a=1FΔE(x¯i,yay¯a)<Δsubscript𝐸QUBOsubscript𝑥𝑖subscript¯𝑥𝑖superscriptsubscript𝑎1𝐹Δ𝐸subscript¯𝑥𝑖subscript𝑦𝑎subscript¯𝑦𝑎absent\displaystyle\Delta E_{\mathrm{QUBO}}(x_{i}\to\bar{x}_{i})+\sum_{a=1}^{F}% \Delta E(\bar{x}_{i},y_{a}\to\bar{y}_{a})<roman_Δ italic_E start_POSTSUBSCRIPT roman_QUBO end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_a = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT roman_Δ italic_E ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) <
ΔEQUBO(xix¯i).Δsubscript𝐸QUBOsubscript𝑥𝑖subscript¯𝑥𝑖\displaystyle\Delta E_{\mathrm{QUBO}}(x_{i}\to\bar{x}_{i})\,.roman_Δ italic_E start_POSTSUBSCRIPT roman_QUBO end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .
Refer to caption
Figure 5: QUBO factor F=|{ya}|𝐹subscript𝑦𝑎F=|\{y_{a}\}|italic_F = | { italic_y start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT } | motivation example. By perturbing F=3𝐹3F=3italic_F = 3 auxiliary variables one is able to restore the PUBO zero barrier between xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and x¯isubscript¯𝑥𝑖\bar{x}_{i}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT states. For F(0,3)𝐹03F\in(0,3)italic_F ∈ ( 0 , 3 ) the effective barrier is defined, taking intermediate values between QUBO and PUBO.

III.1.5 Disconnectivity graphs notation

In the following sections we adopt the following convention when plotting disconnectivity graphs (e.g. see Fig. 6). The y-axis stands for the PUBO/QUBO energy. Every circle represents a separate local minimum/saddle cluster. The diameter of such circle corresponds to the square root of the cluster degeneracy, i.e. the area of a circle is proportional the number of connected configurations within a cluster. There is no explicit meaning behind the x-axis distance between the DG leaves and branches. If a circle is shown to have a zero-energy connection to lower clusters, then it represents a saddle cluster. If two or more saddles appear connected, then the situation depicted in Fig. 1(c) between 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and 𝐱fsubscript𝐱𝑓\mathbf{x}_{f}bold_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is in place. Red clusters in Figs. 7a and  7b have no direct connection to the global minimum denoted by green, i.e. all local minima are red, as well as some saddles (e.g. the state 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT in Fig. 1(c)). Blue saddle clusters in Figs. 7a and  7b were found to be connected to the global minimum by a descent algorithm of choice without energy barriers (e.g. 𝐱fsubscript𝐱𝑓\mathbf{x}_{f}bold_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT in Fig. 1(c)).

Every DG is accompanied by a histogram of the number of states obtained with BFS at each energy level. The degeneracy of every cluster is denoted by 𝒩ksubscript𝒩𝑘\mathcal{N}_{k}caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, while the total number of states per energy is plotted as a normalized by N𝑁Nitalic_N (number of variables) natural logarithm of k𝒩ksubscript𝑘subscript𝒩𝑘\sum_{k}\mathcal{N}_{k}∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The grey histogram shows the total number of BFS aggregated states (including saddle exits). The blue and red histograms count the corresponding stable states shown by circles on the DG.

III.2 Easy and hard problems

Refer to caption
(a)
Refer to caption
(b)
Figure 6: Disconnectivity graphs of “hard” (a, instance uf50-920) and “easy” (b, instance uf50-933) 3-SAT problems. States truncated at E9𝐸9E\leq 9italic_E ≤ 9. Sampling limit per energy level: 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT. (a) 147 clusters, 2 global minima. (b) 97 clusters, 1654 global minima.

The finite size fluctuations of relatively small random 3-SAT problems usually employed for IM benchmarking in practice results in a strong spread of their hardness. In this section, with the help of DGs we aim to highlight the landscape features exhibited by such instances of different hardness. In [11] IM TTS9999{}_{99}start_FLOATSUBSCRIPT 99 end_FLOATSUBSCRIPT 222Time-to-solution 99%percent9999\%99 % (TTS9999{}_{99}start_FLOATSUBSCRIPT 99 end_FLOATSUBSCRIPT) is the time needed for a stochastic solver to reach a solution at least once with probability p=0.99𝑝0.99p=0.99italic_p = 0.99. was estimated for a collection of randomly generated problems from the open benchmarking library SATLIB [61]. The instance uf50-920 shown in Fig. 5(a) was observed to be relatively “hard” with 57000absent57000\approx 57000≈ 57000 TTS9999{}_{99}start_FLOATSUBSCRIPT 99 end_FLOATSUBSCRIPT algorithmic steps, while the instance uf50-933 in Fig. 5(b) was “easy” with 3700absent3700\approx 3700≈ 3700 steps.

A clear distinction is seen in both the number of global minimum configurations, as well as the number of distinct global minimum clusters (1 vs 7). It is in general unclear what is the property of optimization landscapes that would measure hardness best for a particular solver. In [36] authors use the number of global minimum clusters as a proxy for predicting Survey Propagation’s ability to find global minima in 4-SAT random instances. The same correlation is clearly observed in this work. However, instead of trying to predict the hardness of instances by DGs, we demonstrate how DGs can be used to gain insights into experimentally observed algorithm behaviours by a diverse set of sampled landscape properties.

In addition to much larger cardinality of the set of global minima, the easy problem features a large saddle point at E=1𝐸1E=1italic_E = 1 (20877208772087720877 states) which is only connected to global minima and acts as a basin of attraction for solvers. In comparison, the hard instance contains similar saddles with only 206absent206\approx 206≈ 206 states. There were no local minima found at E6𝐸6E\geq 6italic_E ≥ 6 in the easy instance, while the hard instance features local minima even at E=8𝐸8E=8italic_E = 8.

At every energy level we observe massive saddle points which are connected to the global minimum. We note, however, that it becomes very important for local search not to descend in the wrong local minimum cluster, even though in principle it is possible to descent to a solution without overcoming any barriers. With this observation we highlight the higher ratio of local minima/disconnected saddles in the “hard” instance compared to the “easy” one.

The majority of energy barriers in both tested 3-SAT problems is the minimum possible one, ΔE=1Δ𝐸1\Delta E=1roman_Δ italic_E = 1 (as also previously observed in [62]). In other words, the constructed DGs illustrate the significance of entropic barriers that are determined by probabilities of descending into better areas of the landscape, which resulted in the observed 15absent15\approx\!15≈ 15 times difference in the time-to-solution metric.

III.3 Random and industrial problems

The uniform random 3-SAT problems are a common benchmark for testing the performance of heuristic solvers. In the thermodynamic limit of N𝑁N\to\inftyitalic_N → ∞ and M𝑀M\to\inftyitalic_M → ∞ their static properties are understood within the framework of the replica symmetry breaking (or cavity) methods of statistics [32]. In general, the lack of structure of random CSP causes state-of-the-art exact solvers to struggle near the phase transition ratio α=M/N𝛼𝑀𝑁\alpha=M/Nitalic_α = italic_M / italic_N and ultimately take exponential time to find solutions due to the difficulty of truncating the search space based on exponentially growing deep decision trees [26].

On the other hand, it is not a difficult task to engineer a structured problem to challenge a heuristic solver. A very small basin of attraction of a global minimum with overall rugged landscape would make a local search heuristic relying on stochastic exploration get lost. As a result, stochastic by design, IMs can have a hard time outperforming exact routines exploiting inherent structures of problems. In order to draw conclusions about the capabilities that IMs would need to tackle both combinatorial optimization classes, we employ DGs to visualize the distinction in landscape properties between fully random and structured “industrial” instances.

To represent the structured industrial class, we generated a 3-SAT formulation of the factoring problem of the number 55 using the method from [63]. This resulted in a 3-SAT instance having 68 boolean variables and 248 clauses with only one global minimum. For comparison, a random uniform instance near the phase transition ratio α𝛼\alphaitalic_α was generated of the same 68 variable size, but with 295 clauses. We obtained an instance with a single global minimum cluster having 4 configurations. The DG of the uniform random instance is shown in Fig. 6(a), while the DG of factoring — in Fig. 6(b).

We truncated the DG of the random problem to energy levels 5 or lower, resulting in 400absent400\approx 400≈ 400 distinct local minimum/saddle clusters, while the semi-prime factoring DG has managed to fit as many as 650absent650\approx 650≈ 650 clusters already at the energy E3𝐸3E\leq 3italic_E ≤ 3. This indicates much more pronounced ruggedness of the factoring 3-SAT problem with weak connectivity of saddle points.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 7: Disconnectivity graph examples of 3-SAT problems: (a) Uniform random of 68 variables and 295 clauses. 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT sampling limit of states per energy. 4 global minima. (b) Semi-prime factoring of 55 mapped to the 3-SAT problem of 68 variables and 248 clauses [63]. 1 global minimum. (c-d) Overlap distributions of local minimum states.

As in the previous section, we observed exponentially large connected saddle clusters at every energy level of random 3-SAT: it is possible to traverse huge distances in the optimization landscape without the need to overcome any energy barriers. It means that the hardness of this problem class arises mainly from the entropic barriers, leaving gradient-based solvers oblivious about meaningful exploration directions.

In comparison, the number of saddle clusters disconnected from the global minimum in the factoring problem constitute a much bigger fraction of the overall number of captured clusters. Moreover, the number of stable states of the factoring problem at the given energies is much smaller than in the random case (we didn’t need to impose sampling limits), indicating that the energy barriers are the main contributor to the hardness. These results support the conclusion of [64], where authors argue there is no evidence for an advantage of employing SAT reductions for factoring problems, both using classical SOTA SAT solvers, and their hypothetical quantum counterparts.

The limitation of DGs is that they compress combinatorial landscape information to local minima and barriers between them, while the distances in solution space are left aside. Since we are able to store all of the discovered by GWL+BFS states, one approach to probing such distance information is by calculating the mutual overlaps of local minima. The mutual overlap of states for Ising formulated problems is defined by qab=1Ni=1Nsiasibsubscript𝑞𝑎𝑏1𝑁superscriptsubscript𝑖1𝑁subscriptsuperscript𝑠𝑎𝑖subscriptsuperscript𝑠𝑏𝑖q_{ab}=\frac{1}{N}\sum_{i=1}^{N}s^{a}_{i}s^{b}_{i}italic_q start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [28], where si=2xi1subscript𝑠𝑖2subscript𝑥𝑖1s_{i}=2x_{i}-1italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 2 italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1. We show computed histograms of overlaps of local minima for the given random and industrial instances in Figs. 6(c) and 6(d).

The histograms of random and structured problems exhibit distinct behaviour with the random instance having the majority of states at zero overlap values. This property is not explicitly shown by the DG visualization. It is implied, however, by the very large saddle clusters. Compared to the random instance, the local minima of the factoring problem at E2𝐸2E\leq 2italic_E ≤ 2 are closer to each other without showing evidence of a gap in the overlaps [12]. Thus, IMs can be challenged by different landscape features depending on the problem class, suggesting a strong algorithmic need for specialization.

Refer to caption
Figure 8: QUBO mapping landscape (KZFD-BG) truncated to the stable states with E5𝐸5E\leq 5italic_E ≤ 5 of a 3-SAT instance in Fig. 5(a). Problem size: 50 (native) + 136 (auxiliary) variables. QUBO factor F=1𝐹1F=1italic_F = 1. Total number of stable states (blue and red) is 9.7×105absent9.7superscript105\approx 9.7\times 10^{5}≈ 9.7 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT across 1621162116211621 clusters. Total number sampled states (grey histogram) is 5.7×106absent5.7superscript106\approx 5.7\times 10^{6}≈ 5.7 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT.
Refer to caption
(a)
Refer to caption
(b)
Figure 9: (a) Local minimum and disconnected saddle cluster complexities vs cluster entropies for the native (PUBO) and QUBO landscapes at energies E2𝐸2E\leq 2italic_E ≤ 2 for 1000 sampled SATLIB 3-SAT problems of size N=50𝑁50N=50italic_N = 50. (b) The ratio of the number of local minima and saddles disconnected from the global minimum to all sampled stable states.

III.4 QUBO mappings of 3-SAT

In this section we study energy landscapes of QUBO mappings of 3-SAT using the DGs construction extension introduced in Sec. III.1.4. We explicitly state the F𝐹Fitalic_F factor when mappings are compared with each other. We say that F=𝐹F=\inftyitalic_F = ∞ when all auxiliary variables are probed, essentially recovering the native (PUBO) landscape. At a given F𝐹Fitalic_F the “effective” barrier definition is illustrated in Fig. 5.

In Fig. 8 we plot a KZFD-BG QUBO mapping landscape of the instance from Fig. 5(a) truncated to the subspace of energies E5𝐸5E\leq 5italic_E ≤ 5. The QUBO barrier factor was chosen to be F=1𝐹1F=1italic_F = 1, meaning that at every step 𝐱a𝐱bsubscript𝐱𝑎subscript𝐱𝑏\mathbf{x}_{a}\to\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT → bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT only one QUBO auxiliary variable is allowed to be flipped in order to overcome the QUBO barriers between the native states 𝐱𝐱\mathbf{x}bold_x. One can observe the following features of the QUBO landscape of 3-SAT:

  • The connectivity of states is drastically reduced with large saddle point clusters of the PUBO landscape shattered into multiple disconnected saddles or local minima in the QUBO landscape (in total 1621162116211621 clusters). This has a direct negative effect on the ability to find global minima for the local search heuristics at low temperatures/noise.

  • The global minimum cluster (which consists of 2 states for this instance) is preserved, but only 20202020 from the original 206206206206 configurations were found to be connected from energy E=1𝐸1E=1italic_E = 1 towards the solution. In other words, blue clusters have become red clusters. The same behaviour is observed higher in energies, i.e. local search faces energy barriers and “holes” in addition to entropic barriers.

In order to highlight the necessity to carefully approach mapping into Ising hardware, we would like to directly compare the QUBO mappings introduced in Sec. III.1.1 to each other from the perspective of connectivity of states (clustering) at different values of the QUBO factor F𝐹Fitalic_F. We use 1000 SATLIB instances of size 50 with 218 clauses (uf50 1-1000) to accumulate statistics from sampled DGs. On average, our QUBO mapping scheme of variable substitution and penalty terms introduced 138±4plus-or-minus1384138\pm 4138 ± 4 auxiliary variables.

In Fig. 8(a) we show the histogram of the number of local minimum/disconnected saddle clusters 𝒩(s)exp(NΣ(s))𝒩𝑠𝑁Σ𝑠\mathcal{N}(s)\equiv\exp{(N\Sigma(s))}caligraphic_N ( italic_s ) ≡ roman_exp ( italic_N roman_Σ ( italic_s ) ) of size Sexp(Ns)𝑆𝑁𝑠S\equiv\exp{(Ns)}italic_S ≡ roman_exp ( italic_N italic_s ) for KZFD-BG and Rosenberg mappings, and for the native space. The parameter Σ(s)Σ𝑠\Sigma(s)roman_Σ ( italic_s ) is usually referred to as cluster complexity, while s𝑠sitalic_s is the cluster entropy [27]. As mentioned in Sec. III.1.3, the distinct clusters are sampled with the GWL algorithm; however, the cluster entropy estimations are improved further by BFS. The results are averaged over all considered instances for the states sampled at energies e=E/N0.04𝑒𝐸𝑁0.04e=E/N\leq 0.04italic_e = italic_E / italic_N ≤ 0.04.

We explicitly plot the global minima (E=0𝐸0E=0italic_E = 0) distribution as a sanity check: the RSB theory predicts its maximum value in the thermodynamic limit being at the entropy s0.06𝑠0.06s\approx 0.06italic_s ≈ 0.06, while the curve itself should be below 0 complexity when the clause-to-variable ratio of 3-SAT is above the phase transition value 4.2674.2674.2674.267 (we have 4.364.364.364.36) [32]. Both features are present for our sampled data.

We observe shattering of the native landscape clusters by the Rosenberg mapping to be stronger than that of the KZFD-BG mapping. This result can be interpreted as follows. On average, in order to transition (overcome the barrier) from state 𝐱asubscript𝐱𝑎\mathbf{x}_{a}bold_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT to state 𝐱bsubscript𝐱𝑏\mathbf{x}_{b}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT having the same energy, the Rosenberg mapping needs to flip at least FRossubscript𝐹RosF_{\mathrm{Ros}}italic_F start_POSTSUBSCRIPT roman_Ros end_POSTSUBSCRIPT auxiliary variables, introduced by quadratization, while KZFD-BG would safely transition after flipping only FKZFD<FRossubscript𝐹KZFDsubscript𝐹RosF_{\mathrm{KZFD}}<F_{\mathrm{Ros}}italic_F start_POSTSUBSCRIPT roman_KZFD end_POSTSUBSCRIPT < italic_F start_POSTSUBSCRIPT roman_Ros end_POSTSUBSCRIPT. As a result, F𝐹Fitalic_F can be seen as a measure of the ruggedness or shattering of the quadratizatized optimization landscape induced by the mapping.

As displayed by the histogram in Fig. 8(a), both QUBO mappings feature large clusters that are not accounted for in the PUBO case. These are the native space saddle clusters connected to the global minimum (thus not shown on the histogram), which were transformed by quadratization to either local minima or disconnected saddles. In Sec. III.2 we discussed the value of connected saddle points at low temperatures/noise for finding global minima using IMs. QUBO mappings, thus, can transform saddle points into local minima effectively impeding the descend in energy. The ratio of local minima/disconnected saddles to all stable states is shown in Fig. 8(b). With increasing F𝐹Fitalic_F we approach the native landscape faster for the KZFD-BG mapping than for the Rosenberg mapping. This suggests a potential algorithm for IMs that are forced by hardware to use quadratization methods. With sufficient exploration of the auxiliary space, it is possible to recover the native (PUBO) landscape geometry and thereby benefit from the reduced number of local minima/disconnected saddles, provided that the costs of specific hardware implementations do not outweigh such benefits in terms of time-to-solution/energy-to-solution metrics.

With our analysis we would also like to highlight the effect of choosing quadratization methods on performance of solvers. While all such methods preserve global minima, the geometry of combinatorial landscapes may change beyond recognition, thereby drastically decreasing the local search performance. To support the observed landscape advantage of KZFD-BG mapping over Rosenberg, we performed simulated annealing for a collection of SATLIB 3-SAT problem instances in Fig. 10. The advantage of the KZFD-BG mapping clearly exhibits itself in the solver performance giving dominating TTS9999{}_{99}start_FLOATSUBSCRIPT 99 end_FLOATSUBSCRIPT for the majority of instances. For the problem size of N=50𝑁50N=50italic_N = 50 none of the KZFD-BG mapped instances timed-out within our imposed computational budget of SA sweeps. On the other hand, the solver using the Rosenberg mapping failed to find a solution for several instances even at this modest problem size.

Refer to caption
Figure 10: TTS9999{}_{99}start_FLOATSUBSCRIPT 99 end_FLOATSUBSCRIPT of Simulated Annealing for the non-term-wise QUBO mappings: Rosenberg and KZFD-BG. Instances uf50 1-100 and uf75 1-100. The average mapping sizes are N50QUBO=188±5subscriptsuperscript𝑁QUBO50plus-or-minus1885N^{\mathrm{QUBO}}_{50}=188\pm 5italic_N start_POSTSUPERSCRIPT roman_QUBO end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 50 end_POSTSUBSCRIPT = 188 ± 5, N75QUBO=303±5subscriptsuperscript𝑁QUBO75plus-or-minus3035N^{\mathrm{QUBO}}_{75}=303\pm 5italic_N start_POSTSUPERSCRIPT roman_QUBO end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 75 end_POSTSUBSCRIPT = 303 ± 5. There are 7 instances that timed-out for both mappings represented by the rightmost point.

IV Conclusion

In this work we have suggested methods to sample disconnectivity graphs of degenerate combinatorial optimization problems, while also introducing extensions for quadratic embeddings motivated by hardware constraints of Ising machines. DGs have proven to be able to visually capture energy landscape properties of instances with different structure (industrial and random), hardness, and order (quadratic and higher order). To characterize clustering/ruggedness of the configuration space arising from locality reduction, we have introduced a new method, QUBO factor F𝐹Fitalic_F. From this perspective we have discussed the reasons behind observed experimental performance gap between different QUBO mappings, as well as between QUBO and PUBO.

The directions for future work include investigating other definitions of neighbourhoods beyond the simple bit-flip in order to visualize and gain intuition into how optimization landscapes are perceived by different local (or non-local) search routines. For example, isoenergetic cluster moves [65] allow solvers to make large Hamming distance steps, defining a different neighbourhood for each configuration, thus a new DG with distinct connectivity of states. Moreover, understanding of the energy landscape geometry is of great importance in a variety of fields ranging from inference and learning in energy-based models [66] to attractor dynamics and storage capacity in associative memories [67, 68]. One application example is non-equilibrium inhomogeneous sampling methods [36, 69] which essentially modify energy barriers reducing the hardness of sampling of high-quality and diverse solutions.

Sparsifying methods motivated by the available connectivity topology of the Ising hardware constitute a complementary problem which can also be studied with the methods of this work. Finally, the distinct properties of auxiliary variables imply the possibility to introduce adaptive algorithms leveraging the specific native-auxiliary interactions within the constrains of Ising machines.

V Acknowledgements

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under the Air Force Research Laboratory (AFRL) Agreement No. FA8650-23-3-7313. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. The authors gratefully acknowledge computing time on the supercomputer JURECA [70] Forschungszentrum Jülich under grant no. “optimization”. We also gratefully acknowledge the generous funding of this work under NEUROTEC II (Verbundkoordinator / Förderkennzeichen: Forschungszentrum Jülich / 16ME0398K) by the Bundesministerium für Bildung und Forschung.

Appendix A Sparsification by auxiliary variables

Let’s assume that due to hardware limitations we are unable to support full interaction connectivity of a variable x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of the PUBO function in Eq. 3. Due to its multilinear form, we can split the interactions of x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, i.e.

f(𝐱)=A(xia,,xka)x1+B(xib,,xkb)x1,𝑓𝐱𝐴subscript𝑥subscript𝑖𝑎subscript𝑥subscript𝑘𝑎subscript𝑥1𝐵subscript𝑥subscript𝑖𝑏subscript𝑥subscript𝑘𝑏subscript𝑥1f(\mathbf{x})=A(x_{i_{a}},\dots,x_{k_{a}})x_{1}+B(x_{i_{b}},\dots,x_{k_{b}})x_% {1}\,,italic_f ( bold_x ) = italic_A ( italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_B ( italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (15)

where A𝐴Aitalic_A and B𝐵Bitalic_B are independent functions (assuming such exist). x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the second term can be substituted by an auxiliary variable y𝑦yitalic_y with the introduction of a penalty as follows:

g(𝐱,y)=Ax1+By+P(x1+y2x1y),𝑔𝐱𝑦𝐴subscript𝑥1𝐵𝑦𝑃subscript𝑥1𝑦2subscript𝑥1𝑦g(\mathbf{x},y)=Ax_{1}+By+P(x_{1}+y-2x_{1}y)\,,italic_g ( bold_x , italic_y ) = italic_A italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_B italic_y + italic_P ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_y - 2 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_y ) , (16)

which obeys f(𝐱)=minyg(𝐱,y)𝑓𝐱subscript𝑦𝑔𝐱𝑦f(\mathbf{x})=\min_{y}g(\mathbf{x},y)italic_f ( bold_x ) = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_g ( bold_x , italic_y ) as in locality reduction methods if P|B|𝑃𝐵P\geq|B|italic_P ≥ | italic_B |. As a result, the local search move (x1,y)=(0,0)(x¯1,y¯)=(1,1)subscript𝑥1𝑦00subscript¯𝑥1¯𝑦11(x_{1},y)=(0,0)\to(\bar{x}_{1},\bar{y})=(1,1)( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y ) = ( 0 , 0 ) → ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_y end_ARG ) = ( 1 , 1 ) can only be made with single flips through a higher energy barrier than in the denser original formulation (see Tab. 1).

Table 1: Sparsification truth table
x𝑥xitalic_x y𝑦yitalic_y g(𝐱,y)𝑔𝐱𝑦g(\mathbf{x},y)italic_g ( bold_x , italic_y ) f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x )
0 0 0 0
0 1 B+|B|𝐵𝐵B+|B|italic_B + | italic_B |
1 0 A+|B|𝐴𝐵A+|B|italic_A + | italic_B | A+B𝐴𝐵A+Bitalic_A + italic_B
1 1 A+B𝐴𝐵A+Bitalic_A + italic_B

Appendix B 3-SAT to QUBO mappings

In this section we describe in detail the mappings of 3-SAT problems formulated as conjugate normal forms (CNF) to quadratic pseudo-boolean functions (QUBO).

Notation:

x𝑥xitalic_x are boolean variables, l𝑙litalic_l are literals that stand for either x𝑥xitalic_x or its negation x¯1x¯𝑥1𝑥\bar{x}\equiv 1-xover¯ start_ARG italic_x end_ARG ≡ 1 - italic_x, y𝑦yitalic_y are boolean auxiliary variables.

The problem of maximizing the total number of clauses of size k=3𝑘3k=3italic_k = 3 is reformulated as a minimization problem of a third order pseudo-boolean polynomial of literals as follows (inverting the expression and using De-Morgan law):

(l¯11l¯21l¯31)(l¯12l¯22l¯32)subscript¯𝑙subscript11subscript¯𝑙subscript21subscript¯𝑙subscript31subscript¯𝑙subscript12subscript¯𝑙subscript22subscript¯𝑙subscript32\displaystyle(\bar{l}_{1_{1}}\vee\bar{l}_{2_{1}}\vee\bar{l}_{3_{1}})\wedge(% \bar{l}_{1_{2}}\vee\bar{l}_{2_{2}}\vee\bar{l}_{3_{2}})\wedge\dots( over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ( over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ …
l11l21l31+l12l22l32+,absentsubscript𝑙subscript11subscript𝑙subscript21subscript𝑙subscript31subscript𝑙subscript12subscript𝑙subscript22subscript𝑙subscript32\displaystyle\rightarrow l_{1_{1}}l_{2_{1}}l_{3_{1}}+l_{1_{2}}l_{2_{2}}l_{3_{2% }}+\dots\,,→ italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + … , (17)

where each li=xaisubscript𝑙𝑖subscript𝑥subscript𝑎𝑖l_{i}=x_{a_{i}}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT or li=x¯aisubscript𝑙𝑖subscript¯𝑥subscript𝑎𝑖l_{i}=\bar{x}_{a_{i}}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, i[1,3M]𝑖13𝑀i\in[1,3M]italic_i ∈ [ 1 , 3 italic_M ], a[1,N]𝑎1𝑁a\in[1,N]italic_a ∈ [ 1 , italic_N ]. The simplest mapping of this expression to QUBO (quadratization) would be to introduce the Rosenberg penalties for every term in the sum:

l11l21l31+l12l22l32+=miny𝔹g(l(x),y),subscript𝑙subscript11subscript𝑙subscript21subscript𝑙subscript31subscript𝑙subscript12subscript𝑙subscript22subscript𝑙subscript32subscript𝑦𝔹𝑔𝑙𝑥𝑦\displaystyle l_{1_{1}}l_{2_{1}}l_{3_{1}}+l_{1_{2}}l_{2_{2}}l_{3_{2}}+\dots=% \min_{y\in\mathbb{B}}g(l(x),y)\,,italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ⋯ = roman_min start_POSTSUBSCRIPT italic_y ∈ blackboard_B end_POSTSUBSCRIPT italic_g ( italic_l ( italic_x ) , italic_y ) , (18)
g(l(x),y)=i[1,M]yil3i+(3yi2yil1i2yil2i+l1il2i).𝑔𝑙𝑥𝑦subscript𝑖1𝑀subscript𝑦𝑖subscript𝑙subscript3𝑖3subscript𝑦𝑖2subscript𝑦𝑖subscript𝑙subscript1𝑖2subscript𝑦𝑖subscript𝑙subscript2𝑖subscript𝑙subscript1𝑖subscript𝑙subscript2𝑖\displaystyle g(l(x),y)=\sum_{i\in[1,M]}y_{i}l_{3_{i}}+(3y_{i}-2y_{i}l_{1_{i}}% -2y_{i}l_{2_{i}}+l_{1_{i}}l_{2_{i}})\,.italic_g ( italic_l ( italic_x ) , italic_y ) = ∑ start_POSTSUBSCRIPT italic_i ∈ [ 1 , italic_M ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ( 3 italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 2 italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - 2 italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

The validity of such quadratizaton (Eq. 6) directly follows from the fact that auxiliary variables yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are introduced independently for each term of Eq. (17) and l1l2l3=miny[yl3+(3y2yl12yl2+l1l2)]subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑦𝑦subscript𝑙33𝑦2𝑦subscript𝑙12𝑦subscript𝑙2subscript𝑙1subscript𝑙2l_{1}l_{2}l_{3}=\min_{y}\left[yl_{3}+(3y-2yl_{1}-2yl_{2}+l_{1}l_{2})\right]italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT [ italic_y italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + ( 3 italic_y - 2 italic_y italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 2 italic_y italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ].

B.1 Non-term-wise Rosenberg

In order to get the “classic” Rosenberg [47] quadratizaton, we write the PUBO of Eq. (17) for variables x𝑥xitalic_x:

l11l21l31subscript𝑙subscript11subscript𝑙subscript21subscript𝑙subscript31\displaystyle l_{1_{1}}l_{2_{1}}l_{3_{1}}italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT +\displaystyle++ l12l22l32+=i<j<kSijkxixjxksubscript𝑙subscript12subscript𝑙subscript22subscript𝑙subscript32subscript𝑖𝑗𝑘subscript𝑆𝑖𝑗𝑘subscript𝑥𝑖subscript𝑥𝑗subscript𝑥𝑘\displaystyle l_{1_{2}}l_{2_{2}}l_{3_{2}}+\dots=\sum_{i<j<k}S_{ijk}x_{i}x_{j}x% _{k}italic_l start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ⋯ = ∑ start_POSTSUBSCRIPT italic_i < italic_j < italic_k end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (19)
+\displaystyle++ i<jWijxixj+iBixi+C.subscript𝑖𝑗subscript𝑊𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗subscript𝑖subscript𝐵𝑖subscript𝑥𝑖𝐶\displaystyle\sum_{i<j}W_{ij}x_{i}x_{j}+\sum_{i}B_{i}x_{i}+C\,.∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_C .

The pairs xmxnsubscript𝑥𝑚subscript𝑥𝑛x_{m}x_{n}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in a set covering all terms of order 3 are substituted by auxiliary variables y(mn)subscript𝑦𝑚𝑛y_{(mn)}italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT with the addition of a penalties as in Eq. (18):

i<j<kSijkxixjxk+=miny(mn),kS(mn)ky(mn)xk+subscript𝑖𝑗𝑘subscript𝑆𝑖𝑗𝑘subscript𝑥𝑖subscript𝑥𝑗subscript𝑥𝑘limit-fromsubscript𝑦subscript𝑚𝑛𝑘subscript𝑆𝑚𝑛𝑘subscript𝑦𝑚𝑛subscript𝑥𝑘\displaystyle\sum_{i<j<k}S_{ijk}x_{i}x_{j}x_{k}+\dots=\min_{y}\sum_{(mn),k}S_{% (mn)k}y_{(mn)}x_{k}+∑ start_POSTSUBSCRIPT italic_i < italic_j < italic_k end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⋯ = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT ( italic_m italic_n ) , italic_k end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT +
+(mn)P(mn)R(3y(mn)2y(mn)xn2y(mn)xm+xmxn)subscript𝑚𝑛subscriptsuperscript𝑃𝑅𝑚𝑛3subscript𝑦𝑚𝑛2subscript𝑦𝑚𝑛subscript𝑥𝑛2subscript𝑦𝑚𝑛subscript𝑥𝑚subscript𝑥𝑚subscript𝑥𝑛\displaystyle+\sum_{(mn)}P^{R}_{(mn)}(3y_{(mn)}-2y_{(mn)}x_{n}-2y_{(mn)}x_{m}+% x_{m}x_{n})+ ∑ start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ( 3 italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT - 2 italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 2 italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
+[2ndorderterms],annotateddelimited-[]absent2ndorderterms\displaystyle+\dots[\leq\mathrm{2nd\;order\;terms}]\,,+ … [ ≤ 2 roman_n roman_d roman_order roman_terms ] , (20)

where the lower bound penalty coefficients are now index-dependent [71]:

P(mn)Rmax[kS(mn)k+,kS(mn)k]subscriptsuperscript𝑃𝑅𝑚𝑛subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘\displaystyle P^{R}_{(mn)}\geq\max\left[\sum_{k}S^{+}_{(mn)k},-\sum_{k}S^{-}_{% (mn)k}\right]italic_P start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ≥ roman_max [ ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT , - ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT ]
S(mn)k+>0,S(mn)k<0.formulae-sequencesubscriptsuperscript𝑆𝑚𝑛𝑘0subscriptsuperscript𝑆𝑚𝑛𝑘0\displaystyle S^{+}_{(mn)k}>0,\;S^{-}_{(mn)k}<0\,.italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT > 0 , italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT < 0 . (21)

B.2 Non-term-wise KZFD-BG

Here we modify the Rosenberg mapping of Sec. B.1 applying quadratization ideas of [50, 51, 46], where the positive and negative monomials get different penalty terms of Eq. 8 (here third order):

x1x2x3=miny[2yi=13yxi],subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑦2𝑦superscriptsubscript𝑖13𝑦subscript𝑥𝑖\displaystyle-x_{1}x_{2}x_{3}=\min_{y}\left[2y-\sum_{i=1}^{3}yx_{i}\right]\,,- italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT [ 2 italic_y - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_y italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ,
x1x2x3=x2x3x¯1x2x3subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑥2subscript𝑥3subscript¯𝑥1subscript𝑥2subscript𝑥3\displaystyle x_{1}x_{2}x_{3}=x_{2}x_{3}-\bar{x}_{1}x_{2}x_{3}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
=x2x3+miny[2yyx¯1i=23yxi].absentsubscript𝑥2subscript𝑥3subscript𝑦2𝑦𝑦subscript¯𝑥1superscriptsubscript𝑖23𝑦subscript𝑥𝑖\displaystyle=x_{2}x_{3}+\min_{y}\left[2y-y\bar{x}_{1}-\sum_{i=2}^{3}yx_{i}% \right]\,.= italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT [ 2 italic_y - italic_y over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_y italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] . (22)

Rearranging the summands in these equations, we get for the positive monomial:

xmxkxnxky+(yyxmyxn+xmxn),subscript𝑥𝑚subscript𝑥𝑘subscript𝑥𝑛subscript𝑥𝑘𝑦𝑦𝑦subscript𝑥𝑚𝑦subscript𝑥𝑛subscript𝑥𝑚subscript𝑥𝑛x_{m}x_{k}x_{n}\to x_{k}y+(y-yx_{m}-yx_{n}+x_{m}x_{n})\,,italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_y + ( italic_y - italic_y italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_y italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , (23)

and for the negative monomial:

xmxnxkyxk+yxmxn+(yyxmyxn+xmxn).subscript𝑥𝑚subscript𝑥𝑛subscript𝑥superscript𝑘𝑦subscript𝑥superscript𝑘𝑦subscript𝑥𝑚subscript𝑥𝑛𝑦𝑦subscript𝑥𝑚𝑦subscript𝑥𝑛subscript𝑥𝑚subscript𝑥𝑛-x_{m}x_{n}x_{k^{\prime}}\to-yx_{k^{\prime}}+y-x_{m}x_{n}+(y-yx_{m}-yx_{n}+x_{% m}x_{n})\,.- italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → - italic_y italic_x start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_y - italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( italic_y - italic_y italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_y italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) . (24)

As a result, an arbitrary 3rd order pseudo-boolean function is quadratizatized as:

i<j<kSijkxixjxk+=miny(mn),kS(mn)k+y(mn)xksubscript𝑖𝑗𝑘subscript𝑆𝑖𝑗𝑘subscript𝑥𝑖subscript𝑥𝑗subscript𝑥𝑘subscript𝑦subscript𝑚𝑛𝑘subscriptsuperscript𝑆𝑚𝑛𝑘subscript𝑦𝑚𝑛subscript𝑥𝑘\displaystyle\sum_{i<j<k}S_{ijk}x_{i}x_{j}x_{k}+\dots=\min_{y}\sum_{(mn),k}S^{% +}_{(mn)k}y_{(mn)}x_{k}∑ start_POSTSUBSCRIPT italic_i < italic_j < italic_k end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⋯ = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT ( italic_m italic_n ) , italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
+(mn),kS(mn)k(y(mn)xky(mn)+xmxn)subscript𝑚𝑛superscript𝑘subscriptsuperscript𝑆𝑚𝑛superscript𝑘subscript𝑦𝑚𝑛subscript𝑥superscript𝑘subscript𝑦𝑚𝑛subscript𝑥𝑚subscript𝑥𝑛\displaystyle+\sum_{(mn),k^{\prime}}S^{-}_{(mn)k^{\prime}}(y_{(mn)}x_{k^{% \prime}}-y_{(mn)}+x_{m}x_{n})+ ∑ start_POSTSUBSCRIPT ( italic_m italic_n ) , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
+(mn)P(mn)(y(mn)y(mn)xny(mn)xm+xmxn)subscript𝑚𝑛subscript𝑃𝑚𝑛subscript𝑦𝑚𝑛subscript𝑦𝑚𝑛subscript𝑥𝑛subscript𝑦𝑚𝑛subscript𝑥𝑚subscript𝑥𝑚subscript𝑥𝑛\displaystyle+\sum_{(mn)}P_{(mn)}(y_{(mn)}-y_{(mn)}x_{n}-y_{(mn)}x_{m}+x_{m}x_% {n})+ ∑ start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
+[2ndorderterms],annotateddelimited-[]absent2ndorderterms\displaystyle+\dots[\leq\mathrm{2nd\;order\;terms}]\,,+ … [ ≤ 2 roman_n roman_d roman_order roman_terms ] , (25)

where S(mn)k+,S(mn)ksubscriptsuperscript𝑆𝑚𝑛superscript𝑘subscriptsuperscript𝑆𝑚𝑛superscript𝑘S^{+}_{(mn)k^{\prime}},S^{-}_{(mn)k^{\prime}}italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT denote coefficients of positive and negative monomials. The penalty parameters P(mn)Ksubscriptsuperscript𝑃𝐾𝑚𝑛P^{K}_{(mn)}italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT are chosen as:

P(mn)KkS(mn)k+kS(mn)ksubscriptsuperscript𝑃𝐾𝑚𝑛subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘\displaystyle P^{K}_{(mn)}\geq\sum_{k}S^{+}_{(mn)k}-\sum_{k}S^{-}_{(mn)k}italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT
S(mn)k+>0,S(mn)k<0.formulae-sequencesubscriptsuperscript𝑆𝑚𝑛𝑘0subscriptsuperscript𝑆𝑚𝑛𝑘0\displaystyle S^{+}_{(mn)k}>0,\;S^{-}_{(mn)k}<0\,.italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT > 0 , italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT < 0 . (26)

Indeed, for every auxiliary variable index (mn)𝑚𝑛(mn)( italic_m italic_n ) we have

g(x,y(mn))=(N++N)y(mn)|N|(xmxny(mn))𝑔𝑥subscript𝑦𝑚𝑛superscript𝑁superscript𝑁subscript𝑦𝑚𝑛superscript𝑁subscript𝑥𝑚subscript𝑥𝑛subscript𝑦𝑚𝑛\displaystyle g(x,y_{(mn)})=\left(N^{+}+N^{-}\right)y_{(mn)}-|N^{-}|(x_{m}x_{n% }-y_{(mn)})italic_g ( italic_x , italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ) = ( italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT - | italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT | ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT )
+P(mn)K(y(mn)y(mn)xny(mn)xm+xmxn),subscriptsuperscript𝑃𝐾𝑚𝑛subscript𝑦𝑚𝑛subscript𝑦𝑚𝑛subscript𝑥𝑛subscript𝑦𝑚𝑛subscript𝑥𝑚subscript𝑥𝑚subscript𝑥𝑛\displaystyle+P^{K}_{(mn)}(y_{(mn)}-y_{(mn)}x_{n}-y_{(mn)}x_{m}+x_{m}x_{n})\,,+ italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , (27)

where we defined

N+kS(mn)k+xk,NkS(mn)kxkformulae-sequencesuperscript𝑁subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘subscript𝑥𝑘superscript𝑁subscriptsuperscript𝑘subscriptsuperscript𝑆𝑚𝑛superscript𝑘subscript𝑥𝑘\displaystyle N^{+}\equiv\sum_{k}S^{+}_{(mn)k}x_{k},\;N^{-}\equiv\sum_{k^{% \prime}}S^{-}_{(mn)k^{\prime}}x_{k}italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ≡ ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ≡ ∑ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
|N+|=kS(mn)k+,|N|=kS(mn)k.formulae-sequencesuperscript𝑁subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘superscript𝑁subscript𝑘subscriptsuperscript𝑆𝑚𝑛𝑘\displaystyle|N^{+}|=\sum_{k}S^{+}_{(mn)k},\;-|N^{-}|=\sum_{k}S^{-}_{(mn)k}\,.| italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT | = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT , - | italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT | = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) italic_k end_POSTSUBSCRIPT . (28)

As a result, f(x)=minyg(x,y)𝑓𝑥subscript𝑦𝑔𝑥𝑦f(x)=\min_{y}g(x,y)italic_f ( italic_x ) = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_g ( italic_x , italic_y ) due to Eq. 26 is guaranteed, as shown in Tab. 2. Compared to the Rosenberg mapping, non-term-wise KZFD-BG has smaller dynamic range 2nd order interactions, since that 2PR>PK2superscript𝑃𝑅superscript𝑃𝐾2P^{R}>P^{K}2 italic_P start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT > italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT.

The same native variable pairs xixjsubscript𝑥𝑖subscript𝑥𝑗x_{i}x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are chosen for substitution for both mappings in this work for fair comparison. Their choice is a result of a greedy (i.e. efficient) optimization algorithm choosing the most frequent variable pairs which achieves significant reduction (possibly not the optimal 333Finding a minimal set of pairs of variables to be substituted is in general an NP-Hard problem.) of the QUBO space compared to the term-wise methods (e.g. Eq. 18).

Table 2: KZFD-BG QUBO mapping truth table for every substituted pair xnxmsubscript𝑥𝑛subscript𝑥𝑚x_{n}x_{m}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and an auxiliary y(mn)subscript𝑦𝑚𝑛y_{(mn)}italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT.
y(mn)subscript𝑦𝑚𝑛y_{(mn)}italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT xnsubscript𝑥𝑛x_{n}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT g(x,y(mn))𝑔𝑥subscript𝑦𝑚𝑛g(x,y_{(mn)})italic_g ( italic_x , italic_y start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT ) f(x)𝑓𝑥f(x)italic_f ( italic_x )
0 0 0 0 0
1 0 0 N++N+|N|+P(mn)Ksuperscript𝑁superscript𝑁superscript𝑁subscriptsuperscript𝑃𝐾𝑚𝑛N^{+}+N^{-}+|N^{-}|+P^{K}_{(mn)}italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT + | italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT | + italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT
0 1 0 0 0
1 1 0 N++N+|N|superscript𝑁superscript𝑁superscript𝑁N^{+}+N^{-}+|N^{-}|italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT + | italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT |
0 1 1 |N|+P(mn)Ksuperscript𝑁subscriptsuperscript𝑃𝐾𝑚𝑛-|N^{-}|+P^{K}_{(mn)}- | italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT | + italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_m italic_n ) end_POSTSUBSCRIPT N++Nsuperscript𝑁superscript𝑁N^{+}+N^{-}italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT
1 1 1 N++Nsuperscript𝑁superscript𝑁N^{+}+N^{-}italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT

Appendix C implementation details

The original code developed for this work uses GWL sampling described in Sec. III.1 and has the following output: GWL histogram of visits to basins of attraction and energy levels, sampled cluster sizes, symmetric matrix of energy barriers between clusters. The information about the connectivity of clusters and their type (local minimum/saddle) is then derived from the barrier matrix during postprocessing. As input the program takes the conjugate normal form of a 3-SAT problem, or the equivalent 3rd order sparse PUBO formulation.

The hyperparameters that can be tuned of DG sampling are: number of parallel threads of sampling, total GWL steps per thread, hill climbing type (random/steepest), usage of ballistic search, cluster exploration and breadth-first search limits.

Extended disconnectivity graphs construction from the aforementioned data sampled with the GWL algorithm is based on the functions from pele library [73].

References

  • Mohseni et al. [2022] N. Mohseni, P. L. McMahon, and T. Byrnes, Ising machines as hardware solvers of combinatorial optimization problems, Nature Reviews Physics 4, 363 (2022).
  • Mahmoodi et al. [2019] M. R. Mahmoodi, M. Prezioso, and D. B. Strukov, Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization, Nature Communications 10, 5113 (2019).
  • Cai et al. [2020] F. Cai, S. Kumar, T. Van Vaerenbergh, X. Sheng, R. Liu, C. Li, Z. Liu, M. Foltin, S. Yu, Q. Xia, J. J. Yang, R. Beausoleil, W. D. Lu, and J. P. Strachan, Power-efficient combinatorial optimization using intrinsic noise in memristor hopfield neural networks, Nature Electronics 3, 409 (2020).
  • Aramon et al. [2019] M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, and H. G. Katzgraber, Physics-inspired optimization for quadratic unconstrained problems using a digital annealer, Frontiers in Physics 710.3389/fphy.2019.00048 (2019).
  • Peng et al. [2008] X. Peng, Z. Liao, N. Xu, G. Qin, X. Zhou, D. Suter, and J. Du, Quantum adiabatic algorithm for factorization and its experimental implementation, Phys. Rev. Lett. 101, 220405 (2008).
  • Borders et al. [2019] W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno, and S. Datta, Integer factorization using stochastic magnetic tunnel junctions, Nature 573, 390 (2019).
  • Mallick et al. [2020] A. Mallick, M. K. Bashar, D. S. Truesdell, B. H. Calhoun, S. Joshi, and N. Shukla, Using synchronized oscillators to compute the maximum independent set, Nature Communications 11, 4689 (2020).
  • Goto et al. [2021] H. Goto, K. Endo, M. Suzuki, Y. Sakai, T. Kanao, Y. Hamakawa, R. Hidaka, M. Yamasaki, and K. Tatsumura, High-performance combinatorial optimization based on classical mechanics, Science Advances 7, eabe7953 (2021).
  • Könz et al. [2021] M. S. Könz, W. Lechner, H. G. Katzgraber, and M. Troyer, Embedding overhead scaling of optimization problems in quantum annealing, PRX Quantum 2, 040322 (2021).
  • Valiante et al. [2021] E. Valiante, M. Hernandez, A. Barzegar, and H. G. Katzgraber, Computational overhead of locality reduction in binary optimization problems, Computer Physics Communications 269, 108102 (2021).
  • Hizzani et al. [2023] M. Hizzani, A. Heittmann, G. Hutchinson, D. Dobrynin, T. V. Vaerenbergh, T. Bhattacharya, A. Renaudineau, D. Strukov, and J. P. Strachan, Memristor-based hardware and algorithms for higher-order hopfield optimization solver outperforming quadratic ising machines (2023), arXiv:2311.01171 [cs.ET] .
  • Gamarnik [2021] D. Gamarnik, The overlap gap property: A topological barrier to optimizing over random structures, Proceedings of the National Academy of Sciences 11810.1073/pnas.2108492118 (2021).
  • Bernaschi et al. [2021] M. Bernaschi, M. Bisson, M. Fatica, E. Marinari, V. Martin-Mayor, G. Parisi, and F. Ricci-Tersenghi, How we are leading a 3-xorsat challenge: From the energy landscape to the algorithm and its efficient implementation on gpus(a), Europhysics Letters 133, 60005 (2021).
  • Becker and Karplus [1997] O. M. Becker and M. Karplus, The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics, The Journal of Chemical Physics 106, 1495 (1997).
  • Wales et al. [1998] D. J. Wales, M. A. Miller, and T. R. Walsh, Archetypal energy landscapes, Nature 394, 758 (1998).
  • Doye et al. [1999] J. P. K. Doye, M. A. Miller, and D. J. Wales, The double-funnel energy landscape of the 38-atom Lennard-Jones cluster, The Journal of Chemical Physics 110, 6896 (1999).
  • Chakraborty et al. [2014] D. Chakraborty, R. Collepardo-Guevara, and D. J. Wales, Energy landscapes, folding mechanisms, and kinetics of rna tetraloop hairpins, Journal of the American Chemical Society 136, 18052 (2014).
  • Calvo et al. [2007] F. Calvo, T. V. Bogdan, V. K. de Souza, and D. J. Wales, Equilibrium density of states and thermodynamic properties of a model glass former, The Journal of Chemical Physics 127, 044508 (2007).
  • Wales [2005] D. J. Wales, Energy landscapes and properties of biomolecules, Physical Biology 2, S86 (2005).
  • Garstecki et al. [1999] P. Garstecki, T. X. Hoang, and M. Cieplak, Energy landscapes, supergraphs, and “folding funnels” in spin systems, Phys. Rev. E 60, 3219 (1999).
  • Biswas and Katzgraber [2020] K. Biswas and H. G. Katzgraber, Adding color: Visualization of energy landscapes in spin glasses (2020), arXiv:2004.12431 [cond-mat.dis-nn] .
  • Zhou [2011] Q. Zhou, Random walk over basins of attraction to construct ising energy landscapes, Phys. Rev. Lett. 106, 180602 (2011).
  • Coja-Oghlan et al. [2017] A. Coja-Oghlan, A. Haqshenas, and S. Hetterich, Walksat stalls well below satisfiability, SIAM Journal on Discrete Mathematics 31, 1160 (2017).
  • Bellitti et al. [2021] M. Bellitti, F. Ricci-Tersenghi, and A. Scardicchio, Entropic barriers as a reason for hardness in both classical and quantum algorithms, Phys. Rev. Res. 3, 043015 (2021).
  • Barahona [1982] F. Barahona, On the computational complexity of ising spin glass models, Journal of Physics A: Mathematical and General 15, 3241 (1982).
  • Selman and Kirkpatrick [1996] B. Selman and S. Kirkpatrick, Critical behavior in the computational cost of satisfiability testing, Artificial Intelligence 81, 273 (1996), frontiers in Problem Solving: Phase Transitions and Complexity.
  • Mé zard et al. [2005] M. Mé zard, T. Mora, and R. Zecchina, Clustering of solutions in the random satisfiability problem, Physical Review Letters 9410.1103/physrevlett.94.197205 (2005).
  • Mezard and Montanari [2009] M. Mezard and A. Montanari, Information, Physics, and Computation (Oxford University Press, Inc., USA, 2009).
  • Marques-Silva [2008] J. Marques-Silva, Practical applications of boolean satisfiability, in 2008 9th International Workshop on Discrete Event Systems (2008) pp. 74–80.
  • Bhattacharya et al. [2024] T. Bhattacharya, G. H. Hutchinson, G. Pedretti, X. Sheng, J. Ignowski, T. V. Vaerenbergh, R. Beausoleil, J. P. Strachan, and D. B. Strukov, Computing high-degree polynomial gradients in memory (2024), arXiv:2401.16204 [cs.ET] .
  • Garey and Johnson [1990] M. R. Garey and D. S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., USA, 1990).
  • Montanari et al. [2008] A. Montanari, F. Ricci-Tersenghi, and G. Semerjian, Clusters of solutions and replica symmetry breaking in random k-satisfiability, Journal of Statistical Mechanics: Theory and Experiment 2008, P04004 (2008).
  • Zdeborová and Krząkała [2007] L. Zdeborová and F. Krząkała, Phase transitions in the coloring of random graphs, Phys. Rev. E 76, 031131 (2007).
  • Ardelius and Zdeborová [2008] J. Ardelius and L. Zdeborová, Exhaustive enumeration unveils clustering and freezing in the random 3-satisfiability problem, Phys. Rev. E 78, 040101 (2008).
  • Li et al. [2009] K. Li, H. Ma, and H. Zhou, From one solution of a 3-satisfiability formula to a solution cluster: Frozen variables and entropy, Phys. Rev. E 79, 031102 (2009).
  • Mohseni et al. [2021] M. Mohseni, D. Eppens, J. Strumpfer, R. Marino, V. Denchev, A. K. Ho, S. V. Isakov, S. Boixo, F. Ricci-Tersenghi, and H. Neven, Nonequilibrium monte carlo for unfreezing variables in hard combinatorial optimization (2021), arXiv:2111.13628 [cond-mat.dis-nn] .
  • Bresler and Huang [2021] G. Bresler and B. Huang, The algorithmic phase transition of random k𝑘kitalic_k-sat for low degree polynomials (2021), arXiv:2106.02129 [cs.CC] .
  • Gamarnik et al. [2022] D. Gamarnik, C. Moore, and L. Zdeborová, Disordered systems insights on computational hardness, Journal of Statistical Mechanics: Theory and Experiment 2022, 114015 (2022).
  • Flamm et al. [2002] C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger, Barrier trees of degenerate landscapes, Zeitschrift für Physikalische Chemie 216, 155 (2002).
  • Reidys and Stadler [2002] C. M. Reidys and P. F. Stadler, Combinatorial landscapes, SIAM Review 44, 3 (2002).
  • Hallam and Prugel-Bennett [2005] J. Hallam and A. Prugel-Bennett, Large barrier trees for studying search, IEEE Transactions on Evolutionary Computation 9, 385 (2005).
  • Dauphin et al. [2014] Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization (2014), arXiv:1406.2572 [cs.LG] .
  • Lee et al. [2019] J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, First-order methods almost always avoid strict saddle points, Mathematical Programming 176, 311 (2019).
  • Dattani [2019] N. Dattani, Quadratization in discrete optimization and quantum mechanics (2019), arXiv:1901.04405 [quant-ph] .
  • Schrijver [2000] A. Schrijver, A combinatorial algorithm minimizing submodular functions in strongly polynomial time, Journal of Combinatorial Theory, Series B 80, 346 (2000).
  • Boros and Gruber [2014] E. Boros and A. Gruber, On quadratization of pseudo-boolean functions (2014), arXiv:1404.6538 [math.OC] .
  • Rosenberg [1975] I. G. Rosenberg, Reduction of bivalent maximization to the quadratic case, Cahiers Centre Études Recherche Opér. 17 (1975).
  • Biamonte [2008] J. D. Biamonte, Nonperturbative k𝑘\mathit{k}italic_k-body to two-body commuting conversion hamiltonians and embedding problem instances into ising spins, Phys. Rev. A 77, 052331 (2008).
  • Aadit et al. [2022] N. A. Aadit, A. Grimaldi, M. Carpentieri, L. Theogarajan, J. M. Martinis, G. Finocchio, and K. Y. Camsari, Massively parallel probabilistic computing with sparse ising machines, Nature Electronics 5, 460 (2022).
  • Kolmogorov and Zabin [2004] V. Kolmogorov and R. Zabin, What energy functions can be minimized via graph cuts?, IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 147 (2004).
  • Freedman and Drineas [2005] D. Freedman and P. Drineas, Energy minimization via graph cuts: settling what is possible, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2 (2005) pp. 939–946 vol. 2.
  • Landau et al. [2004] D. P. Landau, S.-H. Tsai, and M. Exler, A new approach to Monte Carlo simulations in statistical physics: Wang-Landau sampling, American Journal of Physics 72, 1294 (2004).
  • Liang [2005] F. Liang, A generalized wang-landau algorithm for monte carlo computation, Journal of the American Statistical Association 100, 1311 (2005).
  • Tang and Zhou [2012] W. Tang and Q. Zhou, Finding multiple minimum-energy conformations of the hydrophobic-polar protein model via multidomain sampling, Phys. Rev. E 86, 031909 (2012).
  • Barbu and Zhu [2020] A. Barbu and S.-C. Zhu, Monte Carlo Methods (Springer Singapore, 2020).
  • Hartmann [2000] A. K. Hartmann, A new method for analysing ground-state landscapes: ballistic search, Journal of Physics A: Mathematical and General 33, 657 (2000).
  • Mann and Hartmann [2010] A. Mann and A. K. Hartmann, Numerical solution-space analysis of satisfiability problems, Phys. Rev. E 82, 056702 (2010).
  • Denchev et al. [2016] V. S. Denchev, S. Boixo, S. V. Isakov, N. Ding, R. Babbush, V. Smelyanskiy, J. Martinis, and H. Neven, What is the computational value of finite-range tunneling?, Phys. Rev. X 6, 031015 (2016).
  • Note [1] This valid state may not be unique, i.e. there could be degeneracy in the auxiliary variable space 𝐲𝐲\mathbf{y}bold_y for each fixed 𝐱𝐱\mathbf{x}bold_x. In this case we consider all such configurations and take the minimum of the computed effective QUBO barrier.
  • Note [2] Time-to-solution 99%percent9999\%99 % (TTS9999{}_{99}start_FLOATSUBSCRIPT 99 end_FLOATSUBSCRIPT) is the time needed for a stochastic solver to reach a solution at least once with probability p=0.99𝑝0.99p=0.99italic_p = 0.99.
  • Hoos and Stützle [2000] H. Hoos and T. Stützle, Satlib: An online resource for research on sat (2000) pp. 283–292.
  • Frank et al. [1997] J. Frank, P. Cheeseman, and J. Stutz, When gravity fails: Local search topology (1997), arXiv:cs/9712101 [cs.AI] .
  • [63] P. Purdom and A. Sabry, Cnf generator for factoring problems.
  • Mosca and Verschoor [2022] M. Mosca and S. R. Verschoor, Factoring semi-primes with (quantum) sat-solvers, Scientific Reports 12, 7982 (2022).
  • Houdayer [2001] J. Houdayer, A cluster monte carlo algorithm for 2-dimensional spin glasses, The European Physical Journal B - Condensed Matter and Complex Systems 22, 479 (2001).
  • Lecun et al. [2006] Y. Lecun, S. Chopra, and R. Hadsell, A tutorial on energy-based learning (2006).
  • Hopfield [1982] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities., Proceedings of the National Academy of Sciences 79, 2554 (1982)https://www.pnas.org/doi/pdf/10.1073/pnas.79.8.2554 .
  • Krotov and Hopfield [2016] D. Krotov and J. J. Hopfield, Dense associative memory for pattern recognition (2016), arXiv:1606.01164 [cs.NE] .
  • Mohseni et al. [2023] M. Mohseni, M. M. Rams, S. V. Isakov, D. Eppens, S. Pielawa, J. Strumpfer, S. Boixo, and H. Neven, Sampling diverse near-optimal solutions via algorithmic quantum annealing, Phys. Rev. E 108, 065303 (2023).
  • Jülich Supercomputing Centre [2021] Jülich Supercomputing Centre, JURECA: Data Centric and Booster Modules implementing the Modular Supercomputing Architecture at Jülich Supercomputing Centre, Journal of large-scale research facilities 710.17815/jlsrf-7-182 (2021).
  • Babbush et al. [2013] R. Babbush, B. O’Gorman, and A. Aspuru-Guzik, Resource efficient gadgets for compiling adiabatic quantum optimization problems, Annalen der Physik 525, 877 (2013).
  • Note [3] Finding a minimal set of pairs of variables to be substituted is in general an NP-Hard problem.
  • [73] pele: Python energy landscape explorer.