Modern SAT solvers: fast, neat and underused (part 3 of N)

In the previous two parts (1, 2) we used a SAT solver as a black box that we feed input into, and it will (usually quickly) spit out an answer. In this part, we will look at how SAT solvers work and what heuristics and other tricks they employ in their quest for performance.

Approaches to SAT solving

Modern SAT solvers fall into one of two groups: local search based solvers and
Conflict Driven Clause Learning (CDCL) based solvers. This post will concern itself with the latter for two simple reasons, one is that most of my experience is with CDCL solver, the second is that local-search based solvers are rarely used in practice.

There are two main reasons for local search based SAT solvers dearth of practical usage:

  1. They are often not complete (they might not find a solution even if it exists)
  2. They are usually slower than the deterministic CDCL solvers

They do however have their uses, e.g. when solving MaxSAT[1] problem, and have some interesting theoretical properties[2].

CDCL solvers

The CDCL solvers are an evolution of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, which itself is a reasonably simple[3] improvement over the naive backtracking algorithm. CDCL is both complete (will answer "SAT" if a solution exists) and sound (it will not answer "SAT" for an unsatisfiable formula).

I think that the best way to explain how CDCL works is to start with a naive backtracking algorithm and then show how the DPLL and CDCL algorithms improve upon it.

Simple backtracking

A (very) naive backtracking algorithm could work as follows:

  1. Pick a variable without an assigned truth value. If there are none, return SAT.
  2. Assign it a truth-value (true/false).
  3. Check if all clauses in our formula are still potentially satisfiable.
    • If they are, go to 1.
    • If they are not satisfiable, go to 2 and pick the other truth-value.
    • If they are not satisfiable, and both truth-values have been tried, backtrack.
    • If there is nowhere to backtrack, return UNSAT.

This algorithm is obviously both complete and sound. It is also very inefficient, so let's start improving it.

The first improvement we will make is to speed up the check for unsatisfiable clauses in step 3, but we need to introduce two new concepts to do so, positive literal and negative literal. A literal is positive if it evaluates to true given its variable truth value and negative otherwise. As an example, $\neg x$ is positive literal when variable $x$ is set to false, and negative literal when variable $x$ is set to true.

The trick we will use to speed up the check for unsatisfiable clauses is to update instead the state of our clauses based on variable assignment. This means that after step 2 we will take all clauses that contain a literal of the variable selected in step 1, and update them accordingly. If they contain a positive literal, they are satisfied, and we can remove them from further consideration completely. If they contain a negative literal, they cannot be satisfied using this variable, and we can remove the literal from them.

If removing the negative literals creates an empty clause, then the clause is unsatisfiable under the current assignment, and we need to backtrack.

The improved backtracking algorithm can thus be described as:

  1. Pick a variable without an assigned truth value. If there are none, return SAT.
  2. Assign it a truth-value (true/false).
  3. Remove all clauses with positive literals of the variable assignment.
  4. Remove all negative literals of the variable assignment.
  5. Check if an empty clause was created.
    • If it was, try the other truth-value or backtrack.
    • If it was not, go to 1.

DPLL algorithm

Given the implementation above, it can be seen that if step 4 creates a clause consisting of a single literal (called unit clause), we are provided with extra information. Specifically, it provides us with an assignment for the variable of the literal inside the unit clause, because the only way to satisfy a unit clause is to make the literal inside positive. We can then also apply steps 3 and 4 for this forced assignment, potentially creating new unit clauses in the process. This is called unit propagation.

Another insight we could have is that if at any point, all literals of a variable have the same polarity, that is, they are either all negated or not, we can effectively remove that variable and all clauses that contain a literal of that variable[4]. This is called pure literal elimination.

By adding these two tricks to our backtracking solver, we have reimplemented a DPLL solver[5]:

  1. Pick a variable without an assigned truth value. If there are none, return SAT.
  2. Assign it a truth-value (true/false).
  3. Remove all clauses with positive literals of the variable assignment.
  4. Remove all negative literals of the variable assignment.
  5. Keep performing unit propagation and pure literal elimination while possible.
  6. Check if an empty clause was created.
    • If it was, try the other truth-value or backtrack.
    • If it was not, go to 1.

Obviously, the order in which variables are picked in step 1 and which truth-values are attempted first in step 2, has a significant impact on solver's runtime, and we will get to heuristics for these later.

CDCL algorithm

The difference between a DPLL solver and a CDCL solver is the introduction of something called non-chronological backtracking or backjumping. The idea behind it is that often, a conflict (an empty clause is created) is caused by a variable assignment that happened much sooner than it was detected, and if we could somehow identify when the conflict was caused, we could backtrack several steps at once, without running into the same conflict multiple times.

The implementation of backjumping analyzes the current conflict via something called conflict clause, finds out the earliest variable assignment involved in the conflict and then jumps back to that assignment[6]. The conflict clause is also added to the problem, to avoid revisiting the parts of the search space that were involved in the conflict.

If you want more details about how a CDCL SAT solver works, I recommend looking at the Chaff and the MiniSat solvers. Chaff is often seen as the first SAT solver performant enough to be of practical interest, while MiniSat was written in 2003 to show that implementing state of the art SAT solver can be quite easy, and its later versions are still used as the basis for some current solvers. Specifically, you can look at the paper on Chaff's construction, or at the nitty-gritty of MiniSat's implementation. MiniSat has a very liberal open source licence, and we provide a somewhat cleaned-up version in a GitHub repo.

Performance tricks of CDCL SAT solvers

As a reminder, the Sudoku solver we built in the first post creates SAT instances with 729 variables and ~12k clauses. MiniSat then needs ~1.5 ms to solve them. Similarly, my employer's translation of master-key systems often creates problems with 100k-1M[7] variables and an order of magnitude more clauses. These large instances are then solved within a couple of minutes.

In this section, we will look at the specific tricks used by the CDCL SAT solvers to achieve this excellent performance.

Data structures

Good data structures are the backbone of every performant program and SAT solvers are no exceptions. Some of the data structures are generic, and well-known outside solvers, such as custom memory managers that batch allocations and keep data laid out in a cache-friendly manner, other are pretty much specific to CDCL SAT solvers, such as the (2) watched literals scheme.

I will skip over the tricks played with clause representation to ensure it is cache friendly because I want to make this post primarily about SAT specific tricks, and not generic tricks of the trade. This leaves us with the 2 watched literals trick.

Let's backtrack a bit, and return to the first algorithm we wrote down for solving SAT. To improve upon it, we proposed a step where we update and evaluate clauses based on the currently assigned variable, so that satisfied clauses are removed, while unsatisfied clauses are shortened. This step is called BCP (binary constraint propagation).

The naive implementation is simple, you can create a mapping between a variable and each clause that contains its literal when you are loading the problem, and then just iterate through all clauses relevant to a variable, either marking them as solved or shortening them. Backtracking is also surprisingly simple because when you unset a variable, you can restore the related clauses.

However, the naive implementation is also very inefficient. The only time when we can propagate a clause is when it is unsatisfied and is down to a single unassigned literal, in which case we can use the unassigned literal for unit propagation. Visiting clauses that are either already satisfied, or are not yet down to single unassigned literal is thus a waste of time. This poses a question, how do we keep track of clause status, without explicitly updating them on every variable assignment change?

2 watched literals

Enter the 2 watched literals algorithm/data structure/trick, pioneered by the Chaff solver[8]. The basic idea is that 2 literals from each clause are selected (watched), and the clause is only visited when one of them would be removed from the clause (in other words, its variable takes the opposite polarity). When a clause is visited, one of these four things happens

  1. All but one literal evaluate to false. This last literal is then unit propagated further.
  2. All literals evaluate to false. This particular assignment is UNSAT, and the solver must backtrack.
  3. At least one literal evaluates to true. Nothing to do.
  4. At least 2 literals are not assigned, and the clause is not satisfied. Remove this clause from the watchlist that brought us here, add it to a watchlist of different literal.

This trick ensures that we only visit clauses with the potential to become unit-clauses, speeding up BCP significantly. It is not without its disadvantages though, using these lazy checks means that we cannot easily answer queries like "how many clauses currently have 3 unassigned literals" because the only thing we know about a clause is that it is either satisfied, or it still has at least 2 unassigned literals. Implementation of backtracking is also a bit trickier than using the naive implementation of BCP updates, but not overly so.

Note that we do not restore the original watches when backtracking, we keep the replaced ones. The invariant provided by the watches still holds, and there is no reason to do the extra work.

Over time, two more practical optimizations emerged:

  1. Store literals to propagate directly in watch for binary clauses

Binary clauses consist of precisely two literals, and we use 2 watches per clause. In other words, once one of the watches is triggered, it will force unit-propagation to happen to the other literal. By specializing path for binary clauses, we can save time it would take to bring the clause from memory and determine that there is only one literal left, and instead, we can start propagating the assignment directly.

  1. Copy the watched literals into a separate location

This is another optimization based around decreasing cache pressure when working with watches. As it turns out when a clause is examined because of a watch, the most common result of the visit is option 3, that is, the clause is satisfied, and there is nothing to do. Furthermore, the most common reason for the clause being satisfied is the other watched literal.

Copying the watched literals of each clause into a separate location allows us to take advantage of this fact because we can check this case without reading the whole clause from memory, thus alleviating the cache pressure a bit[9].

Clause deletion

In the introduction, I said that the difference between the DPLL and CDCL algorithms is that the latter learns new clauses during its search for a solution. This learning improves the scalability of CDCL significantly[10], but it also carries a potential for a significant slowdown, because each learnt clause takes up valuable memory and increases the time needed for BCP. Given that the upper bound on the number of learnable clauses is $2^{|Vars|}$, storing all of the learnt clauses obviously does not work, and we need to have a strategy for pruning them.

Let's start with a very naive strategy, first in, first out (FIFO). In this strategy, we decide on an upper limit of learnt clauses, and when adding a newly learnt clause exceeds this limit, the oldest learnt clause is deleted. This strategy avoids the problem with the ballooning number of learnt clauses, but at the cost of discarding potentially useful clauses. In fact, we are guaranteed to discard useful clauses because every learnt clause has a deterministic lifetime.

Let's consider a different naive strategy, random removal. In this strategy, we again decide on an upper limit of learnt clauses, but this time the clause to remove is picked completely randomly. This has the advantage that while we might remove a useful clause, we are not guaranteed that we remove useful clauses. While this distinction might seem minor, the random pruning strategy usually outperforms the FIFO one.

It is evident that a strategy that just keeps n best learnt clauses dominates both of these. The problem with this idea is that we need a way to score clauses on their usefulness, and doing so accurately might be even harder than solving the SAT instance in the first place. This means that we need to find a good (quickly computable and accurate) heuristic that can score a clause's usefulness.

Clause usefulness heuristics

The number of possible heuristics is virtually unlimited, especially if you count various hybrids and small tweaks, but in this post, we will look only at 3 of them. They are:

  1. Clause activity

This heuristic is used by the MiniSat solver. A clause's activity is based on how recently it was used during conflict resolution, and clauses with low activity are removed from the learnt clause database. The idea behind this is that if a clause was involved in conflict resolution, it has helped us find a conflict quicker and thus let us skip over part of the search space. Conversely, if a clause has not been used for a while, then the slowdown and memory pressure it introduces is probably not worth keeping it around.

  1. Literal Block Distance (LBD)

This heuristic was introduced in a 2009 paper and subsequently implemented in the Glucose solver. This heuristic assumes that we have a mapping between variables currently assigned a truth value and the decision level (recursion level) at which they were assigned that value. Given clause $C$, $LBD(C)$ is then calculated by taking the decision levels from variables of all literals in that clause, and counting how many different decision levels were in this set.

The less there are, the better, and clauses for which $LBD(C) = 2$ are called glue clauses[11]. The idea is that they glue together variables from the higher (later) decision level (later in the search tree) to a variable[12] from a lower (earlier) decision level, and the solver can then use this clause to set these variables earlier after backtracking. Solvers that use the LBD heuristic for learnt clause management almost always keep all of the glue clauses and for removal only consider clauses where $LBD(C) \geq 3$.

  1. Clause size

The third heuristic we will look at is extremely simple, it is just the clause's size, $|C|$, with a lower score being better. To understand the reason why shorter clauses are considered better, consider a unit clause $\neg x_3$. Adding this clause to a problem forces assignment $x_3 := false$, effectively removing about half of the possible search space. The story is similar for binary clauses, e.g. $(x_3 \vee x_5)$ cuts out about $1 \over 4$ of the possible variable assignments, because it forbids assignment $x_3 := false \wedge x_5 := false$. More generally, if we do not consider overlaps, an n-ary clause forbids $1 \over 2^{n}$ possible variable assignments.

Using clause size metric for learnt clause management is then done by picking a threshold k and splitting learnt clauses into two groups, those where $|C| \leq k$ and those where $|C| \gt k$. Pruning the learnt clauses then only considers the latter group for removal, where the longer clauses are deleted first. It should also incorporate a bit of randomness, to give a chance to not delete the useful, but long, clause in lieu of the useless, but short(er), clause. The final rating of a clause is then $|C| + random()$.

Let's compare these 3 heuristics across 3 criteria:

  1. How much is the clause's rating dependent on the path the solver took to learn this clause, or, how dynamic is the heuristic
  2. What does it base its claims of predictive strength on
  3. Real-world performance

Here is a quick overview:

Clause activity LBD Clause size
Dynamicity High Some None[13]
Prediction basis Clauses's recent performance How many decision layers are involved in the clause Size of the cut the clause makes in the decision tree
Performance in the real world Used in MiniSat to good effect Used in Glucose to good effect MiniSat with randomized clause size as the management supposedly outperforms Glucose[14]

There are various reasons why it is hard to compare different strategies for learnt clause management objectively. For starters, they are often implemented in entirely different solvers so they cannot be compared directly, and even if you vivify them and port these different strategies to the same solver, the results do not have to generalize. The different solvers might use different learning algorithms, different variable-selection heuristics (see below), different restart strategy and so on, and all of these design consideration must be optimized to work together.

Another reason why generalization is hard is that different heuristics might perform differently on different kinds of instances, and the average user cares about their kind of instances a lot more than some idealized average. After all, my employer uses SAT in our core product, and if we could get 10% more performance for "our kind" of instances at the cost of a 10x slowdown on the other kinds, we would take it in a heartbeat.

So, instead of trying to compare these heuristics objectively, I will leave you with some food for your thoughts:

  • Glucose is seen as better performing than MiniSat, but a lot of it is its better performance on unsolvable instances, and there are more differences than just the learnt clause management
  • More dynamic heuristics likely need more CPU and RAM for bookkeeping
  • More static heuristics have to evaluate clauses with less instance-specific context
  • As is often disclaimed, "past performance is no guarantee of future results."

Variable heuristics

As was already mentioned, the solver's performance on a specific problem strongly depends on the order in which it assigns values to variables. In other words, a quickly-computable heuristic approximating "good" order is an essential part of each CDCL solver. The first strong heuristic, VSIDS (Variable State Independent Decaying Sum), has also been introduced by the Chaff solver, and with minor tweaks, has remained the strongest heuristic for many years[15].

Before we look at the heuristics, how they work and what facts about the SAT structure they exploit, it should be noted that they are usually employed in tandem with purely random selection, to balance between the needs to exploit and to explore the search space.

VSIDS

VSIDS works by assigning each variable a score and then picking the variable with the highest score. If there are multiple options with the same score, then the tie has to be broken somehow, but the specifics don't matter too much.

The scores are determined using a simple algorithm:

  • Start with all counters initialized to 0.
  • On conflict, increase the counter of all variables involved in the conflict by $c_{add}$.
  • Every j conflicts, decrease the counter of all variables by multiplying it with coefficient $c_{decay}$.

The values for j, $c_{add}$, and $c_{decay}$ are picked via empirical testing, and for any reasonable implementation of VSIDS, it must always hold that $0 < c_{decay} < 1$.

The original VSIDS implementation in the Chaff solver used to only increase counter of literals in the learnt clause, rather than of all involved literals, and it also decreased the counters significantly, but rarely ($c_{decay} = 0.5$, $j = 1000$). More modern implementations update more literals and decay the counters less, but more often (e.g. $c_{decay} = 0.95$, $j = 1$). This increases the cost of computing the VSIDS but makes the heuristic more responsive to changes in the current search space[16].

Over time, various different modifications of VSIDS have emerged, and I want to showcase at least one of them. The paper that introduced this modification called it adaptVSIDS[17], short for adaptative VSIDS. The idea behind it is to dynamically change the value of $c_{decay}$ depending on the quality of the learnt clauses, so that when the learnt clauses are of high quality, the solver stays in the same area of the search space for longer, and if the learnt clauses are of poor quality, it will move out of this area of the search space quicker. Specifically, it will increase $c_{decay}$ when the learnt clauses are good, and decrease it when the learnt clauses are bad, as measured by a clause-quality metric such as LBD mentioned above.

Learning Rate Based heuristics (LRB and friends)

This is a relatively new family of heuristics (~2016 onwards), with a simple motivation: the big differences between the old DPLL algorithm and the modern CDCL one is that the latter learns about the structure of the problem it is solving. Thus, optimizing variable selection towards learning more is likely to perform better in the long run.

However, while the idea is simple, implementation is much less so. Computing learning rate based heuristic boils down to solving an online reinforcement learning problem, specifically, it is the Multi-armed bandit (MAB) problem. Our MAB is also non-stationary, that is, the underlying reward (learning rate) distribution changes during play (solving the problem), which further complicates finding the solution.

In the end, the algorithm applied is in many ways similar to VSIDS, in that a variant of exponential moving average (EMA), is applied to each variable and the one with the best score is selected at each step for branching. The important difference is that while VSIDS bumps each variable involved in a conflict by a fixed amount, the LRB heuristic assigns each variable a different payoff based on the amount of learning it has led to[18].

Restarts

As mentioned in the first post, solving NP-complete problems (such as SAT) naturally leads to heavy-tailed run times. To deal with this, SAT solvers frequently "restart" their search to avoid the runs that take disproportionately longer. What restarting here means is that the solver unsets all variables and starts the search using different variable assignment order.

While at first glance it might seem that restarts should be rare and become rarer as the solving has been going on for longer, so that the SAT solver can actually finish solving the problem, the trend has been towards more aggressive (frequent) restarts.

The reason why frequent restarts help solve problems faster is that while the solver does forget all current variable assignments, it does keep some information, specifically it keeps learnt clauses, effectively sampling the search space, and it keeps the last assigned truth value of each variable, assigning them the same value the next time they are picked to be assigned[19].

Let's quickly examine 4 different restart strategies.

  1. Fixed restarts

This one is simple, restart happens every n conflicts, and n does not change during the execution. This strategy is here only for completeness sake, as it has been abandoned long ago because of poor performance.

  1. Geometric restarts

This is another simple strategy, where the time between restarts increases geometrically. What this does in practice is to restart often at the start, sampling the search space, and then provide the solver enough uninterrupted time to finish the search for a solution.

  1. Luby restarts

In this strategy, the number of conflicts between 2 restarts is based on the Luby sequence. The Luby restart sequence is interesting in that it was proven to be optimal restart strategy for randomized search algorithms where the runs do not share information. While this is not true for SAT solving, Luby restarts have been quite successful anyway.

The exact description of Luby restarts is that the ith restart happens after \(\DeclareMathOperator{\Luby}{Luby} u \cdot \Luby(i)\) conflicts, where u is a constant and \(\DeclareMathOperator{\Luby}{Luby}\Luby(i)\) is defined as

\begin{align}
\DeclareMathOperator{\Luby}{Luby}
\Luby(i) =
\begin{cases}
2^{k-1} & \text{if } i = 2^{k} - 1 \\
\Luby(i - 2^{k -1} + 1) & \text{if } 2^{k-1} \leq i \lt 2^{k} - 1
\end{cases}
\end{align}

A less exact but more intuitive description of the Luby sequence is that all numbers in it are powers of two, and after a number is seen for the second time, the next number is twice as big. The following are the first 16 numbers in the sequence:

\[
(1, 1, 2, 1, 1, 2, 4, 1, 1, 2, 1, 1, 2, 4, 8, 1, \ldots)
\]

From the above, we can see that this restart strategy tends towards frequent restarts, but some runs are kept running for much longer, and there is no upper limit on the longest possible time between two restarts.

  1. Glucose restarts

Glucose restarts were popularized by the Glucose solver, and it is an extremely aggressive, dynamic restart strategy. The idea behind it is that instead of waiting for a fixed amount of conflicts, we restart when the last couple of learnt clauses are, on average, bad.

A bit more precisely, if there were at least X conflicts (and thus X learnt clauses) since the last restart, and the average LBD of the last X learnt clauses was at least K times higher than the average LBD of all learnt clauses, it is time for another restart. Parameters X and K can be tweaked to achieve different restart frequency, and they are usually kept quite small, e.g. Glucose 2.1 uses \(X = 50\) and \(K = 1.25\)[20].

So what restart strategy is the best? There only correct answer is neither because while glucose restarts have been very successful in SAT competitions, they are heavily optimized towards the handling of industrial (real world problems encoded as SAT) unsatisfiable instances at the expense of being able to find solutions to problems that are actually satisfiable. In a similar vein, the Luby restarts heavily favor finding solutions to satisfiable industrial instances, at the expense of finding solutions to problems that are unsatisfiable[21].

In practice, the current state of the art sat solvers use various hybrids of these techniques, such as switching between periods with glucose restarts and Luby restarts, where the lengths of the periods increase geometrically, or switching between glucose restarts and running without any restarts, and so on. There have also been some experiments with using machine learning to learn a restart strategy.

Preprocessing and Inprocessing

The last (but not least) trick I want to cover is preprocessing, and inprocessing of the input SAT instance. The motivation for preprocessing is quite simple: the provided encoding of the problem is often less than optimal. No matter the reasons for this, the end result is the same, modern state of the art SAT solvers use various preprocessing and inprocessing techniques.

The difference between preprocessing and inprocessing is straightforward. Preprocessing happens once, before the actual solving starts. Inprocessing occurs more than once because it is interleaved with the actual solving. While it is harder to implement inprocessing than preprocessing, using inprocessing carries 2 advantages:

  • The solver does not have to pay the full processing cost at the start if the problem is easy
  • Learnt clauses can be processed as well

There are too many processing techniques to show them all, so in the interest of keeping this already long post at least somewhat palatable, I will show only two. Specifically, I want to explain self-subsumption (or self-subsuming resolution) and (bounded) variable elimination (BVE), but to explain them, I first have to explain resolution and subsumption.

Let's start with subsumption. Given 2 clauses, A and B, A subsumes B, \(A \subseteq B\), iff every literal from A is also present in B. What this means practically is that A is more restrictive in regards to satisfiability than B, and thus B can be thrown away.

Resolution is an inference rule that, given a set of existing clauses, allows us to create new clauses that do not change the satisfiability of the whole set of clauses because it is satisfied when its precursors are also satisfied. This is done by taking a pair of clauses that contain complementary literals, removing these complementary literals and splicing the rest of the clauses together. Complementary literals are literals where one of them is a negation of the other, e.g. \(x_{1}\) and \(\neg x_{1}\) are complimentary, while \(x_{1}\) and \(\neg x_{2}\) or \(x_{1}\) and \(x_{1}\) are not, because in the first pair the variables do not match and in the second pair, both literals have the same polarity.

This sounds complex, but it really is not. Here is a simple example, where the two clauses above the line are originals, and the clause below the line is the result of resolving them together:

\[
\frac{x_1 \vee \neg x_2, \neg x_1 \vee x_3}{\neg x_2 \vee x_3}
\]

A good way of thinking about how resolution works (and why it is correct) is to think through both of the possible assignments of variable \(x_1\). First, let us consider the case of \(x_1 = true\). In this case, the first original clause is satisfied, and the only way to satisfy the second clause is to assign \(x_3 = true\). This assignment means that the resolvent clause is also satisfied. The second option is to assign \(x_1 = false\). This satisfies the second clause, and to satisfy the first one as well, we need to assign \(x_2 = false\). This assignment also means that the resolvent clause is satisfied.

With this knowledge in hand, we can look at self-subsumption. Given 2 clauses, A and B, and their resolvent R, A is self-subsumed by B iff \( R \subseteq A \) (A is subsumed by R). This means that we can replace A with R, in effect shortening A by one literal.

As an example, take \((x_1 \vee x_2 \vee \neg x_3)\) as clause A and \((\neg x_1 \vee \neg x_3 )\) as clause B. The resolvent of these two clauses is \((x_2 \vee \neg x_3)\), which subsumes A. This means that A is self-subsumed by B.

(Bounded) variable elimination (BVE) is also simple. If we want to remove a specific variable x from a set of clauses, all we have to do is split all clauses containing that particular variable into two groups, one with all clauses where the variable's literal has positive polarity, and one with all clauses where the variable's literal has negative polarity. If we then resolve each clause from the first group with each clause from the second group, we get a (potentially large) set of resolvents without x. If we then replace the original clauses with the resolvents, we removed x from the original set of clauses, without changing the satisfiability of the set as a whole.

Unlike self-subsumption, which will always simplify the SAT instance, variable elimination might make it harder. The reason is that it trades a variable for clauses, which might be beneficial, but does not have to be. This leads to the idea of bounded variable elimination, where a variable is only eliminated if the resulting number of clauses is bounded in some way, e.g. in the total number of added clauses[22], or the size of resulting clauses.


That's it for part 3, but not for this series, because I still have at least two more posts planned, one of which will again be theoretical.



  1. Simple explanation of the MaxSAT problem is that you have to find how many clauses in an unsatisfiable SAT problem can be satisfied. ↩︎

  2. Determinizing a local-search algorithm has proven that the upper-bound on algorithmic complexity of solving a generic CNF-SAT with n variables and m clauses is \[\mathcal{O}\left(2^{n\left(1 - \frac{1}{\ln{\frac{m}{n}} + \mathcal{O}\ln{\ln{m}}}\right)}\right)\] You can improve this significantly if you limit yourself to 3-SAT (SAT where every clause consists of exactly 3 literals), to just \(\mathcal{O}(1.473^{n})\). ↩︎

  3. At least conceptually. Implementing DPLL/CDCL in a performant manner can be quite complex. ↩︎

  4. All literals of such variable become positive if any of them becomes positive. Therefore, all clauses containing that variable can be trivially satisfied and thus removed. ↩︎

  5. In other words, we have reached the state of the art in the 1960s. ↩︎

  6. This part is very hand-wavy because the exact details of conflict analysis are complex and beyond the scope of this post. ↩︎

  7. The largest problem we've had a SAT solver successfully solve contained ~300M variables and almost 1 billion clauses. Using MiniSat and our know-how, it took 15 minutes to solve it. ↩︎

  8. The (z)Chaff solver was a significant upgrade on the capabilities and performance of SAT solvers, and is often considered as the first CDCL SAT solver with "modern" performance. It also pioneered at least 2 techniques that are still used with only minor modifications, (2-)Watched Literals and VSIDS heuristic. ↩︎

  9. This is because a single cache-line can hold these summaries for 8 clauses, as opposed to \( \leq 2 \) clauses. ↩︎

  10. The specific improvement depends on the problem, but for most problems, CDCL can deal with an order (or two) of magnitude more variables than plain DPLL. ↩︎

  11. Given the underlying learning mechanisms, LBD score of 2 is the lowest, and thus the best, that can be achieved when ranking a learnt clause. ↩︎

  12. If the conflict resolution and clause learning are done in a specific way, the learnt clause can be guaranteed to always have only one variable from the lower decision level. ↩︎

  13. As long as we assume that the random number generation is independent for each clause. ↩︎

  14. Said Jabbour, Jerry Lonlac, Lakhdar Sais and Yakoub Salhi, Revisiting the Learned Clauses Database Reduction Strategies, CoRR, 2014 ↩︎

  15. In fact, variants of the VSIDS heuristic function might still be the dominant heuristic. Recently there has been a paper on using the current understanding of SAT problem structure to design a new heuristic function, and the result was very promising, as the new heuristic dominated VSIDS when implemented in MiniSat. However, I have not kept up with it so it might not have panned out. ↩︎

  16. In practice, VSIDS updates are not as expensive as they would appear, because we can get the same results if we, instead of decaying the counters of all variables, keep multiplying \(c_{add}\) by \(1 \over{c_{decay}}\). In a sufficiently large problem, this approach will eventually lead to an overflow, but that can be also solved by periodically renormalizing the counters and \(c_{add}\). ↩︎

  17. There is a reason why naming things is one of the three "two hard things in CS". ↩︎

  18. The exact definition of learning rate (LR) of a variable is intentionally left ambiguous. The very rough answer is that it is the ratio of learnt clauses that contained a given variable to the total number of clauses learnt when that variable was set. ↩︎

  19. General version of this idea is called phase saving and is surprisingly powerful. The motivation behind it is that if a variable was given specific assignment via BCP, then there is likely a good reason for that assignment and thus it should be reused. ↩︎

  20. Strictly speaking, this is not actually true. Glucose 2.1 uses \(K = 0.8\) because it multiplies the average LBD of the last X clauses by K. I think multiplying the average LBD of all clauses is more intuitive and \(K = 1.25\) then has the same effect. ↩︎

  21. This is also the reason why we currently use MiniSat in our product. The absolute majority of our problems are satisfiable, and Glucose is badly outperformed by the venerable MiniSat. We have also tested CryptoMiniSat, but at the time it contained a bug that made some variable insertion patterns quadratic. I helped upstream to fix it, but we have not had the time to measure its performance again. ↩︎

  22. The 2005 SatElite SAT preprocessor only applies variable elimination if it does not add any clauses. This is less restrictive than it sounds because the resolvents are often tautological, that means they contain literals of the same variable in both polarities, and tautological clauses are always satisfied. ↩︎