*Without talking about Gustafson's law and the assumptions both make.*

Let me preface this by saying there is nothing wrong with Amdahl's law^{[1]} *itself*. After all, it is just an observation of the result of applying simple math.

But I believe there is a lot wrong with its current use *in pedagogy* around parallel programming^{[2]}.

Amdahl's law is a formula to calculate the *speedup* of a task by using more computing resources, e.g. multiple threads (assuming perfect parallelization). Formally written

\[

S(n) = \frac{1}{(1-p) + \frac{p}{n}}

\]

where

- \(n\) is the number of threads you have
- \(p\) is the fraction of runtime taken up by parallelizable code

There are two important things to notice about the formula.

- The result is the improvement in
**latency**of the task, that is, how much less real time you have to wait from start to finish. - For large \(n\), the result simplifies to just \(\frac{1}{(1 - p)}\).

Informally we can also generalize it a bit:

Optimizing a single part of code cannot provide overall improvements larger than the fraction of time initially taken by the optimized part.

We expect experienced programmers to understand the underlying principle naturally; the colder a section of code is, the less payoff there is in optimizing it. This is because if a section of code takes up 1% of the overall runtime, you will, at most, knock off 1% of the overall runtime by optimizing it.

My issue with Amdahl's law when talking about multithreading is simple. Amdahl's law is used to explain the limit of parallelizing code through a large number of threads. If the parallelizable code takes up 95% of the runtime before parallelization, even perfect parallelization across 1000s of cores won't decrease the runtime more than 20x (5% of the original time).

This limit is correct, but providing further context is necessary for the students to avoid leaving with the idea that parallelizing past few cores is pointless. The issue is that Amdahl's law makes an important assumption about the problem, and students who have just been exposed to a bunch of new complex ideas (concurrency, synchronization, etc.), are unlikely to realize what the assumption is. Can you figure out the assumption?

Amdahl's law assumes that there is always a fixed amount of work.

This assumption holds for a lot of important software.

Take `grep`

as a simple example. No matter how quickly you can search for a pattern in a single^{[3]} file, you will always have to wait the time it takes to transfer the bytes from the drive^{[4]} to the memory so you can read and match them.

Filters in Photoshop are another example. Most of the time, they will be applied to a single image, sometimes to a well-defined batch of images. No matter how fast/parallelized the computational part of the filter becomes, there will always be non-parallelizable steps, like dealing with IO.

Gustafson's law talks about the *slowdown* from forcing parallel code to be run on a serial computer.

\[

S(n) = (1 - p) + p * n

\]

as with Amdahl's law,

- \(n\) is the number of threads you have
- \(p\) is the fraction of runtime taken up by parallelizable code

Notice that there are no diminishing returns on the parallelism level applied to the task. Where does this difference come from? The difference in the basic assumptions of the shape of the work. Can you figure out the assumption underlying Gustafson's law?

Gustafson's law assumes a fixed amount of time to do the work.

This assumption also holds for a lot of important software.

For example, models for weather forecasting are massively parallel but must finish running within a set amount of time^{[5]}. So when we get more parallel resources, we can run more granular^{[6]} models or larger model cohorts, improving our ability to accurately forecast short-term weather.

Various other physics simulations have the same property, where we can saturate an arbitrary number of cores^{[7]} to achieve better and more precise results, but we need to get the results within some predetermined amount of time.

But the tasks do not have to be inherently parallel for the software to obey Gustafson's law. The work that uses up the extra resources can be orthogonal from the work that is always done; e.g. old word processor running on an old machine might not even have available resources to run simple spell check, while a new one can do online machine learning on your writing style to tell you when your voice has changed.

Both Amdahl's and Gustafson's laws are correct while holding very different positions about the efficiency of parallelism. This difference comes from their contradictory assumptions about the properties of the problem our code is solving, and thus at most, one of them applies to our code.

Somehow it became common to show students just Amdahl's law as part of multithreading courses without explaining the underlying assumption it makes. But showing your students just Amdahl's law^{[8]} without explaining it is doing them a huge disservice, so, please:

**stop talking about Amdahl's law without also explaining Gustafson's law**

If you do not have the time to do both, skip showing either.

*Quick terminology note; Keeping the problem size constant while increasing the number of cores is called strong scaling. Increasing the number of cores and the size of the problem is called weak scaling.*

The fact that it is called "law" annoys me a bit. There is nothing factually wrong with Amdahl's law though

^{[9]}. ↩︎Amdahl's law is perfectly applicable to single-threaded programming, but I have never seen it brought up in that context. I find this interesting because we teach people about its implications, but we never name the rule as such. ↩︎

Arguably, as the implementation of your grep-like utility becomes faster, the users will use it to search through more files. This is because a common usage pattern is to look for the file with a specific pattern, and once users realize that they can get the results in a reasonable time, even if they are lazy with prefiltering the files, they will get lazy. ↩︎

Assuming cold FS cache. ↩︎

If the weather model does not finish in time, we won't get the results (prediction) before the weather is actually happening. And once the weather is happening, well, we can just observe it for much more accurate data. ↩︎

Higher granularity means that the "cells" used to simplify the simulations become smaller, which in turn means that there will be more of them, and thus there is more work to do. ↩︎

To quote a friend working on physics simulations in a field I won't pretend to understand: "If it got me 10x cores, I would turn Hindu". ↩︎

Obviously, this is also true for showing just Gustafson's law. But I have never met someone who said somebody showed them just Gustafon's law without further explanation. I have only seen this happen with Amdahl's law. ↩︎

*Technically*, second-order effects can violate Amdahl's law in some specific cases^{[10]}, but that requires code explicitly crafted to do that. Let's assume we are talking about plain old boring programs instead. ↩︎The simplest way to violate Amdahl's law is through CPU cache effects on microbenchmark results. If an optimization of one part reduces how much i/d cache it uses up, that can provide speed up to the non-optimized parts due to not evicting them from the CPU cache. ↩︎

In the previous two parts (1, 2) we used a SAT solver as a black box that we feed input into, and it will (usually quickly) spit out an answer. In this part, we will look at how SAT solvers work and what heuristics and other tricks they employ in their quest for performance.

Modern SAT solvers fall into one of two groups: local search based solvers and

*Conflict Driven Clause Learning* (CDCL) based solvers. This post will concern itself with the latter for two simple reasons, one is that most of my experience is with CDCL solver, the second is that local-search based solvers are rarely used in practice.

There are two main reasons for local search based SAT solvers dearth of practical usage:

- They are often not
*complete*(they might not find a solution even if it exists) - They are usually slower than the deterministic CDCL solvers

They do however have their uses, e.g. when solving *MaxSAT*^{[1]} problem, and have some interesting theoretical properties^{[2]}.

The CDCL solvers are an evolution of the *Davis-Putnam-Logemann-Loveland* (DPLL) algorithm, which itself is a reasonably simple^{[3]} improvement over the naive backtracking algorithm. CDCL is both *complete* (will answer "SAT" if a solution exists) and *sound* (it will not answer "SAT" for an unsatisfiable formula).

I think that the best way to explain how CDCL works is to start with a naive backtracking algorithm and then show how the DPLL and CDCL algorithms improve upon it.

A (very) naive backtracking algorithm could work as follows:

- Pick a variable without an assigned truth value. If there are none, return SAT.
- Assign it a truth-value (true/false).
- Check if all clauses in our formula are still potentially satisfiable.
- If they are, go to 1.
- If they are not satisfiable, go to 2 and pick the other truth-value.
- If they are not satisfiable, and both truth-values have been tried, backtrack.
- If there is nowhere to backtrack, return UNSAT.

This algorithm is obviously both *complete* and *sound*. It is also very inefficient, so let's start improving it.

The first improvement we will make is to speed up the check for unsatisfiable clauses in step 3, but we need to introduce two new concepts to do so, *positive literal* and *negative literal*. A literal is *positive* if it evaluates to true given its variable truth value and *negative* otherwise. As an example, $\neg x$ is *positive* literal when variable $x$ is set to false, and *negative* literal when variable $x$ is set to true.

The trick we will use to speed up the check for unsatisfiable clauses is to update instead the state of our clauses based on variable assignment. This means that after step 2 we will take all clauses that contain a literal of the variable selected in step 1, and update them accordingly. If they contain a positive literal, they are satisfied, and we can remove them from further consideration completely. If they contain a negative literal, they cannot be satisfied using this variable, and we can remove the literal from them.

If removing the negative literals creates an empty clause, then the clause is unsatisfiable under the current assignment, and we need to backtrack.

The improved backtracking algorithm can thus be described as:

- Pick a variable without an assigned truth value. If there are none, return SAT.
- Assign it a truth-value (true/false).
- Remove all clauses with positive literals of the variable assignment.
- Remove all negative literals of the variable assignment.
- Check if an empty clause was created.
- If it was, try the other truth-value or backtrack.
- If it was not, go to 1.

Given the implementation above, it can be seen that if step 4 creates a clause consisting of a single literal (called *unit clause*), we are provided with extra information. Specifically, it provides us with an assignment for the variable of the literal inside the unit clause, because the only way to satisfy a unit clause is to make the literal inside positive. We can then also apply steps 3 and 4 for this forced assignment, potentially creating new unit clauses in the process. This is called *unit propagation*.

Another insight we could have is that if at any point, all literals of a variable have the same polarity, that is, they are either all negated or not, we can effectively remove that variable and all clauses that contain a literal of that variable^{[4]}. This is called *pure literal elimination*.

By adding these two tricks to our backtracking solver, we have reimplemented a DPLL solver^{[5]}:

- Pick a variable without an assigned truth value. If there are none, return SAT.
- Assign it a truth-value (true/false).
- Remove all clauses with positive literals of the variable assignment.
- Remove all negative literals of the variable assignment.
- Keep performing unit propagation and pure literal elimination while possible.
- Check if an empty clause was created.
- If it was, try the other truth-value or backtrack.
- If it was not, go to 1.

Obviously, the order in which variables are picked in step 1 and which truth-values are attempted first in step 2, has a significant impact on solver's runtime, and we will get to heuristics for these later.

The difference between a DPLL solver and a CDCL solver is the introduction of something called *non-chronological backtracking* or *backjumping*. The idea behind it is that often, a conflict (an empty clause is created) is caused by a variable assignment that happened much sooner than it was detected, and if we could somehow identify when the conflict was caused, we could backtrack several steps at once, without running into the same conflict multiple times.

The implementation of backjumping analyzes the current conflict via something called *conflict clause*, finds out the earliest variable assignment involved in the conflict and then jumps back to that assignment^{[6]}. The conflict clause is also added to the problem, to avoid revisiting the parts of the search space that were involved in the conflict.

If you want more details about how a CDCL SAT solver works, I recommend looking at the Chaff and the MiniSat solvers. Chaff is often seen as the first SAT solver performant enough to be of practical interest, while MiniSat was written in 2003 to show that implementing state of the art SAT solver can be quite easy, and its later versions are still used as the basis for some current solvers. Specifically, you can look at the paper on Chaff's construction, or at the nitty-gritty of MiniSat's implementation. MiniSat has a very liberal open source licence, and we provide a somewhat cleaned-up version in a GitHub repo.

As a reminder, the Sudoku solver we built in the first post creates SAT instances with 729 variables and ~12k clauses. MiniSat then needs ~1.5 ms to solve them. Similarly, my employer's translation of master-key systems often creates problems with 100k-1M^{[7]} variables and an order of magnitude more clauses. These large instances are then solved within a couple of minutes.

In this section, we will look at the specific tricks used by the CDCL SAT solvers to achieve this excellent performance.

Good data structures are the backbone of every performant program and SAT solvers are no exceptions. Some of the data structures are generic, and well-known outside solvers, such as custom memory managers that batch allocations and keep data laid out in a cache-friendly manner, other are pretty much specific to CDCL SAT solvers, such as the *(2) watched literals* scheme.

I will skip over the tricks played with clause representation to ensure it is cache friendly because I want to make this post primarily about SAT specific tricks, and not generic tricks of the trade. This leaves us with the *2 watched literals* trick.

Let's backtrack a bit, and return to the first algorithm we wrote down for solving SAT. To improve upon it, we proposed a step where we update and evaluate clauses based on the currently assigned variable, so that satisfied clauses are removed, while unsatisfied clauses are shortened. This step is called *BCP* (binary constraint propagation).

The naive implementation is simple, you can create a mapping between a variable and each clause that contains its literal when you are loading the problem, and then just iterate through all clauses relevant to a variable, either marking them as solved or shortening them. Backtracking is also surprisingly simple because when you unset a variable, you can restore the related clauses.

However, the naive implementation is also very inefficient. The only time when we can propagate a clause is when it is unsatisfied and is down to a single *unassigned* literal, in which case we can use the unassigned literal for unit propagation. Visiting clauses that are either already satisfied, or are not yet down to single unassigned literal is thus a waste of time. This poses a question, how do we keep track of clause status, without explicitly updating them on every variable assignment change?

Enter the 2 watched literals algorithm/data structure/trick, pioneered by the Chaff solver^{[8]}. The basic idea is that 2 literals from each clause are selected (watched), and the clause is only visited when one of them would be removed from the clause (in other words, its variable takes the opposite polarity). When a clause is visited, one of these four things happens

- All but one literal evaluate to false.
*This last literal is then unit propagated further.* - All literals evaluate to false.
*This particular assignment is UNSAT, and the solver must backtrack.* - At least one literal evaluates to true.
*Nothing to do.* - At least 2 literals are not assigned, and the clause is not satisfied.
*Remove this clause from the watchlist that brought us here, add it to a watchlist of different literal.*

This trick ensures that we only visit clauses with the *potential* to become unit-clauses, speeding up BCP significantly. It is not without its disadvantages though, using these lazy checks means that we cannot easily answer queries like "how many clauses currently have 3 unassigned literals" because the only thing we know about a clause is that it is either satisfied, or it still has at least 2 unassigned literals. Implementation of backtracking is also a bit trickier than using the naive implementation of BCP updates, but not overly so.

Note that we do not restore the original watches when backtracking, we keep the replaced ones. The invariant provided by the watches still holds, and there is no reason to do the extra work.

Over time, two more practical optimizations emerged:

- Store literals to propagate directly in watch for binary clauses

Binary clauses consist of precisely two literals, and we use 2 watches per clause. In other words, once one of the watches is triggered, it will force unit-propagation to happen to the other literal. By specializing path for binary clauses, we can save time it would take to bring the clause from memory and determine that there is only one literal left, and instead, we can start propagating the assignment directly.

- Copy the watched literals into a separate location

This is another optimization based around decreasing cache pressure when working with watches. As it turns out when a clause is examined because of a watch, the most common result of the visit is option 3, that is, the clause is satisfied, and there is nothing to do. Furthermore, the most common reason for the clause being satisfied is the *other* watched literal.

Copying the watched literals of each clause into a separate location allows us to take advantage of this fact because we can check this case *without* reading the whole clause from memory, thus alleviating the cache pressure a bit^{[9]}.

In the introduction, I said that the difference between the DPLL and CDCL algorithms is that the latter learns new clauses during its search for a solution. This learning improves the scalability of CDCL significantly^{[10]}, but it also carries a potential for a significant slowdown, because each learnt clause takes up valuable memory and increases the time needed for BCP. Given that the upper bound on the number of learnable clauses is $2^{|Vars|}$, storing *all* of the learnt clauses obviously does not work, and we need to have a strategy for pruning them.

Let's start with a very naive strategy, *first in, first out* (FIFO). In this strategy, we decide on an upper limit of learnt clauses, and when adding a newly learnt clause exceeds this limit, the oldest learnt clause is deleted. This strategy avoids the problem with the ballooning number of learnt clauses, but at the cost of discarding potentially useful clauses. In fact, we are guaranteed to discard useful clauses because every learnt clause has a deterministic lifetime.

Let's consider a different naive strategy, *random removal*. In this strategy, we again decide on an upper limit of learnt clauses, but this time the clause to remove is picked completely randomly. This has the advantage that while we *might* remove a useful clause, we are not *guaranteed* that we remove useful clauses. While this distinction might seem minor, the random pruning strategy usually outperforms the FIFO one.

It is evident that a strategy that just keeps *n* best learnt clauses dominates both of these. The problem with this idea is that we need a way to score clauses on their usefulness, and doing so accurately might be even harder than solving the SAT instance in the first place. This means that we need to find a good (quickly computable and accurate) heuristic that can score a clause's usefulness.

The number of possible heuristics is virtually unlimited, especially if you count various hybrids and small tweaks, but in this post, we will look only at 3 of them. They are:

- Clause activity

This heuristic is used by the MiniSat solver. A clause's activity is based on how recently it was used during conflict resolution, and clauses with low activity are removed from the learnt clause database. The idea behind this is that if a clause was involved in conflict resolution, it has helped us find a conflict quicker and thus let us skip over part of the search space. Conversely, if a clause has not been used for a while, then the slowdown and memory pressure it introduces is probably not worth keeping it around.

*Literal Block Distance*(LBD)

This heuristic was introduced in a 2009 paper and subsequently implemented in the Glucose solver. This heuristic assumes that we have a mapping between variables currently assigned a truth value and the *decision level* (recursion level) at which they were assigned that value. Given clause $C$, $LBD(C)$ is then calculated by taking the decision levels from variables of all literals in that clause, and counting how many different decision levels were in this set.

The less there are, the better, and clauses for which $LBD(C) = 2$ are called *glue clauses*^{[11]}. The idea is that they *glue together* variables from the higher (later) decision level (later in the search tree) to a variable^{[12]} from a lower (earlier) decision level, and the solver can then use this clause to set these variables earlier after backtracking. Solvers that use the LBD heuristic for learnt clause management almost always keep *all* of the glue clauses and for removal only consider clauses where $LBD(C) \geq 3$.

- Clause size

The third heuristic we will look at is extremely simple, it is just the clause's size, $|C|$, with a lower score being better. To understand the reason why shorter clauses are considered better, consider a unit clause $\neg x_3$. Adding this clause to a problem forces assignment $x_3 := false$, effectively removing about half of the possible search space. The story is similar for binary clauses, e.g. $(x_3 \vee x_5)$ cuts out about $1 \over 4$ of the possible variable assignments, because it forbids assignment $x_3 := false \wedge x_5 := false$. More generally, if we do not consider overlaps, an *n*-ary clause forbids $1 \over 2^{n}$ possible variable assignments.

Using clause size metric for learnt clause management is then done by picking a threshold *k* and splitting learnt clauses into two groups, those where $|C| \leq k$ and those where $|C| \gt k$. Pruning the learnt clauses then only considers the latter group for removal, where the longer clauses are deleted first. It should also incorporate a bit of randomness, to give a chance to *not* delete the useful, but long, clause in lieu of the useless, but short(er), clause. The final rating of a clause is then $|C| + random()$.

Let's compare these 3 heuristics across 3 criteria:

- How much is the clause's rating dependent on the path the solver took to learn this clause, or, how
*dynamic*is the heuristic - What does it base its claims of predictive strength on
- Real-world performance

Here is a quick overview:

Clause activity | LBD | Clause size | |
---|---|---|---|

Dynamicity | High | Some | None^{[13]} |

Prediction basis | Clauses's recent performance | How many decision layers are involved in the clause | Size of the cut the clause makes in the decision tree |

Performance in the real world | Used in MiniSat to good effect | Used in Glucose to good effect | MiniSat with randomized clause size as the management supposedly outperforms Glucose^{[14]} |

There are various reasons why it is hard to compare different strategies for learnt clause management objectively. For starters, they are often implemented in entirely different solvers so they cannot be compared directly, and even if you vivify them and port these different strategies to the same solver, the results do not have to generalize. The different solvers might use different learning algorithms, different variable-selection heuristics (see below), different restart strategy and so on, and all of these design consideration must be optimized to work together.

Another reason why generalization is hard is that different heuristics might perform differently on different kinds of instances, and the average user cares about *their* kind of instances a lot more than some idealized average. After all, my employer uses SAT in our core product, and if we could get 10% more performance for "our kind" of instances at the cost of a 10x slowdown on the other kinds, we would take it in a heartbeat.

So, instead of trying to compare these heuristics objectively, I will leave you with some food for your thoughts:

- Glucose is seen as better performing than MiniSat, but a lot of it is its better performance on unsolvable instances, and there are more differences than just the learnt clause management
- More dynamic heuristics likely need more CPU and RAM for bookkeeping
- More static heuristics have to evaluate clauses with less instance-specific context
- As is often disclaimed, "past performance is no guarantee of future results."

As was already mentioned, the solver's performance on a specific problem strongly depends on the order in which it assigns values to variables. In other words, a quickly-computable heuristic approximating "good" order is an essential part of each CDCL solver. The first strong heuristic, *VSIDS* (Variable State Independent Decaying Sum), has also been introduced by the Chaff solver, and with minor tweaks, has remained the strongest heuristic for many years^{[15]}.

Before we look at the heuristics, how they work and what facts about the SAT structure they exploit, it should be noted that they are usually employed in tandem with purely random selection, to balance between the needs to *exploit* and to *explore* the search space.

VSIDS works by assigning each variable a score and then picking the variable with the highest score. If there are multiple options with the same score, then the tie has to be broken somehow, but the specifics don't matter too much.

The scores are determined using a simple algorithm:

- Start with all counters initialized to 0.
- On conflict, increase the counter of all variables involved in the conflict by $c_{add}$.
- Every
*j*conflicts, decrease the counter of*all*variables by multiplying it with coefficient $c_{decay}$.

The values for *j*, $c_{add}$, and $c_{decay}$ are picked via empirical testing, and for any reasonable implementation of VSIDS, it must always hold that $0 < c_{decay} < 1$.

The original VSIDS implementation in the Chaff solver used to only increase counter of literals in the learnt clause, rather than of all involved literals, and it also decreased the counters significantly, but rarely ($c_{decay} = 0.5$, $j = 1000$). More modern implementations update more literals and decay the counters less, but more often (e.g. $c_{decay} = 0.95$, $j = 1$). This increases the cost of computing the VSIDS but makes the heuristic more responsive to changes in the current search space^{[16]}.

Over time, various different modifications of VSIDS have emerged, and I want to showcase at least one of them. The paper that introduced this modification called it *adaptVSIDS*^{[17]}, short for adaptative VSIDS. The idea behind it is to dynamically change the value of $c_{decay}$ depending on the quality of the learnt clauses, so that when the learnt clauses are of high quality, the solver stays in the same area of the search space for longer, and if the learnt clauses are of poor quality, it will move out of this area of the search space quicker. Specifically, it will increase $c_{decay}$ when the learnt clauses are good, and decrease it when the learnt clauses are bad, as measured by a clause-quality metric such as LBD mentioned above.

This is a relatively new family of heuristics (~2016 onwards), with a simple motivation: the big differences between the old DPLL algorithm and the modern CDCL one is that the latter learns about the structure of the problem it is solving. Thus, optimizing variable selection towards learning more is likely to perform better in the long run.

However, while the idea is simple, implementation is much less so. Computing learning rate based heuristic boils down to solving an online reinforcement learning problem, specifically, it is the Multi-armed bandit (MAB) problem. Our MAB is also non-stationary, that is, the underlying reward (learning rate) distribution changes during play (solving the problem), which further complicates finding the solution.

In the end, the algorithm applied is in many ways similar to VSIDS, in that a variant of *exponential moving average* (EMA), is applied to each variable and the one with the best score is selected at each step for branching. The important difference is that while VSIDS bumps each variable involved in a conflict by a fixed amount, the LRB heuristic assigns each variable a different payoff based on the amount of learning it has led to^{[18]}.

As mentioned in the first post, solving NP-complete problems (such as SAT) naturally leads to heavy-tailed run times. To deal with this, SAT solvers frequently "restart" their search to avoid the runs that take disproportionately longer. What restarting here means is that the solver unsets all variables and starts the search using different variable assignment order.

While at first glance it might seem that restarts should be rare and become rarer as the solving has been going on for longer, so that the SAT solver can actually finish solving the problem, the trend has been towards more aggressive (frequent) restarts.

The reason why frequent restarts help solve problems faster is that while the solver does forget all current variable assignments, it does keep some information, specifically it keeps learnt clauses, effectively sampling the search space, and it keeps the last assigned truth value of each variable, assigning them the same value the next time they are picked to be assigned^{[19]}.

Let's quickly examine 4 different restart strategies.

- Fixed restarts

This one is simple, restart happens every *n* conflicts, and *n* does not change during the execution. This strategy is here only for completeness sake, as it has been abandoned long ago because of poor performance.

- Geometric restarts

This is another simple strategy, where the time between restarts increases geometrically. What this does in practice is to restart often at the start, sampling the search space, and then provide the solver enough uninterrupted time to finish the search for a solution.

- Luby restarts

In this strategy, the number of conflicts between 2 restarts is based on the Luby sequence. The Luby restart sequence is interesting in that it was proven to be optimal restart strategy for randomized search algorithms where the runs *do not* share information. While this is not true for SAT solving, Luby restarts have been quite successful anyway.

The exact description of Luby restarts is that the *ith* restart happens after \(\DeclareMathOperator{\Luby}{Luby} u \cdot \Luby(i)\) conflicts, where *u* is a constant and \(\DeclareMathOperator{\Luby}{Luby}\Luby(i)\) is defined as

\begin{align}

\DeclareMathOperator{\Luby}{Luby}

\Luby(i) =

\begin{cases}

2^{k-1} & \text{if } i = 2^{k} - 1 \\

\Luby(i - 2^{k -1} + 1) & \text{if } 2^{k-1} \leq i \lt 2^{k} - 1

\end{cases}

\end{align}

A less exact but more intuitive description of the Luby sequence is that all numbers in it are powers of two, and after a number is seen for the second time, the next number is twice as big. The following are the first 16 numbers in the sequence:

\[

(1, 1, 2, 1, 1, 2, 4, 1, 1, 2, 1, 1, 2, 4, 8, 1, \ldots)

\]

From the above, we can see that this restart strategy tends towards frequent restarts, but some runs are kept running for much longer, and there is no upper limit on the longest possible time between two restarts.

- Glucose restarts

Glucose restarts were popularized by the Glucose solver, and it is an *extremely* aggressive, dynamic restart strategy. The idea behind it is that instead of waiting for a fixed amount of conflicts, we restart when the last couple of learnt clauses are, on average, bad.

A bit more precisely, if there were at least *X* conflicts (and thus *X* learnt clauses) since the last restart, and the average LBD of the last *X* learnt clauses was at least *K* times higher than the average LBD of *all* learnt clauses, it is time for another restart. Parameters *X* and *K* can be tweaked to achieve different restart frequency, and they are usually kept quite small, e.g. Glucose 2.1 uses \(X = 50\) and \(K = 1.25\)^{[20]}.

So what restart strategy is the best? There only correct answer is neither because while glucose restarts have been very successful in SAT competitions, they are heavily optimized towards the handling of industrial (real world problems encoded as SAT) unsatisfiable instances at the expense of being able to find solutions to problems that are actually satisfiable. In a similar vein, the Luby restarts heavily favor finding solutions to satisfiable industrial instances, at the expense of finding solutions to problems that are unsatisfiable^{[21]}.

In practice, the current state of the art sat solvers use various hybrids of these techniques, such as switching between periods with glucose restarts and Luby restarts, where the lengths of the periods increase geometrically, or switching between glucose restarts and running without any restarts, and so on. There have also been some experiments with using machine learning to learn a restart strategy.

The last (but not least) trick I want to cover is preprocessing, and inprocessing of the input SAT instance. The motivation for preprocessing is quite simple: the provided encoding of the problem is often less than optimal. No matter the reasons for this, the end result is the same, modern state of the art SAT solvers use various preprocessing and inprocessing techniques.

The difference between preprocessing and inprocessing is straightforward. Preprocessing happens once, before the actual solving starts. Inprocessing occurs more than once because it is interleaved with the actual solving. While it is harder to implement inprocessing than preprocessing, using inprocessing carries 2 advantages:

- The solver does not have to pay the full processing cost at the start if the problem is easy
- Learnt clauses can be processed as well

There are too many processing techniques to show them all, so in the interest of keeping this already long post at least somewhat palatable, I will show only two. Specifically, I want to explain *self-subsumption* (or *self-subsuming resolution*) and *(bounded) variable elimination* (BVE), but to explain them, I first have to explain *resolution* and *subsumption*.

Let's start with subsumption. Given 2 clauses, A and B, A *subsumes* B, \(A \subseteq B\), iff every literal from A is also present in B. What this means practically is that A is more restrictive in regards to satisfiability than B, and thus B can be thrown away.

*Resolution* is an inference rule that, given a set of existing clauses, allows us to create new clauses that do not change the satisfiability of the whole set of clauses because it is satisfied when its precursors are also satisfied. This is done by taking a pair of clauses that contain complementary literals, removing these complementary literals and splicing the rest of the clauses together. Complementary literals are literals where one of them is a negation of the other, e.g. \(x_{1}\) and \(\neg x_{1}\) are complimentary, while \(x_{1}\) and \(\neg x_{2}\) or \(x_{1}\) and \(x_{1}\) are not, because in the first pair the variables do not match and in the second pair, both literals have the same polarity.

This sounds complex, but it really is not. Here is a simple example, where the two clauses *above* the line are originals, and the clause below the line is the result of resolving them together:

\[

\frac{x_1 \vee \neg x_2, \neg x_1 \vee x_3}{\neg x_2 \vee x_3}

\]

A good way of thinking about how resolution works (and why it is correct) is to think through both of the possible assignments of variable \(x_1\). First, let us consider the case of \(x_1 = true\). In this case, the first original clause is satisfied, and the only way to satisfy the second clause is to assign \(x_3 = true\). This assignment means that the resolvent clause is also satisfied. The second option is to assign \(x_1 = false\). This satisfies the second clause, and to satisfy the first one as well, we need to assign \(x_2 = false\). This assignment also means that the resolvent clause is satisfied.

With this knowledge in hand, we can look at self-subsumption. Given 2 clauses, A and B, and their resolvent R, A is *self-subsumed* by B iff \( R \subseteq A \) (A is subsumed by R). This means that we can replace A with R, in effect shortening A by one literal.

As an example, take \((x_1 \vee x_2 \vee \neg x_3)\) as clause A and \((\neg x_1 \vee \neg x_3 )\) as clause B. The resolvent of these two clauses is \((x_2 \vee \neg x_3)\), which subsumes A. This means that A is self-subsumed by B.

*(Bounded) variable elimination* (BVE) is also simple. If we want to remove a specific variable *x* from a set of clauses, all we have to do is split *all* clauses containing that particular variable into two groups, one with all clauses where the variable's literal has positive polarity, and one with all clauses where the variable's literal has negative polarity. If we then resolve each clause from the first group with each clause from the second group, we get a (potentially large) set of resolvents without *x*. If we then replace the original clauses with the resolvents, we removed *x* from the original set of clauses, without changing the satisfiability of the set as a whole.

Unlike self-subsumption, which will always simplify the SAT instance, variable elimination might make it harder. The reason is that it trades a variable for clauses, which might be beneficial, but does not have to be. This leads to the idea of *bounded* variable elimination, where a variable is only eliminated if the resulting number of clauses is *bounded* in some way, e.g. in the total number of added clauses^{[22]}, or the size of resulting clauses.

*That's it for part 3, but not for this series, because I still have at least two more posts planned, one of which will again be theoretical.*

Simple explanation of the MaxSAT problem is that you have to find how many clauses in an unsatisfiable SAT problem

*can*be satisfied. ↩︎Determinizing a local-search algorithm has proven that the upper-bound on algorithmic complexity of solving a generic CNF-SAT with

*n*variables and*m*clauses is \[\mathcal{O}\left(2^{n\left(1 - \frac{1}{\ln{\frac{m}{n}} + \mathcal{O}\ln{\ln{m}}}\right)}\right)\] You can improve this significantly if you limit yourself to 3-SAT (SAT where every clause consists of exactly 3 literals), to just \(\mathcal{O}(1.473^{n})\). ↩︎At least conceptually. Implementing DPLL/CDCL in a performant manner can be quite complex. ↩︎

All literals of such variable become positive if any of them becomes positive. Therefore, all clauses containing that variable can be trivially satisfied and thus removed. ↩︎

In other words, we have reached the state of the art in the 1960s. ↩︎

This part is very hand-wavy because the exact details of conflict analysis are complex and beyond the scope of this post. ↩︎

The largest problem we've had a SAT solver successfully solve contained ~300M variables and almost 1 billion clauses. Using MiniSat and our know-how, it took 15 minutes to solve it. ↩︎

The (z)Chaff solver was a significant upgrade on the capabilities and performance of SAT solvers, and is often considered as the first CDCL SAT solver with "modern" performance. It also pioneered at least 2 techniques that are still used with only minor modifications, (2-)Watched Literals and VSIDS heuristic. ↩︎

This is because a single cache-line can hold these summaries for 8 clauses, as opposed to \( \leq 2 \) clauses. ↩︎

The specific improvement depends on the problem, but for most problems, CDCL can deal with an order (or two) of magnitude more variables than plain DPLL. ↩︎

Given the underlying learning mechanisms, LBD score of 2 is the lowest, and thus the best, that can be achieved when ranking a learnt clause. ↩︎

If the conflict resolution and clause learning are done in a specific way, the learnt clause can be guaranteed to always have only one variable from the lower decision level. ↩︎

As long as we assume that the random number generation is independent for each clause. ↩︎

Said Jabbour, Jerry Lonlac, Lakhdar Sais and Yakoub Salhi, Revisiting the Learned Clauses Database Reduction Strategies, CoRR, 2014 ↩︎

In fact, variants of the VSIDS heuristic function might still be the dominant heuristic. Recently there has been a paper on using the current understanding of SAT problem structure to design a new heuristic function, and the result was very promising, as the new heuristic dominated VSIDS when implemented in MiniSat. However, I have not kept up with it so it might not have panned out. ↩︎

In practice, VSIDS updates are not as expensive as they would appear, because we can get the same results if we, instead of decaying the counters of all variables, keep multiplying \(c_{add}\) by \(1 \over{c_{decay}}\). In a sufficiently large problem, this approach will eventually lead to an overflow, but that can be also solved by periodically renormalizing the counters and \(c_{add}\). ↩︎

There is a reason why naming things is one of the three "two hard things in CS". ↩︎

The exact definition of learning rate (LR) of a variable is intentionally left ambiguous. The very rough answer is that it is the ratio of learnt clauses that contained a given variable to the total number of clauses learnt when that variable was set. ↩︎

General version of this idea is called

*phase saving*and is surprisingly powerful. The motivation behind it is that if a variable was given specific assignment via BCP, then there is likely a good reason for that assignment and thus it should be reused. ↩︎Strictly speaking, this is not actually true. Glucose 2.1 uses \(K = 0.8\) because it multiplies the average LBD of the

*last X*clauses by K. I think multiplying the average LBD of*all*clauses is more intuitive and \(K = 1.25\) then has the same effect. ↩︎This is also the reason why we currently use MiniSat in our product. The absolute majority of our problems are satisfiable, and Glucose is badly outperformed by the venerable MiniSat. We have also tested CryptoMiniSat, but at the time it contained a bug that made some variable insertion patterns quadratic. I helped upstream to fix it, but we have not had the time to measure its performance again. ↩︎

The 2005 SatElite SAT preprocessor only applies variable elimination if it does not add any clauses. This is less restrictive than it sounds because the resolvents are often tautological, that means they contain literals of the same variable in both polarities, and tautological clauses are always satisfied. ↩︎

The previous post in this series was a quick introduction to the world of SAT and SAT solvers, including a simple example of how we can take a real-world problem and use SAT solver to solve it. In this post, we will use SAT to solve a harder real-world problem, namely lock-chart solving, sometimes also known as *master-key system* (MKS) solving and explore some of the more advanced techniques used to efficiently convert problems to SAT.

Before reading further, note that this post will go only over the basics of solving master-key systems and the approach will be to create a simple solver, rather than a production-ready one. If you are interested in all of the gory details of solving master-key systems in the real world, you should also look at:

- Radomír Černoch's disseration that provides a theoretical framework for talking about different kinds of lock-charts and master-key system solvers
- My own master's thesis that goes over the nitty-gritty details of the production-ready master-key system solver developed by our research group. The solver described within is currently being used by an actual manufacturer of master-key systems.
- An open source master-key system solving research test bed that we develop to help other people test their own approaches and improvements for solving master-key systems.

Master-key system is a set of keys and locks where a key can open more than one lock (and thus a lock can be opened by more than one key). They are often found in business buildings, where the typical employee should have limited access, e.g. to the floor, the kitchen and his own office, but some employees (e.g. maintenance staff) need to have full access to most rooms in a floor (or building).

Before we start looking to solve a master-key system, we should talk about how plain old mechanical locks work, and how a master-key system is specified.

The idea behind mechanical locks is quite old, it is often dated back to ancient Egypt or to even sooner, and while the manufacturing has gotten better, the basic idea behind it remains roughly the same. The idea is that the lock contains a *tumbler*, a movable part that prevents the lock from opening. The tumbler should be easy to move using the correct key, but impossible to move using the wrong key and hard to move using lock-picking tools. The exact design of the tumbler varies, e.g. in my country the most common design is the pin tumbler lock, but there are also other tumbler designs, such as the disc tumbler lock or wafer tumbler lock.

Let's quickly look at a schema of the pin tumbler lock, named for the spring-loaded pins that rest against the inserted keys. The pins are separated into multiple parts by horizontal cuts, shown in this schema using blue and green colour. The right side shows a lock where the cuts in pins are aligned with the shear line, because a compatible key has been inserted, and the left side shows a lock where the cuts in pins are not aligned with the shear line, because an incompatible key has been inserted.

We will also use this schema to define some common terms:

*(cutting) position*is a position at which the key can be cut. Denoted as $p_{i}$ in the schema above.*cutting depth*is a depth to which the key (or locK) is cut. Denoted as $d_{i}$ in the schema above.*(key) cutting*is the actual shape of a key. Usually represented as an ordered tuple, the cutting of the key on the left is (2, 2, 1), the cutting of the key on the right is (1, 2, 0).

There are 2 parts to specifying a master-key system:

- A lock-chart provided by the customer. Lock-chart specifies the number of keys and locks in the system, and the
*opens*and*is-blocked*relations between keys and locks. - A geometry provided by the manufacturer. The geometry defines the set of possible key cuttings by describing the overall shape of the key and providing a set of constraints on the key.

My preferred depiction of a lock-chart is a simple table, where the black squares denote a (key, lock) pair where the key *opens* a lock, the white squares denote a (key, lock) pair where the key *is-blocked* (or does not open) a lock:

For geometry, we will make a simplifying assumption that all positions have the same number of possible cutting depths and that the only kind of constraint we work with is something we call *gecon* (general constraint). This is not that far from the real world, because most real-world manufacturing constraints can be converted into a polynomial number of gecons, and while most of the geometries in the real world are "jagged" (they have different number of possible cutting depths in each position), we can use gecons to encode such geometry inside this simplified framework.

We will represent gecons as a tuple with the same length as there are positions in the geometry, where each element can either be a number or a wildcard, marked as `*`

. When represented in this way, gecon can be seen as a forbidden cutting pattern, e.g. gecon `(*, *, 2, *)`

forbids all keys whose cutting depth at the 3rd position is 2.

This is all we need to know about the inner workings of mechanical locks and master-key systems, so we start working on solving them via conversion to SAT.

Before we start converting our problem to SAT, we need to determine the properties our system should have. In production use, there can be quite a few of them^{[1]}, but luckily most can be translated into gecons, and we will skip over the rest in our simplified example. This means that we end up with 5 properties:

- A key must have
*exactly one*cutting depth selected for each position - A lock must have at
*least one*cutting depth selected for each position - A key's cutting must not match any gecon
- A key must open all locks that the lock-chart specifies it should open
- A key must be blocked in all locks that the lock-chart specifies it should not open

As with the Sudoku example, we will need to decide which properties of the whole system will be modelled via variables and which will be modelled via clauses binding them together. We will start by using 2 groups of variables, $key_{p, d}^{k}$ for keys, and $lock_{p, d}^{l}$ for locks respectively. The meaning of these variables is that if $key_{p, d}^{k}$ is set to "true", then the key $k$ in position $p$ has a cutting depth $d$, and analogously for the $lock$ variables.

With the variables defined, we can start encoding the properties in CNF. The first two are the same thing as we already did in the Sudoku example:

**Property 1 (A key must have exactly one cutting depth at a position)**

$$

\forall (k, p) \in (keys \times positions): \operatorname{exactly-one}(key_{p, 0}^{k}, key_{p, 1}^{k}, \dots, key_{p, d}^{k})

$$

**Property 2 (A lock must have at least one cutting depth at a position)**

$$

\forall (l, p) \in (locks \times positions): \bigvee_{d \in depths} lock_{p, d}^{l}

$$

**Property 3 (A key's cutting must not match any gecon)**

Formulating this property in a set of CNF clauses is easier if we change how we think about gecons first.

A gecon is a tuple of the same length as there are positions in the geometry, and at each position, the gecon can either contain a wildcard or a specific cutting depth. Because wildcards match any depth, only the positions with specific cutting depth are relevant to reasoning about gecons. In other words, we can also think of gecon as a set of (position, depth) pairs that cannot be present in a key cutting at the same time.

Using this reformulation leads to the following simple clause, saying that at least one of the (position, depth) pairs must not be present in the key.

$$

\forall (k, g) \in (keys \times gecons): \bigvee_{(p, d) \in g} \neg key_{p, d}^{k}

$$

**Property 4 (A key must open all locks that the lock-chart says it should open)**

For a key to open a lock, the pins in the lock need to be cut so that the cuts align with the shear line when the key is inserted. In simpler terms, a key opens a lock when the lock is cut at the same (position, depth) pairs the key is. This leads to a simple translation to a set of binary clauses:

\[

\forall k \in keys,

\forall l \in \operatorname{opened-by}(k):

\bigwedge_{\substack{p \, \in \, positions \\ d \, \in \, depths}}

\left( key_{p, d}^{k} \implies lock_{p, d}^{l} \right)

\]

Because an implication can be converted into a disjunction as $\neg key_{p, d}^{k} \vee lock_{p, d}^{l}$, the produced clauses are trivially convertible to CNF.

**Property 5 (A key is blocked in all locks the lock-chart says it shouldn't open)**

For a key to be blocked in a lock, at least one of the pins in the lock must not be aligned with the shear line. In other words, a key is blocked in a lock when at least one of key's (position, depth) cutting pairs does not have a counterpart in the lock. This can easily be converted into a set of logical formulae:

\[

\forall k \in keys,

\forall l \in \operatorname{blocked-in}(k):

\bigvee_{\substack{p \, \in \, positions \\ d \, \in \, depths}} \left(key_{p, d}^{k} \wedge \neg \, lock_{p, d}^{l}\right)

\]

The problem with this translation is that the produced formulae are not in CNF, but rather in DNF, and the naive conversion from DNF to CNF using distributive law leads to an exponential explosion in the number of clauses. Specifically, given $N$ clauses of length $L$, the conversion produces $L^N$ clauses of length $N$.

Instead, we have to move from using equivalent transformations, as shown in the previous post, to equisatisfiable transformations.

Tseytin transformation is a simple algorithm that allows you to transform arbitrary logic formula into a CNF formula that is equisatisfiable with the original one. The size of the resulting CNF formula is linear in size of the original formula, but it also contains new variables to achieve this.

The basic idea is that if we have a formula that explodes when being converted into CNF, such as $\left(x_{1} \wedge x_{2}\right) \vee \left(x_{3} \wedge x_{4}\right) \vee \left(x_{5} \wedge x_{6}\right)$, then, if we could replace each of the conjunctions with a new variable that would be "true" when the whole conjunction is "true" and vice versa, the conversion to CNF would become trivial: $\left(y_{1} \vee y_{2} \vee y_{3}\right)$.

Tying the new variables to their subexpression is done by using an equivalence, e.g. $y_{1} \iff \left(x_{1} \wedge x_{2} \right)$, but these new clauses also need to be converted to CNF. The first step is to split the logical equivalence into 2 implications, then converting those implications into disjunctions, like so:

\[

\begin{align}

%% Step 1 -- the original

y_{1} & \iff \left(x_{1} \wedge x_{2} \right) \\

%% Step 2 -- two implications

\left( y_{1} \implies \left(x_{1} \wedge x_{2} \right)\right)

& \wedge

\left( y_{1} \impliedby \left(x_{1} \wedge x_{2} \right) \right)

\\

%% Step 3 -- implications to negated disjunctions

\left( \neg y_{1} \vee \left(x_{1} \wedge x_{2} \right)\right)

& \wedge

\left( y_{1} \vee \neg (x_{1} \wedge x_{2}) \right)

\\

%% Step 4 -- LHS multiplication

\left( \left( \neg y_{1} \vee x_{1} \right) \wedge \left( \neg y_{1} \vee x_{2} \right)\right)

& \wedge

\left( y_{1} \vee \neg x_{1} \vee \neg x_{2} \right)

\\

%% Step 4 -- Remove unneeded parentheses

\left( \neg y_{1} \vee x_{1} \right) \wedge \left( \neg y_{1} \vee x_{2} \right)

& \wedge

\left( y_{1} \vee \neg x_{1} \vee \neg x_{2} \right)

\end{align}

\]

Using Tseytin transformation we can convert the DNF generated by blocking keys in locks into a much smaller set of clauses, by defining a new kind of variable, $block_{p, d}^{k, l}$:

\[

\left( key_{p, d}^{k} \wedge \neg lock_{p, d}^{l} \right) \iff block_{p, d}^{k, l}

\]

This definition means that $block_{p, d}^{k, l}$ is "true" when the key $k$ is blocked in lock $l$ at position $p$ and depth $d$, and lets us rewrite the formulation for **property 5** in this way^{[2]}:

\[

\forall k \in keys,

\forall l \in \operatorname{blocked-in}(k):

\bigvee_{\substack{p \, \in \, positions \\ d \, \in \, depths}} block_{p, d}^{k, l}

\]

The model we have created above is a valid logical model for a master-key system. However, some of the clauses in it are redundant, e.g. if we assume that any lock in a lock-chart is opened by at least one key, we can remove clauses generated by **property 2**. This is caused by the fact that we already force keys to have exactly one cutting depth at a position, so a lock opened by a key will have at least one cutting depth for each position anyway.

We can also define the $block_{p, d}^{k, l}$ variables using a single implication,

\[

block_{p, d}^{k, l} \implies ( key_{p, d}^{k} \wedge lock_{p, d}^{l} )

\]

saving 2 binary clauses per variable. I will skip providing proof of this fact because the proof is quite involved. There is also something much more interesting going on, namely that these optimisations might not be optimisations at all. Removing clauses from a problem, and thus "decreasing" the amount of work a SAT solver has to do, does not necessarily decrease its running time for reasons I will talk about in another post.

There is one more thing to note about the formulation above, specifically that it does not forbid spurious cuts in locks. A spurious cut is a cut that does not correspond to a cut in any of the keys that open the lock. We want to avoid these cuts because they increase the manufacturing costs and decrease the security of the locks. There are two ways to solve this:

- Add a set of clauses that forbid spurious cuts in locks. Formulating them is simple enough, but it does add quite a few new clauses of low value (likely to lengthen the solver's running time).
- Post-process the results to remove spurious cuts. This has linear complexity in regards to the number of opening (key, lock) pairs, which is usually only a small multiple of the total number of keys in a lock-chart.

Because the post-processing option is easy and fast, in the real world, we would pick that one, but we won't use either of these two options in our toy example.

Now that we know how to translate a master-key system to CNF-SAT, it is time to implement a solver for master-key systems in C++^{[3]}. As before the full code lives in a GitHub repository and this post will only contain the more interesting and relevant excerpts. Also, before we start writing the solver itself, we need to define its input and output formats.

The chosen formats are mostly picked for their simplicity and easiness of hand-rolling a simple parser for them. For the lock-chart, we will pick the simplest possible textual representation, that is translating the full lock-chart into `*`

for black squares and `.`

for white squares. As an example, the lock-chart shown in the "Specifying master-key systems" section would be encoded into this:

```
**.*.......
**..*......
**...*.....
**....*....
*.*....*...
*.*.....*..
*.*......*.
*.*.......*
***********
```

For geometry, we will use a simple, line-oriented, format. As an example, a geometry with 3 position and 6 depths at each position where the first and the last position are not allowed to share cutting depth will be encoded like this:

```
base: 3x6
G: 0, *, 0
G: 1, *, 1
G: 2, *, 2
G: 3, *, 3
G: 4, *, 4
G: 5, *, 5
```

Finally, the output format will also be a line-oriented, with one key being output per line. The keys will be written in the same order as they have in the lock-chart, and each key will be output as a comma-separated list of cutting depths, sorted by their position, e.g. this output:

```
1,1,1
1,1,2
1,1,3
```

specifies 3 keys, where the first key cutting has depth 1 at all three positions, the second key cutting has depth 1 at first and second positions and depth 2 at the third position and the third key cutting has depth 1 at first and second positions and depth 3 at the third position.

As always, the first thing to do is to figure out how can we address the variables. Unlike with the sudoku example in the previous post, we won't be computing the variables directly^{[4]}, but rather we will keep a map from variable indices (position, depth and key/lock order) to the Minisat's internal variables, and create new variables on-demand. To simplify the code using our mapper, we will also cheat a bit; instead of storing the variables, we will store the appropriate literal in positive polarity:

```
// Inside the solver class:
using indices = std::tuple<size_t, size_t, size_t>;
std::map<indices, Minisat::Lit> m_key_vars;
// Implementation of variable (literal) accessor for _key_ variables
Minisat::Lit solver::key_lit(size_t position, size_t depth, size_t key) {
auto indices = std::make_tuple(position, depth, key);
auto it = m_key_vars.find(indices);
if (it != m_key_vars.end()) {
return it->second;
}
return m_key_vars[indices] = Minisat::mkLit(m_solver.newVar());
}
```

We will skip over the implementation for $lock$ literals, as it is essentially the same modulo some variable names. What is more interesting is that in the final version of the code, we are not saving the $block$ variables. This is done because each block variable is only used twice, once when it is defined, and the second time when it is used to enforce the fact that a key doesn't open a lock. Because both of these uses are done at the same time, we never need to return to a previously defined blocking variable and thus do not need to store them.

With the variables ready, we can once again translate SAT formulation of a problem into C++ code. In our MKS solver, the main work is done via three helper functions, `add_key`

, `add_lock`

and `add_gecon`

, that are responsible for adding clauses related to a specific key, lock or gecon respectively, so these three functions will be the focus of our investigation.

Let's start with the simplest one, `add_lock`

. It is responsible for enforcing that each lock must have at least one cutting depth at each position (**property 2**).

```
void solver::add_lock(size_t lock) {
for (size_t pos = 0; pos < m_geometry.positions; ++pos) {
Minisat::vec<Minisat::Lit> literals;
for (size_t depth = 0; depth < m_geometry.depths; ++depth) {
literals.push(lock_lit(pos, depth, lock));
}
add_clause(literals);
}
}
```

`add_gecon`

is similarly easy, as it also has only one, simple, responsibility: enforce that no key cutting matches specific gecon (**property 3**).

```
void solver::add_gecon(size_t gecon) {
auto const& pattern = m_geometry.gecons[gecon].pattern;
for (size_t key = 0; key < m_lockchart.keys(); ++key) {
Minisat::vec<Minisat::Lit> lits;
for (size_t pos = 0; pos < pattern.size(); ++pos) {
// -1 is the wildcard marking
if (pattern[pos] != -1) {
lits.push(~key_lit(pos, pattern[pos], key));
}
}
add_clause(lits);
}
}
```

And finally, `add_key`

is responsible for ensuring that each key has exactly 1 cutting depth at each position (**property 1**):

```
void solver::add_key(size_t key) {
for (size_t pos = 0; pos < m_geometry.positions; ++pos) {
Minisat::vec<Minisat::Lit> literals;
for (size_t depth = 0; depth < m_geometry.depths; ++depth) {
literals.push(key_lit(pos, depth, key));
}
exactly_one(literals);
}
}
```

This leaves 2 things unimplemented, *opens* and *is-blocked-in* relations between keys and locks. In our toy solver, these will also be part of `add_key`

. The reason for that is a simple implementation detail, specifically that the internal `lockchart`

implementation stores mapping from keys to the locks they open/they are blocked in.

This is the implementation of **property 4** (keys can open specific locks):

```
void solver::add_key(size_t key) {
// ...
for (auto lock : m_lockchart.opens(key)) {
for (size_t pos = 0; pos < m_geometry.positions; ++pos) {
for (size_t depth = 0; depth < m_geometry.depths; ++depth) {
// key_{p, d} => lock_{p, d} <---> ~key_{p, d} v lock_{p, d}
add_clause(~key_lit(pos, depth, key), lock_lit(pos, depth, lock));
}
}
}
// ...
}
```

And this is the implementation of **property 5** (keys are blocked in specific locks):

```
void solver::add_key(size_t key) {
// ...
for (auto lock : m_lockchart.blocked_in(key)) {
Minisat::vec<Minisat::Lit> blocking_lits;
for (size_t pos = 0; pos < m_geometry.positions; ++pos) {
for (size_t depth = 0; depth < m_geometry.depths; ++depth) {
auto block = Minisat::mkLit(m_solver.newVar());
// block_{p, d} <=> (key_{p, d} && ~lock_{p, d})
// 1) block_{p, d} => (key_{p, d} && ~lock_{p, d})
// ~block_{p, d} v (key_{p, d} && ~lock_{p, d})
// (~block_{p, d} v key_{p, d}) && (~block_{p, d} v ~lock_{p, d})
add_clause(~block, key_lit(pos, depth, key));
add_clause(~block, ~lock_lit(pos, depth, lock));
// 2) block_{p, d} <= (key_{p, d} && ~lock_{p, d})
// block_{p, d} v ~key_{p, d} v lock_{p, d}
add_clause(block, ~key_lit(pos, depth, key), lock_lit(pos, depth, lock));
blocking_lits.push(block);
}
}
add_clause(blocking_lits);
}
// ...
}
```

Now, with the solver done, it is time for benchmarks...

Benchmarking will be once more problematic, but for entirely different reasons. Benchmarking the sudoku solver from the previous post was hard because there are example sudokus *everywhere*, but there is no agreed-upon set of representative sudoku puzzles. I solved this by picking a set of 95 supposedly hard (containing only 17 givens) inputs and using those as a reasonable approximation. However, benchmarking the MKS solver has the exact opposite problem: there are *no* non-trivial inputs publically available.

This doesn't mean there will be no benchmarks because I do have access to some proprietary inputs, thanks to our research partnership. It does mean, however, that I cannot publish them, or describe them in too much detail. I also can use only a subset of them, because some of them use require features that are not implemented in our toy solver. After further filtering this subset to only use lock-charts that have at least 100 keys, I have 7 inputs across 2 geometries to test our solver with.

Geometry A is interesting by being *very* long, as it has ~30 positions, but relatively shallow, with the shallowest position have only 2 cutting depths, and the deepest having ~5 cutting depths. It also contains ~100 gecons. In contrast, geometry B is much closer to being squarish, as it has ~10 positions and ~10 depths at each position, and contains ~80 gecons.

For geometry A, there are 2 lockcharts. The smaller one contains ~150 keys, and the larger one contains ~250 keys. For geometry B, there are 5 lockcharts, ranging between ~100 keys and ~500 keys. We will refer to them in order sorted by their increasing size so problem 1 will be the smallest one.

The measurements were once again taken on a stock i5-6600k @ 3.5 GHz, against binaries compiled with `g++`

using `-O3 and -DNDEBUG`

flags. Each input has been run 10 times, and the median and stddev can be found in the table below.

Geometry | Problem | Median time to solve (s) | stddev (s) |
---|---|---|---|

A | Problem 1 | 23.74 | 0.09 |

A | Problem 2 | 57.28 | 0.17 |

B | Problem 1 | 5.37 | 0.01 |

B | Problem 2 | 5.80 | 0.02 |

B | Problem 3 | 48.43 | 0.14 |

B | Problem 4 | 70.55 | 0.13 |

B | Problem 5 | 394.82 | 9.32 |

As we could see in the previous chapter, our toy solver can solve non-trivial lockcharts and geometries in a reasonable amount of time. However, because there are no public solvers or inputs available, we have no point of comparison for them. Instead, let me tell you an anecdote from our own research into solving master-key systems.

The original approach our research group picked was to write a specialised solver for the problem, including all the manufacturer-specific constraints. This solver was in development for multiple years, and while it produced correct solutions, it didn't work quite fast enough -- only about 80% of all test inputs were solved within a specific time limit. In other words, things weren't going too well, until one of our colleagues had a fit of inspiration and suggested converting the problem to SAT.

In ~3 months the SAT-based MKS solver went from an idea to having feature parity with the specialised solver, including system integration and supporting vendor-specific constraints. It also performed much better, and the prototype was able to successfully solve ~90% of the inputs within the time limit. Because this approach proved fruitful, the SAT-based solver, along with the underlying concepts, was then developed further in our partnership with Assa Abloy (née FAB), and, as described in my thesis, the solver can now solve lockcharts with ~4k keys inside a reasonable amount of time.

I think this anecdote illustrates my point from the previous article well, in that we were able to quickly create a reasonably performing solver by translating the problem into SAT and using a modern SAT solver. However, translating MKS to SAT does have its limitations^{[5]}, and we are currently working on an open source solver that exploits the structure of the MKS domain to (hopefully) scale to even larger lockcharts.

*This is all for part 2. Part 3 is out and it looks at the internals of modern SAT solvers.*

*Also, a small personal appeal: if you have an in with some key manufacturer, try to convince them to make obsoleted geometries public. Likewise, if you have access to large, real-world, complex lockcharts, see if you can get the rights to make them public.*

One of the more common constraints that cannot be translated into gecon is the desire for 2 key cuttings to differ at more than

*X*positions. ↩︎Some manufacturers require keys to be blocked in a lock by more than one position. This can be implemented by encoding "at least 2" constrain over the $block_{p, d}^{k, l}$ variables for given key-lock pair. ↩︎

There are good reasons why our production solver is written in C++, but the reason this toy example is written in C++ as well is just that I am used to it. ↩︎

Because our toy solver always uses rectangular geometry, we could compute the variables on demand just as we did previously. However, the first step towards production-ready implementation would be to support jagged geometries directly, without gecons, which would make computing the variables directly harder to implement. Implementing variable access via lazy mapping can also be a useful technique when translating different problems into SAT, so I decided to showcase the implementation here. ↩︎

You should read the links provided at the top of this post for detailed explanation and discussion, but the short story is that the size of the translated problem is dominated by $block$ variables, and the $block$ variables grow linearly in the number of positions, depths and (key, lock) pairs that are blocked. Because the number of positions and depths is fixed for given geometry, and the usual lockchart is mostly empty (blocked), the size of the SAT can be said to be quadratic in the number of keys. ↩︎

Before I started doing research for Intelligent Data Analysis (IDA) group at FEE CTU, I saw SAT solvers as academically interesting but didn't think that they have many practical uses outside of other academic applications. After spending ~1.5 years working with them, I have to say that modern SAT solvers are fast, neat and criminally underused by the industry.

Boolean satisfiability problem (SAT) is the problem of deciding whether a formula in boolean logic is satisfiable. A formula is *satisfiable* when at least one interpretation (an assignment of `true`

and `false`

values to logical variables) leads to the formula evaluating to `true`

. If no such interpretation exists, the formula is *unsatisfiable*.

What makes SAT interesting is that a variant of it was the first problem to be proven NP-complete, which roughly means that a lot of other problems can be translated into SAT in reasonable^{[1]} time, and the solution to this translated problem can be converted back into a solution for the original problem.

As an example, the often-talked-about dependency management problem, is also NP-Complete and thus translates into SAT^{[2]}^{[3]}, and SAT could be translated into dependency manager. The problem our group worked on, generating key and lock cuttings based on user-provided lock-chart and manufacturer-specified geometry, is also NP-complete.

I will likely write about master-key systems and our approach towards solving them later, but to keep this post reasonably short, we will instead use Sudoku for practical examples.

These days, SAT almost always refers to CNF-SAT^{[4]}, a boolean satisfaction problem for formulas in conjunctive normal form (CNF). This means that the entire formula is a conjunction (AND) of clauses, with each clause being a disjunction (OR) of literals. Some examples:

- $(A \vee B) \wedge (B \vee C)$
- $(A \vee B) \wedge C$
- $A \vee B$
- $A \wedge C$

There are two ways to pass a formula to a SAT solver: by using a semi-standard file format known as DIMACS, or by using the SAT solver as a library. In real-world applications, I prefer using SAT solver as a library (e.g. MiniSat for C++), but the DIMACS format lets you prototype your application quickly, and quickly test the performance characteristics of different solvers on your problem.

DIMACS is a line oriented format, consisting of 3 different basic types of lines.

- A comment line. Any line that starts with "c" is comment line.
- A summary line. This line contains information about the kind and size of the problem within the file. A summary line starts with "p", continues with the kind of the problem (in most cases "cnf"), the number of variables and the number of clauses within this problem. Some DIMACS parsers expect this line to be the first non-comment line, but some parsers can handle the file without it.
- A clause line. A clause line consists of space-separated numbers, ending with a 0. Each non-zero number denotes a literal, with negative numbers being negative literals of that variable, and 0 being the terminator of a line.

As an example, this formula

$$(A \vee B \vee C) \wedge (\neg A \vee B \vee C) \wedge (A \vee \neg B \vee C) \wedge (A \vee B \vee \neg C)$$

would be converted into DIMACS as

```
c An example formula
c
p cnf 3 4
1 2 3 0
-1 2 3 0
1 -2 3 0
1 2 -3 0
```

MiniSat is a fairly simple and performant SAT solver that also provides a nice C++ interface and we maintain a modernised fork with CMake integration. The C++ interface to MiniSat uses 3 basic vocabulary types:

`Minisat::Solver`

- Implementation of the core solver and its algorithms.`Minisat::Var`

- Representation of a*variable*.`Minisat::Lit`

- Representation of a concrete (positive or negative)*literal*of a variable.

The difference between a variable and a literal is that the literal is a concrete "evaluation" of a variable inside a clause. As an example, formula $ (A \vee B \vee \neg C) \wedge (\neg A \vee \neg B) $ contains 3 variables, $A$, $B$ and $C$, but it contains 5 literals, $A$, $\neg A$, $B$, $\neg B$ and $\neg C$.

MiniSat's interface also uses one utility type: `Minisat::vec<T>`

, a container similar to `std::vector`

, that is used to pass clauses to the solver.

The following example uses MiniSat's C++ API to solve the same clause as we used in the DIMACS example.

```
// main.cpp:
#include <minisat/core/Solver.h>
#include <iostream>
int main() {
using Minisat::mkLit;
using Minisat::lbool;
Minisat::Solver solver;
// Create variables
auto A = solver.newVar();
auto B = solver.newVar();
auto C = solver.newVar();
// Create the clauses
solver.addClause( mkLit(A), mkLit(B), mkLit(C));
solver.addClause(~mkLit(A), mkLit(B), mkLit(C));
solver.addClause( mkLit(A), ~mkLit(B), mkLit(C));
solver.addClause( mkLit(A), mkLit(B), ~mkLit(C));
// Check for solution and retrieve model if found
auto sat = solver.solve();
if (sat) {
std::clog << "SAT\n"
<< "Model found:\n";
std::clog << "A := " << (solver.modelValue(A) == l_True) << '\n';
std::clog << "B := " << (solver.modelValue(B) == l_True) << '\n';
std::clog << "C := " << (solver.modelValue(C) == l_True) << '\n';
} else {
std::clog << "UNSAT\n";
return 1;
}
}
```

Because all of our clauses have length $\le 3$, we can get away with just using utility overloads that MiniSat provides, and don't need to use `Minisat::vec`

for the clauses.

We will also need to build the binary. Assuming you have installed our fork of MiniSat (either from GitHub, or from vcpkg), it provides proper CMake integration and writing the CMakeLists.txt is trivial:

```
cmake_minimum_required (VERSION 3.5)
project (minisat-example LANGUAGES CXX)
set(CMAKE_CXX_EXTENSIONS OFF)
find_package(MiniSat 2.2 REQUIRED)
add_executable(minisat-example
main.cpp
)
target_link_libraries(minisat-example MiniSat::libminisat)
```

Building the example and running it should^{[5]} give you this output:

```
SAT
Model found:
A := 0
B := 1
C := 1
```

Very few problems are naturally expressed as a logical formula in the CNF format, which means that after formulating a problem as a SAT, we often need to convert it into CNF. The most basic approach is to create an equivalent formula using De-Morgan laws, distributive law and the fact that two negations cancel out. This approach has two advantages: one, it is simple and obviously correct. Two, it does not introduce new variables. However, it has one significant disadvantage: some formulas lead to exponentially large CNF conversion.

The other approach is to create an equisatisfiable^{[6]} CNF formula, but we won't be covering that in this post.

Some common equivalencies are in the table below.

Original clause | Equivalent clause |
---|---|

$ \neg \neg \alpha $ | $ \alpha $ |

$ \alpha \implies \beta $ | $ \neg \alpha \vee \beta $ |

$ \neg ( \alpha \wedge \beta ) $ | $ \neg \alpha \vee \neg \beta $ |

$ \neg ( \neg \alpha \wedge \neg \beta ) $ | $ \alpha \vee \beta $ |

$ (\alpha \wedge \beta) \vee \gamma $ | $ (\alpha \vee \gamma) \wedge (\beta \vee \gamma) $ |

$ \alpha \iff \beta $ | $ \left(\alpha \implies \beta \right) \wedge \left(\alpha \impliedby \beta \right) $ |

Obviously, you don't have to remember these identities, but knowing at least some of them (implication) is much faster than deriving them from the truth tables every time.

With this background, we can now look at how we could use a real-world problem, such as Sudoku, using a SAT solver. First, we will go over the rules of Sudoku and how they can be translated into (CNF-)SAT. Then we will go over implementing this converter in C++ and benchmarking the results.

Sudoku is a puzzle where you need to place numbers 1-9 into a 9x9 grid consisting of 9 3x3 boxes^{[7]}, following these rules:

- Each row contains all of the numbers 1-9
- Each column contains all of the numbers 1-9
- Each of the 3x3 boxes contains all of the numbers 1-9

We can also restate this rules as:

- No row contains duplicate numbers
- No column contains duplicate numbers
- No 3x3 box contains duplicate numbers

Because these rules alone wouldn't make for a good puzzle, some of the positions are prefilled by the puzzle setter, and a proper Sudoku puzzle should have only one possible solution.

The first step in translating a problem to SAT is to decide what should be modelled via variables, and what should be modelled via clauses over these variables. With Sudoku, the natural thing to do is to model positions as variables, but in SAT, each variable can only have 2 values: "true" and "false". This means we cannot just assign each position a variable, instead we have to assign each combination of position *and* value a variable. We will denote such variable as $x_{r, c}^{v}$. If variable $x_{r, c}^{v}$ is set to "true", then the number in $r$-th row and $c$-th column is $v$.

Using this notation, let's translate the Sudoku rules from the previous section into SAT.

**Rule 1 (No row contains duplicate numbers)**

\[

\forall (r, v) \in (rows \times values):

\operatorname{exactly-one}(x_{r, 0}^{v}, x_{r, 1}^{v}, \dots, x_{r, 8}^{v})

\]

In plain words, for each row and each value, we want exactly one column in that row to have that value. We do that by using a helper called $\operatorname{exactly-one}$, that generates a set of clauses that ensure that exactly *one* of the passed-in literals evaluate to "true".

We will see how to define $\operatorname{exactly-one}$ later. First, we will translate the other Sudoku rules into these pseudo-boolean formulas.

**Rule 2 (No column contains duplicate numbers)**

\[

\forall (c, v) \in (columns \times values):

\operatorname{exactly-one}(x_{0, c}^{v}, x_{1, c}^{v}, \dots, x_{8, c}^{v})

\]

This works analogically with Rule 1, in that for each column and each value, we want exactly one row to have that value.

**Rule 3 (None of the 3x3 boxes contain duplicate numbers)**

This rule works exactly the same way as the first two: for each box and each value, we want exactly one position in the box to have that value.

\[

\forall (box, value) \in (boxes \times values):

\operatorname{exactly-one}(\operatorname{literals-in-box}(box, value))

\]

Even though it seems to be enough at first glance, these 3 rules are in fact *not* enough to properly specify Sudoku. This is because a solution like this one:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

0 | x | . | . | . | . | . | . | . | . |

1 | . | . | . | x | . | . | . | . | . |

2 | . | . | . | . | . | . | x | . | . |

3 | . | x | . | . | . | . | . | . | . |

4 | . | . | . | . | x | . | . | . | . |

5 | . | . | . | . | . | . | . | x | . |

6 | . | . | x | . | . | . | . | . | . |

7 | . | . | . | . | . | x | . | . | . |

8 | . | . | . | . | . | . | . | . | x |

where "x" denotes a position where *all* variables are set to "true" and "." denotes a position where *no* variables are set to "true", is valid according to the rules as given to the SAT solver.

This is because we operate with an unstated assumption, that each position can contain only one number. This makes perfect sense to a human, but the SAT solver does not understand the meaning of the variables, it only sees clauses it was given. We can fix this simply by adding one more rule:

**Rule 4 (Each position contains exactly one number)**

\[

\forall (r, c) \in (rows \times columns): \operatorname{exactly-one}(x_{r, c}^{1}, x_{r, c}^{2}, \ldots, x_{r, c}^{9}))

\]

With this rule in place, we have fully translated the rules of Sudoku into SAT and can use a SAT solver to help us solve sudoku instances. But before we do that, we need to define the $\operatorname{exactly-one}$ helper our description of Sudoku relies on.

There is no way to encode numeric constraints natively in boolean logic, but often you can decompose these constraints into simpler terms and encode these. Many research papers have been written about the efficient encoding of specific constraints and other gadgets, but in this post, we only need to deal with the most common and one of the simplest constraints possible: "exactly one of this set of literals has to evaluate to true". Everyone who works with SAT often can write this constraint from memory, but we will derive it from first principles because it shows how more complex constraints can be constructed.

The first step is to decompose the constraint $x == n$ into two parts: $x \ge n$ and $x \le n$, or for our specific case, $x \ge 1$ and $x \le 1$, or, translated into the world of SAT, at least 1 literal has to evaluate to "true", and no more than 1 literal can evaluate to "true". Forcing at *least one* literal to be true is easy, just place all of them into one large disjunction:

\[

\bigvee_{lit \in Literals} lit

\]

Forcing *at most* one literal to be true seems harder, but with a slight restating of the logic, it also becomes quite easy. At most one literal is true when *there is no pair of literals where both of the literals are true at the same time*.

\[

\neg \bigvee_{i \in 1..n, j \in 1..n, i \neq j} lit_{i} \wedge lit_{j}

\]

This set of clauses says exactly that, but it has one problem: it is not in CNF. To convert them into CNF, we have to use some of the identities in the previous section on converting formulas to CNF. Specifically, the fact that negating a disjunction leads to a conjunction of negations, and negating a conjunction leads to a disjunction of negations. Using these, we get the following CNF formula:

\[

\bigwedge_{i \in 1..n, j \in 1..n, i \neq j} \neg lit_{i} \vee \neg lit_{j}

\]

We can also use the fact that both conjunction and disjunction are commutative (there is no difference between $x \wedge y$ and $y \wedge x$) to halve the number of clauses we create, as we only need to consider literal pairs where $i < j$.

Now that we know how to limit the number of "true" literals to both *at least* 1 and *at most* 1, limiting the number of "true" literals to *exactly* 1 is trivial; just apply both constraints at the same time via conjunction.

Now that we know how to describe Sudoku as a set of boolean clauses in CNF, we can implement a C++ code that uses this knowledge to solve arbitrary Sudoku. For brevity, this post will only contain relevant excerpts, but you can find the entire resulting code on GitHub^{[8]}.

The first thing we need to solve is addressing variables, specifically converting a (row, column, value) triple into a specific value that represents it in the SAT solver. Because Sudoku is highly regular, we can get away with linearizing the three dimensions into one, and get the number of variable corresponding to $x_{r, c}^{v}$ as `r * 9 * 9 + c * 9 + v`

. We can also use the fact that `Minisat::Var`

is just a plain `int`

numbered from 0 to avoid storing the variables at all because we can always compute the corresponding variable on-demand:

```
Minisat::Var toVar(int row, int column, int value) {
return row * columns * values + column * values + value;
}
```

Now that we can quickly retrieve the SAT variable from a triplet of (row, column, value), but before we can use the variables, they need to be allocated inside the SAT solver:

```
void Solver::init_variables() {
for (int r = 0; r < rows; ++r) {
for (int c = 0; c < columns; ++c) {
for (int v = 0; v < values; ++v) {
static_cast<void>(solver.newVar());
}
}
}
}
```

With the variables allocated, we can start converting the SAT version of Sudoku rules into C++ code.

**Rule 1 (No row contains duplicate numbers)**

```
for (int row = 0; row < rows; ++row) {
for (int value = 0; value < values; ++value) {
Minisat::vec<Minisat::Lit> literals;
for (int column = 0; column < columns; ++column) {
literals.push(Minisat::mkLit(toVar(row, column, value)));
}
exactly_one_true(literals);
}
}
```

**Rule 2 (No column contains duplicate numbers)**

```
for (int column = 0; column < columns; ++column) {
for (int value = 0; value < values; ++value) {
Minisat::vec<Minisat::Lit> literals;
for (int row = 0; row < rows; ++row) {
literals.push(Minisat::mkLit(toVar(row, column, value)));
}
exactly_one_true(literals);
}
}
```

**Rule 3 (None of the 3x3 boxes contain duplicate numbers)**

This rule results in the most complex code, as it requires two iterations -- one to iterate over all of the boxes and one to collect variables inside each box. However, the resulting code is still fairly trivial:

```
for (int r = 0; r < 9; r += 3) {
for (int c = 0; c < 9; c += 3) {
for (int value = 0; value < values; ++value) {
Minisat::vec<Minisat::Lit> literals;
for (int rr = 0; rr < 3; ++rr) {
for (int cc = 0; cc < 3; ++cc) {
literals.push(Minisat::mkLit(toVar(r + rr, c + cc, value)));
}
}
exactly_one_true(literals);
}
}
}
```

**Rule 4 (Each position contains exactly one number)**

```
for (int row = 0; row < rows; ++row) {
for (int column = 0; column < columns; ++column) {
Minisat::vec<Minisat::Lit> literals;
for (int value = 0; value < values; ++value) {
literals.push(Minisat::mkLit(toVar(row, column, value)));
}
exactly_one_true(literals);
}
}
```

We also need to define the `exactly_one_true`

helper:

```
void Solver::exactly_one_true(Minisat::vec<Minisat::Lit> const& literals) {
solver.addClause(literals);
for (size_t i = 0; i < literals.size(); ++i) {
for (size_t j = i + 1; j < literals.size(); ++j) {
solver.addClause(~literals[i], ~literals[j]);
}
}
}
```

With these snippets, we have defined a model of Sudoku as SAT. There are still 2 pieces of the solver missing: a method to specify values in the pre-filled positions of the board and a method that extracts the found solution to the puzzle.

Fixing the values in specific positions is easy, we can just add a unary clause for each specified position:

```
bool Solver::apply_board(board const& b) {
for (int row = 0; row < rows; ++row) {
for (int col = 0; col < columns; ++col) {
auto value = b[row][col];
if (value != 0) {
solver.addClause(Minisat::mkLit(toVar(row, col, value - 1)));
}
}
}
return ret;
}
```

Because the only way to satisfy a unary clause is to set the appropriate variable to the polarity of the contained literal, this forces the specific position to always contain the desired value.

To retrieve a solution, we need to be able to determine a position's value. Because only one of the variables for any given position can be set to true, the value corresponding to that specific variable is the value of the given position:

```
board Solver::get_solution() const {
board b(rows, std::vector<int>(columns));
for (int row = 0; row < rows; ++row) {
for (int col = 0; col < columns; ++col) {
for (int val = 0; val < values; ++val) {
if (solver.modelValue(toVar(row, col, val)).isTrue()) {
b[row][col] = val + 1;
break;
}
}
}
}
return b;
}
```

With the solver finished, we can go on to benchmarking its performance.

As far as I could tell from a cursory search, there are no standard test suites for benchmarking Sudoku solvers. I decided to follow Norvig's blog post about his own Sudoku solver and use this set of 95 hard Sudokus for measuring the performance of my solver.

The measurements were done on PC with factory-clocked i5-6600K CPU @ 3.5 GHz, the code was compiled using `g++`

under Windows Subsystem for Linux, and each input was run 10 times. After that, I took the mean of the results for each problem and put all of them into a boxplot. Since I am a proponent of LTO builds, I also compiled the whole thing, including MiniSat, with LTO enabled, and then benchmarked the binary.

These are the results:

As you can see, the LTO build performed somewhat better, but not significantly so. What is interesting, is that the number of outliers *above* the box, and the relative lengths of the whiskers, suggest that the underlying distribution of solver's running time over all of the inputs is heavy-tailed. This means that the longest-running inputs will need significantly longer to be solved than the others, and it is a common attribute of solvers for NP-complete problems. This is because a single wrong decision during the search for a solution can lengthen the total runtime significantly.

There is one more question to answer, namely, how does this performance compare with high-performance Sudoku-specialized solvers? I picked 2, ZSolver and fsss2, and tried running them on the same set of problems. Not too surprisingly, they both outperformed our SAT-based solver badly. The sort of "converting" solver we wrote will always be slower than a well-tuned specialised solver, but they do have some advantages that can make them desirable. As an example, I have no prior domain-specific knowledge about solving Sudokus, but I was able to write the SAT-based Sudoku solver in less than 2 hours. It is also much more readable and extendable^{[9]}.

*That is all for part 1, but I have much more I want to say about SAT solvers, so you can expect more posts about both using them, and about their internals and the theory behind why are they so fast.*

*There are more benchmarks in part 1.5, and part 2 shows how to implement a SAT-based solver for master-key systems.*

This means polynomial, because when it comes to complexity theory, algorithms with polynomial complexity are generally considered tractable, no matter how high the exponent in the polynomial is, and algorithms with exponential complexity are considered intractable. ↩︎

At least as long as we assume that

- To install a package, all its dependencies must be installed
- A package can list specific versions of other packages as dependencies
- Dependency sets of each version of a package can be different
- Only one version of a package can be installed ↩︎
In fact, various dependency managers in the wild already use SAT solvers, such as Fedora's DNF, Eclipse's plugin manager, FreeBSD's pkg, Debian's apt (optionally), and others. ↩︎

There are some extensions like XOR-SAT, which lets you natively encode XOR clauses, but these are relatively rare and only used in specialist domain, e.g. cryptanalysis. ↩︎

When using the library interface of MiniSat, it defaults to being entirely deterministic. This means that if you are using the same version of MiniSat, the result will always be the same, even though there are different models. ↩︎

Two formulas, f1 and f2, are equisatisfiable when f1 being satisfied means that f2 is also satisfied and vice versa. ↩︎

There is also a notion of generalised sudoku, where you have to fill in numbers 1-N in NxN grid according to the same rules. It is proven to be NP-complete. ↩︎

The real code differs in places, especially in that it is coded much more defensively and contains more validity checking in the form of assertions. ↩︎

Try reading through the code of the linked solvers and imagine extending them to work with 18x18 boards. With our solver, it is just a matter of changing 3 constants on top of a file from 9 to 18. ↩︎