The Telegraph asks: World’s hardest sudoku: can you crack it?:
The answer is: “Sure thing!” …
If you expect Fruity Loops and Flip-Flop Chains in Darth Vader style (I can kill you with a single thought!), you will be disappointed. However, the method employed is not simple trial and error either.
The trial and error process usually offered for sudokus is a depth-first decision tree walk, which is as clueless as it gets. While the usual environment for abstract problems is in fact clueless, sudoku is nothing but all about the clues.
Therefore, a
is a much better choice. Especially, since the sudoku constraints are naturally very loopy and new facts often change the results of previous tests.
Although the depth of the search never exceeds 2 decision levels in this example, it does not mean that it is not a lot of work.
The method is designed to work well with computer assisted solver programs. The program should have bookmarks, but if not, saving positions in files will do (you never need more than 2 files, unless there is no good undo function, then you will need 3 files).
Each chain state is denoted by a marker:
Sudoku cells are denoted as <column letter><row number> (just as with spreadsheets. Or chess, if you wish).
A statement about the allowed values of a sudoku cell is made by enumerating the values:
<statement> =def= <cell>: <value1>,<value2>,..
E.g. A2: 4,9,3 is read as:
The cell A2 can hold the values 4, 9 and 3.
Special statements are used to emphasize disproof, proof or inconclusivity:
Several statements are concatenated with the operator +:
<statements> =def= <statement1> + <statement2> + ..
E.g. A2: 4,9 + B5: 1,6 is read as:
The cell A2 can hold the values 4 and 9, and the cell B5 can hold the values 1 and 6.
A hypothetical statement is marked with # and introduces a new level of reasoning based on all previous statements and hypotheses:
<hypothesis> =def= # <statement>
E.g. # A2: 7,8 is read as:
Assuming that A2 allows the values 7 and 8.
The hypothesis/statement (HS) chain # C3: 6 + B7: 9 # A2: 7,8 is read as:
In the context of the assumption that C3 must have the value 6 and the proven fact that B7 must have the value 9 in this context, the assumption is made that A2 can (only) hold the values 7 and 8.
A conclusion from a HS chain is denoted by =>:
<hs-chain> => <statements>
This defines a Hypothesis/Disproof/Proof chain (HDP chain)
Let me say beforehand, that I never use anything more advanced than single/double column and naked/hidden pair/triple/quad checks (no X-Wing, Swordfish, XY-Wing, XYZ-Wing, never ever a XY-Chain). In fact, the example solution was found in this way.
The special feature of this method lies in the structure of the binary decisions. The rest is just following a (tediously exhausting) process without giving in to the desire of finding a shortcut (there is none!). So, here goes …
Branch1 | Branch2 | Action |
---|---|---|
SOLVED | ??? | finish. The solution proves the correctness of the current HDP chain |
??? | SOLVED | finish. The solution proves the correctness of the current HDP chain |
UNSOLVED | UNSOLVED | no action (resist the temptation to follow one of the branches further!) |
CTR | UNSOLVED | terminate current level and use branch 2 as a new proven fact in the HPD chain |
UNSOLVED | CTR | terminate current level and use branch 1 as a new proven fact in the HPD chain |
CTR | CTR | terminate current and previous and evaluate the contradiction in the context of the new current level |
Since this is the main method for solving a sudoku, the unqualified term pair reduction always refers to cell pair reduction.
If there are no cell pairs:
The design of the pair reduction algorithm aims to reduce a sudoku to cells which only contain single values or pairs. So it is no surprise, that this eventually succeeds.
Here is a full reduction from the sudoku:
........3..1..9.6..5..8.4.....9...8...867.....1....2....6..7.2..3.8..5..4.......8
The pair chain produces a contradiction for both branches.
Based on the following simplified desicion tree, note that eliminating C4: 1,5,6,8 causes B5: 3,5,9 to detect a contradiction immediately instead of during resolution of hypothesis I6: 5,9.
Doing a depth-first walk delivers the following HDP chain:
INC # C4: 2,4
DIS # C4: 2,4 # B5: 2,4 => CTR => B5: 3,5,9
DIS # C4: 2,4 + B5: 3,5,9 # I6: 5,9 => CTR => I6: 3,7,8
DIS # C4: 2,4 + B5: 3,5,9 + I6: 3,7,8 => CTR => C4: 1,5,6,8
DIS C4: 1,5,6,8 => CTR
While a breadth-first walk delivers the following HDP chain:
INC # C4: 2,4
DIS # C4: 1,5,6,8 => CTR => C4: 2,4
DIS C4: 2,4 # B5: 2,4 => CTR => B5: 3,5,9
DIS C4: 2,4 + B5: 3,5,9 => CTR
The scenario is not uncommon, since sudokus have a single solution, which necessarily causes many contradictions in the decision tree.
This sudoku from menneske.no is a good example, where the benefit of staying on the same level can be seen directly.
The reduction pair analysis does not find a solution, since it only evaluates a hypothesis in itself. It does not propagate the consequences from a contradiction in the branches.
However, when the consequences of the contradictions are propagated, a new pair C1: 2,7 is detected, which is used in the hypothesis C1: 2,7 # B2: 2,7 to reveal the solution:
Another advantage of doing a breadth-first walk lies in the possibility of information gathering. For each unsolved decision branch, the number of resulting pairs can be recorded. If all checks reveal no further immediate contradictions, a hypothesis must be formed. In that case it is usually (but not necessarily) advantageous to choose the decision branch resulting in most pairs.
If one branch results in 5 pairs and another branch results in 19 pairs, it is definitely good to check the 19-pair branch first.
Here is the pair reduction analysis of the World’s Hardest Sudoku:
# * PAIR REDUCTION ..
# * LEVEL 0 PASS 1 ROUND 1 (AUTO SOLVE) (G8)
# * 8..........36......7..9.2...5...7.......457.....1...3...1....68..85...1..9....4..
# * PAIR G8: 3,9 BLK 9
# G7: 3,9,5 # reduction candidate for 3,9
# G7: 3,9 # 3 pairs
# I8: 3,9,2,7 # reduction candidate for 3,9
# I8: 3,9 # 7 pairs
# * PAIR G8: 3,9 ROW 8
# F8: 3,9,2,4,6 # reduction candidate for 3,9
# F8: 3,9 # 4 pairs
# F8: 2,4,6 # 2 pairs
# * PAIR G8: 3,9 COL G
# G1: 3,9,1,5,6 # reduction candidate for 3,9
# G1: 3,9 # 6 pairs
# G1: 1,5,6 # 1 pairs
# * UNSOLVED!
The decision graph for the first level does not look very promising:
However the actual pair reduction that leads to the solution does not exceed level 2 and gives a much better picture:
Note
The fact that sometimes more than one value is allowed/disallowed in a decision branch has nothing to do with the binary nature of the decision. I have a hard time understanding the necessity to consider only single values, which seems to be some sort of quality criterium for certain people. The strictness of a binary decision does not go away, just because more than one possible value is considered. I also do not find it harder to follow the consequences of multiple values than following the consequences of a single value.
The following hypothesis/disproof/proof chain leads to the solution:
- STA Original Sudoku
- HYP # I8: 3,9
- DIS # I8: 3,9 # B1: 1,2 => CTR => B1: 6
- DIS # I8: 3,9 # B1: 1,2 => CTR => B1: 6
- STA # I8: 3,9 + B1: 6
- DIS # I8: 3,9 + B1: 6 # A2: 1,2 => CTR => A2: 5,9
- STA # I8: 3,9 + B1: 6 + A2: 5,9
- DIS # I8: 3,9 + B1: 6 + A2: 5,9 # B5: 1,2 => CTR => B5: 3,8
- STA I8: 2,7
- HYP I8: 2,7 # G7: 5
- DIS I8: 2,7 # G7: 5 # G4: 6 => CTR => G4: 1,8
- STA I8: 2,7 # G7: 5 + G4: 1,8
- DIS I8: 2,7 # G7: 5 + G4: 1,8 # C5: 2,9 => CTR => C5: 6
- STA I8: 2,7 # G7: 5 + G4: 1,8 + C5: 6
- DIS I8: 2,7 # G7: 5 + G4: 1,8 + C5: 6 # H3: 4,5 => CTR => H3: 8
- DIS I8: 2,7 # G7: 5 + G4: 1,8 + C5: 6 + H3: 8 => CTR => G7: 3,9
- STA I8: 2,7 + G7: 3,9
- HYP I8: 2,7 + G7: 3,9 # A8: 3,4,6
- DIS I8: 2,7 + G7: 3,9 # A8: 3,4,6 # A9: 3 => CTR => A9: 6,7
- STA I8: 2,7 + G7: 3,9 # A8: 3,4,6 + A9: 6,7
- DIS I8: 2,7 + G7: 3,9 # A8: 3,4,6 + A9: 6,7 # D7: 2,7 => CTR => D7: 4,9
- PRF I8: 2,7 + G7: 3,9 # A8: 3,4,6 + A9: 6,7 + D7: 4,9 => SOL
- STA I8: 2,7 + G7: 3,9 # A8: 3,4,6 + A9: 6,7 + D7: 4,9
Section HDP Chain Illustrations shows Screenshots for all the stages.
I have since automated this algorithm in a first crude manner to allow for automatic classification of hardness. In its current completely unoptimized form, the program comes back with this chain of reasoning (see Analysis of xx-worlds-hardest-sudoku-work.sdk for automated result):
DIS # I8: 3,9 # B1: 1,2 => CTR => B1: 6
DIS # I8: 3,9 + B1: 6 # A2: 1,2 => CTR => A2: 5,9
DIS # I8: 3,9 + B1: 6 + A2: 5,9 # B5: 1,2 => CTR => B5: 3,8
DIS # I8: 3,9 + B1: 6 + A2: 5,9 + B5: 3,8 => CTR => I8: 2,7
DIS I8: 2,7 # G7: 5 # G4: 6 => CTR => G4: 1,8
DIS I8: 2,7 # G7: 5 + G4: 1,8 # A4: 2,9 => CTR => A4: 1,3
DIS I8: 2,7 # G7: 5 + G4: 1,8 + A4: 1,3 # A5: 2,9 => CTR => A5: 1,3,6
DIS I8: 2,7 # G7: 5 + G4: 1,8 + A4: 1,3 + A5: 1,3,6 # C5: 2,9 => CTR => C5: 6
DIS I8: 2,7 # G7: 5 + G4: 1,8 + A4: 1,3 + A5: 1,3,6 + C5: 6 => CTR => G7: 3,9
DIS I8: 2,7 + G7: 3,9 # F8: 3,9 # E9: 2,6 => CTR => E9: 1,3,8
DIS I8: 2,7 + G7: 3,9 # F8: 3,9 + E9: 1,3,8 # F9: 2,6 => CTR => F9: 1,3,8
DIS I8: 2,7 + G7: 3,9 # F8: 3,9 + E9: 1,3,8 + F9: 1,3,8 # F7: 3,9 => CTR => F7: 2,4
DIS I8: 2,7 + G7: 3,9 # F8: 3,9 + E9: 1,3,8 + F9: 1,3,8 + F7: 2,4 # A8: 4 => CTR => A8: 2,7
DIS I8: 2,7 + G7: 3,9 # F8: 3,9 + E9: 1,3,8 + F9: 1,3,8 + F7: 2,4 + A8: 2,7 # B1: 1,2 => CTR => B1: 6
DIS I8: 2,7 + G7: 3,9 # F8: 3,9 + E9: 1,3,8 + F9: 1,3,8 + F7: 2,4 + A8: 2,7 + B1: 6 => CTR => F8: 2,4,6
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 # B8: 2,4 => CTR => B8: 3,6
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 # B6: 2,4 => CTR => B6: 6,8
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 # D7: 2,4 => CTR => D7: 7,9
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 # F7: 9 => CTR => F7: 2,4
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 + F7: 2,4 # B1: 2,4 => CTR => B1: 1,6
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 + F7: 2,4 + B1: 1,6 # A8: 3,6,7 => CTR => A8: 2,4
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 + F7: 2,4 + B1: 1,6 + A8: 2,4 # A2: 2,4 => CTR => A2: 9
DIS I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 + F7: 2,4 + B1: 1,6 + A8: 2,4 + A2: 9 # E1: 2,3 => CTR => E1: 5
PRF I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 + F7: 2,4 + B1: 1,6 + A8: 2,4 + A2: 9 + E1: 5 => SOL
STA I8: 2,7 + G7: 3,9 + F8: 2,4,6 + B8: 3,6 + B6: 6,8 + D7: 7,9 + F7: 2,4 + B1: 1,6 + A8: 2,4 + A2: 9 + E1: 5
The level index of analyzed sudokus lists some sudokus that I have collected and run through my research tool.
So, the World’s Hardest Sudoku does not seem so hard after all. And in the context of this algorithm it actually isn’t. I have found some harder puzzles, of which tarx0119 is the hardest so far.
The initial pair reduction analysis is pretty meager:
Here is an actual 2-level pair reduction analysis for the sudoku tarx0119:
This sudoku requires 1 level of constraint pair reduction and 2 levels of cell pair reduction to be solved. Here is the inital disproof of hypothesis # A7: 8:
After the clue B7: 8 is detected, the full resolution only requires 2 levels of cell pair reduction. Although there is still only the same lonlely pair available and only one additional clue is detected, the 2-level decision chain presents itself very differently and is now actually quite simple:
[1] | I am well aware, that the algorithm presented is actually called Iterative deepening depth-first search (IDDFS). However, I wish to make a specific point. IMHO IDDFS is actually a misnomer, since it places emphasis on the depth-first part instead of the more important breadth-first component. Even the Wikipedia author seem to feel the need to expand on the breadth-first nature of IDDFS (emphasis added by me):
|