I've been writing about a Perl sudoku solver. Sudoku is a great example of a problem worthy of study. It looks easy from the outside, but when you have a go at solving it, it gets tricky very quickly. The comments on this series have included suggestions to go recursive. Well, that's definitely a better approach than the one I am documenting here – but the nice thing is that I know that for sure because I tried it the simple way first. Less is not always more. Anyway, this series is a retrospective, so if I may beg your indulgence, we shall continue to stick our heads in the sand for a little while longer.
So where were we? Well, at this point it may help to actually name some of the processes we have applied to solve sudoku puzzles. This page on solving sudoku provides us with the necessary nomenclature. If you review that page, you'll see that what is refered to as excluding
and reducing
in the Perl code is more commonly known as removing singles and hidden singles respectively.
Once I got this far I was a bit stuck. I refused to allow myself to look up anything on the web (yeah, a bit pointless, considering my original aim was to win €150!). Eventually I realised that if you have two cells, with just two number possibilities in them, then no other cells in the same vertical, horizontal or home square can have those two numbers. If other cells did have, say, one of the numbers, then that would solve one of the double number cells, leaving the other inconsistent with the other solved cell. Ok, an example: say we have 12
and 12
and 123
in the top left corner, with all other relevant cells solved. Then we can exclude 1
and 2
from the 3rd cell, giving 12
and 12
and 3
. This approach is known as naked pairs.
Actually if you think about it for a moment, this technique also applies to sets of three cells with three numbers, four cells with four numbers, and so on. I have only coded it up for pairs however, and that has solved all the sudokus that I have tried.
Before I show you the code, let's talk economics. You see, I had to stop doing the Irish Independent Super Sudoku. Firstly, and most importantly, entering a load of hex digits into square boxes every weekend get boring very quickly. Secondly, I wasn't winning anything and it was starting to look like I might actually actually lose money. In order to enter the competition you have to buy the paper, €1.50, and then you have to post off the entry, another €0.48. Do that every week and after 76 weeks, you'll hit €150.48 (I was already 10 weeks down). Plus it's boring filling out the squares. Did I mention that already?
Right, here's how we do naked pairs. By the way, Perl was really great for hacking out this solution, but it is still pretty verbose. I'm considering porting this to Javascript and some other scripting languages, just for the craic.
sub excludePair {
my @cell = @{shift()};
my $prow = shift;
my $pcol = shift;
my $valbin = $cell[$prow][$pcol];
if( 1 == keys(%{$valbin}) ) {
return;
}
my $cmpstr = join('',sort(keys(%{$valbin})));
my @vals = keys(%{$valbin});
my $valsmatch = 0;
for( my $c = 0; $c < ($sqsize*$sqsize); $c++ ) {
if( $c != $pcol ) {
my $testbin = $cell[$prow][$c];
if( 2 == keys(%{$testbin}) ) {
my $teststr = join('',sort(keys(%{$testbin})));
if( $cmpstr eq $teststr ) {
$valsmatch = 1;
}
}
}
}
if( $valsmatch ) {
for( my $c = 0; $c < ($sqsize*$sqsize); $c++ ) {
if( $c != $pcol ) {
my $testbin = $cell[$prow][$c];
if( 2 <= keys(%{$testbin}) ) {
my $teststr = join('',sort(keys(%{$testbin})));
if( $cmpstr ne $teststr ) {
for (@vals) {
delete(${$testbin}{$_});
}
}
}
}
}
}
$valsmatch = 0;
for( my $r = 0; $r < ($sqsize*$sqsize); $r++ ) {
if( $r != $prow ) {
my $testbin = $cell[$r][$pcol];
if( 2 == keys(%{$testbin}) ) {
my $teststr = join('',sort(keys(%{$testbin})));
if( $cmpstr eq $teststr ) {
$valsmatch = 1;
}
}
}
}
if( $valsmatch ) {
for( my $r = 0; $r < ($sqsize*$sqsize); $r++ ) {
if( $r != $prow ) {
my $testbin = $cell[$r][$pcol];
if( 2 <= keys(%{$testbin}) ) {
my $teststr = join('',sort(keys(%{$testbin})));
if( $cmpstr ne $teststr ) {
for (@vals) {
delete(${$testbin}{$_});
}
}
}
}
}
}
}
Well that's a candidate for The Daily WTF anyway. Notice that I was too lazy to actually code up the home square part. Oh well.
This subroutine assumes that it is only called on cells with two numbers left. It searches for another cell that matches the current one, and if found, removes the two numbers of these two cells from all the other relevant cells. The first two for loops handle the column, and the second two handle the row.
The expression join('',sort(keys(%{$testbin})))
gets the keys of the cell, that is, the numbers in the cell, sorts them and concatenates them together to form a canonical comparison string. Probably not the most efficient way of checking each cell for two numbers.
Ok, I guess I'll let you have the whole damn thing now. Download the Perl code at your peril. It's public domain and the usual disclaimers apply (not guilty for “thermonucular” destruction of your computer, etc.).
I am quite interested in seeing how far one can push this type of heuristics-based approach. So the plan now is to write up some sort of test harness and generate a few unsolvable sudokus. I would like to keep adding heuristics as described on the suduko hints page above and see how far we get. I may then compare it to a proper solution, using real algorithms, but for now, Good Bye, Good Night and Good Luck!.