Introduction

Sudoku Solver & Generator is a web application written in PHP. It's main intention is to be able to solve any N*N sudoku puzzle, but it can also perform basic sudoku puzzle generation. I've written this web application simply because I was interested in the algorithms behind solving Sudokus, rather than doing sudoku puzzles by hand.

Personally I think sudoku puzzles are quite boring, because they simply require lots of manual labor to solve. While good perception and intuition are great help in solving sudokus, in the end, all it requires is simple work and application of solving algorithms. Thus, generalizing these algorithms is much more interesting than actually solving the sudokus.

Of course, no disrespect intended for expert sudoku solvers, who can apply logics even beyond what is implemented in this solver. Solving those very hard sudokus is something quite different compared to solving your every day sudokus found in newspapers and magazines.

Sudoku terminology

If you are not familiar with the sudoku terminology, it may be hard to understand the solver and what's written on this page. Thus, here is a short reference to the meaning of different terms.

  • Size - Size refers to the number of columns or rows in the puzzle.
  • Region - Sudoku has same amount of regions as it has columns or rows.
  • Cell - Each row, column and region consists of equal amount of cells.
  • Unit - Unit means a column, a row or a region.
  • Solution - The correct number for a cell.
  • Candidate - Possible solution for a cell.
  • Symmetric Sudoku - A sudoku puzzle where every region is a square

Using the Solver

Using the sudoku Solver should be simple enough to solve standard 9x9 and 16x16 puzzles. Just choose the appropriate solver and enter the given clues for the sudoku solver and hit "Solve the Sudoku Puzzle" button.

Using the custom solver may be a bit more difficult, since you will have to manually enter each region for the puzzle. To enter regions, simply use the same letter to indicate the same region in the puzzle in the "Edit Regions" tab. It doesn't matter what characters you use. Just use enough of different kinds of characters to indicate the different regions. Once you go back to the "Edit Cells" tab, you will see colorized version of the regions to see indication of how you entered them. Just enter the clues for the sudoku puzzle after that and hit the solve button.

Alternatively, you can also go to the Templates section to select one of the premade sudoku puzzle templates that are somewhat commonly seen in sudokus. Note that it is not possible to use the custom sudoku solver to solve any sudoku puzzle that does not conform to the standard N*N format.

Once you've submitted the sudoku puzzle, the solver will attemp to solve it and display the results of it's attempt. You will be shown the puzzle as you submitted it and the solution to it in addition to information about how the solver solved the puzzle.

If you made some mistake while inputting the sudoku, you will be given an error message and you can go back to the solver in order to fix the problem in the sudoku puzzle. Note that sometimes you may be given two solutions to a sudoku puzzle. This happens if the sudoku has more than single solution. The solver will also inform that the puzzle is a not a proper sudoku puzzle, since it has more than single solution.

Advanced options

Advanced options in the solver allow you to change more advanced settings about the solver. These settings are usually best left in default position, but sometimes you may need to change them for various reasons. Here are the explanations for the different advanced settings

  • Guess algorithm - This setting determines whether the guessing algorithm is enabled. When the sudoku solver fails to make progress with any of the available logics, it will attempt to make a guess using a simple implementation of Ariadne's Thread. However, if this setting is disabled, the solver will stop all solving attemps once it reaches a point it can not get past without logic.
  • Candidate guess order - This settings changes the order in which different candidates are guessed for particular cell. The candidates can be guess from lowest to highest value, highest to lowest value or in random order. It is worth noting, that the candidate value is nonobvious. Value is determined by order of appearance in the sudoku puzzle, rather than, say, the ASCII value of the character.
  • Verify single solution - When sudoku solver needs to make a guess to solve the sudoku puzzle, it can not be certain that there are no other solutions without checking all possible guesses. Normally, the solver exhausts all options to verify a single solution or stops at the second solution. If this option is disabled, the solver will stop at the first found solution, and indicate that it is not sure whether there is multiple solutions or not.

Logic behind solving sudokus

When the sudoku puzzle is first loaded by the solver, it scans through the puzzle to find all the candidates for each cell. The solver keeps track of this information through solving the puzzle in many different ways, because it helps to solve and especially speed up the solving process quite a bit.

Solving the sudoku puzzle consists of two things. Finding correct solutions for each cell and eliminating candidates from cells. To do these tasks, the solver employs various different algorithms to find the solutions and candidates to remove. In sudoku, there are basically two different ways to find the correct solution for a cell.

  • (Naked) Singletons (A) - When a cell contains only one candidate, that candidate must be the solution for that cell, because no other candidate can be placed in that cell. To find these, the solver keeps track of number of candidates in each cell, and when it notices a cell with only one candidate, the solver puts that candidate in the cell.
  • Hidden Singletons (B) - A hidden singleton is a candidate that only exists once in a single unit. That is, only that cell has that particular candidate. Because each unit must have all different candidates in one of the cells, the solution for that particular cell is that candidate. To find these, the solver keeps track of the number of different candidates in each unit. When it finds a unit where one candidate exists only once, it will scan through the cells in that unit to find the cell with that candidate and place it that cell.

Generally, both of these solution finding methods are extremely fast to compute and provides no difficulty to a human solver either. Sudoku puzzles released in newspapers and magazines usually rate sodokus that can be solved using only these methods from easy to normal, because neither of these are particularly hard to notice

These two methods for finding solutions is not even nearly enough to solve harder sudoku puzzles. Solving harder puzzles requires logic to eliminate candidates from the cell. Some of these logics may be easy to do by hand, but more often than not, candidate elimination logic requires manual tracking of all candidates in each cell, which may become quite tedious to do with just pen and paper.

There exists many different obscure logics to remove candidates in many special cases. However, Most of the sudoku puzzles can be solved by eliminating candidates with couple common logics, which are implemented into this sudoku solver.

  • Candidates locked to regions (D) and rows or columns (E) - A candidate is "locked" to a region, when all cells with that candidate exist within a single row or column. Because that region must have that candidate in one of it's cells, the candidate can not exist anywhere else in that particular row or column.

    A candidate locked to row or column works in similar fashion, except the other way around. A candidate is locked to a row or column when all cells with that candidate exist within a single region. Because that row or column must also have all candidates, the candidate can not exist anywhere else in the region it's locked to.

    These two elimination logics are quite common in difficult sudoku puzzles. They can even be moderately easy to notice without tracking candidates manually, and often lead to direct results, since not many sudoku puzzles require applying various elimination logics in row.

  • Hidden (F) and Naked (G) chains - A naked chain (also known as naked pair/triple/quadruple) exists within a unit when exactly N different cells containly only N different candidates. Because each of these cells must be solved, they take up all the candidates in the naked chain. Because of this, the candidates part of the naked chain can not exist in any other cell in the unit.

    A hidden chain is the reverse of naked chain. A hidden chain exists within a unit when all the cells containing N different candidates are exactly N different cells. Because all of those candidates must exist in at least one of the cells, these cells can not contain any other candidates.

    Note that applying both of these two logics is useless, because they find exactly the same candidates to remove. In other words, a naked chain and hidden chain always exist together. It may sound like it makes them easy to find, but generally these can be hard to notice. Some naked chains, like when two cells contain the same two candidates are easy to notice, but when 4 different cells contain 2 candidates each (but the same 4 candidates), it may be harder to notice.

    Even though it's usually easier for person to see naked chains, the solver attempts to find hidden chains instead of naked chains by default. This is because the algorithm for finding hidden chains is better optimized compared to the algorithm for finding naked chains. Sometimes it may also be useful the disable both of these algorithms, because they are not particularly fast algorithms due to the number of different combinations they have to try.

  • N-Wing (H) - N-Wing is the arbitrary form of wing algorithm. Smaller forms, such as 2nd and 4rd form are better known as X-Wing and Swordfish algorithms. This algorithm is rarely needed even in the harder sudoku puzzles, because these can be very hard to notice manually. It is interested, however that computationally the solver is faster at finding N-Wings than naked or hidden chains.

    For any of the different candidates, take any N number of rows. If these N rows contain the candidate in total of exactly N different columns, you've found an N-Wing. Because each of these rows must have the candidate and the only possible cells are in these N different columns, the candidate can not exist in cell in these columns that are not part of these N rows.

While the intention of this solver is to be able to solve sudoku puzzles without resorting to guessing, the truth is that there exists many different logics that can be used to solve few spesific cases. This solver only implements the few common logics above which have been expanded to the Nth complexity. This leaves the solver unable to solve many very hard sudokus with logic only. That's why, the solver can also make guesses to solve sudokus. (which is also computationally very easy and fast to implement)

  • Guess algorithm (C) - When all else fails, the solver will simply make a guess about the correct solution. The solver will always guess in a cell with least amount of different candidates in order to reduce the number of possibilities as much as possible. If the solver ends up in state with no possible solution, it will simply restore a state from previous guess and make a new guess.

    The guessing is actually quite fast. Even if it has to return to previous points plenty of times, it's usually faster to just guess instead of trying to solve the sudoku by eliminating candidates. However, some puzzles create nasty "traps" for the guessing algorithm, which can result in very slow solving times depending on how early guessing is required and whether the solver is likely to guess right or not.

If you're wondering why I haven't implemented more elimination logics to this solver then the answer can be summarized in one word: difficulty. It's important to remember, that this sudoku solver can solve any N*N sudoku puzzle regardless of the region arrangement. Thus, any logic must be independent of the region arrangement and any logic must be generalized to the Nth complexity to be useful. I'm not really a mathematician, so implementing even the existing logics has proven to be difficult enough.

I have studied the elimination logics from various sites, but the actual programmed implementations are all my own. I have not even used any other implementations as an example or reference for ideas.

Javascript Benchmark

The sudoku solver takes advantage of javascript especially with the custom sudoku solver to make it more userfriendly. However, there were some issues with the speed of recreating the sudoku board. To help me determine the best option, I created a small javascript benchmark for the purpose.

It is not very comprehensive test of almost anything except for the most suitable method for this particular purpose, so don't take it too seriously.

The bench mark is available here.