Ages ago I used to play a simple game – or a puzzle, if you will – on pen and paper where one tries to fill a 10×10 grid of cells with numbers from 1 to 100. One starts by inserting number 1 to any of the cells. After that there are strict rules on where to place the following number. It should be 3 cells away in any of the four main directions or 2 cells away in any diagonal direction. One is of course expected to stay in bounds of the grid, and cannot enter a number in a cell that is already occupied by another number. This continues until one has completely filled the grid with numbers, or – more likely – when there are no valid moves left.

Here is an example of starting the game near the center of the board and making a move diagonally right and upwards.

Possible moves after entering one number
Possible moves after entering two numbers

After several attempts, never having succeeded in finishing the game with a grid full of numbers, I quickly wrote an algorithm to find a solution to it just to see if one even exists. I picked a genetic algorithm that created random move combinations for random starting points and then tried to solve the game using those. Even though the choice of algorithm wasn't perhaps the best for this particular task, it didn't take long before it returned a valid solution.

The challenge

Some time later I started writing an algorithm that would find all possible solutions to the game. While this is technically a very simple task, the challenge comes from the vast search space one has to go through to really try all the possible move combinations from all the starting points.

There are 100 starting positions and in theory 8 possible directions to try out for each of the following 99 moves, resulting in a massive search tree with 100 × 899 = 2.546295e+91 unique paths and leaf nodes. Luckily, in reality there are much fewer options because some moves are not valid. For example, just taking into account that one cannot move back to the cell they just came from reduces the number of possible moves to 8 × 798, and considering all the other already occupied cells and the board boundaries limit the options further. But checking for a move's validity has a cost also, so it's easy to see that there's still a lot of calculation to be done.

As I wrote the initial version of the solving algorithm a very long time ago, I did it in Java which was my main working language at the time. I quickly realized the algorithm wasn't efficient enough for solving a 10×10 grid so I tried it out with smaller boards. I was actually quite surprised there are solutions to this game on a 5×5 grid already, and fortunately finding those was rather quick.

Now I'm eager to see how much I can optimize the algorithm and how big of a board can it handle after the changes.

The current state of the algorithm

The algorithm is a naive depth-first search starting from each of the grid cells, one at a time. I ran it on a MacBook Pro laptop (with a 2.9 GHz i9 processor and 32GB of RAM) for boards of sizes 5×5 and 6×6 to get some statistics on how performant the algorithm currently is. I also wanted to get a list of all the solutions for those boards to have something I can validate my results against when making changes.

It seems there are surprisingly many solution to boards of this size already. The 5×5 board had 12 400 unique solutions, with the 6×6 board having a massive set of 8 250 272 different solutions.

These test runs logged all the solutions they found while they ran which obviously affects the performance. But I still timed the runs to get at least a ballpark figure to compare my later runs against. The algorithm found all the solutions for the smaller board in less than a second, with two of my latest documented runs finishing in 873 and 675 milliseconds respectively. For the bigger board my single test run took 25 minutes and 59 seconds which just shows how a small change in the board size makes for an enormous change in the search space size.

It's obvious that the algorithm needs a lot of love if I'd ever like to consider using it for the actual 10×10 board.

Ideas for improvement

Fortunately I have some ideas of things to try. I'll list them in order from what I guess to be the least efficient to the most efficient one just to keep up the tension.

Using another programming language

...well, apart from this one; I don't actually think this is the least efficient approach but this is kind of a special case so I'll get this out of the way first.

I actually got this whole optimization idea because I thought about rewriting the thing completely in Rust as a learning exercise but felt that the current implementation is so inefficient that comparing it to a Rust version – that I would definitely write more efficiently in many ways – wouldn't be fair. So I decided to first try to improve the Java version at least a bit before doing the jump.

I've thought about doing a version in a fully functional language also, most likely Clojure, out of curiosity. But I'm pretty sure I won't find the motivation and will just postpone that indefinitely.

Using some bit magic

This is one of those things that comes to mind when thinking of optimizations but very rarely actually gives any benefits nowadays. Sure, I could change my code to do a bit shift right instead of dividing a value by 2 to micro-optimize a particular operation or something like that. But one must know the limitations of such tricks also (e.g. the risk of losing fractions of numbers when the results aren't always integers) and more importantly, most compilers should do such optimizations already if they can be recognized, so I don't have my hopes up when it comes to making the code faster with such changes.

Still, I think I'll scan the code with this in mind, just to not miss any opportunities. And it'd be nice to get to do some "low level" coding for a change anyway, even if for a line or two and not that seriously.

Using optimized data structures

My original code uses basic Java data structures to handle data. I'm pretty sure I could find better-performing alternatives in many cases, e.g. replacing growable Lists with Stacks or arrays with a constant size.

More importantly, the implementation is done in a very object-oriented way which on one hand makes it rather easy to read and comprehend, but on the other hand quite possibly makes it less performant. I have used some native data types instead of their boxed alternatives but I'm pretty sure this could be improved still.

Using threads

The current implementation is written as a single-threaded application. As testing different solution paths can easily be done independently and in parallel, e.g. by starting the search from different cells simultaneously, I'd imagine using threads would quite easily improve the performance of the search in whole.

However, I'm definitely not an expert on Java threading, and it seems there are very different opinions on when they're actually beneficial and how many of them one should use. I've even seen posts saying that there's no point in writing thread-handling code manually nowadays as modern processors do that automatically so well nowadays even for (seemingly) single-threaded applications that any manual effort will just slow things down.

I don't think I'll abandon the idea so easily, though. I'll definitely have to try it out with different amounts of threads and thread pooling, and record some statistics myself before I can say I know if it works for this particular case.

I know that the code needs some changes to work better with threads, though, so this experiment doesn't come free.

Precalculating often used values

When I think of all the steps and the details the algorithm has, I'm pretty sure there are checks that are being done several times that could possibly be done only once. That is, some values could likely be calculated before starting to go through the starting points and moves, and cached and used more efficiently during the actual search.

Even if there aren't many opportunities for this kind of optimization, even one could make a big difference if the same value is needed several times during the search.

Limiting the size of the search space with domain knowledge

This last option is by far the most important. While some of the other listed optimizations might help do things faster, this helps avoid work that shouldn't be done in the first place.

Sometimes it's really hard to come up with ideas to narrow down the search space. Luckily in this case the game/puzzle has some characteristics that help narrow down the initial search space by roughly 85% from the very beginning.

I'll have to validate my theory about that before going into more details publicly here, though. But since this is easily the most effective optimization I can think of, I'll start with this and do some benchmarking immediately after I get things working.

Off to the races, then

I think this is as good a plan as I'll ever can come up with, so now I should just make a dive to the deep end and start writing some code. I'll be writing separate posts later of some of the approaches mentioned here, explaining them in more detail and sharing some numbers about the performance effect they had.

We'll see how efficient I can make the algorithm. I hope I can some day say I have all the solutions for a 7×7 grid also, and it didn't take years for my machine to calculate those.

Here are the first three posts about trying out these optimizations:

...and one about a small realization I had — which probably won't help with the optimizations: