Prime factors kata

The prime factors kata is popularised by Uncle Bob (Robert C. Martin). He is well known for his championing of Test Driven Development (TDD) and clean code. Uncle Bob often uses the prime factors kata to explain the power of TDD. The kata is interesting, because it shows the power of incremental steps. Every test you introduce will fine tune the algorithm until finally a working factorisation algorithm emerges. If you follow the rules of TDD it can sometimes feel as if the algorithm arrives out of thin air.

I would recommend performing this kata yourself before you continue reading. The best way to do the kata is by taking incremental steps like this:

  1. The number 1 returns an empty list of factors
  2. The number 2 returns a list with the factor 2
  3. The number 3 returns a list with the factor 3
  4. The number 4 returns a list of the factors 2 and 2
  5. Etc

For every new number you only make the minimum changes to pass the test. After a few steps you will notice the algorithm becomes better and better and more and more tests start passing without any code changes. You can follow these steps until a fully working algorithm emerges.

If you have no experience with TDD, I would recommend to read the introduction article: Test driven development: red, green, refactor. After trying the kata you can continue in this article to see how I would solve the kata.

The number 1 returns an empty list

The number one doesn't have any prime factors. Providing the number 1 to our function should return an empty list. We can test that in our first unit test:

class PrimeFactorTest extends TestCase
{
    /** @test */public function number_1_has_no_factors()
    {
        $factor = new PrimeFactor();
        $this->assertEquals([], $factor->factorize(1));
    }
}

This test fails, because we don't have a PrimeFactor class:

Error : Class "Webdevils\Katas\PrimeFactors\PrimeFactor" not found

We can solve that easily:

class PrimeFactor
{

}

Unfortunately our test still doesn't pass.

Error : Call to undefined method Webdevils\Katas\PrimeFactors\PrimeFactor::factorize()

Again, we can solve this error easily by adding a factorize method to our class. I return a non-empty list to make sure we see the actual test fail.

public function factorize(int $number) : array
{
    return [-1];
}

And as expected: our test fails, but this time for the right reason.

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
+    0 => -1
 )

Let's solve that by returning an empty array:

public function factorize(int $number) : array
{
    return [];
}

Anything to refactor? Not really, on to our next test.

The number two has the prime factor two

We can add the next test for the number two. The number two has a single prime factor: the number two.

/** @test */
public function number_2_has_2_as_a_factor()
{
    $factor = new PrimeFactor();
    $this->assertEquals([2], $factor->factorize(2));
}

Our algorithm doesn't support the number two yet.

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 2
 )

What is the simplest way to make the test pass? Just check if the number is larger than 2 and return 2. Otherwise return an empty list.

public function factorize(int $number) : array
{
    if($number > 1) {
        return [2];
    }
    
    return [];
}

And our test passes. Hooray!

Time to do some refactoring in our test. We have two tests that look similar and we know that more similar tests will follow. We can easily get rid of the repetition by using a dataProvider. We create two test methods: a method that performs the test, but expects a few parameters and another method, the data provider, that returns a list of values for these parameters. This is how it would look like in our test:

/**
 * @test
 * @dataProvider list_of_numbers_and_prime_factors
 */
public function returns_prime_factors_for_a_number(int $number, array $primeFactors)
{
    $factor = new PrimeFactor();
    $this->assertEquals($primeFactors, $factor->factorize($number));
}

public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]]
    ];
}

Additional advantage of dataProviders is that adding new tests is relatively easy.

Next, we can also make our factorization code a bit nicer. We have two return statements, but what we actually want is a list of factors and if the number is larger than 1, we want to add an additional factor:

public function factorize(int $number) : array
{
    $factors = [];

    if($number > 1) {
        $factors[] = 2;
    }

    return $factors;
}

By reducing everything to a single

The number three has prime factor three

Let's add a test for the number three:

public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]],
        [3, [3]],
    ];
}

And we fail the test again:

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 3
+    0 => 2
 )

What is the easiest way to pass this test? Instead of adding the number two, we add the value passed to the factorize function:

public function factorize(int $number) : array
{
    $factors = [];

    if($number > 1) {
        $factors[] = $number;
    }

    return $factors;
}

And our tests pass again.

We have nothing to refactor, so let's continue with the next number

The number 4 has 2 and 2 as prime factors

This one is a bit more complicated. Let's add the test:

public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]],
        [3, [3]],
        [4, [2, 2]],
    ];
}

Our test will fail, because we add the number two only once:

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 2
-    1 => 2
+    0 => 4
 )

As expected we get the number 4. We have to do a check if the number is dividable by 2. If it is dividable, we store 2 as a factor and store the remainder. We then add the remainder to our list as a factor as well. In that way we should also pass the three case. Let's try that:

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;

    if($remainder > 1) {
        if($remainder % 2 == 0) {
            $factors[] = 2;
            $remainder /= 2;
        }

        $factors[] = $remainder;
    }

    return $factors;
}

Looks good, but when we run the test we get an error. Our second test case for the number two fails:

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
     0 => 2
+    1 => 1
 )

Two divided by Two is one, which gives us a remainder of one. One isn't a prime factor and shouldn't be added to the list. We can solve this problem by adding a check if the remainder is larger than one:

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;

    if($remainder > 1) {
        if($remainder % 2 == 0) {
            $factors[] = 2;
            $remainder /= 2;
        }

        if($remainder > 1) {
            $factors[] = $remainder;
        }
    }

    return $factors;
}

Now our test passes, but our code looks a bit ugly. Let's refactor.

We check if the remainder is larger than 1. Within the if statement we check if the remainder is larger than 1 again. We can move the nested if statement to the outside of the first if statement and the code would do the same thing:

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;

    if($remainder > 1) {
        if($remainder % 2 == 0) {
            $factors[] = 2;
            $remainder /= 2;
        }
    }

    if($remainder > 1) {
        $factors[] = $remainder;
    }

    return $factors;
}

Let's run the tests to verify. And yes! We are still green!

Factorizing the numbers 5 until 8

And now something extraordinarily happens. If we test for 5, 6 or 7, the test actually passes! Which means that even though it feels like our algorithm is kind of hacked together, it is moving in the right direction.

public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]],
        [3, [3]],
        [4, [2, 2]],
        [5, [5]],
        [6, [2, 3]],
        [7, [7]],
    ];
}

Unfortunately it is going wrong with the number 8:

 public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]],
        [3, [3]],
        [4, [2, 2]],
        [5, [5]],
        [6, [2, 3]],
        [7, [7]],
        [8, [2, 2, 2]],
    ];
}

The number 8 has three prime factors: 2, 2 and 2. Our algorithm cannot work with more than 2 prime factors yet and our test fails:

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
     0 => 2
-    1 => 2
-    2 => 2
+    1 => 4
 )

We get the prime factors 2 and 4 instead of 2, 2 and 2.

Luckily we can solve this problem easily. What is the simplest step to go back to green again?

Right! We just change the if statement to see if the remainder is dividable by 2 into a while. How cool is that?

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;

    if($remainder > 1) {
        while($remainder % 2 == 0) {
            $factors[] = 2;
            $remainder /= 2;
        }
    }

    if($remainder > 1) {
        $factors[] = $remainder;
    }

    return $factors;
}

And our code passes again.

The number 9

Next up, we have the number 9. The prime factors for the number 9 are 3 and 3:

public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]],
        [3, [3]],
        [4, [2, 2]],
        [5, [5]],
        [6, [2, 3]],
        [7, [7]],
        [8, [2, 2, 2]],
        [9, [3, 3]],
    ];
}

And this test fails again. Nine isn't dividable by 2 and therefore falls into the last case, which means we just return the number 9 itself as prime factor:

Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 3
-    1 => 3
+    0 => 9
 )

Ok, how can we change our current code to also check for three? We can first extract the number 2 into its own variable called factor:

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;
    $factor = 2;

    if($remainder > 1) {
        while($remainder % $factor == 0) {
            $factors[] = $factor;
            $remainder /= $factor;
        }
    }

    if($remainder > 1) {
        $factors[] = $remainder;
    }

    return $factors;
}

We want to try the number 2 first, but if the remainder isn't dividable by 2 and greater than 1, we want to start dividing by 3. And after that 5, etc.

Most of the code is already there. We want to keep dividing while greater than 1, so we can change the first if statement into a while loop. In order to prevent infinite looping, we can increase the factor by one and try dividing by that.

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;
    $factor = 2;

    while($remainder > 1) {
        while($remainder % $factor == 0) {
            $factors[] = $factor;
            $remainder /= $factor;
        }

        $factor++;
    }

    if($remainder > 1) {
        $factors[] = $remainder;
    }

    return $factors;
}

And our code passes the test. We can do some refactoring though. After the while statement we know for sure that the remainder will be 1. Otherwise we would still be in the loop. This means that the if statement after the loop will always goes to false and can be removed. Let's try it and run the tests:

public function factorize(int $number) : array
{
    $factors = [];
    $remainder = $number;
    $factor = 2;

    while($remainder > 1) {
        while($remainder % $factor == 0) {
            $factors[] = $factor;
            $remainder /= $factor;
        }

        $factor++;
    }

    return $factors;
}

And great, we still pass the test!

Final test

I think our little algorithm actually is done. It will work with any number and will factorize it. Let's try this out with one more test for a number with many prime factors:

public function list_of_numbers_and_prime_factors() : array
{
    return [
        [1, []],
        [2, [2]],
        [3, [3]],
        [4, [2, 2]],
        [5, [5]],
        [6, [2, 3]],
        [7, [7]],
        [8, [2, 2, 2]],
        [9, [3, 3]],
        [2*2*2*3*5*11*13*17*17*23, [2, 2, 2, 3, 5, 11, 13, 17, 17, 23]],
    ];
}

Run the tests, and we are still green!

Conclusion

Creating a test for every following numbers and just doing the minimum necessary change let to a working algorithm for factorization. Of course we could find ways to optimize the algorithm. And this technique wouldn't work for every mathematical function (otherwise this would be the technique mathematicians would use to prove every maths problem). It is however a good way to practice and show the power of TDD.

The code for this episode is available on Github. Happy practising!

Author

Mark Kazemier's avatar
Mark Kazemier

Hi, my name is Mark. I'm the founder of webdevils.nl and love developing websites and other web applications. Through Webdevils.nl I want to spread my enthousiasm about the web and PHP. In my professional live I'm a security expert specialised in security monitoring.

View all posts