An Unreasonably Deep Dive into Project Euler Problem 3
Intro
I’ve been busy as usual for the last couple of months, and haven’t really had time to extend the Project Euler series after Problem 1 and Problem 2 articles.
However, I finally did set aside some time to play around with Problem 3: finding the largest prime factor of a number.
Problem Statement
The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143?
Definitions and Theorems
Definition: For two integers a and b, we can say that a divides b, a is a divisor of b, b is a multiple of a, or a is a factor of b if, for some integer y, y*a=b.
Definition: an integer n is prime if the only divisors of n are itself and 1. Note that 1 and -1 are divisors of every integer.
Definition: a whole number n is composite if it is not prime. That is, it has some integer divisors beyond itself and 1.
Theorem (Fundamental Theorem of Arithmetic): Every integer greater than 1 is either prime, or the product of primes (i.e., composite). Also, for composite number, the prime factorization is unique up to the order of the factors. There is 1, and only 1, prime factorization for a composite number.
Proof: see Wikipedia
To start considering the problem, we know that the given number is either prime or composite (product of unique set of primes). We can assume the number is composite, otherwise the problem statement doesn’t make much sense and/or is trivial.
We must therefore find the unique prime factors of the number, and then find which of those primes is largest.
V1: Brute force
These sorts of approaches are often underestimated and can be very effective in practical applications. They’re a good place to start.
A naive algorithm could be to loop through all the numbers from 2 to n-1, and check if each one divides n. If some number d does divide n, then put it in the list of divisors. After finding all the divisors, check each one to determine if it is prime. To check primality, loop through all the numbers between 3 and d-1 to check if one of the numbers is a divisor of d. If you find any such numbers, d is not prime and therefore not a prime factor of n. If you find no such number, then d divides n but d does not have any other divisors. Thus, d is prime. Thus, d is a prime factor of n.
package v1
// V1:
// Take a number n, calculate all its factors,
// for each factor f, test if f is itself
// prime by getting all its factors. If only 1 and f
// are divisors of f, then f is prime, and a factor
// of n, and therefore a prime factor.
import (
"math"
)
// If the only divisors of n are 1 and n, then n is prime
func isPrime(n int) bool {
if n == 0 || n == 1 {
return false
}
facs := factors(n)
if len(facs) != 0 {
return false
}
return true
}
// Generate all factors
func factors(n int) []int {
facs := make([]int, 0)
for i := 2; i < n; i++ {
if math.Mod(float64(n), float64(i)) == 0.0 {
facs = append(facs, i)
}
}
return facs
}
// Given some n, return the largest prime factor of n
func LargestPrimeFactor(n int) int {
biggest := 0
for i := 2; i < n; i++ {
if math.Mod(float64(n), float64(i)) == 0.0 { // i divides n evenly
if isPrime(i) { // i is prime
if i > biggest { // i is new biggest prime
biggest = i
}
}
}
}
return biggest
}
We have three functions: one which calculates all the factors of a given number, one which tests if a number is prime, and one which ties it all together (LargestPrimeFactor
). While this solution is very inefficient, it does work for small numbers, so the logic is correct. We can see this by writing the necessary tests, using benchmark functions to look at performance of the functions and system.
package v1
import (
"testing"
"reflect"
)
const NUMBER int = 600851475143
const benchmarkNumber int = 13195
const benchmarkPrime int = 29
func Test_factors(t *testing.T) {
type args struct {
n int
}
tests := []struct {
name string
args args
want []int
}{
{name: "factors of zero", args: args{n: 0}, want: []int{}},
{name: "factors of 6", args: args{n: 6}, want: []int{2, 3}},
{name: "factors of a prime, 7", args: args{n: 7}, want: []int{}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := factors(tt.args.n); !reflect.DeepEqual(got, tt.want) {
t.Errorf("factors() = %v, want %v", got, tt.want)
}
})
}
}
func Test_isPrime(t *testing.T) {
type args struct {
n int
}
tests := []struct {
name string
args args
want bool
}{
{name: "even, not prime", args: args{n: 8}, want: false},
{name: "odd, not prime", args: args{n: 15}, want: false},
{name: "zero, not prime", args: args{n: 0}, want: false},
{name: "one, not prime", args: args{n: 1}, want: false},
{name: "a prime number, 7", args: args{n: 7}, want: true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := isPrime(tt.args.n); got != tt.want {
t.Errorf("isPrime() = %v, want %v", got, tt.want)
}
})
}
}
func Test_LargestPrimeFactor(t *testing.T) {
type args struct {
n int
}
tests := []struct {
name string
args args
want int
}{
{name: "benchmarkNumber", args: args{n: benchmarkNumber}, want: benchmarkPrime},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := LargestPrimeFactor(tt.args.n); !reflect.DeepEqual(got, tt.want) {
t.Errorf("LargestPrimeFactor() = %v, want %v", got, tt.want)
}
})
}
}
func Benchmark_factors(b *testing.B) {
benchmarks := []struct {
name string
input int
}{
{name: "benchmarkNumber", input: benchmarkNumber},
}
for _, bm := range benchmarks {
b.Run(bm.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
factors(bm.input)
}
})
}
}
func Benchmark_LargestPrimeFactor(b *testing.B) {
benchmarks := []struct {
name string
input int
}{
{name: "benchmarkNumber", input: benchmarkNumber},
}
for _, bm := range benchmarks {
b.Run(bm.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
LargestPrimeFactor(bm.input)
}
})
}
}
func Benchmark_isPrime(b *testing.B) {
benchmarks := []struct {
name string
input int
}{
{name: "benchmarkNumber", input: benchmarkNumber},
}
for _, bm := range benchmarks {
b.Run(bm.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
isPrime(bm.input)
}
})
}
}
Note that in the Benchmark_
functions, we are using the sub-test/sub-benchmark format, where we define multiple pieces of test data/test cases which can be run. In the benchmark case we are only using one, but we could easily increase that later if desired, so the flexibility is nice.
$ go test -bench=. -benchmem
Benchmark_factors/benchmarkNumber-4 3000 416669 ns/op 248 B/op 5 allocs/op
Benchmark_LargestPrimeFactor/benchmarkNumber-4 2000 625321 ns/op 624 B/op 28 allocs/op
Benchmark_isPrime/benchmarkNumber-4 3000 414224 ns/op 248 B/op 5 allocs/op
So far, not great for performance nor for memory allocations. Let’s improve.
V2: Skip the even numbers
By definition of prime, we know that 2 is the only even prime number. Since all other even numbers are divisible by 2, they have a divisor other than themselves and 1 - thus, they are not prime. Therefore, we can modify our solution to exclude checking whether or not these numbers are prime, and see if that speeds up the system. The code is almost the same as the previous version, with the difference being the addition of an isEven()
function to check the parity of the number.
package v2
import (
"math"
)
// V2:
// We know that 2 is the only even prime, so we can
// exclude that from being checked.
func isEven(n int) bool {
if math.Mod(float64(n), 2.0) == 0.0 {
return true
}
return false
}
// Ignore even factors since they can never be prime
func factors(n int) []int {
facs := make([]int, 0)
for i := 2; i < n; i++ {
if isEven(i) {
continue
}
if math.Mod(float64(n), float64(i)) == 0.0 {
facs = append(facs, i)
}
}
return facs
}
// Even numbers can't be prime, so we don't need to check them
func isPrime(n int) bool {
if n == 0 || n == 1 || isEven(n) {
return false
}
facs := factors(n)
if len(facs) != 0 {
return false
}
return true
}
func LargestPrimeFactor(n int) int {
biggest := 0
for i := 2; i < n; i++ {
if isEven(i) { // skip even numbers
continue
}
if math.Mod(float64(n), float64(i)) == 0.0 {
if isPrime(i) {
if i > biggest {
biggest = i
}
}
}
}
return biggest
}
We use the same testing and benchmarking code as in V1, but with the addition of a test and benchmark for isEven()
.
func Test_isEven(t *testing.T) {
type args struct {
n int
}
tests := []struct {
name string
args args
want bool
}{
{name: "Even number, 4", args: args{n: 4}, want: true},
{name: "Even number, 0", args: args{n: 0}, want: true},
{name: "Even number, -2", args: args{n: -2}, want: true},
{name: "Odd number, 3", args: args{n: 3}, want: false},
{name: "Odd number, 1", args: args{n: 1}, want: false},
{name: "Odd number, -5", args: args{n: -5}, want: false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := isEven(tt.args.n); !reflect.DeepEqual(got, tt.want) {
t.Errorf("isEven() = %v, want %v", got, tt.want)
}
})
}
}
func Benchmark_isEven(b *testing.B) {
benchmarks := []struct {
name string
input int
}{
{name: "Odd number, 3", input: 3},
}
for _, bm := range benchmarks {
b.Run(bm.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
isEven(bm.input)
}
})
}
}
Since we are expecting fewer numbers to be checked, we would expect better performance, right?
$ go test -bench=. -benchmem
Benchmark_factors/benchmarkNumber-4 1000 1903807 ns/op 248 B/op 5 allocs/op
Benchmark_LargestPrimeFactor/benchmarkNumber-4 500 2716207 ns/op 624 B/op 28 allocs/op
Benchmark_isPrime/benchmarkNumber-4 1000 1905843 ns/op 248 B/op 5 allocs/op
Benchmark_isEven/Odd_number,_3-4 50000000 24.7 ns/op 0 B/op 0 allocs/op
WRONG! Though it doesn’t look like our memory allocations have changed, performance is definitely worse than before - but by how much exactly? For this, we can use the helpful benchcmp
command. It’s easy to install with just a go get golang.org/x/tools/cmd/benchcmp
. The we can redirect the output of the V1 benchmark and the V2 benchmark to a file, and use benchcmp to compare the files.
$ go test -bench=. -benchmem > v1.bm
And in the V2 package directory
$ go test -bench=. -benchmem > v2.bm
Then, with both files in the same directory, we can easily see what’s going on.
$ benchcmp v1.bm v2.bm
benchmark old ns/op new ns/op delta
Benchmark_factors/benchmarkNumber-4 414686 1901842 +358.62%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 623854 2723608 +336.58%
Benchmark_isPrime/benchmarkNumber-4 414616 1907886 +360.16%
benchmark old allocs new allocs delta
Benchmark_factors/benchmarkNumber-4 5 5 +0.00%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 28 28 +0.00%
Benchmark_isPrime/benchmarkNumber-4 5 5 +0.00%
benchmark old bytes new bytes delta
Benchmark_factors/benchmarkNumber-4 248 248 +0.00%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 624 624 +0.00%
Benchmark_isPrime/benchmarkNumber-4 248 248 +0.00%
V3
It seems that in the new version, checking the parity of the numbers in order to avoid testing even numbers for primality is actually slower. We know from the deep-dive into Project Euler Problem 2 that testing parity of numbers with a bitwise AND is much faster, so let’s replace our naive implementation with that one and call it V3. There will be no other changes, so we will just swap out the implementation of isEven()
.
func isEven(x int) bool {
if x&1 == 0 {
return true
}
return false
}
Now we can use the same tests and benchmarks, and see if we obtained better performance over V2, and therefore hopefully over V1.
$ go test -bench=. -benchmem
Benchmark_factors/benchmarkNumber-4 10000 222491 ns/op 248 B/op 5 allocs/op
Benchmark_LargestPrimeFactor/benchmarkNumber-4 5000 345445 ns/op 624 B/op 28 allocs/op
Benchmark_isPrime/benchmarkNumber-4 10000 223073 ns/op 248 B/op 5 allocs/op
Benchmark_isEven/Odd_number,_3-4 2000000000 0.29 ns/op 0 B/op 0 allocs/op
As before, the bitwise version is much faster for testing parity. Was there any performance improvement over V2, and hopefully over V1?
$ benchcmp v2.bm v3.bm
benchmark old ns/op new ns/op delta
Benchmark_factors/benchmarkNumber-4 2016066 222356 -88.97%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 2915830 358904 -87.69%
Benchmark_isPrime/benchmarkNumber-4 2013975 222206 -88.97%
Benchmark_isEven/Odd_number,_3-4 26.9 0.29 -98.92%
benchmark old allocs new allocs delta
Benchmark_factors/benchmarkNumber-4 5 5 +0.00%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 28 28 +0.00%
Benchmark_isPrime/benchmarkNumber-4 5 5 +0.00%
Benchmark_isEven/Odd_number,_3-4 0 0 +0.00%
benchmark old bytes new bytes delta
Benchmark_factors/benchmarkNumber-4 248 248 +0.00%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 624 624 +0.00%
Benchmark_isPrime/benchmarkNumber-4 248 248 +0.00%
Benchmark_isEven/Odd_number,_3-4 0 0 +0.00%
Memory usage is the same, but the parity checking time was reduced by about 99%, so it’s almost free now. Let’s see if that bought us some significant improvement over the first version. Intuitively, we are skipping primality checks on half the numbers (the even ones greater than 2) so that should reduce the runtime by about half.
$ benchcmp v1.bm v3.bm
benchmark old ns/op new ns/op delta
Benchmark_factors/benchmarkNumber-4 414686 222356 -46.38%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 623854 358904 -42.47%
Benchmark_isPrime/benchmarkNumber-4 414616 222206 -46.41%
benchmark old allocs new allocs delta
Benchmark_factors/benchmarkNumber-4 5 5 +0.00%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 28 28 +0.00%
Benchmark_isPrime/benchmarkNumber-4 5 5 +0.00%
benchmark old bytes new bytes delta
Benchmark_factors/benchmarkNumber-4 248 248 +0.00%
Benchmark_LargestPrimeFactor/benchmarkNumber-4 624 624 +0.00%
Benchmark_isPrime/benchmarkNumber-4 248 248 +0.00%
That intuition seems to be correct. We reduced the runtime by about 50%. However, if we think about the properties of prime factors, we can take a different and MUCH faster approach to solving the problem.
V4
When we started out, we were given some number n and our naive solution was to check all the numbers from 3 to n-1 to see if they are divisors of n. If so, we then went on to check if the number is prime, and therefore a prime divisor. To check if a number is prime, we generated all its factors, and if there were no factors between 3 and n-1, inclusive, then the number is prime.
The problem is that we did a lot of extra work to check if a number is prime. The reasoning is this: if a number IS NOT prime, then it will have at least one divisor which is less than or equal to its square root. If a number has some divisors between 3 and the number, call them a and b, then the largest they can both be is a = b = sqrt(n) since a * b = n. Therefore, if one of the numbers is larger, then the other must be smaller. The two numbers pivot around the square root of n, and can be at most sqrt(n).
To see why this is true, consider the number 100, which has square root 10. In this case, a = b = 10. If a goes up to 20, b goes down to 5. If we want to know if a number is prime, we can simply check to see if it has any divisors less than its square root.
Additionally, instead of generating all factors and then checking if each is prime, we can simply start at 3 - the smallest possible factor - and work our way up to sqrt(n).
With that in mind, we can change our isPrime()
function and check a much smaller set of numbers.
func isPrime(n int) bool {
if n == 0 || n == 1 || isEven(n) {
return false
}
for i := 3; i <= int(math.Sqrt(float64(n))); i++ {
if isEven(i) {
continue
}
if math.Mod(float64(n), float64(i)) == 0.0 {
return false
}
}
return true
}
Now we can run the same set of tests (which pass) and benchmarks again to see how much performance we gained (if any).
$ benchcmp v3.bm v4.bm
ignoring Benchmark_factors/benchmarkNumber-4: before has 1 instances, after has 0
benchmark old ns/op new ns/op delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 358904 223015 -37.86%
Benchmark_isPrime/benchmarkNumber-4 222206 239 -99.89%
Benchmark_isEven/Odd_number,_3-4 0.29 0.29 +0.00%
benchmark old allocs new allocs delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 28 0 -100.00%
Benchmark_isPrime/benchmarkNumber-4 5 0 -100.00%
Benchmark_isEven/Odd_number,_3-4 0 0 +0.00%
benchmark old bytes new bytes delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 624 0 -100.00%
Benchmark_isPrime/benchmarkNumber-4 248 0 -100.00%
Benchmark_isEven/Odd_number,_3-4 0 0 +0.00%
Great! We now have 0 memory allocations, which will make things faster, and we check fewer numbers, which will also make things faster. This is demonstrated by the 99% reduction in runtime for isPrime()
and a 38% reduction for finding the largest prime factor compared to V3.
If we compare to V1, we see that we are down about 64%.
$ benchcmp v1.bm v4.bm
ignoring Benchmark_factors/benchmarkNumber-4: before has 1 instances, after has 0
benchmark old ns/op new ns/op delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 623854 223015 -64.25%
Benchmark_isPrime/benchmarkNumber-4 414616 239 -99.94%
benchmark old allocs new allocs delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 28 0 -100.00%
Benchmark_isPrime/benchmarkNumber-4 5 0 -100.00%
benchmark old bytes new bytes delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 624 0 -100.00%
Benchmark_isPrime/benchmarkNumber-4 248 0 -100.00%
Bonus round: V5
If we wanted to squeeze out a bit more performance (without going on to more advanced methods like the General Number Field Sieve, the current top performer for prime factorization) we can take advantage again of the fact that all primes larger than 2 are odd. With this fact in mind, we only need to check odd divisors, and when checking for primality, we need only check odd numbers. This removes the call to isEven()
entirely, since we don’t need to check for that now, but it does make our isPrime()
implementation more brittle since we are assuming the caller is always providing odd numbers.
package v5
import (
"math"
)
func isPrime(n int) bool {
for i := 3; i <= int(math.Sqrt(float64(n))); i = i + 2 {
if math.Mod(float64(n), float64(i)) == 0.0 {
return false
}
}
return true
}
func LargestPrimeFactor(n int) int {
biggest := 0
// First, find factors
for i := 3; i < n; i = i + 2 {
if math.Mod(float64(n), float64(i)) == 0.0 {
// i is a factor, so check if it's prime
if isPrime(i) {
if i > biggest {
biggest = i
}
}
}
}
return biggest
}
Now we start at 3 and increment by 2 in each loop, which will only test odd numbers. The performance is a bit better, but we’re starting to get only marginal benefits at this point.
$ benchcmp v4.bm v5.bm
ignoring Benchmark_isEven/Odd_number,_3-4: before has 1 instances, after has 0
benchmark old ns/op new ns/op delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 223015 201270 -9.75%
Benchmark_isPrime/benchmarkNumber-4 239 231 -3.35%
benchmark old allocs new allocs delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 0 0 +0.00%
Benchmark_isPrime/benchmarkNumber-4 0 0 +0.00%
benchmark old bytes new bytes delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 0 0 +0.00%
Benchmark_isPrime/benchmarkNumber-4 0 0 +0.00%
We get another 10% or so increase overall compared to V4. What about compared to our original implementation?
$ benchcmp v1.bm v5.bm
ignoring Benchmark_factors/benchmarkNumber-4: before has 1 instances, after has 0
benchmark old ns/op new ns/op delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 623854 201270 -67.74%
Benchmark_isPrime/benchmarkNumber-4 414616 231 -99.94%
benchmark old allocs new allocs delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 28 0 -100.00%
Benchmark_isPrime/benchmarkNumber-4 5 0 -100.00%
benchmark old bytes new bytes delta
Benchmark_LargestPrimeFactor/benchmarkNumber-4 624 0 -100.00%
Benchmark_isPrime/benchmarkNumber-4 248 0 -100.00%
Overall, we are down about 68% for the small benchmark number provided in the challenge. Let’s try a bigger number and see if the percentages change as the numbers increase (they should). The random number I first typed originally was so slow to compute with V1 that I cancelled the benchmark. I’ve replaced it with a random 10 digit number 8275832758
. Here are the results (I removed all the benchmark data that was the same as before).
$ benchcmp v1.1.bm v5.1.bm
Benchmark_isPrime/8275832758-4 275850428911 9771163 -100.00%
I tried to also do the benchmark for LargestPrimeFactor()
as well, but go test
died mysteriously due to a SIGQUIT somewhere. Perhaps a bug report is in order.
Outro
At this point, we aren’t getting much more in the way of performance improvements, so there’s no need to carry out more changes to this algorithm. If we want to have better performance, especially for numbers with a greater number of digits, we will have to change the algorithm and use something more intelligent and parallelizable (there are other sieves for which this is possible).