Part 1: First Attempts
A walkthrough of my idea and some attempts to implement it.
Page Details
Author: Suraj S. Singh
Word Count: 4513 words
First Created: 2023-04-09
License: CC BY-NC-SA 4.0
Tags:
- programming-language
- python
In this post, I will go over some of my attempts to implement a better version of the Sieve of Eratosthenes.
If you would like to read up on the background information for this topic, please see Part 0.
TL;DR Summary
I have an idea to take a range generator and try to yield only prime numbers by figuring out a pattern to skip each multiple of the prime. The hard part is to somehow skip when previous skips are already applied (for example, when skipping multiples of 3, we don't need to consider 6, since we skipped it at 2). I then document 2 attempts at a solution, each of which works for small numbers, but fails at 1331 and 1573. I come to realize that I need to figure out how to find the pattern more mathematically.
Simple Generator Version
To set a baseline, I created a simple generator version adapted from my first iteration of the Sieve exercise for Rust on Exercism.
1
2 # List of primes currently found
3 =
4
5 # For every number from 2 up to the end number
6
7
8 # Note: (number % prime) when converted to boolean
9 # means the same as (number % prime) != 0
10
11 # If for every prime in currently found primes, number % prime != 0...
12
13 # Append it to currently found primes list,
14
15 # and yield the number.
16 yield
17
18 # Once finished, return the the list of found primes.
19 # As this is a generator, this list is the value assigned
20 # to the StopIteration exception
21 return
The goal for my attempts is to be as efficient and accurate as this.
Note 1: This is meant to be the minimal standard version. Improvements can easily be made to this function.
Note 2: This is a generator that also returns a value. One can either take primes as they are generated or wait until the StopIteration exception is raised to get the fully formed list.
Idea Summary
The main idea, as described in part 0, is to take a range object and refine it with a pattern of skips to get all prime numbers out. Both Python and Rust have lazy ranges, meaning the values inside the range are not held in memory, rather they are only produced when they are requested upon. Ranges can be modified with skip and step-by functions to automatically jump over certain values. All of this combined could help build a better generator function, as it would only need to hold the current range it is working on and yield from there. However, this only works if the pattern of skips, or rather pattern of keeping values, is tailored to each successive range so that it produces the appropriate value. In this case, producing only values that are not a multiple of the prime being looked at.
Setup
Imports
1
2
3
- collections from the standard library:
- itertools from the standard library:
- more_itertools library:
peekable
: Allow one item look-ahead for a generator/iterator.
Helper classes and functions
Below are a set of helper functions:
A function that takes a generator and returns the first value and the remaining generator. Think of this as a combined head
function and tail
function starting at the first item from other languages like Haskell and R. I will use this to retrieve the first value while also skipping it in the generator (acts like skip()
).
1
2
3 return ,
4
5 return None,
- A function that takes a generator and a list of how many indices to keep, and then yields that many values, skipping the next value when the pattern changes. This implements the
keeper()
function used above.
1
2 # Keep index when pattern is a list
3 = None
4 # If pattern is None or empty list, then keep everything
5
6 = 0
7
8 =
9
10 =
11 = 0
12
13
14
15
16
17 =
18 # Keep these value
19
20 yield
21 # Skip this one
22
23
24 # Move to next item in pattern if pattern is a list
25 # Cycle back around when it reaches the end
26
27 = %
28
29
30 break
- A function that takes a list and tries to find the shortest repeating pattern length. This is able to also handle partially defined patterns, for example: given $[2,4,5,2,4,5,2]$, it will determine that a length of 3 is enough to repeat, even though the last "pattern" is not fully defined. This does create an issue of how do we know if the pattern is correct (there might not be a pattern). This function does not address this, merely its goal is to find the shortest possible pattern which can satisfy if we assumed it repeated.
1
2 =
3
4 =
5 =
6
7 # Check for any failing chunks
8
9 break
10 # no-break - found length
11 return
12 # Default length is full length
13 return
Attempts
Attempt 1
Below is my first attempt at the problem, which successfully generates primes up to 1330.
The code is encapsulated in a function that has one parameter called final_number
, which is the number to generate the primes up to.
I broke up the code into specific sections, but you can find the full code at the end of the section.
The overall steps are as follows:
- Get the current prime to sieve the current range generator.
- Create a list of multiples of the current prime that are still in the range.
- Use that list of multiples to find a pattern for how many numbers to keep between each multiple.
- Update the range generator to follow the keep pattern found.
- Start again from step 1 if we have not reached the end.
Data Setup
Before the loop, I set up three variables to help out with the process:
already_sieved_primes
: A list of known primes that have already been used.helping_sieve_primes
: A list of prime numbers that still need to be looked at and have their multiples removed.current_running_range
: The dynamic range object that will change as it follows through the loop.
2 # Primes that have already been processed and whose multiples can never appear in the range
3 =
4
5 # Prime numbers that are between p and p^2.
6 # To help find the keep pattern
7 =
8
9 # The range that is running
10 # Starts with unbounded range: 3... step by 1
11 =
The already sieved primes list is there for building the final list to return.
If no return was needed, I could just remove that variable, as it serves no other purpose as of now.
On the other hand, the helping sieve primes list and current running range variables are central to the algorithm.
The helpers inside the list will provide all the primes still needing to go through the loop and will be used to help create the "in-between" items to figure out the pattern for keeping values.
The current range keeps track of all the applied keep functions to make sure that the range is producing the correct value.
Inside the Loop
In the main loop (currently implemented as a while True
), we can go through the process of refining the sieve.
The first thing to do is to figure out the prime number to sieve out (from here, referred to as p
).
In this case, we'll just take out the first prime number from the previous step(s) that still needs to be processed (in the helper list).
The currently running range can also be made peekable to allow looking at one next step.
15 # The current prime to sieve (ref as "p")
16 =
17
18 # Make current range with one look ahead
19 =
With the prime number to sieve determined, we can get all the numbers in the range up to the prime squared (p^2
), as they are guaranteed to be prime (as p^2
is the smallest current composite number in the range).
This will do two things: 1) provide more numbers to help sieve out more multiples of the current prime p
, and 2) move the range up to p^2
and thus reduce the numbers need to "check".
21 # While the next number is smaller than p^2
22
23 # Add next value in range to helping sieve
24
We will then create a list of multiples to remove from the range, using p
and the primes from the helper list.
It is not yet known if these numbers will be enough to determine the pattern, but I will work with the assumption that it will be.
We also need a copy of the current range and a counter.
The copy is to allow getting numbers with the range without actually moving the range.
The counter is to count the numbers that are in between each adjacent multiple.
26 # Create a list of multiples divisible by sieving prime not already removed from before
27 = +
28 # Create a copy of the current range
29 =
, 30 # Make the copied range peekable
31 =
32 # Create a counter to count how many numbers are between two multiples of p
33 =
Now, let's count the numbers between each adjacent set of two multiples and use that to update the counter.
35 # For each start and end point (a number in between list and the next number after)
36
37 # Count how many primes from helping primes list are in between start and end
38
39
40 += 1
41
42 # Skip the number if it is the start or end
43 =
44
45
46 =
47
48 # Count how many items that are between start and end inside the copied range
49
50
51 += 1
52 =
We can then use the values in the counter to create a list of numbers to keep following a specific pattern, using the shortest_periodic_pattern
function to get the most compact pattern.
56 # Skip first in list because already processed
57 =
58
59 # Find the pattern
60 =
61 =
Then we can start finishing up.
We can add p
into the list of already sieved primes and update the max prime with the latest number in the helper list.
If the max prime is bigger than or equal to the final number to calculate, we are done with the loop and so we stop.
63 # Finished with prime so move it to already sieved
64
65
66 # Get the max prime
67 =
68
69 # Check that we have produced enough primes
70
71 break
Otherwise, moving along, we can update the currently running range by removing the first value (should be p^2
) and running the keep-skip function with the pattern.
73 # Remove square of prime for the current range
74 =
, 75
76 # Verify that we have removed the prime squared as the first item
77 assert == ** 2
78
79 # Produce the next range
80 =
Return
Once the loop is finished, return all the primes that have been already sieved as well as the number of primes still in the queue to be processed (up to the final number).
83 return +
Full Code
Attempt 1 Full Code
1
2 # Primes that have already been processed and whose multiples can never appear in the range
3 =
4
5 # Prime numbers that are between p and p^2.
6 # To help find the keep pattern
7 =
8
9 # The range that is running
10 # Starts with unbounded range: 3... step by 1
11 =
12
13 # Run forever (allow infinite range)
14
15 # The current prime to sieve (ref as "p")
16 =
17
18 # Make current range with one look ahead
19 =
20
21 # While the next number is smaller than p^2
22
23 # Add next value in range to helping sieve
24
25
26 # Create a list of multiples divisible by sieving prime not already removed from before
27 = +
28 # Create a copy of the current range
29 =
, 30 # Make the copied range peekable
31 =
32 # Create a counter to count how many numbers are between two multiples of p
33 =
34
35 # For each start and end point (a number in between list and the next number after)
36
37 # Count how many primes from helping primes list are in between start and end
38
39
40 += 1
41
42 # Skip the number if it is the start or end
43 =
44
45
46 =
47
48 # Count how many items that are between start and end inside the copied range
49
50
51 += 1
52 =
53
54
55
56 # Skip first in list because already processed
57 =
58
59 # Find the pattern
60 =
61 =
62
63 # Finished with prime so move it to already sieved
64
65
66 # Get the max prime
67 =
68
69 # Check that we have produced enough primes
70
71 break
72
73 # Remove square for prime from the current range
74 =
, 75
76 # Verify that we have removed the prime squared as the first item
77 assert == ** 2
78
79 # Produce the next range
80 =
81
82
83 return +
Analysis
For my first attempt, the idea seems to be simple enough to follow. Have a range, gather some known primes, use that to generate a pattern for which numbers to keep, update the range, rinse and repeat. Because of this simplicity, it might not be surprising that there are issues with the function.
The main issue I found is that while it works well for small numbers, it fails when it reaches $1331$ (which is $11^3$). When running this, the function allows 1331 to be kept as a prime when it is not. I appear to be missing the repeating pattern for when the range reaches $11$. There likely is an issue with developing the pattern for $11$, either that it is incomplete or there is no pattern for $11$.
There is also the issue of the assertion being broken when sieving $53$, as the first value of the updated range is not its square ($2809$), but rather $2813$. This is likely a carry over for the issue that happens at $11$, or technically just before it. Notice that $53$ is the first prime after $49$, which is the square of $7$. This lends more evidence that something is amiss for the $11$ pattern, as all prime numbers below it are fine, but after that is not.
Besides those two issues, other improvements I could consider would be removing some unnecessary calculations. For example, start with the "evens" already sieved out and have 2 already processed. Maybe also do the same for multiples of 3, as that is also an easy one to process. Another idea might be to include more known primes in the helper list, say by getting all the additional primes between the first and second composite numbers, which could help with finding the pattern.
Attempt 2
My second attempt uses the framework of the previous attempt and tries to improve it in certain ways. I will only go over the changes to the code, all other code snippets should be the same as the previous attempt. See the full attempt towards the end of this section for the full code.
Update Setup
For the set-up before the loop, I chose to start from 3 rather than 2 to help with processing the range and remove one level of the keep function. This means adding a check for the special case of the final number less than 2 (since 2 will have been processed once we reach the loop). With that, 3 moves into the helping sieve list (in preparation for being the next prime to process) and the new range starts from 5 (one above the square of 2) and step up by 2 (skips even numbers).
One other code change I apply is calculating max_prime
before the loop.
Since 2 is already processed, max_prime
can be reasonably defined as 3 at the start (the last number in the primes to be processed).
This will help remove the need to add and conditional check inside the while loop (i.e. max prime check can directly be the condition for the while).
2 # Special case: There are no primes less than 2
3
4 return
5
6 # Primes that have already been processed and whose multiples can never appear in the range
7 =
8
9 # Prime numbers that are between the current prime and its square.
10 # To help find the keep pattern
11 =
12
13 # The range that is running
14 # Starts the unbounded range with 5 and step by 2: [5,7,9,...)
15 =
16
17 =
Update Loop
With the max prime value being initialized before the loop, I can update the while loop to be while max_prime < final_number
(i.e. conversion from a "do-while" to a regular "while" loop).
This update removes the max_prime
check inside the loop.
Then inside, I make a small update to the search for the number of primes. In the attempt before, the loop would stop once it reaches the prime squared. I still do this, but I skip over the prime value squared to run one more loop.
29 # Count of how many numbers (all primes) are between p and p^2
30
31 # Skip the first composite number and stop the loop
32
33
34 break
35 # Count the next prime helper number
36
Once the prime square is skipped, I can also take the number up to the multiplication of the current prime (prime_0
) and the next smallest prime (prime_1
).
These numbers are also guaranteed to be prime because the next composite number is guaranteed to be the prime_0
* prime_1
, since it is the combination of the two smallest distinct primes1.
This should provide more numbers to help with finding the pattern.
38 =
39 # While the next number is smaller than p*p+1, add the next number to helping list
40
41 # Count the next prime helper number
42
Because of the expanded list of helping primes, I updated the between list to include the current prime cubed (as the new final multiple).
45 =
The for-loop after is kept the same. Once all the numbers are found and tallied up, I find the shortest repeating pattern like before using the keep list, but with a small twist. First, the keep list does not exclude the first item anymore, as now two items are already accounted for in the pattern. Instead, once the pattern repetition is found, I have to move the beginning two numbers to the end. This is because getting the prime before the current prime squared has already exhausted the range. Similar for the primes up to current prime times next prime. In order to keep the order correct, this means "rotating" the list to shift everything two units back and move the first two numbers to the end of the pattern.
73 # Skip first in list because already processed
74 =
75
76 # Find the pattern
77 =
78 =
79
80 =
Once the pattern is determined, the current prime can be moved to the already processed list and the current range can be updated. Since the range has removed all numbers till the multiplication of the two lowest primes, I had to update the assertion.
88 # Remove prime_0 * prime_1 from the current range
89 =
, 90
91 # Verify that we have reached prime_0 * prime_1
92 assert == *
After the update, the while-loop should automatically stop once the number of primes reaches the final number (likely slightly faster since there are more numbers in the helper list).
Full Code
Attempt 2 Full Code
1
2
3 # Special case: There are no primes less than 2
4
5 return
6
7 # Primes that have already been processed and whose multiples can never appear in the range
8 =
9
10 # Prime numbers that are between p_0 (currently processing prime) and p_1 (prime just after)
11 # To help find the keep pattern
12 =
13
14 # The range that is running
15 # Starts the unbounded range with 5 and step by 2: [5,7,9,...)
16 =
17
18 =
19
20 # while max prime is less than the final number
21 # (meaning not all primes reached in range of final number)
22
23 # The current prime to sieve (ref as "p" in comments)
24 =
25
26 # Make current range with one look ahead
27 =
28
29 # Count of how many numbers (all primes) are between p and p^2
30
31 # Skip the first composite number and stop the loop
32
33
34 break
35 # Count the next prime helper number
36
37
38 =
39 # While the next number is smaller than p*p+1, add the next number to helping list
40
41 # Count the next prime helper number
42
43
44 # Create a list of multiples divisible by sieving prime not already removed from before
45 =
46 # Create a copy of the current range
47 =
, 48 # Make the copied range peekable
49 =
50 # Create a counter to count how many numbers are between two multiples of p
51 =
52
53 # For each start and end point (a number in between list and the next number after)
54
55 # Count how many primes from helping primes list are in between start and end
56
57
58 += 1
59
60 # Skip the number if it is the start or end
61 =
62
63
64 =
65
66 # Count how many items that are between start and end inside the copied range
67
68
69 += 1
70 =
71
72
73 # Skip first in list because already processed
74 =
75
76 # Find the pattern
77 =
78 =
79
80 =
81
82 # Finished with prime so move it to already sieved
83
84
85 # Get the max prime
86 =
87
88 # Remove prime_0 * prime_1 from the current range
89 =
, 90
91 # Verify that we have reached prime_0 * prime_1
92 assert == *
93
94 # Produce the next range
95 =
96
97 return +
Analysis
With this second attempt finished, I have done slightly better than my attempt 1. Now, it fails at $1573$ ($11^2 * 13$). Also, the assertion now fails at $61$, where the result should be $4087$ ($61*67$), but is actually $4093$ (which is surprisingly prime).
I think I see the problem being just that I don't use the square as another thing to multiply the helper primes by, which should be an easy patch. But this demonstrates a more fundamental issue: I have no idea how the pattern is calculated. While the idea to pre-process the range did help remove certain steps of the calculations and adding more helping primes helped increase the pattern sequence, neither did much for finding the actual pattern.
The big thing here is that I need to figure out how the pattern is derived. Currently, I get it by making a copy of the range and assuming the repetition happens. However, that might not be the case, so it might be time to look into the underlying mathematics of this problem.
Conclusion
So based on these results, did I succeed? While I didn't get to the result of actually getting the generator working, this did provide some helpful insight into how I might approach future attempts. Namely, I am going to need to figure out how best to find the pattern for any given prime and how much information is needed to accomplish that task.
In the next post, I will take a step back and try to review some of the mathematical ideas present for this problem and review other kinds of solutions related to the Sieve of Eratosthenes.
Footnotes
Possibly as an extension of this idea, one might try to see if the third composite number would always be the current prime times the second next prime (which could give even more primes to help). Unfortunately, that is note the case. The smallest counterexample for my created range is $5$: $5 * 11 = 55$ but $7^2 = 49$.