Solution: Evening Destinations
The puzzle statement says upper and lower case are considered distinct. Without this stipulation, one solution would be:
{connecticut, massachusetts, montana, ohio, tennessee}
which has 4 a’s, 4 c’s, 6 e’s, 2 h’s, 2 i’s, 2 m’s, 6 n’s, 4 o’s, 6 s’s, 6 t’s, and 2 u’s.
[Other letters don’t appear; zero is an even number.] As a partial check against overlooking a letter, we count:
4 + 4 + 6 + 2 + 2 + 2 + 6 + 4 + 6 + 6 + 2 = 44 total occurrences, agreeing with a combined 11 + 13 + 7 + 4 + 9 = 44 letters in the names.
Background
The strength of some cryptographic algorithms, such as RSA, relies upon the difficulty of integer factorization.We must know the difficulty of factoring in order to know how large to make cryptographic keys. The fastest known integer factorization algorithm, number field sieve [RSA512], collects hordes of data.This data (typified by the state names) consists of congruences in which one or both sides factor into small primes (typified by the letters). After enough data is collected, the algorithm tries to multiply a subset of these congruences to get squares on both sides. A product is a square precisely when all primes appear to even powers. The alphabet for the Evening Destinations puzzle has 41 letters. Peter Montgomery (Redmond) is tackling problems in which the alphabet has over 10 million letters (really, prime numbers).
Brute Force Attack
With 40 state names, there are 2^40 - 1 ~= 10^12 nonempty subsets to check. If we average 100 instructions to check each subset, we will need 10^14 instructions total.After searching for two days, a modern PC will print out:
{Idaho, Indiana, Massachusetts, Michigan, Mississippi, Missouri, Ohio, Oregon}
We check that these names have a total of 5 + 7 + 13 + 8 + 11 + 8 + 4 + 6 = 62 letters. The individual counts are:
2 I’s, 4 M’s, 2 O’s, 6 a’s, 2 c’s, 2 d’s, 2 e’s, 2 g’s, 4 h’s, 10 i’s, 4 n’s, 4 o’s, 2 p’s, 2 r’s, 10 s’s, 2 t’s, 2 u’s
Pruning the Alphabet
The time complexity of a brute force attack doubles with each additional state name. If a letter occurs only once amongst all state names, then we can eliminate the state having that letter, cutting the estimated time in half. This observation immediately eliminates ten state names.
D Delaware
F Florida
G Georgia
H Hawaii
L Louisiana
P Pennsylvania
U Utah
f California
x Texas
z Arizona
Additional rounds, on the remaining 30 names, eliminate
T Tennessee
v Nevada
w Iowa
N Nebraska
b Alabama
We are left with 25 names out of the original 40:
Alaska Arkansas Colorado Connecticut Idaho Illinois Indiana Kansas Kentucky Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Ohio Oklahoma Oregon Vermont Virginia Washington Wisconsin Wyoming
N.B.Although the letter p appears only in Mississippi, it occurs twice, so we retain Mississippi in our list.
Now there are only 2^25 - 1 ~= 33 million subsets to check.If we average 100 instructions per subset, the brute force attack will finish in a few seconds, which is acceptable. However, the brute force attack is infeasible for large alphabets.
Linear Algebra Formulation
This alphabet has 41 letters after we omit B, E, J, Q, R, S, X, Y, Z, j, q.
Call the others a_1 to a_41.
Let the state names be s_1 to s_40. Set up a 41 x 40 matrix M = (m_{ij}) in which element m_{ij} counts the number of occurrences (perhaps zero) of letter a_i in state s_j. An illustrative matrix M, with columns for the states Alabama (AL), Alaska (AK), and Arkansas (AR), appears at the left below.
Multiplying this illustrative matrix by the vector v = [1, 0, 1]^T adds the first and third columns of the matrix. The output vector Mv counts the total number of occurrences of each letter in the set {Alabama, Arkansas}, the two states corresponding to the 1’s in v.
For example, there are 5 total a’s in these two names.
AL AK AR Letter v Mv
( 1 1 1 ) A ( 2 )
( 3 2 2 ) a ( 1 ) ( 5 )
( 1 0 0 ) b ( 1 )
( 0 1 1 ) k ( 1 )
M = ( 1 1 0 ) l M * ( 0 ) = ( 1 )
( 1 0 0 ) m ( 1 )
( 0 0 1 ) n ( 1 )
( 0 0 1 ) r ( 1 ) ( 1 )
( 0 1 2 ) s ( 2 )
The puzzle asks for a non-empty subset of state names such that all entries in the output vector are even. This is equivalent to finding a vector v = [v_1, ..., v_40] of length 40 with ones and zeros, but not all zeros, such that all entries in the product Mv are even.
Viewing the matrix M and the vector v over the field GF(2) having two elements, we want a nonzero vector v in the null space of M. If M has more columns than rows, then a fundamental theorem of linear algebra ensures us that its column vectors are linearly dependent, and the null space will be non-trivial.
[Its rank is at most the number of rows, so the number of columns exceeds the rank.]
Here we have 41 rows (letters) but only 40 columns (states), so that theorem does not apply. Nonetheless we can apply Gaussian elimination [DEKSA, Algorithm N, Section 4.6.2] to find its null space. This null space has dimension 1, generated by the vector with 1’s for
{Idaho, Indiana, Massachusetts, Michigan, Mississippi, Missouri, Ohio, Oregon}.
The time for Gaussian elimination on an n x n matrix grows with O(n^3), much better than the O(2^n) for brute force. The implied constant in this asymptotic time is small since Gaussian elimination over GF(2) can act on 32 (or 64) bits at a time. Gaussian elimination is convenient for medium-sized alphabets, such as 10000 letters.
For very large alphabets, Gaussian elimination encounters memory problems.That algorithm stores an entire n x n binary matrix, or about n^2 / 8 bytes.When n = 100000, this is 1.25 gigabytes. The Block Lanczos algorithm [PLMBL] stores only the nonzero entries of the (very sparse) matrix rather than a full matrix. The Block Lanczos algorithm is iterative, repeatedly applying the matrix to a vector.
Its time is:
O(n^2 * (average number of nonzero entries per column))
compared to O(n^3) for Gaussian elimination. Block Lanczos has been used for matrices with 13 million rows and columns.
As with the brute force method, it helps to reduce the problem size before we build the matrix.We can eliminate 15 rows (and 15 columns) for the letters appearing only once, as described in the previous section.That leaves a 26 x 25 matrix.
[SCFIL] describes more strategies for reducing the matrix size.
References
[DEKSA]
Donald E. Knuth, The Art of Computer Programming, vol. 2, Seminumerical Algorithms, 2nd edition, Addison-Wesley, Reading, MA, 1981.
[PLMBL]
Peter L. Montgomery, A Block Lanczos Algorithm for Finding Dependencies over GF(2), pp. 106-120 of Advances in Cryptology — EUROCRYPT ’95, Louis C. Guillou and Jean-Jacques Quisquater (Eds), vol. 121 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1995. Available at ftp.cwi.nl://pub/pmontgom/BlockLanczos.psl.gz .
[RSA512]
RSA-512 paper from Eurocrypt 2000.
[SCFIL]
Stefania Cavallar, Strategies in Filtering in the Number Field Sieve, pp. 209-231 of Algorithmic Number Theory proceedings, Lecture Notes in Computer Science, vol. 1838, Wieb Bosma (Ed.), Springer-Verlag, Berlin, 2000.



