homeworks Homework 3 A list of Tetragrams (compressed) donated to the course by Jim Gillogly
You may benefit from reading draft of the lecture notes  (gzipped), for Shannon's Theory of Secrecy Systems.

More Information on the Hill Climbing Contest

Participation in the contest will be graded by additional 25% depending on your progress. There will be
5 levels of cryptograms (5 points each) encrypted with different keys:

8x8 Grandpre with about 300 letter of ciphertext
8x8 Grandpre with about 150 - 200
9x9 Grandpre with about 400 letters of ciphertext
9x9 Grandpre with about 300 letters of ciphertext
10x10 Grandpre with about 500 - 600 letters of ciphertext

The contest will be opened (problems published) at 18:00 IST and closed  at 20:00. I will mark those who solve some of the
problems after the deadline but will not be able to give formal credit. Submission of solutions is by e-mail.

Clarification for Problem 4B (30.04.2000)
In problem 4.B state the following:

  the number of queries (chosen or known plaintext) [data complexity]
  the amount of steps for the attack                                 [time complexity]
  the probability of success should be at least 50%  and for a significant fraction of all keys (50%-90% of all randomly chosen
  matrices A should be covered by the attack).

Note that it is always possible to solve the problem with few queries just by trying all possible
A matrices ( 26^2! steps) but this was not an intention. Your aim in this problem is to minimize
both the data and the time complexity.

Note  also, that you can access the "encryption/decryption oracle" only before you are presented with the challenge ciphertext.
When you finish all your queries the cipher box is taken from you. Otherwise it would be possible just to query the challenge
ciphertext to the decryption oracle and see the result.

More Clarifications to HW3 (02.05.2000)

2B-2C
You may calculate the UD using the random cipher model. Solving it exactly involves calculation of UD for a simple
substitution in the first place. You may look here how Shannon calculates it for a two letter language (letters 0 and 1).
You may also read pp.693-694 of his paper.

3ABC
Pages 9-10 of the lecture notes   give 3 examples of residue classes for simple substitution, transposition and Vigenere.
A good idea may be to describe residue classes for these ciphers yourself and then read the notes to check that you understood
this notion correctly.

4B
a) Solution should be algorithmic not information theoretic, i.e. you have to show an algorithm (method) of the attack
capable of solving the cipher with the given number of queries, certain time, and estimated probability of correct solution.

b) You may calculate UD to have a lower bound on the number of queries. Note that this problem can be solved with less then
700 chosen queries.

c) You may need to know some properties of a randomly chosen permutation on a set on N inputs:

    average number of cycles:              ln(N) (variance of this number is ln(N))
    probability to have no fixed points:   1/e
    on average the longest cycle contains 62% of all elements.

For more look here.

Contest

I think that a promising direction is not to start climbing from a random point, but to optimize the initial point as much as you can.
For example you may note that most of English texts have long repeated sequences. Finding several 5-7 repeats is not a rare case.
You may assume that at least two 7 letter repeats are present in every problem (it is true for the 8x8 test problem).
So do some pre-processing.

a. Build a 8x8 frequency matrix.
b. Build left-right contact diagram, reach contact relationships may give away vowels or frequent consonants like N,T,R,S.
    Few contacts most likely give away consonants (this may not always work in short texts).
c.  See the most frequent digrams and trigrams, check 10 most plausible substitutions (the, ing, ion, and, her, ere, ent, tha, for,...)
check if this assumption is supported by the other trigrams and the 8x8 frequency matrix.
d. Duplicated letters (71 71, for ex.) have relatively few possibilities in English
e. Look for almost repeats: assume that somewhere in the text two 7 letter words repeat and that they share at least one
homophone. Try to locate the possible pairs and work with the most probable of them (similar letter frequencies,
more homophones coincide) intensively, i.e. use them as initial climbing points. This may shrink the grandpre matrix
by as many as 10-12 letters. (You may try even to check for 10 letter repeats, in some texts you may find these.)
f.Also random stepping function may take too many steps, clever heuristic may be better:
instead of trying all possible values for some homophone (64),  see if you can guess or complete new trigrams. See if new
almost repeats appeared due to shrinking of the matrix in the previous stage.
g. You may have a list of 100 most frequent words or phrases: (is not, by the, in the, on the, at the, of the, if it is, have a, have the, ..)
and give them a try.

These are by far not all the ideas that can be used. Be creative.