More Information on the Hill Climbing Contest
Participation in the contest will be graded by additional 25%
depending on your progress. There will be
5 levels of cryptograms (5 points each) encrypted with different keys:
8x8 Grandpre with about 300 letter of ciphertext
8x8 Grandpre with about 150 - 200
9x9 Grandpre with about 400 letters of ciphertext
9x9 Grandpre with about 300 letters of ciphertext
10x10 Grandpre with about 500 - 600 letters of ciphertext
The contest will be opened (problems published) at 18:00
IST and closed at 20:00. I will
mark those who solve some of the
problems after the deadline but will not be able to give formal credit.
Submission of solutions is by e-mail.
Clarification for Problem 4B (30.04.2000)
In problem 4.B state the following:
the number of queries (chosen or known plaintext) [data complexity]
the amount of steps for the attack
[time complexity]
the probability of success should be at least 50% and
for a significant fraction of all keys (50%-90% of all randomly chosen
matrices A should be covered by the attack).
Note that it is always possible to solve the problem with few queries
just by trying all possible
A matrices ( 26^2! steps) but this was not an intention. Your
aim in this problem is to minimize
both the data and the time complexity.
Note also, that you can access the "encryption/decryption oracle"
only
before you are presented with the challenge ciphertext.
When you finish all your queries the cipher box is taken from you.
Otherwise it would be possible just to query the challenge
ciphertext to the decryption oracle and see the result.
More Clarifications to HW3 (02.05.2000)
2B-2C
You may calculate the UD using the random cipher
model. Solving it exactly involves calculation of UD for a simple
substitution in the first place. You may look
here
how Shannon calculates it for a two letter language (letters 0 and
1).
You may also read pp.693-694
of his paper.
3ABC
Pages 9-10 of the lecture
notes give 3 examples of residue classes for simple substitution,
transposition and Vigenere.
A good idea may be to describe residue classes
for these ciphers yourself and then read the notes to check that you understood
this notion correctly.
4B
a) Solution should be algorithmic not information
theoretic, i.e. you have to show an algorithm (method) of the attack
capable of solving the cipher with the given
number of queries, certain time, and estimated probability of correct solution.
b) You may calculate UD to have a lower bound
on the number of queries. Note that this problem can be solved with less
then
700 chosen queries.
c) You may need to know some properties of a randomly chosen permutation on a set on N inputs:
average number of cycles:
ln(N)
(variance of this number is ln(N))
probability to have no fixed
points: 1/e
on average the longest cycle
contains 62%
of all elements.
For more look here.
Contest
I think that a promising direction is not to start
climbing from a random point, but to optimize the initial point as much
as you can.
For example you may note that most of English
texts have long repeated sequences. Finding several 5-7 repeats is not
a rare case.
You may assume that at least two 7 letter repeats
are present in every problem (it is true for
the 8x8 test problem).
So do some pre-processing.
a. Build a 8x8 frequency matrix.
b. Build left-right contact diagram, reach contact
relationships may give away vowels or frequent consonants like N,T,R,S.
Few contacts most likely give
away consonants (this may not always work in short texts).
c. See the most frequent digrams and trigrams,
check 10 most plausible substitutions (the, ing, ion, and, her, ere, ent,
tha, for,...)
check if this assumption is supported by the
other trigrams and the 8x8 frequency matrix.
d. Duplicated letters (71 71, for ex.) have relatively
few possibilities in English
e. Look for almost repeats: assume that somewhere
in the text two 7 letter words repeat and that they share at least one
homophone. Try to locate the possible pairs and
work with the most probable of them (similar letter frequencies,
more homophones coincide) intensively, i.e. use
them as initial climbing points. This may shrink the grandpre matrix
by as many as 10-12 letters. (You may try even
to check for 10 letter repeats, in some texts you may find these.)
f.Also random stepping function may take too
many steps, clever heuristic may be better:
instead of trying all possible values for some
homophone (64), see if you can guess or complete new trigrams. See
if new
almost repeats appeared due to shrinking of the
matrix in the previous stage.
g. You may have a list of 100 most frequent words
or phrases: (is not, by the, in the, on the, at the, of the, if it is,
have a, have the, ..)
and give them a try.
These are by far not all the ideas that can be used. Be creative.