breaksub

Python implementation of a Markov Chain Monte Carlo method for breaking monoalphabetic substitution ciphers.

prerequisites

Python 3.7+ should work

The only dependency is the `tqdm` module which is optional (can be disabled with the `--no-progress` option)

options

--iters: Number of iterations of Monte Carlo method. Default is 5000

--no-progress: Do not show progress bar. If not specified, `tqdm` module must be installed

--alphabet: Comma separated file containing the alphabet. Default is lowercase English alphabet with period(".") and space(" ")

--letter-probs: Comma separated file containing the probability of occurrence of each letter. If not provided, reasonable default values are used for the default alphabet

--transition-probs: Comma separated file where the j-th number of i-th line is the probability of occurrence of j-th character right after i-th character of the alphabet. If not provided, reasonable default values are used for the default alphabet

It is assumed that input ciphertext consists of characters from the specified alphabet.

usage

use default alphabet with default parameters

./breaksub.py test_data/ciphertext_russel.txt > russel.txt

do 3000 iterations of the method (default is 5000) and do not show progress bar

./breaksub.py test_data/ciphertext_warandpeace.txt --no-progress --iters 3000 > warandpeace.txt

specifying alphabet and algorithm parameters

./breaksub.py --alphabet example_options/alphabet.csv --letter-probs example_options/letter_probabilities.csv --transition-probs example_options/letter_transition_matrix.csv --iters 7000 test_data/ciphertext_paradiselost.txt > paradiselost.txt

from breaksub import *

with open("test_data/ciphertext_warandpeace.txt") as f:
    ciphertext = f.read()

alphabet = default_alphabet() # a-z, space, period
ciphertext = [c for c in ciphertext if c in alphabet] # remove characters that are not in our alphabet

options = DecipherOptions(iters=4000, print_progress=True)
cipher = substitution_decipher(ciphertext, options) # break the cipher
print('cipher is ', cipher)

plaintext = recover_plaintext(cipher, ciphertext) # recover plaintext given the cipher
print(plaintext)

notes

The algorithm is probabilistic so it does not always converge to the optimal solution. Try running it multiple times if the output does not make sense.

The algorithm does not work well if input is too short

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

breaksub

prerequisites

options

usage

notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

breaksub

prerequisites

options

usage

notes