README: add usefull tips from @ubergarm on issue sherjilozair#91

sherjilozair#91 (comment)
SRombauts · Apr 19, 2017 · 1aa4b72 · 1aa4b72
1 parent b577c96
commit 1aa4b72
Showing 1 changed file with 19 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -15,7 +15,9 @@ Inspired from Andrej Karpathy's [char-rnn](https://github.com/karpathy/char-rnn)
 ## Basic Usage
 To train with default parameters on the tinyshakespeare corpus, run `python train.py`. To access all the parameters use `python train.py --help`.
 
-To sample from a checkpointed model, `python sample.py`.
+To sample from a checkpointed model, `python sample.py`. This can be run while the learning is in progress, to check progression at last checkpoint.
+
+To continue training after interruption or to run on more epochs, `python train.py --init_from=save`
 
 ## Datasets
 You can use any plain text file as input. For example you could download [The complete Sherlock Holmes](https://sherlock-holm.es/ascii/) as such:
@@ -30,7 +32,22 @@ mv cnus.txt input.txt
 
 Then start train from the top level directory using `python train.py --data_dir=./data/sherlock/`
 
-A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`
+A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`.
+
+## Tuning
+
+Tuning your models is kind of a "dark art" at this point. In general:
+
+1. Start with as much clean input.txt as possible e.g. 50MiB
+2. Start by establishing a baseline using the default settings.
+3. Use tensorboard to compare all of your runs visually to aid in experimenting.
+4. Tweak --rnn_size up somewhat if you have a lot of input data.
+5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
+6. Tweak --seq_length up based on the length of a valid input string
+   (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc).
+   An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
+7. Finally once you've done all that, only then would I suggest adding some dropout.
+   Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.
 
 ## Tensorboard
 To visualize training progress, model graphs, and internal state histograms:  fire up Tensorboard and point it at your `log_dir`.  E.g.: