README: add usefull tips from @ubergarm on issue sherjilozair#91

sherjilozair#91 (comment)
SRombauts · Apr 20, 2017 · c5aa080 · c5aa080
1 parent b9b64dc
commit c5aa080
Showing 1 changed file with 20 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -16,6 +16,10 @@ Inspired from Andrej Karpathy's [char-rnn](https://github.com/karpathy/char-rnn)
 To train with default parameters on the tinyshakespeare corpus, run `python train.py`. To access all the parameters use `python train.py --help`.
 
 To sample from a checkpointed model, `python sample.py`.
+Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU.
+To force CPU mode, use `export CUDA_VISIBLE_DEVICES=""` (set `CUDA_VISIBLE_DEVICES=""` on Windows) and `unset CUDA_VISIBLE_DEVICES` afterward.
+
+To continue training after interruption or to run on more epochs, `python train.py --init_from=save`
 
 ## Datasets
 You can use any plain text file as input. For example you could download [The complete Sherlock Holmes](https://sherlock-holm.es/ascii/) as such:
@@ -30,7 +34,22 @@ mv cnus.txt input.txt
 
 Then start train from the top level directory using `python train.py --data_dir=./data/sherlock/`
 
-A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`
+A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`.
+
+## Tuning
+
+Tuning your models is kind of a "dark art" at this point. In general:
+
+1. Start with as much clean input.txt as possible e.g. 50MiB
+2. Start by establishing a baseline using the default settings.
+3. Use tensorboard to compare all of your runs visually to aid in experimenting.
+4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
+5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
+6. Tweak --seq_length up from 50 based on the length of a valid input string
+   (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc).
+   An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
+7. Finally once you've done all that, only then would I suggest adding some dropout.
+   Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.
 
 ## Tensorboard
 To visualize training progress, model graphs, and internal state histograms:  fire up Tensorboard and point it at your `log_dir`.  E.g.: