Skip to content

Commit

Permalink
README: add usefull tips from @ubergarm on issue sherjilozair#91
Browse files Browse the repository at this point in the history
  • Loading branch information
Sébastien Rombauts committed Apr 19, 2017
1 parent b577c96 commit 1aa4b72
Showing 1 changed file with 19 additions and 2 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ Inspired from Andrej Karpathy's [char-rnn](https://github.com/karpathy/char-rnn)
## Basic Usage
To train with default parameters on the tinyshakespeare corpus, run `python train.py`. To access all the parameters use `python train.py --help`.

To sample from a checkpointed model, `python sample.py`.
To sample from a checkpointed model, `python sample.py`. This can be run while the learning is in progress, to check progression at last checkpoint.

To continue training after interruption or to run on more epochs, `python train.py --init_from=save`

## Datasets
You can use any plain text file as input. For example you could download [The complete Sherlock Holmes](https://sherlock-holm.es/ascii/) as such:
Expand All @@ -30,7 +32,22 @@ mv cnus.txt input.txt

Then start train from the top level directory using `python train.py --data_dir=./data/sherlock/`

A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`
A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`.

## Tuning

Tuning your models is kind of a "dark art" at this point. In general:

1. Start with as much clean input.txt as possible e.g. 50MiB
2. Start by establishing a baseline using the default settings.
3. Use tensorboard to compare all of your runs visually to aid in experimenting.
4. Tweak --rnn_size up somewhat if you have a lot of input data.
5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
6. Tweak --seq_length up based on the length of a valid input string
(e.g. names are <= 12 characters, sentences may be up to 64 characters, etc).
An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
7. Finally once you've done all that, only then would I suggest adding some dropout.
Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.

## Tensorboard
To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your `log_dir`. E.g.:
Expand Down

0 comments on commit 1aa4b72

Please sign in to comment.