-
Notifications
You must be signed in to change notification settings - Fork 8k
CFG Parameters in the [net] section
CFG-Parameters in the [net]
section:
-
[net]
section-
batch=1
- number of samples (images, letters, ...) which will be precossed in one batch -
subdivisions=1
- number of mini_batches in one batch, sizemini_batch = batch/subdivisions
, so GPU processesmini_batch
samples at once, and the weights will be updated forbatch
samples (1 iteration processesbatch
images) -
width=416
- network size (width), so every image will be resized to the network size during Training and Detection -
height=416
- network size (height), so every image will be resized to the network size during Training and Detection -
channels=3
- network size (channels), so every image will be converted to this number of channels during Training and Detection -
inputs=256
- network size (inputs) is used for non-image data: letters, prices, any custom data -
max_chart_loss=20
- max value of Loss in the imagechart.png
-
For training only
-
Contrastive loss:
-
contrastive=1
- use Supervised contrastive loss for training Classifier (should be used with[contrastive]
layer) -
unsupervised=1
- use Unsupervised contrastive loss for training Classifier on images without labels (should be used withcontrastive=1
parameter and with[contrastive]
layer)
-
-
Data augmentation:
-
angle=0
- randomly rotates images during training (classification only) -
saturation = 1.5
- randomly changes saturation of images during training -
exposure = 1.5
- randomly changes exposure (brightness) during training -
hue=.1
- randomly changes hue (color) during training https://en.wikipedia.org/wiki/HSL_and_HSV -
blur=1
- blur will be applied randomly in 50% of the time: if1
- will be blured background except objects withblur_kernel=31
, if>1
- will be blured whole image withblur_kernel=blur
(only for detection and if OpenCV is used) -
min_crop=224
- minimum size of randomly cropped image (classification only) -
max_crop=448
- maximum size of randomly cropped image (classification only) -
aspect=.75
- aspect ration can be changed during croping from0.75
- to1/0.75
(classification only) -
letter_box=1
- keeps aspect ratio of loaded images during training (detection training only, but to use it during detection-inference - use flag-letter_box
at the end of detection command) -
cutmix=1
- use CutMix data augmentation (for Classifier only, not for Detector) -
mosaic=1
- use Mosaic data augmentation (4 images in one) -
mosaic_bound=1
- limits the size of objects whenmosaic=1
is used (does not allow bounding boxes to leave the borders of their images when Mosaic-data-augmentation is used) -
data augmentation in the last
[yolo]
-layer-
jitter=0.3
- randomly changes size of image and its aspect ratio from x(1 - 2*jitter)
to x(1 + 2*jitter)
-
random=1
- randomly resizes network size after each 10 batches (iterations) from/1.4
tox1.4
with keeping initial aspect ratio of network size
-
-
adversarial_lr=1.0
- Changes all detected objects to make it unlike themselves from neural network point of view. The neural network do an adversarial attack on itself -
attention=1
- shows points of attention during training -
gaussian_noise=1
- add gaussian noise
-
-
Optimizator:
-
momentum=0.9
- accumulation of movement, how much the history affects the further change of weights (optimizer) -
decay=0.0005
- a weaker updating of the weights for typical features, it eliminates dysbalance in dataset (optimizer) http://cs231n.github.io/neural-networks-3/ -
learning_rate=0.001
- initial learning rate for training -
burn_in=1000
- initial burn_in will be processed for the first 1000 iterations,current_learning rate = learning_rate * pow(iterations / burn_in, power) = 0.001 * pow(iterations/1000, 4)
where ispower=4
by default -
max_batches = 500200
- the training will be processed for this number of iterations (batches) -
policy=steps
- policy for changing learning rate:constant (by default), sgdr, steps, step, sig, exp, poly, random
(f.e., ifpolicy=random
- then current learning rate will be changed in this way= learning_rate * pow(rand_uniform(0,1), power)
) -
power=4
- ifpolicy=poly
- the learning rate will be= learning_rate * pow(1 - current_iteration / max_batches, power)
-
sgdr_cycle=1000
- ifpolicy=sgdr
- the initial number of iterations in cosine-cycle -
sgdr_mult=2
- ifpolicy=sgdr
- multiplier for cosine-cycle https://towardsdatascience.com/https-medium-com-reina-wang-tw-stochastic-gradient-descent-with-restarts-5f511975163 -
steps=8000,9000,12000
- ifpolicy=steps
- at these numbers of iterations the learning rate will be multiplied byscales
factor -
scales=.1,.1,.1
- ifpolicy=steps
- f.e. ifsteps=8000,9000,12000
,scales=.1,.1,.1
and the current iteration number is10000
thencurrent_learning_rate = learning_rate * scales[0] * scales[1] = 0.001 * 0.1 * 0.1 = 0.00001
-
label_smooth_eps=0.1
- use label smoothing for training Classifier
-
For training Recurrent networks:
-
Object Detection/Tracking on Video - if
[conv-lstm]
or[crnn]
layers are used in additional to[connected]
and[convolutional]
layers -
Text generation - if
[lstm]
or[rnn]
layers are used in additional to [connected] layers-
track=1
- if is set1
then the training will be performed in Recurrents-tyle for image sequences -
time_steps=16
- training will be performed for a random image sequence that contains 16 images fromtrain.txt
file- for
[convolutional]
-layers:mini_batch = time_steps*batch/subdivisions
- for
[conv_lstm]
-recurrent-layers:mini_batch = batch/subdivisions
andsequence=16
- for
-
augment_speed=3
- if set3
then can be used each1st, 2nd or 3rd
image randomly, i.e. can be used 16 images with indexes0, 1, 2, ... 15
or110, 113, 116, ... 155
fromtrain.txt
file -
sequential_subdivisions=8
- lower value increases the sequence of images, so iftime_steps=16 batch=16 sequential_subdivisions=8
, then will be loadedtime_steps*batch/sequential_subdivisions = 16*16/8 = 32
sequential images with the same data-augmentation, so the model will be trained for sequence of 32 video-frames -
seq_scales=0.5, 0.5
- increasing sequence of images at some steps, i.e. the coefficients to which the originalsequential_subdivisions
value will be multiplied (andbatch
will be dividied, so the weights will be updated rarely) at correspondsteps
if is usedpolicy=steps
orpolicy=sgdr
-