diff --git a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/document.css b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/document.css index e27f060..9003fdf 100644 --- a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/document.css +++ b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/document.css @@ -32,10 +32,12 @@ .element, .expression, .file, +.function, .menu, .package, .parameter, .tab, +.variable, .window { font-family: monospace; diff --git a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-categories.xml b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-categories.xml index ad32359..def21c0 100644 --- a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-categories.xml +++ b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-categories.xml @@ -3,15 +3,253 @@
+ - Categories... + Categories + allow for grouping captions in a manner that allows the application to assist with keeping image + captioning consistent. - + + + + When adding captions to images for use in training models such as + + LORAs, it is important to keep captions consistent. + Consistent + in this case means to avoid + false positive + and + false negative + captions. To understand what these terms mean and why this is important, it is necessary to understand how image + training processes typically work. + + + Let m be an existing text-to-image model that we're attempting to fine-tune. Let + generate(k, p) + be a function that, given a model + k + and a text prompt p, generates an image. For example, if the model + m + knows about the concept of + laurel trees, then we'd hope that + generate(m, "laurel tree") + would produce a picture of a laurel tree. + + + Let's assume that m has not been trained on pictures of rose bushes and doesn't + know what a rose bush is. If we evaluate generate(m, "rose bush"), then we'll + just get arbitrary images that likely don't contain rose bushes. We want to fine-tune + m + by producing a LORA that introduces the concept of rose bushes. We produce a large dataset of images of rose + bushes, and caption each image with (at the very least) the caption + rose bush. + + + The training process then steps through each image i in the dataset and performs + the following steps: + + + + + Take the set of captions provided for i and combine them into a prompt + p. The exact means by which the captions are combined into a prompt is + typically a configurable aspect of the training method. In practice, the most significant caption + ("rose bush") would be the first caption in the prompt, and all other captions + would be randomly shuffled and concatenated onto the prompt. + + + Generate an image g with g = generate(m, p). + + + Compare the images g and i. The + differences + between the two images are what the fine-tuning of the model will learn. + + + + + In our training process, assuming that we've properly captioned the images in our dataset, we would hope that + the only significant difference between g and + i + at each step would be that i would contain an image of a rose bush, and + g + would not. This would, slowly, cause the fine-tuning of the model to learn what constitutes a rose bush. + + + Stepping through the entire dataset once and performing the above steps for each image is known as a single + training epoch. It will take most training processes multiple + epochs + to actually learn anything significant. In practice, the model m can conceptually + be considered to be updated on each training step with the new information it has learned. For the sake of + simplicity of discussion, we ignore this aspect of training here. + + + Given the above process, we're now equipped to explain the concepts of + false positive + and + false negative + captions. + + + + + A false positive caption is a caption that's accidentally applied to an image when that + image does not contain the object being captioned. For example, if an image does not + contain a red sofa, and a caption "red sofa" is provided, then the + "red sofa" + caption is a false positive. + + + To understand why a false positive caption is a problem, consider the + training process + described above. Assume that our original model m knows about the concept of "red + sofas". + + + + + The image + i + does not + contain a red sofa. However, one of the captions provided for + i + is + "red sofa", and so the prompt + p + contains the caption "red sofa". + + + An image g is generated with + g = generate(m, p). Because + p + contains the caption "red sofa", the generated image + g + will likely contain a red sofa. + + + The process compares the images g and i. The + source image i doesn't contain a red sofa, but the generated image + g + almost certainly does. The system then, essentially, erroneously learns that it should be adding red sofas + to images! + + + + + + + Similarly, a false negative caption is a caption that's accidentally + not + applied to an image when it really + should + have been. To understand how this might affect training, consider the training process once again: + + + + + The image i contains a red sofa. However, none of the captions provided for + i + are + "red sofa", and so the prompt + p + does not contain the caption "red sofa". + + + An image g is generated with + g = generate(m, p). Because + p + does not contain the caption "red sofa", the generated image + g + will probably not contain a red sofa. + + + The process compares the images g and i. The + source image i contains a red sofa, but the generated image + g + almost certainly does not. The system then, essentially, erroneously learns that it should be removing red + sofas from images! + + + + + In practice, false negative captions happen much more frequently than + false positive + captions. The reason for this is that it is impractical to know all of the concepts known to the model being + trained, and therefore it's impractical to know which concepts the model can tell are + missing + from the images it inspects. + + + + + Given the above understanding of + false positive + and + false negative + captions, the following best practices can be inferred for captioning datasets: + + + + + Include a single primary caption at the start of the prompt of every image in the + dataset. This primary caption is effectively the name of the concept that you are + trying to teach to the model. The reason for this follows from an understanding of the training process: By + making the primary caption prominent and ubiquitous, the system should learn to + primarily associate the image differences with this caption. + + + Caption all elements of an image that you do not want the model to associate with + your primary caption. This will help ensure that the captioned objects do not show + up as differences in the images that the training process will, as a result, learn. + + + Be consistent in your captioning between images with respect to which aspects of the image you caption. For + example, if in one of your images, you caption the lighting or the + background colour, then you should caption the + lighting + and + background colour + in all of the images. This assumes, of course, that you are not trying to teach the + model about lighting or background colours! This practice is, ultimately, about reducing + false negatives. + + + + + In our example training process above, we should use + "rose bush" + as the primary caption for each of our images, and we should caption the objects in each image that are not rose + bushes (for example, + "grass", "soil", + "sky", "afternoon lighting", + "outdoors", etc.) + + + + + + + When a category is marked as required, then each image in the dataset + must + contain one or more captions from that category. + - Required... + Unlike captions which can share their meanings across different datasets, categories are + a tool used to help ensure consistent captioning within a single dataset. It is up to users to pick suitable + categories for their captions in order to ensure that they caption their images in a consistent manner. A useful + category for most datasets, for example, is "lighting". Assign captions such as + "dramatic lighting", "outdoor lighting", and so on, + to a required "lighting" category. The + validation + process will then fail if a user has forgotten to caption lighting in one or more images. diff --git a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-validation.xml b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-validation.xml new file mode 100644 index 0000000..9b52314 --- /dev/null +++ b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/i-validation.xml @@ -0,0 +1,9 @@ + + +
+ + Validation... + +
diff --git a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/implementation.xml b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/implementation.xml index 9a5cafc..fa91f24 100644 --- a/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/implementation.xml +++ b/com.io7m.laurel.documentation/src/main/resources/com/io7m/laurel/documentation/implementation.xml @@ -9,4 +9,5 @@ +