ComicEater

A collection of utilities for comic book archive management. Currently this is in alpha testing (AKA perfectly stable for me but YMMV).

Including the following:

Supports Rar, Rar5, Zip, 7z, CBZ, CBR, CB7
Verify archives aren't corrupt using 7Zip
Verify images aren't corrupt using ImageMagick
Option to upscale comic books with machine learning to higher resolutions using waifu2x-ncnn-vulkan
Option to remove white space on margins
Option to split double pages into individual pages
Download metadata (from sources like AniList)
Download covers for use in Komga
Converts all of your archives to the standard CBZ, comic book zip format
Rename and group files according to downloaded metadata
Split nested archives into individual archives
Intelligently split archives based on volume count, eg a single archive named Jojo v1-3 could be split into 3 separate archives
Convert a folder of images into an archive - intelligently splitting them according to their volume number
Easy to understand pattern matching to standardize naming across your library
Conversion of archives into individual folders so they can be be recognized as a series in Komga
Removes distributer bloat, like url file links to their site
Supports a Queue folder that can be used to automatically convert archives on a chron job
Moves failed conversions to a maintenance folder so you can manually fix and rerun any failed jobs

Examples

Upscale before and after:

For a more accurate comparison view here.

White space trim:

Double page split and white space trim:

It keeps the inner margin in tact, to indicate which page is the inner book binding. Here's an inner book binding example I tweeted about.

Converts nested archives

Original archive:

Split archive with downloaded covers:

Archives read as series in Komga now with metadata from AniList:

Archives read as series in Tachiyomi now with Komga:

Support

If you're into this sort of thing, you might be interested in my podcast or the games I stream:

You can get support here: Discord

If you find my tools useful please consider supporting via Patreon.

Install

The apps current state requires both Windows and WSL.

The app currently only supports being run from the source code, though I'm open to pull-requests to dockerize it or remove the windows dependency. All dependencies are WSL specific, but all paths are input as Windows paths for convenience.

The base functionality requires the following to be installed:

sudo apt-get install p7zip imagemagick unrar

Also make sure the version of Node.js specified in the .nvmrc (found in the project root) is installed. Currently this is 16.6.2. I recommend using nvm.

Install yarn:

npm install --global yarn

In the project root folder execute:

yarn

Cover Image Fetching

Puppeteer is also an internal requirement for downloading cover images, so your system may require additional dependencies. On Ubuntu 20.04 i had to install these:

sudo apt install -y libx11-xcb1 libxcomposite1 libxcursor1 libxdamage1 libxi-dev libxtst-dev libnss3 libcups2 libxss1 libxrandr2 libasound2 libatk1.0-0 libatk-bridge2.0-0 libpangocairo-1.0-0 libgtk-3-0 libgbm1

AVIF Support

You need ImageMagick convert --version | grep -i heic to show heic in the delegates list. This was a pain to setup, but if you follow these instructions on a modern version of Ubuntu you might get it working too. https://stackoverflow.com/a/66116056

In addition if you want WEBP support, you might have to run sudo apt install build-essential pkg-config webp libwebp-dev libwebp7 before the commands listed in that stackoverflow post. convert --version has to have your image formats listed, including webp if you want to be able to use webp files in your comics.

If you're building imagemagick from source you also have to install these other libraries to get jpg working 🫠 https://gist.github.com/nickferrando/fb0a44d707c8c3efd92dedd0f79d2911

Customize your config file

I highly recommend reading through the [#config] section and then downloading my config, and adjust it as needed. This is the only part you should have to use your brain on ;).

You can now run any of the commands below from WSL!

Warning

~~I recommend running the commands from bash, not zsh, as zsh can crash WSL when run for long periods with a lot of text output.~~ It is 2023 and I no longer believe this to be the case.
Make sure you have a backup of any archives you put in your queueFolders. I've run this with ~~hundreds~~ thousands of archives now, so it does work well, but there could be bugs. I make no guarantees it will work well for you.

Commands

CLI Logging

You can use the -v command with any command to change the log level. Move v's is more verbose. I recommend running all commands with -vv to see info logging, so you can see how many succeeded and at what steps.

Error's should always be logged.

-v: Shows WARN level

-vv: Shows INFO level

-vvv: Shows DEBUG level

Maintain Collection

Description

It convert archives from the seriesFolders's queueFolders to CBZ's. Then converts them to series and updates their metadata.

Option

--configFile --maintainCollection --offline: Don't download metadata or use downloaded metadata for file renaming

Download Cover options are also valid.

Enhancement options are also valid.

Template

yarn main --configFile "<configFile>" --maintainCollection

Example

yarn main -vv --configFile "W:\Collection\ComicEater.test.yml" --maintainCollection

Flow

☑ Get all archives at queueFolders and move them to the maintenanceFolders
☑ Convert them to CBZ (See the Convert to CBZ Flow)
☑ Use folderPatterns to gather metadata from the folder about the files
☑ Use the filePatterns to gather data about the files
☑ Search remote sources for any additional metadata
☑ Download Covers

Convert to CBZ Flow

☑ Get all archives from the path (if maintainCollection, this will be your queueFolders)
☑ Get all image folders in path
☑ Test that the archives are valid archives with 7z t
☑ Get volumeRange from filePatterns to infer if multiple volumes are present
☑ Extract archive in current directory
☑ Recursively check for nested archives, and apply each of the following steps to each archive.
☑ Remove archive distributer bloat per user config (links to tracker etc.)
☑ Validate that there are images present in extracted archives
☑ Validate that images are valid using ImageMagick by doing a transform to a 5x5 image - Currently requires writing them to a /tmp/ directory that is automatically cleaned up after the test is run
☑ If multiple volumes are present, see if the parent of the image containing subfolder count matches, and if it does, consider each subfolder as a separate volume
☑ If --trimWhiteSpace is present, run trim through imagemagick
If --upscale is present, run the content through waifu2x
☑ If --splitPages is present, cut each page into two
☑ Repack images
☑ If nested archives exist, flatten all nested archives in place of the original
☑ If there were no errors, remove the extracted working directory

Convert to Series

Description

This is useful for when your archives have already been validated but you want to manually change a series title (maybe it downloaded the wrong one off Anilist). It moves CBZ's to Series folders and update their metadata based on local file and folder patterns. Your archives must already be valid CBZs.

Option

--configFile

--convertToSeries

--offline

Download Cover options are also valid.

Example

yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --convertToSeries

Flow

☑ Get all archives at queueFolders path and move them to the maintenanceFolders
☑ Infer each seriesRoot level archives series from file if no existing metadata
☑ Get metadata from remote sources
☑ Name the series according to the available metadata
☑ Put archives in their seriesRoot series folder according to the config
☑ Rename the archive according to the metadata and configuration rules
☑ Download images for each volume and place in the series folder

Suggest Naming

Description

This makes no changes to archives. This is useful for when you want to see what ComicEater would rename your archive to. Currently, it won't be able to predict how nested archives or volumes would be extracted.

Option

--configFile

--convertToSeries

--offline

Download Cover options are also valid.

Example

yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --suggestNaming

Download Covers

Description

Downloads covers for each volume and places it in the series

Option

--downloadCover Expects a path. If none is given, then it will use the series path of each individual series in the job. --coverQuery "site:bookmeter.com 血界戦線 -Back"

Sometimes it may download the wrong series image even with the validation. For instance the sequel to the manga 血界戦線, is 血界戦線 back 2 back. 血界戦線 is still in the name and considered valid. If you want to ignore the sequel you manually run --coverQuery "site:bookmeter.com 血界戦線 -Back". Google will then exclude the sequels results containing Back.

--noCoverValidate

Sometimes the validation will fail if a manga is named something like BEASTARS, but google only found results containing ビースターズ. If you know the query will work, then you can use the --noCoverValidate to force the first image found in Google's results to be downloaded.

Example

yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --getCovers --downloadCover "W:\Collection\シリーズ"

Flow

☑ Get all archives at queueFolders path
☑ Get metadata from online sources and local sources
☑ Query using coverQuery. This defaults to <volumeNumber> <seriesName> <authors> site:bookmeter.com (You can see this result for yourself on Google Images)
☑ If --noCoverValidate is not present, then validate that the cover's title on Google Images has the correct volume number and series name is present
☑ Downloads the cover to the --downloadCover path with the same name as the volume

Enhancement Options

Upscale

Option

--upscale

Description

Runs waifu2x on all images in the archive, and repacks then with their upscaled version. Currently supports -n 2 -s 2, a setting of 2 denoise level and 2x scale factor. See here for more details.

Setup

Currently, upscaling relies on having waifu2x-ncnn-vulkan.exe on your path. You can get the most recent release from here.

I recommend first trying that a command to waifu2x works, something like this:

waifu2x-ncnn-vulkan.exe -i "W:\\Collection\\SomeFolderWithImages" -o "W:\\Collection\\SomeOutputFolder\\" -n 2 -s 2

NOTE: This program will run as fast as your hardware. It's best if you can confirm it's using your GPU.

If you can get this command working with waifu2x-ncnn-vulkan.exe on your path, the WSL app can call out to it.

Trim White Space

Option

--trimWhiteSpace

Trims white space using GraphicsMagick's trim option. It uses a fuzz factor of 10 so that border colors that are roughly the same color can be properly trimmed. See here for more details.

Split Double Pages

Option

--splitPages

Cut's pages in half. If Trim White Space option is included, it will wait until after the trim is done. Assumes right to left currently.

Setting Metadata

Description

Meta data is no longer persisted to the archives. Instead use something more flexible like SND's KOMF.

Inside the app there are 3 ways of thinking about metadata.

metadata about the archive itself (History)
metadata about the content (Series, Volume, etc.)
metadata about the pipelines progress (Context: Internal runtime info of the pipelines "saga" work)

Config

Descriptions

Every time you run a command you give the app a .yml config file. I personally use one for automated things that I run on a nightly automated task (like converting weekly subscription magazines automatically), and a second config file for manual runs.

There's a lot here, so the easiest way to understand it is to read this, then spend less than 10 mins, trying to understand my real config here. If you have difficulty still you can ask for help on discord.

Patterns

The pattern matching uses the double curly brace syntax {{metaDataVariableName}} as a way to indicate where metadata is at. The pattern matching also uses glob-like syntax to allow for subfolder matching. (I never use more than one folder level deep though personally). So something like {{seriesName}}/**/* matches the top level folder name as the seriesName, and no sub-folders would be used in the metadata.

Getting Metadata

The folderPatterns and filePatterns use custom pattern matching to know how to infer metadata from your file names and folders. They use an ordered list, and will take the top pattern it can match with all variables in the pattern. So if you a file named [Bob] MyManga v01, and file patterns of

"VerySpecifcPattern[{{authors}}] {{seriesName}} {{volumeNumber}}"
"[{{authors}}] {{seriesName}} v{{volumeNumber}}"
"{{seriesName}}" It will automatically infer the author is Bob and the series name should be MyManga, and it contains the first volume.Since the top pattern would not match, it would ignore it (VerySpecificPattern wasn't found in the file name [Bob] MyManga v01). Since the [] of the authors pattern and the space before teh seriesName and the v of the volumeNumber were present it matched the second pattern.

If instead the file had been named Bob's standalone Manga, it would match the third pattern, giving it a series name of Bob's standalone Manga. The author would not be inferred, and the volume number would also be unknown.

Outputting

Based on the metadata picked up from the file & folder patterns, as well as the metadata gained from external sources like AniList, it will use the outputNamingConventions as a prioritized list of ways to name your files. It will not use a pattern unless ALL metadata variables were matched (besides fileName, which can be used as a default).

"{{seriesRoot}}{{seriesName}}/[{{authors}}] {{seriesName}} - 第{{volumeRange}}巻"
"{{seriesRoot}}{{seriesName}}/[{{authors}}] {{seriesName}} - 第{{volumeNumber}}{{volumeVariant}}巻"
"{{seriesRoot}}{{fileName}}/{{fileName}}"

The first output pattern would match if [Bob] MyManga v1-4 wasn't able to be automatically split, and therefore had a volumeRange. The second pattern would be matched if a volume had a single letter after it, as is common in manga distribution: [Bob] MyManga v1e, and would result in: /YourSeriesRoot/MyManga/[Bob] MyManga - 第1e巻. The last case would be a fallback to keeping whatever the files original name was, in case the other metadata variables couldn't be found.

Recognized Metadata variables

Input and Output metadata Variables

{{authors}} : the default author will be assumed to be the writer in Komga metadata. They will be split by splitAuthorsBy, so Bob・KanjiEater could be split into two separate authors: Bob & Kanjieater.
{{volumeNumber}}: Runs it through various validation checks to assure it's actually a number, and also extracts a volumeVariant, which is at most one letter attached to the volume number. It can also recognize volume ranges, eg c2-5. Chapters and volumes are used without distinction currently.
{{publishYear}}: Any characters
{{publishMonth}}: Any characters
{{publishDate}}: Any characters

Output only

{{seriesRoot}}: The folder from the seriesRoot
{{fileName}}: The original file name

Clean up

filesToDeleteWithExtensions will remove any files in the archive that have a matching file extension. (Common use case someAwfulSite.url)
junkToFilter will remove these patterns from your file names and folder names

Metadata

You can set default metadata according to the accepted comicInfo.xml fields for komga in the defaults:

defaults:
  contentMetaData:
    languageISO: ja
    manga: YesAndRightToLeft

Setting your language to ja will assume you want Japanese text in file names, instead of an English translation.

Maintenance Folder

maintenanceFolder

This is used when something goes wrong. All failed files are moved here.

Potential Future Features

~~File names w/ spaces breaks spawn~~
~~Saga orchestration~~
~~Save detailed file history~~
~~Nested Archives~~
~~Nested Rar test failing~~
~~Configuration from file~~
~~String paths https://stackoverflow.com/questions/29182244/convert-a-string-to-a-template-string~~
~~Better content cleanup~~
~~Suggest naming~~
~~Number padding~~
~~Ignore case of junk to filter~~
~~Prefer series name from series if native~~
~~Deeply nested folders with globbing~~
~~Write ComicInfo.xml~~
~~Remove junk from Image folders, names, content~~
~~Clean CBZs~~
~~Handle Volume ranges~~
~~If TotalVolumes matches folder count, extract to individual~~
~~Nested image folders that are multivolume~~
~~fix halfwidth fullwidth chars for file folder pattern~~
~~Get metadata from archive contents~~
~~Convert Image folders to CBZ~~
~~Fix deletion after extracting folders - doesn't delete the clean dir~~
~~Vendor Series metadata~~
~~Automate maintenance~~
~~Unified Series calls data vendors once per series~~
~~Cleanup regression~~
~~invalid images still being zipped~~
~~Stopped on moving series~~
~~Stat error not killing it on 7z -t~~
~~Start importing clean series~~
~~null in summary/description~~
~~Offline options~~
~~volume range with archives takes second as a batch, but then deletes the first, and leaves the rest as dupes~~
~~Archive with multiple folder volumes failed: Brave Story, not cleaned up but made. Ran on individual volumes, and each was a separate series~~
~~didn't clean up soil 9 & 10~~
~~Shuto Heru v13-14~~
~~keep the range if it wasn't split~~
~~Handle hakuneko folders~~
~~Add magazines~~
~~Download book covers~~
~~Trim white space~~
~~Split Double Images~~
~~Waifu2x~~
~~ to komga to update after modifying content~~
~~Move folders to prep before doing anything~~
~~config to rename automatically~~
~~Send API request to Komga~~
convert to typescript
~~Set cover to second (n) page based on komga tag~~
~~blur nsfw tag~~
~~Add tags: use komf~~
Maintain metadata outside of the archive
dockerify
Get names from google organic search
Undo naming / folder move
Master Config Test. > x results
Manual Series metadata
Scraper Series metadata
Get a new cover image based on existing dimension / reverse image lookup
Detect missing volumes/issues
Interactive naming
~~Webp~~
~~Avif deconversion~~
Record File hash drift events

Files

README.md

Latest commit

History

README.md

File metadata and controls

ComicEater

Table of Contents

Examples

Upscale before and after:

White space trim:

Double page split and white space trim:

Converts nested archives

Support

Install

Cover Image Fetching

AVIF Support

Customize your config file

Warning

Commands

CLI Logging

Maintain Collection

Description

Option

Template

Example

Flow

Convert to CBZ Flow

Convert to Series

Description

Option

Example

Flow

Suggest Naming

Description

Option

Example

Download Covers

Description

Option

Example

Flow

Enhancement Options

Upscale

Option

Description

Setup

Trim White Space

Option

Split Double Pages

Option

Setting Metadata

Description

Config

Descriptions

Patterns

Getting Metadata

Outputting

Recognized Metadata variables

Input and Output metadata Variables

Output only

Clean up

Metadata

Maintenance Folder

Potential Future Features