Skip to content

Latest commit

 

History

History
437 lines (321 loc) · 21.2 KB

README.md

File metadata and controls

437 lines (321 loc) · 21.2 KB

ComicEater

A collection of utilities for comic book archive management. Currently this is in alpha testing (AKA perfectly stable for me but YMMV).

Including the following:

  • Supports Rar, Rar5, Zip, 7z, CBZ, CBR, CB7
  • Verify archives aren't corrupt using 7Zip
  • Verify images aren't corrupt using ImageMagick
  • Option to upscale comic books with machine learning to higher resolutions using waifu2x-ncnn-vulkan
  • Option to remove white space on margins
  • Option to split double pages into individual pages
  • Download metadata (from sources like AniList)
  • Download covers for use in Komga
  • Converts all of your archives to the standard CBZ, comic book zip format
  • Rename and group files according to downloaded metadata
  • Split nested archives into individual archives
  • Intelligently split archives based on volume count, eg a single archive named Jojo v1-3 could be split into 3 separate archives
  • Convert a folder of images into an archive - intelligently splitting them according to their volume number
  • Easy to understand pattern matching to standardize naming across your library
  • Conversion of archives into individual folders so they can be be recognized as a series in Komga
  • Removes distributer bloat, like url file links to their site
  • Supports a Queue folder that can be used to automatically convert archives on a chron job
  • Moves failed conversions to a maintenance folder so you can manually fix and rerun any failed jobs

Table of Contents

Examples

Upscale before and after:

For a more accurate comparison view here.

White space trim:

Double page split and white space trim:

It keeps the inner margin in tact, to indicate which page is the inner book binding. Here's an inner book binding example I tweeted about.

 

Converts nested archives

Original archive:

Split archive with downloaded covers:

Archives read as series in Komga now with metadata from AniList:

Archives read as series in Tachiyomi now with Komga:

Support

If you're into this sort of thing, you might be interested in my podcast or the games I stream:

              

You can get support here: Discord

If you find my tools useful please consider supporting via Patreon.

Install

The apps current state requires both Windows and WSL.

The app currently only supports being run from the source code, though I'm open to pull-requests to dockerize it or remove the windows dependency. All dependencies are WSL specific, but all paths are input as Windows paths for convenience.

The base functionality requires the following to be installed:

sudo apt-get install p7zip imagemagick unrar

Also make sure the version of Node.js specified in the .nvmrc (found in the project root) is installed. Currently this is 16.6.2. I recommend using nvm.

Install yarn:

npm install --global yarn

In the project root folder execute:

yarn

Cover Image Fetching

Puppeteer is also an internal requirement for downloading cover images, so your system may require additional dependencies. On Ubuntu 20.04 i had to install these:

sudo apt install -y libx11-xcb1 libxcomposite1 libxcursor1 libxdamage1 libxi-dev libxtst-dev libnss3 libcups2 libxss1 libxrandr2 libasound2 libatk1.0-0 libatk-bridge2.0-0 libpangocairo-1.0-0 libgtk-3-0 libgbm1

AVIF Support

You need ImageMagick convert --version | grep -i heic to show heic in the delegates list. This was a pain to setup, but if you follow these instructions on a modern version of Ubuntu you might get it working too. https://stackoverflow.com/a/66116056

In addition if you want WEBP support, you might have to run sudo apt install build-essential pkg-config webp libwebp-dev libwebp7 before the commands listed in that stackoverflow post. convert --version has to have your image formats listed, including webp if you want to be able to use webp files in your comics.

If you're building imagemagick from source you also have to install these other libraries to get jpg working 🫠 https://gist.github.com/nickferrando/fb0a44d707c8c3efd92dedd0f79d2911

Customize your config file

I highly recommend reading through the [#config] section and then downloading my config, and adjust it as needed. This is the only part you should have to use your brain on ;).

You can now run any of the commands below from WSL!

Warning

  • I recommend running the commands from bash, not zsh, as zsh can crash WSL when run for long periods with a lot of text output. It is 2023 and I no longer believe this to be the case.
  • Make sure you have a backup of any archives you put in your queueFolders. I've run this with hundreds thousands of archives now, so it does work well, but there could be bugs. I make no guarantees it will work well for you.

Commands

CLI Logging

You can use the -v command with any command to change the log level. Move v's is more verbose. I recommend running all commands with -vv to see info logging, so you can see how many succeeded and at what steps.

Error's should always be logged.

-v: Shows WARN level

-vv: Shows INFO level

-vvv: Shows DEBUG level

Maintain Collection

Description

It convert archives from the seriesFolders's queueFolders to CBZ's. Then converts them to series and updates their metadata.

Option

--configFile --maintainCollection --offline: Don't download metadata or use downloaded metadata for file renaming

Download Cover options are also valid.

Enhancement options are also valid.

Template

yarn main --configFile "<configFile>" --maintainCollection

Example

yarn main -vv --configFile "W:\Collection\ComicEater.test.yml" --maintainCollection

Flow

  1. ☑ Get all archives at queueFolders and move them to the maintenanceFolders
  2. ☑ Convert them to CBZ (See the Convert to CBZ Flow)
  3. ☑ Use folderPatterns to gather metadata from the folder about the files
  4. ☑ Use the filePatterns to gather data about the files
  5. ☑ Search remote sources for any additional metadata
  6. ☑ Download Covers

Convert to CBZ Flow

  1. ☑ Get all archives from the path (if maintainCollection, this will be your queueFolders)
  2. ☑ Get all image folders in path
  3. ☑ Test that the archives are valid archives with 7z t
  4. ☑ Get volumeRange from filePatterns to infer if multiple volumes are present
  5. ☑ Extract archive in current directory
  6. ☑ Recursively check for nested archives, and apply each of the following steps to each archive.
  7. ☑ Remove archive distributer bloat per user config (links to tracker etc.)
  8. ☑ Validate that there are images present in extracted archives
  9. ☑ Validate that images are valid using ImageMagick by doing a transform to a 5x5 image - Currently requires writing them to a /tmp/ directory that is automatically cleaned up after the test is run
  10. ☑ If multiple volumes are present, see if the parent of the image containing subfolder count matches, and if it does, consider each subfolder as a separate volume
  11. ☑ If --trimWhiteSpace is present, run trim through imagemagick
  12. If --upscale is present, run the content through waifu2x
  13. ☑ If --splitPages is present, cut each page into two
  14. ☑ Repack images
  15. ☑ If nested archives exist, flatten all nested archives in place of the original
  16. ☑ If there were no errors, remove the extracted working directory

Convert to Series

Description

This is useful for when your archives have already been validated but you want to manually change a series title (maybe it downloaded the wrong one off Anilist). It moves CBZ's to Series folders and update their metadata based on local file and folder patterns. Your archives must already be valid CBZs.

Option

--configFile

--convertToSeries

--offline

Download Cover options are also valid.

Example

yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --convertToSeries

Flow

  1. ☑ Get all archives at queueFolders path and move them to the maintenanceFolders
  2. ☑ Infer each seriesRoot level archives series from file if no existing metadata
  3. ☑ Get metadata from remote sources
  4. ☑ Name the series according to the available metadata
  5. ☑ Put archives in their seriesRoot series folder according to the config
  6. ☑ Rename the archive according to the metadata and configuration rules
  7. ☑ Download images for each volume and place in the series folder

Suggest Naming

Description

This makes no changes to archives. This is useful for when you want to see what ComicEater would rename your archive to. Currently, it won't be able to predict how nested archives or volumes would be extracted.

Option

--configFile

--convertToSeries

--offline

Download Cover options are also valid.

Example

yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --suggestNaming

Download Covers

Description

Downloads covers for each volume and places it in the series

Option

--downloadCover Expects a path. If none is given, then it will use the series path of each individual series in the job. --coverQuery "site:bookmeter.com 血界戦線 -Back"

Sometimes it may download the wrong series image even with the validation. For instance the sequel to the manga 血界戦線, is 血界戦線 back 2 back. 血界戦線 is still in the name and considered valid. If you want to ignore the sequel you manually run --coverQuery "site:bookmeter.com 血界戦線 -Back". Google will then exclude the sequels results containing Back.

--noCoverValidate

Sometimes the validation will fail if a manga is named something like BEASTARS, but google only found results containing ビースターズ. If you know the query will work, then you can use the --noCoverValidate to force the first image found in Google's results to be downloaded.

Example

yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --getCovers --downloadCover "W:\Collection\シリーズ"

Flow

  1. ☑ Get all archives at queueFolders path
  2. ☑ Get metadata from online sources and local sources
  3. ☑ Query using coverQuery. This defaults to <volumeNumber> <seriesName> <authors> site:bookmeter.com (You can see this result for yourself on Google Images)
  4. ☑ If --noCoverValidate is not present, then validate that the cover's title on Google Images has the correct volume number and series name is present
  5. ☑ Downloads the cover to the --downloadCover path with the same name as the volume

Enhancement Options

Upscale

Option

--upscale

Description

Runs waifu2x on all images in the archive, and repacks then with their upscaled version. Currently supports -n 2 -s 2, a setting of 2 denoise level and 2x scale factor. See here for more details.

Setup

Currently, upscaling relies on having waifu2x-ncnn-vulkan.exe on your path. You can get the most recent release from here.

I recommend first trying that a command to waifu2x works, something like this:

waifu2x-ncnn-vulkan.exe -i "W:\\Collection\\SomeFolderWithImages" -o "W:\\Collection\\SomeOutputFolder\\" -n 2 -s 2

NOTE: This program will run as fast as your hardware. It's best if you can confirm it's using your GPU.

If you can get this command working with waifu2x-ncnn-vulkan.exe on your path, the WSL app can call out to it.

Trim White Space

Option

--trimWhiteSpace

Trims white space using GraphicsMagick's trim option. It uses a fuzz factor of 10 so that border colors that are roughly the same color can be properly trimmed. See here for more details.

Split Double Pages

Option

--splitPages

Cut's pages in half. If Trim White Space option is included, it will wait until after the trim is done. Assumes right to left currently.

Setting Metadata

Description

Meta data is no longer persisted to the archives. Instead use something more flexible like SND's KOMF.

Inside the app there are 3 ways of thinking about metadata.

  1. metadata about the archive itself (History)
  2. metadata about the content (Series, Volume, etc.)
  3. metadata about the pipelines progress (Context: Internal runtime info of the pipelines "saga" work)

Config

Descriptions

Every time you run a command you give the app a .yml config file. I personally use one for automated things that I run on a nightly automated task (like converting weekly subscription magazines automatically), and a second config file for manual runs.

There's a lot here, so the easiest way to understand it is to read this, then spend less than 10 mins, trying to understand my real config here. If you have difficulty still you can ask for help on discord.

Patterns

The pattern matching uses the double curly brace syntax {{metaDataVariableName}} as a way to indicate where metadata is at. The pattern matching also uses glob-like syntax to allow for subfolder matching. (I never use more than one folder level deep though personally). So something like {{seriesName}}/**/* matches the top level folder name as the seriesName, and no sub-folders would be used in the metadata.

Getting Metadata

The folderPatterns and filePatterns use custom pattern matching to know how to infer metadata from your file names and folders. They use an ordered list, and will take the top pattern it can match with all variables in the pattern. So if you a file named [Bob] MyManga v01, and file patterns of

  • "VerySpecifcPattern[{{authors}}] {{seriesName}} {{volumeNumber}}"
  • "[{{authors}}] {{seriesName}} v{{volumeNumber}}"
  • "{{seriesName}}" It will automatically infer the author is Bob and the series name should be MyManga, and it contains the first volume.Since the top pattern would not match, it would ignore it (VerySpecificPattern wasn't found in the file name [Bob] MyManga v01). Since the [] of the authors pattern and the space before teh seriesName and the v of the volumeNumber were present it matched the second pattern.

If instead the file had been named Bob's standalone Manga, it would match the third pattern, giving it a series name of Bob's standalone Manga. The author would not be inferred, and the volume number would also be unknown.

Outputting

Based on the metadata picked up from the file & folder patterns, as well as the metadata gained from external sources like AniList, it will use the outputNamingConventions as a prioritized list of ways to name your files. It will not use a pattern unless ALL metadata variables were matched (besides fileName, which can be used as a default).

  • "{{seriesRoot}}{{seriesName}}/[{{authors}}] {{seriesName}} - 第{{volumeRange}}巻"
  • "{{seriesRoot}}{{seriesName}}/[{{authors}}] {{seriesName}} - 第{{volumeNumber}}{{volumeVariant}}巻"
  • "{{seriesRoot}}{{fileName}}/{{fileName}}"

The first output pattern would match if [Bob] MyManga v1-4 wasn't able to be automatically split, and therefore had a volumeRange. The second pattern would be matched if a volume had a single letter after it, as is common in manga distribution: [Bob] MyManga v1e, and would result in: /YourSeriesRoot/MyManga/[Bob] MyManga - 第1e巻. The last case would be a fallback to keeping whatever the files original name was, in case the other metadata variables couldn't be found.

Recognized Metadata variables

Input and Output metadata Variables

  • {{authors}} : the default author will be assumed to be the writer in Komga metadata. They will be split by splitAuthorsBy, so Bob・KanjiEater could be split into two separate authors: Bob & Kanjieater.
  • {{volumeNumber}}: Runs it through various validation checks to assure it's actually a number, and also extracts a volumeVariant, which is at most one letter attached to the volume number. It can also recognize volume ranges, eg c2-5. Chapters and volumes are used without distinction currently.
  • {{publishYear}}: Any characters
  • {{publishMonth}}: Any characters
  • {{publishDate}}: Any characters

Output only

  • {{seriesRoot}}: The folder from the seriesRoot
  • {{fileName}}: The original file name

Clean up

  • filesToDeleteWithExtensions will remove any files in the archive that have a matching file extension. (Common use case someAwfulSite.url)
  • junkToFilter will remove these patterns from your file names and folder names

Metadata

You can set default metadata according to the accepted comicInfo.xml fields for komga in the defaults:

defaults:
  contentMetaData:
    languageISO: ja
    manga: YesAndRightToLeft

Setting your language to ja will assume you want Japanese text in file names, instead of an English translation.

Maintenance Folder

  • maintenanceFolder

This is used when something goes wrong. All failed files are moved here.

Potential Future Features

  1. File names w/ spaces breaks spawn

  2. Saga orchestration

  3. Save detailed file history

  4. Nested Archives

  5. Nested Rar test failing

  6. Configuration from file

  7. String paths https://stackoverflow.com/questions/29182244/convert-a-string-to-a-template-string

  8. Better content cleanup

  9. Suggest naming

  10. Number padding

  11. Ignore case of junk to filter

  12. Prefer series name from series if native

  13. Deeply nested folders with globbing

  14. Write ComicInfo.xml

  15. Remove junk from Image folders, names, content

  16. Clean CBZs

  17. Handle Volume ranges

  18. If TotalVolumes matches folder count, extract to individual

  19. Nested image folders that are multivolume

  20. fix halfwidth fullwidth chars for file folder pattern

  21. Get metadata from archive contents

  22. Convert Image folders to CBZ

  23. Fix deletion after extracting folders - doesn't delete the clean dir

  24. Vendor Series metadata

  25. Automate maintenance

  26. Unified Series calls data vendors once per series

  27. Cleanup regression

  28. invalid images still being zipped

  29. Stopped on moving series

  30. Stat error not killing it on 7z -t

  31. Start importing clean series

  32. null in summary/description

  33. Offline options

  34. volume range with archives takes second as a batch, but then deletes the first, and leaves the rest as dupes

  35. Archive with multiple folder volumes failed: Brave Story, not cleaned up but made. Ran on individual volumes, and each was a separate series

  36. didn't clean up soil 9 & 10

  37. Shuto Heru v13-14

  38. keep the range if it wasn't split

  39. Handle hakuneko folders

  40. Add magazines

  41. Download book covers

  42. Trim white space

  43. Split Double Images

  44. Waifu2x

  45. ~~ to komga to update after modifying content~~

  46. Move folders to prep before doing anything

  47. config to rename automatically

  48. Send API request to Komga

  49. convert to typescript

  50. Set cover to second (n) page based on komga tag

  51. blur nsfw tag

  52. Add tags: use komf

  53. Maintain metadata outside of the archive

  54. dockerify

  55. Get names from google organic search

  56. Undo naming / folder move

  57. Master Config Test. > x results

  58. Manual Series metadata

  59. Scraper Series metadata

  60. Get a new cover image based on existing dimension / reverse image lookup

  61. Detect missing volumes/issues

  62. Interactive naming

  63. Webp

  64. Avif deconversion

  65. Record File hash drift events