You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to see whether ISRC tags are present in a large audio collection using mid3v2 -l 00*/*3 | grep -a TSRC
it dies halfway through, saying
IDv2 tag info for 00-225167/mina - volami nel cuore.mp3
TIT2=Volami nel cuore
TPE1=MINA
TRCK=1
IDv2 tag info for Traceback (most recent call last):
File "/usr/bin/mid3v2", line 33, in <module>
sys.exit(load_entry_point('mutagen==1.46.0', 'console_scripts', 'mid3v2')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/mutagen/_tools/mid3v2.py", line 484, in entry_point
return main(sys.argv)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/mutagen/_tools/mid3v2.py", line 469, in main
list_tags(args)
File "/usr/lib/python3/dist-packages/mutagen/_tools/mid3v2.py", line 335, in list_tags
print("IDv2 tag info for", filename)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in position 13: surrogates not allowed
This isn't Mina's fault; it's the following file's name which is ANSI or CP437 encoded: "modà - la notte.mp3" where à is represented by character 0x85.
The same goes for other files whose names contain 0x8A for è, 0xB4 for é, 0x95 for ò, 0x97 for ù, 0xA2 for ó and so on.
On Debian GNU/Linux with LANG=en_GB.UTF-8
The text was updated successfully, but these errors were encountered:
martinwguy
changed the title
mid3v2 crashes with UnicodeEncodeError: surrogates not allowed
mid3v2 crashes with "UnicodeEncodeError: surrogates not allowed" on files with accented characters in the filename
May 20, 2024
I think that this is not mutagen's nor python's fault. If the filename is encoded in CP437 and not UTF-8, which is what python expects according to your LANG setting, then I'd say the best fix is to reencode the filenames correctly.
This can be done with: convmv -f cp437 -t utf-8 *. That will just show how the files will be renamed but doesn't do any change. Once you check that the encoding is right, you can run: convmv -f cp437 -t utf-8 --notest * to actually change the filenames in disk.
Trying to see whether ISRC tags are present in a large audio collection using
mid3v2 -l 00*/*3 | grep -a TSRC
it dies halfway through, saying
This isn't Mina's fault; it's the following file's name which is ANSI or CP437 encoded: "modà - la notte.mp3" where à is represented by character 0x85.
The same goes for other files whose names contain 0x8A for è, 0xB4 for é, 0x95 for ò, 0x97 for ù, 0xA2 for ó and so on.
On Debian GNU/Linux with LANG=en_GB.UTF-8
The text was updated successfully, but these errors were encountered: