You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered this issue with glacier-cli failing due to git-annex mistakenly adding things that look like file extension to the key when using the SHA256E backend. Essentially what it means is that certain files will have characters that look like a file extension appended to the key, even when they might not be part of the extension.
Example:
% ls 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3
12. Change The World (feat. 웅산).mp3
% git annex info 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3
file: 12. Change The World (feat. 웅산).mp3
size: 7.48 megabytes
key: SHA256E-s7479642--957208748ae03fe4fc8d7877b2c9d82b7f31be0726e4a3dec9063b84cc64cf09.웅산.mp3
present: true
% git annex calckey 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3
SHA256E-s7479642--957208748ae03fe4fc8d7877b2c9d82b7f31be0726e4a3dec9063b84cc64cf09.웅산.mp3
And the will be a fix for the case with brackets, but there are other cases in which a file extension might not be just ASCII. And then this is what happens:
% git annex copy 12.\ Change\ The\ World\ \(feat.\ 웅산\).mp3 --to glacier
copy 12. Change The World (feat. 웅산).mp3 (checking glacier...) Traceback (most recent call last):
File "/usr/local/bin/glacier", line 737, in <module>
main()
File "/usr/local/bin/glacier", line 733, in main
App().main()
File "/usr/local/bin/glacier", line 719, in main
self.args.func()
File "/usr/local/bin/glacier", line 600, in archive_checkpresent
self.args.vault, self.args.name)
File "/usr/local/bin/glacier", line 161, in get_archive_last_seen
result = self._get_archive_query_by_ref(vault, ref).one()
File "/usr/local/bin/glacier", line 136, in _get_archive_query_by_ref
if ref.startswith('id:'):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 83: ordinal not in range(128)
(user error (glacier ["--region=eu-west-1","archive","checkpresent","music","--quiet","SHA256E-s7479642--957208748ae03fe4fc8d7877b2c9d82b7f31be0726e4a3dec9063b84cc64cf09.\50885\49328.mp3"] exited 1)) failed
git-annex: copy: 1 failed
Now, As the bug report says, you can avoid this issue by changing your backend from SHA256E to SHA256 to avoid adding extensions. But I think addressing this issue would be good anyway.
The text was updated successfully, but these errors were encountered:
Note that on unix, filenames have no defined encoding. No matter how the locale is set up, any filename can contain most any series of bytes. It would be good to just treat the filename passed to glacier as a binary blob if you can.
On Tue, Mar 06, 2018 at 05:49:41PM +0000, Joey Hess wrote:
Note that on unix, filenames have no defined encoding. No matter how the locale is set up, any filename can contain most any series of bytes. It would be good to just treat the filename passed to glacier as a binary blob if you can.
IIRC, AWS Glacier limits "descriptions" to 7 bit printable ASCII, and
glacier-cli uses the description as the "name" by default, in order that
no state needs to be carried outside Glacier in order to be fully
restoreable.
See #16 for another option in
resolving this - by asking glacier-cli to use a lossless "encoding".
As far as I understand, there are only two options:
1. Limit what names glacier-cli is given to 7 bit printable ASCII.
2. Have glacier-cli encode the names it is given.
I've encountered this issue with
glacier-cli
failing due to git-annex mistakenly adding things that look like file extension to the key when using theSHA256E
backend. Essentially what it means is that certain files will have characters that look like a file extension appended to the key, even when they might not be part of the extension.Example:
I've opened an issue with git-annex here:
https://git-annex.branchable.com/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/
And the will be a fix for the case with brackets, but there are other cases in which a file extension might not be just ASCII. And then this is what happens:
Now, As the bug report says, you can avoid this issue by changing your backend from
SHA256E
toSHA256
to avoid adding extensions. But I think addressing this issue would be good anyway.The text was updated successfully, but these errors were encountered: