-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not upload files with diacritics in name #16
Comments
Amazon Glacier does not permit anything non-ASCII in the name. Details are here: http://docs.amazonwebservices.com/amazonglacier/latest/dev/api-archive-post.html "The description must be less than or equal to 1,024 characters. The allowable characters are 7-bit ASCII without control codes, specifically ASCII values 32—126 decimal or 0x20—0x7E hexadecimal." The error message you get from glacier-cli is not helpful though, and I will leave this issue open to fix that. |
Boto can handle it by passing in decoded UTF-8. So we just have to pass it to boto as unicode and not as ascii. I tested this patch:
The second part make uploading work. But
Which is weird to me, because when I'm trying to reproduce it in python console it works:
|
I don't understand. If Amazon will only take ASCII in the range 32-126, how do you expect glacier-cli or boto to encode it for sending to Amazon? If, after a disaster, you use a different tool for recovery, how will that tool know how to decode your encoded archive names? |
Lets have character 'á' |
I don't follow. "\xc3" is 195 decimal, which is greater than the Amazon 126 limit, no? |
I think I've just understood what you are trying to do and now get what you mean by "8 character long string". The problem is though that this overloads the backslash character. If Amazon Glacier gives glacier-cli an archive of description '\xc3\xa1' (8 byte long literal), then how does glacier-cli know whether to create a filename of exactly 8 ASCII bytes ['', 'c', '3', ...] or a filename of exactly 1 UTF-8 'á'? Fundamentally, glacier-cli is a front end for Amazon Glacier, and Glacier doesn't support Unicode so neither can glacier-cli without introducing ambiguities in decoding which harms interoperability with other tools. So I regret that glacier-cli will never be able to support Unicode archive names by default. If you want to add functionality so that the user can specify some kind of mapping as a command line option (that won't be default), then I'd be happy to accept that. It would need to either be some accepted standard method or be done in a pluggable way to support multiple mappings, and needs to be free of conversion ambiguities. Alternatively, a wrapper to glacier-cli might be able to do this, or users could use git-annex which keeps filename metadata in the annex instead of in the special remote. |
I do not suppose no one name filenames in utf8 encoded format, but ok. I think having this as option, which is by default off is fine as well. What about --allow-utf8 ? I find the problem with |
But what encoding would --allow-utf8 use? I'm just looking at http://docs.python.org/2/library/codecs.html#standard-encodings. Why don't we pick one of these? A suitable one would be a coding that converts from Unicode to something that Amazon Glacier can accept (ie. fits into the range 32-126). How about quopri-codec? Quoted-printable is a fairly standard way of embedding Unicode data into a 7-bit stream, right? I'd prefer --convert-utf8 to make it clear that what goes into Glacier is being modified in some way. So then glacier-cli could do a simple How does this sound? |
On 11/22/2012 02:14 PM, basak wrote:
But UTF8 is one of these - unicode_escape :)
I do not agree. That is standard in email world. But to my experience
OK, --convert-utf8 then. Mirek |
But UTF-8 is not unicode_escape! We cannot use UTF-8 since Amazon is not 8-bit clean for Glacier archive descriptions. And if we use unicode_escape, then we're limiting our interoperability only to other Python tools. Is there a common encoding that is 7-bit friendly that is generally accepted and not Python-specific? Apart from quoted-printable, I only see base64 and hex. |
On 23.11.2012 09:33, basak wrote:
Hmm, I think everybody will have different opinion.
I have the patch already ready and checking that, I see that such change
|
That sounds absolutely fine. How about |
--transcode-names= I will send pull request on Monday. |
this will encode name to utf8 before sending closes: basak#16
This is just byting me. What is the status of this request? |
this will encode name to utf8 before sending closes: basak#16
I have filename "./2009/Agátka ve školce/PC090374.JPG"
and I'm trying to upload it using:
`glacier archive upload --name "./2009/Agátka ve školce/PC090374.JPG" Photos "./2009/Agátka ve školce/PC090374.JPG"``
I end up with traceback:
Not sure if this is problem of boto or glacier-cli.
Will investigate later.
The text was updated successfully, but these errors were encountered: