-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable .json formats for uploading and metadata editing #443
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -78,7 +78,7 @@ | |||
from internetarchive.cli.argparser import get_args_dict, convert_str_list_to_unicode | ||||
from internetarchive.session import ArchiveSession | ||||
from internetarchive.utils import (InvalidIdentifierException, get_s3_xml_text, | ||||
is_valid_metadata_key, validate_s3_identifier) | ||||
is_valid_metadata_key, validate_s3_identifier, is_json) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move the new import into its proper alphabetical position, (between get_s3_xml_text and is_valid_metadata_key) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tendency seems to be towards 88 chars max per line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, 102 in internetarchive: Line 4 in 2456376
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the past I've shot for 90 chars max per line, personallly. If 88 is the tendency, I'm happy to change to that! much less than that tends to frustrate me personally though (creating a lot of weird multi-line statements). |
||||
|
||||
# Only import backports.csv for Python2 (in support of FreeBSD port). | ||||
PY2 = sys.version_info[0] == 2 | ||||
|
@@ -260,8 +260,12 @@ def main(argv, session): | |||
# Bulk upload using spreadsheet. | ||||
else: | ||||
# Use the same session for each upload request. | ||||
with io.open(args['--spreadsheet'], 'rU', newline='', encoding='utf-8') as csvfp: | ||||
spreadsheet = csv.DictReader(csvfp) | ||||
|
||||
with io.open(args['--spreadsheet'], 'rU', newline='', encoding='utf-8') as fp: | ||||
if is_json(args['--spreadsheet']): | ||||
spreadsheet = json.load(fp) | ||||
else: | ||||
spreadsheet = csv.DictReader(fp) | ||||
Comment on lines
+265
to
+268
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is strange to call a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The [
{"identifier": "item", "file": "./foobar"},
{"file": "./foobaz"}
] This needs test coverage though and an example in the documentation. |
||||
prev_identifier = None | ||||
for row in spreadsheet: | ||||
for metadata_key in row: | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -398,3 +398,8 @@ def merge_dictionaries(dict0, dict1, keys_to_drop=None): | |
# Items from `dict1` take precedence over items from `dict0`. | ||
new_dict.update(dict1) | ||
return new_dict | ||
|
||
|
||
def is_json(filename): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name should most likely be changed to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now that we have dropped legacy Python, let's add type hints on the function parameters and return type.
https://docs.python.org/3/library/os.path.html#os.path.splitext Of course, my favorite would be
https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see any value in doing either of those over There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd love to make more use of pathlib now that we're PY3 only. With that in mind, it might be worth using it here vs. the simplified I hear your argument against it though, too, @JustAnotherArchivist. I'm fine with either, but would lean towards cclauss's suggestion above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my opinion the Yet if me managed to change to paths where possible and over time consistently manged to use pathlib, that would be pretty nice e.g. if functions where it made sense would already take a path as input instead of a string. |
||
# Checks whether filetype is .json, else fallsback to .csv file format | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "falls back" should be written as two words. |
||
return bool(re.fullmatch('^.*\.json$', filename)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @maxz In this instance, the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cclauss I'm pretty sure they meant that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parentheses should be removed.