Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better sheet utils #289

Merged
merged 110 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 101 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
4c84f0b
First cut at tools for parsing workbooks.
netsettler Aug 14, 2023
7b73a67
Refactor to separate some functionality into a separate sevice class.
netsettler Aug 14, 2023
3d4573f
Add a csv file for testing.
netsettler Aug 14, 2023
f4e5cfa
Add some negative testing.
netsettler Aug 14, 2023
e9d2465
Update lock file.
netsettler Aug 14, 2023
6e9060f
Document new sheets_utils module.
netsettler Aug 14, 2023
df12c91
Issue a beta for this functionality.
netsettler Aug 15, 2023
6a39c8a
Fix documentation for sheet_utils.
netsettler Aug 15, 2023
eedb5c6
Add some declarations. Small refactors to improve modularity.
netsettler Aug 16, 2023
a6b68fe
Rearrange some methods for presentational reasons.
netsettler Aug 16, 2023
3ff63a9
First cut at useful functionality.
netsettler Aug 17, 2023
39bd2e0
Some name changes to make things more abstract. workbook becomes read…
netsettler Aug 17, 2023
77b72f6
Rename sheetname to tabname throughout, to be more clear that this is…
netsettler Aug 17, 2023
ba8c55c
Add some doc strings. Rename load_table_set to just load. Arrange for…
netsettler Aug 17, 2023
50488cb
Add load_items function. Fix some test names. Update changelog.
netsettler Aug 17, 2023
807e525
Experimental bug fix from Will to hopefully make get_schema_names work.
netsettler Aug 17, 2023
2a8e81a
update changelog
netsettler Aug 17, 2023
718054a
Update dcicutils/sheet_utils.py
netsettler Aug 17, 2023
682c95a
Merge branch 'master' into kmp_sheet_utils
netsettler Aug 17, 2023
582f002
Merge branch 'kmp_sheet_utils' into kmp_sheet_utils_refactor_for_csv
netsettler Aug 17, 2023
56d1459
Add some comments in response to Doug's code review.
netsettler Aug 17, 2023
2facf9e
Support TSV files.
netsettler Aug 17, 2023
bcc4e63
Add changelog info about tsv files.
netsettler Aug 17, 2023
9de282e
Add a missing data file.
netsettler Aug 17, 2023
8d6495f
First stable cut at schema hinting. Doesn't find schemas automaticall…
netsettler Aug 23, 2023
3a103ee
Merge branch 'master' into kmp_sheet_utils
netsettler Aug 23, 2023
56f702a
Mark chardet as an acceptable license for use.
netsettler Aug 24, 2023
08d428e
Merge branch 'kmp_sheet_utils' into kmp_sheet_utils_refactor_for_csv
netsettler Aug 24, 2023
42ad579
Merge branch 'kmp_sheet_utils_refactor_for_csv' into kmp_sheet_utils_…
netsettler Aug 24, 2023
60ada3f
Backport some small fixes and cosmetics from the schemas branch.
netsettler Aug 24, 2023
690a833
Cosmetic fix.
netsettler Aug 24, 2023
946b998
Add some missing newlines in data files.
netsettler Aug 24, 2023
36e7de0
Support for coping with .tsv files where trailing whitespace is 'help…
netsettler Aug 24, 2023
a51fb27
PEP8
netsettler Aug 24, 2023
6f097a6
Document our choice of why is_uuid is defined here as it is.
netsettler Aug 24, 2023
477c7a2
PEP8
netsettler Aug 24, 2023
09b4c43
Fix error handling to be clearer.
netsettler Aug 24, 2023
f3bd815
Fix CHANGELOG to reflect recent renamings.
netsettler Aug 24, 2023
7627f6f
Fix a type hint and some PEP8.
netsettler Aug 24, 2023
98cd37c
Implement a cut at escaping for tsv files.
netsettler Aug 24, 2023
3852e56
Add a test case for all of the pieces of parsing and schema hinting p…
netsettler Aug 25, 2023
660df9c
Small cosmetic changes and some additional support for upcoming work.
netsettler Aug 25, 2023
34d528b
Fix a unit test to conform to new google account name.
netsettler Aug 25, 2023
1c34ad0
Fix typo in comment (dcicutils/misc_utils.py)
netsettler Aug 25, 2023
41fad79
Add some doc strings and comments.
netsettler Aug 28, 2023
6e8ce2c
Rename tabname to tab_name throughout the sheet_utils interfaces.
netsettler Aug 30, 2023
04eb58c
Add support for reading inserts dirs, .json, .jsonl (two formats), an…
netsettler Aug 31, 2023
ce9f9bc
Bump beta version.
netsettler Aug 31, 2023
0ea5b62
Add yaml formats.
netsettler Aug 31, 2023
bcc1128
Add class AbstractItemManager. Rename InsertsItemManager to InsertsDi…
netsettler Sep 1, 2023
7de093a
Rename ._parser() to ._parse_json_data(). Factor type checks out of .…
netsettler Sep 1, 2023
b01e34b
Rename _parse_json_data, _load_json_data, and _check_json_data, respe…
netsettler Sep 1, 2023
0ae48ee
WIP. Testing good.
netsettler Sep 1, 2023
1e2c5a9
WIP. Tests passing.
netsettler Sep 2, 2023
b8a4c39
Rearrange the way escaping= works so both csv an tsv files can using …
netsettler Sep 2, 2023
a2fe079
Separate registration of regular table set managers from registration…
netsettler Sep 2, 2023
91ddce0
Stub in checking of required headers.
netsettler Sep 2, 2023
142a20b
Bump beta version.
netsettler Sep 5, 2023
e09af07
PEP8
netsettler Sep 5, 2023
70762c6
Merge branch 'kmp_schemas_from_vapp' into kmp_sheet_utils_with_vapp
netsettler Sep 7, 2023
7d2ecaa
Fix a bug in newly proposed ff_utils.get_schemas with vapp.
netsettler Sep 7, 2023
5e46273
Extend VirtualApp to amke it easier to test by adding an AbstractVirt…
netsettler Sep 7, 2023
53de60a
Implement portal_vapp= in sheet_utils.
netsettler Sep 7, 2023
630720f
Simplifications per Will's code review.
netsettler Sep 7, 2023
5a07b69
Merge utils 7.10.0 from master.
netsettler Sep 7, 2023
295adfe
Merge pull request #282 from 4dn-dcic/kmp_sheet_utils_with_vapp
netsettler Sep 7, 2023
486adce
Merge branch 'master' into kmp_sheet_utils_schema_hinting
netsettler Sep 9, 2023
cb5125c
Refactor to have a separate bundle_utils.py
netsettler Sep 11, 2023
a7aac44
Mostly PEP8
netsettler Sep 12, 2023
54c51aa
Add support for zipped files.
netsettler Sep 12, 2023
d8cc211
Merge branch 'kmp_sheet_utils_schema_hinting' into kmp_sheet_utils_be…
netsettler Sep 12, 2023
a19e8e3
Add bundle_utils to autodoc.
netsettler Sep 12, 2023
36de089
Added sheet_utils and glacier_utils changes from 7.11.0.1b9
dmichaels-harvard Sep 23, 2023
13d40f3
Added sheet_utils and glacier_utils changes from 7.11.0.1b9
dmichaels-harvard Sep 23, 2023
23a87a7
Update dcicutils to Python 3.11 WITH sheet_utils
dmichaels-harvard Sep 23, 2023
ecea292
Update dcicutils to Python 3.11 WITH sheet_utils
dmichaels-harvard Sep 23, 2023
cdc4677
Updated boto versions.
dmichaels-harvard Sep 26, 2023
e4f846d
Updated boto versions.
dmichaels-harvard Sep 26, 2023
00f1541
Added Python 3.10, 3.11 to pyproject Python list
dmichaels-harvard Sep 28, 2023
3a73263
Fix to qa_utils for application/vnd.software602.filler.form+xml mime …
dmichaels-harvard Sep 28, 2023
cd4dd35
Fix to qa_utils for application/vnd.software602.filler.form+xml mime …
dmichaels-harvard Sep 28, 2023
860a433
Fix to qa_utils for application/vnd.software602.filler.form+xml mime …
dmichaels-harvard Sep 28, 2023
87fbcdb
Fix to qa_utils for application/vnd.software602.filler.form+xml mime …
dmichaels-harvard Sep 28, 2023
fd7a5b1
Merged in master
dmichaels-harvard Sep 29, 2023
7aeecb2
poetry update
dmichaels-harvard Sep 29, 2023
f820233
Merge branch 'master' into kmp_sheet_utils_schema_hinting
netsettler Oct 1, 2023
42c289b
Merge branch 'kmp_sheet_utils_schema_hinting' into kmp_sheet_utils_be…
netsettler Oct 1, 2023
3d25f56
WIP
netsettler Oct 14, 2023
f9c38a0
WIP
netsettler Oct 20, 2023
549f348
Reshuffle caching in SchemaManager so it sees instance-local schemas …
netsettler Oct 20, 2023
efffb51
Merge branch 'pyyaml-version-6-which-is-also-python311-and-sheet-util…
netsettler Oct 20, 2023
336142c
Fix PEP8 and some static checks.
netsettler Oct 20, 2023
e308a69
Merge branch 'master' into kmp_sheet_utils_better_schemas3
netsettler Oct 20, 2023
fa1f272
Make sure jsonschema support is loaded.
netsettler Oct 20, 2023
2a80c95
Import jsonschema better.
netsettler Oct 23, 2023
f2a1c4f
Tidy things up Will's code review.
netsettler Oct 23, 2023
9992ad5
Bump alpha version.
netsettler Oct 23, 2023
e6815f7
Rearrange some items in dcicutils/common.py. No functional change.
netsettler Oct 23, 2023
7cb98bb
Revert some changes to glacier_utils.py
netsettler Oct 23, 2023
04a5aae
Update changelog.
netsettler Oct 23, 2023
68d9459
Bump alpha version.
netsettler Oct 23, 2023
46a2c09
Begin to address David's problems in C4-1111.
netsettler Oct 25, 2023
263ce0a
PEP8
netsettler Oct 25, 2023
60b5ef1
Correct a testing problem (hopefully).
netsettler Oct 25, 2023
d9fb9f6
Stub in support for checking non-flattened files.
netsettler Oct 25, 2023
748cde8
Refactor to make an extra entry point for type hinting.
netsettler Oct 25, 2023
61b8155
Support non-string elements of the sequences given to lang_utils.conj…
netsettler Oct 30, 2023
646a7bd
Merge pull request #290 from 4dn-dcic/kmp_sheet_utils_better_schemas3…
netsettler Oct 31, 2023
d910853
PEP8
netsettler Oct 31, 2023
dfc93e3
De-beta as 8.1.0
netsettler Oct 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,69 @@ dcicutils
Change Log
----------

8.1.0
=====

* New module ``bundle_utils.py`` that is intended for schema-respecting worksheets ("metadata bundle").
There are various modular bits of functionality here, but the main entry point here is:

* ``load_items`` to load data from a given table set, doing certain notational canonicalizations, and
checking that things are in the appropriate format.

* In ``common.py``, new hint types:

* ``CsvReader``
* ``JsonSchema``
* ``Regexp``

* In ``lang_utils.py``:

* New arguments ``just_are=`` to ``there_are`` get verb conjugation without the details.

* Add "while" to "which" and "that" as clause handlers in the string pluralizer
(e.g., so that "error while parsing x" pluralizes as "errors while parsing x")

* In ``misc_utils.py``, miscellaneous new functionality:

* New class ``AbstractVirtualApp`` that is either an actual VirtualApp or can be used to make mocks
if the thing being called expects an ``AbstractVirtualApp`` instead of a ``VirtualApp``.

* New function ``to_snake_case`` that assumes its argument is either a CamelCase string or snake_case string
and returns the snake_case form.

* New function ``is_uuid`` (migrated from Fourfront)

* New function ``pad_to``

* New class ``JsonLinesReader``

* In ``qa_checkers.py``:

* Change the ``VERSION_IS_BETA_PATTERN`` to recognize alpha or beta patterns. Probably a rename would be better,
but also incompatible. As far as I know, this is used only to not fuss if you haven't made a changelog entry
for a beta (or now also alpha).

* New module ``sheet_utils.py`` for loading workbooks in a variety of formats, but without schema interpretation.

A lot of this is implementation classes for each of the kinds of files, but the main entry point
is intended to be ``load_table_set`` if you are not working with schemas. For schema-related support,
see ``bundle_utils.py``.

* New module ``validation_utils.py`` with these facilities:

* New class ``SchemaManager`` for managing a set of schemas so that programs asking for a schema by name
only download one time and then use a cache. There are also facilities here for populating a dictionary
with all schemas in a table set (the kind of thing returned by ``load_table_set`` in ``sheet_utils.py``)
in order to pre-process it as a metadata bundle for checking purposes.

* New functions:

* ``validate_data_against_schemas`` to validate that table sets (workbooks, or the equivalent) have rows
in each tab conforming to the schema for that tab.

* ``summary_of_data_validation_errors`` to summarize the errors obtained from ``validate_data_against_schemas``.


8.0.0
=====

Expand All @@ -17,6 +80,7 @@ Change Log
and searching our GitHub organizations (4dn-dcic, dbmi-bgm, smaht-dac) the only ones which might
be affected are cwltools and parliament2, neither of which are dependent on dcicutils in any way.


7.13.0
======

Expand Down
Loading
Loading