Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fixing unittest errors on windows (#222)
* fixing issue #86 from upstream: add get_spans() in Field class, similar to get_spans() in Session class * add unit test for Field get_spans() function * remove unuseful line comments * add dataset, datafreame class * closing issue 92, reset the dataset when call field.data.clear * closing issue 92, reset the dataset when call field.data.clear * add unittest for field.data.clear function * recover the dataset file to avoid merge error when fixing issue 92 * fix end_of_file char in dataset.py * add get_span for index string field * unittest for get_span functions on different types of field, eg. fixed string, indexed string, etc. * dataframe basic methods and unittest * more dataframe operations * minor fixing * update get_span to field subclass * intermedia commit due to test pr 118 * Implementate get_spans(ndarray) and get_spans(ndarray1, ndarray2) function in core.operations. Provide get_spans methods in fields using data attribute. * Move the get_spans functions from persistence to operations. Modify the get_spans functions in Session to call field method and operation method. * minor edits for pull request * remove dataframe for pull request * remove dataframe test for pr * add dataframe * fix get_spans_for_2_fields_by_spans, fix the unittest * Initial commit for is_sorted method on Field * minor edits for the pr * fix minor edit error for pr * add apply_index and apply_filter methods on fields * Adding in missing tests for all field types for is_sorted * update the apply filter and apply index on Fields * minor updates to line up w/ upstream * update apply filter & apply index methods in fields that differ if destination field is set: if set, use dstfld.write because new field usually empty; if not set, write to self using fld.data[:] * updated the apply_index and apply_filter methods in fields. Use olddata[:]=newdata if length of old dataset is equals to new dataset; clear() and write() data if not. * add dataframe basic functions and operations; working on dataset to enable dataframe to create fields. * add functions in dataframe add dataset class add functions in dataset move dataset module to csvdataset * integrates the dataset, dataframe into the session * update the fieldsimporter and field.create_like methods to call dataframe.create update the unittests to follow s.open_dataset and dataset.create_dataframe flow * add license info to a few files * csv_reader_with_njit * change output_excel from string to int * initialize column_idx matrix outside of the njit function * use np.fromfile to load the file into byte array * Refactoring and reformatting of some of the dataset / dataframe code; moving Session and Dataset to abstract types; fixing of is_sorted tests that were broken with the merge of the new functionality * Work on fast csv reading * Address issue #138 on minor tweaks Fix bug: create dataframe in dataset construction method to mapping existing datasets Full syn between dataset with h5file when add dataframe (group), remove dataframe, set dataframe. * remove draft group.py from repo * Improved performance from the fast csv reader through avoiding ndarray slicing * fix dataframe api * fixing #13 and #14, add dest parameter to get_spans(), tidy up the field/fields parameters * minor fix remove dataframe and file property from dataset, as not used so far. * minor fix on unittest * add docstring for dataset * copy/move for dataframe; docstrings * categorical field: convert from byte int to value int within njit function * Adding in of pseudocode version of fast categorical lookup * clean up the comments * docstrings for dataframe * Major reworking of apply_filter / apply_index for fields; they shouldn't destructively change self by default. Also addition of further mem versions of fields and factoring out of common functionality. Fix to field when indices / values are cleared but this leaves data pointing to the old field * add unittest for various fields in dataframe add dataframe.add/drop/move add docstrings * add unittest for Dataframe.add/drop/move * minor change on name to make sure name in consistent over dataframe, dataset.key and h5group * minor fixed of adding prefix b to string in test_session and test_dataset * minor fixed of adding prefix b to string in test_session and test_dataset * Completed initial pass of memory fields for all types * categloric field.keys will return byte key as string, thus minor change on the unittest * solved the byte to string issue, problem is dof python 3.7 and 3.8 * Miscellaneous field fixes; fixed issues with dataframe apply_filter / apply_index * Moving most binary op logic out into a static method in FieldDataOps * Dataframe copy, move and drop operations have been moved out of the DataFrame static methods as python doesn't support static and instance method name overloading (my bad) * Fixing accidental introduction of CRLF to abstract_types * Fixed bug where apply_filter and apply_index weren't returning a field on all code paths; beefed up tests to cover this * Fixed issue in timestamp_field_create_like when group is set and is a dataframe * persistence.filter_duplicate_fields now supports fields as well as ndarrays * sort_on message now shows in verbose mode under all circumstances * Fixed bug in apply filter when a destination dataset is applied * Added a test to catch dataframe.apply_filter bug * Bug fix: categorical_field_constructor in fields.py was returning numeric field when pass a h5py group as a destination for the field * Copying data before filtering, as filtering in h5py is very slow * Adding apply_spans functions to fields * Fixed TestFieldApplySpansCount.test_timestamp_apply_spans that had been written but not run * Issues found with indexed strings and merging; fixes found for apply_filter and apply_index when being passed a field rather than an ndarray; both with augmented testing * Updated merge functions to consistently return memory fields if not provided with outputs but provided with fields * concate cat keys instead of padding * some docstring for fields * dataframe copy/move/drop and unittest * Fixing issue with dataframe move/copy being static * Updating HDF5Field writeable methods to account for prior changes * Adding merge functionality for dataframes * dataset.drop is a member method of Dataset as it did not make sense for it to be static or outside of the class * Added missing methods / properties to DataFrame ABC * minor update on dataframe static function * minor update * minor update session * minor comments update * minor comments update * add unittest for csv_reader_speedup.py * count operation; logical not for numeric fields * remove csv speed up work from commit * minor update * unit test for logical not in numeric field * patch for get_spans for datastore * tests for two fields * add as type to numeric field * seperate the unittest of get_spans by datastore reader * unittest for astype * update astype for fields, update logical_not for numeric fields * remove dataframe view commits * remove kwargs in get_spans in session, add fields back for backward compatibility * remove filter view tests * partial commit on viewer * remote view from git * add df.describe unittest * sync with upstream * Delete python-publish.yml * Update python-app.yml * Update python-app.yml * dataframe describe function * sync with upstream * Update python-app.yml * alternative get_timestamp notebook for discussion * update the notebook output of linux and mac * update format * update the to_timestamp and to_timestamp function in utils fix the current datetime.timestamp() error in test_fields and test_sessions * add unittest for utils to_timestamp and to_datetimie * fix for pr * setup github action specific for windows for cython * minor workflow fix * add example pyx file * fix package upload command on win; as the git action gh-action-pypi-publish works only on linux * add twine as tools * add linux action file * update the linux build command * build workflow for macos * minor update the macos workflow * fixed timestamp issue on windows by add timezone info to datetime * finanlize workflow file, compile react to publish action only * avoid the bytearray vs string error in windows by converting result to bytearray * fixing string vs bytesarray issue * update categorical field key property, change the key, value to bytes if it is a str * solved index must be np.int64 error * all unittest error on windoes removed * minor update on workflow file * minor update workflow file * minor fix: use pip install -r ; remove unused import in utils.py * update action file * remove change on test_presistence on uint32 to int32 Co-authored-by: jie <[email protected]> Co-authored-by: Ben Murray <[email protected]> Co-authored-by: clyyuanzi-london <[email protected]>
- Loading branch information