-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dfview #297
Open
deng113jie
wants to merge
239
commits into
master
Choose a base branch
from
dfview
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Dfview #297
Changes from 3 commits
Commits
Show all changes
239 commits
Select commit
Hold shift + click to select a range
a2d7008
fixing issue #86 from upstream:
62925bb
add unit test for Field get_spans() function
0e313dc
remove unuseful line comments
e211371
add dataset, datafreame class
deng113jie 39e4535
Merge remote-tracking branch 'upstream/master'
deng113jie 329a7cc
closing issue 92, reset the dataset when call field.data.clear
deng113jie d9d8b02
closing issue 92, reset the dataset when call field.data.clear
deng113jie f7ba342
Merge branch 'master' into patch92
deng113jie 21f0fa9
add unittest for field.data.clear function
deng113jie c9363ef
recover the dataset file to avoid merge error when fixing issue 92
deng113jie 14fc1f3
fix end_of_file char in dataset.py
deng113jie 2d13342
add get_span for index string field
deng113jie 666073e
unittest for get_span functions on different types of field, eg. fixe…
deng113jie 73aa50e
Merge remote-tracking branch 'upstream/master'
deng113jie 689cc3f
Merge remote-tracking branch 'upstream/master' into dataframe
deng113jie 8ba818f
dataframe basic methods and unittest
deng113jie abb3337
more dataframe operations
deng113jie 3180cbd
fix upstream merge conflict
deng113jie 9b9c420
minor fixing
deng113jie 55989d6
update get_span to field subclass
deng113jie cd69d04
solve conflict
deng113jie f2136d5
intermedia commit due to test pr 118
deng113jie 30953e3
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 0dccc6e
Merge remote-tracking branch 'upstream/master' into dataframe
deng113jie 000463d
Implementate get_spans(ndarray) and get_spans(ndarray1, ndarray2) fun…
deng113jie 37972b5
Merge branch 'dataframe'
deng113jie 74c1dad
Move the get_spans functions from persistence to operations.
deng113jie bf210c4
Merge branch 'dataframe'
deng113jie 95c1645
minor edits for pull request
deng113jie 5db42d2
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 664e255
remove dataframe for pull request
deng113jie 02265fe
remove dataframe test for pr
deng113jie f536652
add dataframe
deng113jie bafe9cf
Merge remote-tracking branch 'upstream/master' into dataframe
deng113jie 223dbe9
fix get_spans_for_2_fields_by_spans, fix the unittest
deng113jie cc48016
Merge branch 'master' into dataframe
deng113jie 948ce1a
Initial commit for is_sorted method on Field
atbenmurray 37b8ac2
minor edits for the pr
deng113jie 0369c92
fix minor edit error for pr
deng113jie 2096828
Merge branch 'master' into dataframe
deng113jie f213240
add apply_index and apply_filter methods on fields
deng113jie b050d74
Merging from recent PRs
atbenmurray 76b5ff1
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera into da…
deng113jie fe36b94
Adding in missing tests for all field types for is_sorted
atbenmurray daa6012
update the apply filter and apply index on Fields
deng113jie 5c43f38
minor updates to line up w/ upstream
deng113jie 459b91c
update apply filter & apply index methods in fields that differ if de…
deng113jie c0ac960
updated the apply_index and apply_filter methods in fields. Use oldda…
deng113jie dd0867d
add dataframe basic functions and operations; working on dataset to e…
deng113jie e52d825
add functions in dataframe
deng113jie 463ea70
integrates the dataset, dataframe into the session
deng113jie 76d1952
update the fieldsimporter and field.create_like methods to call dataf…
deng113jie 7cfeceb
add license info to a few files
deng113jie b1cb082
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie eaac2b6
csv_reader_with_njit
Liyuan-Chen-1024 a9ce1fb
change output_excel from string to int
Liyuan-Chen-1024 113a83f
Merge branch 'master' of github.com:KCL-BMEIS/ExeTera into importer_c…
Liyuan-Chen-1024 375982c
solve merge conflict
Liyuan-Chen-1024 e9d1053
initialize column_idx matrix outside of the njit function
Liyuan-Chen-1024 e1ed80d
use np.fromfile to load the file into byte array
Liyuan-Chen-1024 f4fe394
Merge branch 'master' into field_is_sorted_method
atbenmurray a057677
Refactoring and reformatting of some of the dataset / dataframe code;…
atbenmurray 0845a63
Merge branch 'issort' into dataframe
deng113jie 4d2886a
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera into da…
deng113jie db3ec9f
Work on fast csv reading
atbenmurray f2efedc
Address issue #138 on minor tweaks
deng113jie 4926330
remove draft group.py from repo
deng113jie 56bb190
Improved performance from the fast csv reader through avoiding ndarra…
atbenmurray 04d810b
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie f0b7e37
fix dataframe api
deng113jie 18d49a6
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 737eeed
fixing #13 and #14, add dest parameter to get_spans(), tidy up the fi…
deng113jie 732762d
minor fix remove dataframe and file property from dataset, as not use…
deng113jie ab6508c
minor fix on unittest
deng113jie 39027f7
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 358d82b
add docstring for dataset
deng113jie 98a4d7f
copy/move for dataframe; docstrings
deng113jie e6b1a57
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie a0e0167
categorical field: convert from byte int to value int within njit fun…
Liyuan-Chen-1024 204bd39
merge
Liyuan-Chen-1024 c788b96
Adding in of pseudocode version of fast categorical lookup
atbenmurray 60f2ba9
clean up the comments
Liyuan-Chen-1024 bba4829
Merge branch 'importer_csv_reader' of github.com:KCL-BMEIS/ExeTera in…
Liyuan-Chen-1024 c341eb2
docstrings for dataframe
deng113jie b23f1d8
Major reworking of apply_filter / apply_index for fields; they should…
atbenmurray 63bd5a0
add unittest for various fields in dataframe
deng113jie 650014e
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie cb9f2a2
add unittest for Dataframe.add/drop/move
deng113jie 013f401
minor change on name to make sure name in consistent over dataframe, …
deng113jie 18ce7ce
minor fixed of adding prefix b to string in test_session and test_dat…
deng113jie 8657081
minor fixed of adding prefix b to string in test_session and test_dat…
deng113jie 51e2fec
Completed initial pass of memory fields for all types
atbenmurray 955aede
categloric field.keys will return byte key as string, thus minor chan…
deng113jie 039d8ee
solved the byte to string issue, problem is dof python 3.7 and 3.8
deng113jie 547bb88
Miscellaneous field fixes; fixed issues with dataframe apply_filter /…
atbenmurray 700635f
Moving most binary op logic out into a static method in FieldDataOps
atbenmurray dec92ca
Resolved conflicts in dataframe.py
atbenmurray b631932
Dataframe copy, move and drop operations have been moved out of the D…
atbenmurray 4804417
Fixing accidental introduction of CRLF to abstract_types
atbenmurray f16cb09
Fixed bug where apply_filter and apply_index weren't returning a fiel…
atbenmurray 37dac08
Fixed issue in timestamp_field_create_like when group is set and is a…
atbenmurray 8c62e0a
persistence.filter_duplicate_fields now supports fields as well as nd…
atbenmurray cfcb69b
sort_on message now shows in verbose mode under all circumstances
atbenmurray 22504ef
Fixed bug in apply filter when a destination dataset is applied
atbenmurray 23c373d
Added a test to catch dataframe.apply_filter bug
atbenmurray 98624e6
Bug fix: categorical_field_constructor in fields.py was returning num…
atbenmurray 76d8717
Copying data before filtering, as filtering in h5py is very slow
atbenmurray 44a9c3d
Adding apply_spans functions to fields
atbenmurray 210f847
Fixed TestFieldApplySpansCount.test_timestamp_apply_spans that had be…
atbenmurray f8829ae
Merge commit 'refs/pull/149/head' of https://github.com/KCL-BMEIS/Exe…
deng113jie a7d6673
Issues found with indexed strings and merging; fixes found for apply_…
atbenmurray 3d322c2
Updated merge functions to consistently return memory fields if not p…
atbenmurray 294ec3a
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie e8edd9d
concate cat keys instead of padding
Liyuan-Chen-1024 c2ba9ff
some docstring for fields
deng113jie 1a19815
dataframe copy/move/drop and unittest
deng113jie 1fb0362
Fixing issue with dataframe move/copy being static
atbenmurray 937368e
Updating HDF5Field writeable methods to account for prior changes
atbenmurray cddcf66
Adding merge functionality for dataframes
atbenmurray 534cbd4
dataset.drop is a member method of Dataset as it did not make sense f…
atbenmurray e5dc536
Added missing methods / properties to DataFrame ABC
atbenmurray 9b1a4a9
minor update on dataframe static function
deng113jie 1967685
minor update
deng113jie 6c3270a
Merge commit 'refs/pull/157/head' of https://github.com/KCL-BMEIS/Exe…
deng113jie 6bdb08e
minor update session
deng113jie 3680436
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie cf5f5a6
minor comments update
deng113jie 23ad71a
minor comments update
deng113jie 75eefc0
add unittest for csv_reader_speedup.py
Liyuan-Chen-1024 3ddc916
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 3a6dc51
Merge commit 'refs/pull/137/head' of https://github.com/KCL-BMEIS/Exe…
deng113jie c02fe32
count operation; logical not for numeric fields
deng113jie 58159d0
remove csv speed up work from commit
deng113jie a7b477d
minor update
deng113jie 903f3b4
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 29f736d
unit test for logical not in numeric field
deng113jie 7fd9bdc
patch for get_spans for datastore
deng113jie 04df757
tests for two fields
deng113jie e47e15c
add as type to numeric field
deng113jie a4b14fb
Merge branch 'master' of https://github.com/deng113jie/ExeTera
deng113jie 5492b94
seperate the unittest of get_spans by datastore reader
deng113jie 25320bd
unittest for astype
deng113jie e289c6b
Merge branch 'dspatch'
deng113jie a59c13a
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 87df0bc
update astype for fields, update logical_not for numeric fields
deng113jie 0875149
remove dataframe view commits
deng113jie c335831
remove kwargs in get_spans in session, add fields back for backward c…
deng113jie bdf783a
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie ea20c60
remove filter view tests
deng113jie 778d56c
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 611601a
partial commit on viewer
deng113jie 66867b7
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie fbe396f
remote view from git
deng113jie c2c7185
add df.describe unittest
deng113jie 78cc222
sync with upstream
deng113jie 001134c
Delete python-publish.yml
deng113jie eb0bb76
Update python-app.yml
deng113jie d646ac2
Update python-app.yml
deng113jie b55775b
dataframe describe function
deng113jie 0d23098
Merge branch 'master' of https://github.com/deng113jie/ExeTera
deng113jie 7774c6f
sync with upstream
deng113jie ae1d621
Update python-app.yml
deng113jie 3d5738e
alternative get_timestamp notebook for discussion
deng113jie 4685c6b
update the notebook output of linux and mac
deng113jie dc38d28
update format
deng113jie 0df34bc
update the to_timestamp and to_timestamp function in utils
deng113jie 87353e3
add unittest for utils to_timestamp and to_datetimie
deng113jie 87abe47
fix for pr
deng113jie a3719ef
setup github action specific for windows for cython
deng113jie ed42f70
minor workflow fix
deng113jie 2157da2
add example pyx file
deng113jie 1abeaa7
fix package upload command on win; as the git action
deng113jie e77562e
add twine as tools
deng113jie 03208aa
add linux action file
deng113jie 35430f2
update the linux build command
deng113jie de3e7e5
build workflow for macos
deng113jie a8af750
minor update the macos workflow
deng113jie d41a24b
fixed timestamp issue on windows by add timezone info to datetime
deng113jie c98b87c
finanlize workflow file, compile react to publish action only
deng113jie a57c413
avoid the bytearray vs string error in windows by converting result to
deng113jie 764650b
fixing string vs bytesarray issue
deng113jie 4676901
update categorical field key property, change the key, value to bytes if
deng113jie e5d74c6
solved index must be np.int64 error
deng113jie 030d587
all unittest error on windoes removed
deng113jie 55e62eb
Merge branch 'master' into win_actions
deng113jie 7cf7bae
minor update on workflow file
deng113jie 521142e
minor update workflow file
deng113jie 9373fd2
minor fix: use pip install -r ; remove unused import in utils.py
deng113jie 6f67ac4
update action file
deng113jie 703a19a
remove change on test_presistence on uint32 to int32
deng113jie 613532a
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie b981bb9
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie e35c1c4
Merge branch 'KCL-BMEIS:master' into master
deng113jie a7ee946
Merge branch 'master' of https://github.com/deng113jie/ExeTera
deng113jie 0f319d1
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie a5ab148
add output argument for describe function in dataframe, so that the r…
deng113jie f50ab1f
comment all print function in unittest
deng113jie 94a074a
modify the remap function for categorical field and categorical mem f…
deng113jie ef563e9
Added check to ensure ExeTera entry point actually works after pip in…
ericspod e0caad0
Attempted Fix
ericspod ba06c9f
find_packages
ericspod 25a3cc3
Tweak
ericspod 2e03557
Merge commit 'refs/pull/253/head' of https://github.com/KCL-BMEIS/Exe…
deng113jie 5f83d7e
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie d932731
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie fff9a19
fixing issue 214
deng113jie ffbc4f0
fixing bug on dataset set item
deng113jie 7f82d58
fixing apply_span_src in fields.py
deng113jie 474425a
revert change on field
deng113jie 1c59265
add unittest for dataset setitem bug
deng113jie 5002c65
examples using dataset generated by randomdataset
deng113jie efcde7c
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 825aaf1
update on examples
deng113jie 47e2e7e
update example: added two csv files and one json files and one import…
deng113jie 071be03
minor update on readme
deng113jie c908397
remove output from notebooks
deng113jie 72cca74
minor update on example notebooks
deng113jie bd9a85f
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 62d0d69
update examples
deng113jie 40e8770
update the example notebooks
deng113jie 81bbbcd
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 3507595
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera
deng113jie 099c8f6
df view init commit
deng113jie 3d6966e
dataframe view init commit:
deng113jie 735b7a5
dataframe view updates 3:
deng113jie 85169e6
change filtered data presentation from data array to field __getitem__
deng113jie 9667dd7
minor update
deng113jie c333f6d
updated view functions,
deng113jie bb764bd
modify the association between view fields with field array, by assig…
deng113jie 7803d76
update the view:
deng113jie dfb36ab
fixed the data[:] for indexed string fields
deng113jie 5c93b43
Merge branch 'master' of https://github.com/KCL-BMEIS/ExeTera into df…
deng113jie fe9cee8
update Eric's comments
deng113jie c1ad9ba
minor update
deng113jie 80c0339
update unittests for dataframe view
deng113jie 6cb1d3e
minor update
deng113jie e8cf7f2
add persistence over view so that view
deng113jie 3153f2b
add unittest for view presistence
deng113jie 135260e
documents on future work
deng113jie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,27 +107,72 @@ def add(self, | |
nfield.data.write(field.data[:]) | ||
self._columns[dname] = nfield | ||
|
||
def add_view(self, field: fld.Field): | ||
def _add_view(self, field: fld.Field, filter: np.ndarray = None): | ||
""" | ||
Internal function called by apply_filter to add a field view into the dataframe. | ||
|
||
:param field: The field to apply filter to. | ||
:param filter: The filter to apply. | ||
:return: The field view. | ||
|
||
""" | ||
# add view | ||
if isinstance(field, fld.NumericField): | ||
view = fld.NumericField(field._session, field._field, self, write_enabled=True) | ||
view.data = field.data | ||
|
||
elif isinstance(field, fld.CategoricalField): | ||
view = fld.CategoricalField(field._session, field._field, self, write_enabled=True) | ||
elif isinstance(field, fld.TimestampField): | ||
view = fld.TimestampField(field._session, field._field, self, write_enabled=True) | ||
elif isinstance(field, fld.FixedStringField): | ||
view = fld.FixedStringField(field._session, field._field, self, write_enabled=True) | ||
elif isinstance(field, fld.IndexedStringField): | ||
view = fld.IndexedStringField(field._session, field._field, self, write_enabled=True) | ||
|
||
field.attach(view) | ||
self._columns[view.name] = view | ||
|
||
# add filter | ||
if filter is not None: | ||
nformat = 'int32' if filter[-1] < 2 ** 31 - 1 else 'int64' | ||
filter_name = view.name | ||
if filter_name not in self._filters_grp.keys(): | ||
fld.numeric_field_constructor(self._dataset.session, self._filters_grp, filter_name, nformat) | ||
filter_field = fld.NumericField(self._dataset.session, self._filters_grp[filter_name], self, | ||
write_enabled=True) | ||
filter_field.data.write(filter) | ||
else: | ||
filter_field = fld.NumericField(self._dataset.session, self._filters_grp[filter_name], self, | ||
write_enabled=True) | ||
if nformat not in filter_field._fieldtype: | ||
filter_field = filter_field.astype(nformat) | ||
filter_field.data.clear() | ||
filter_field.data.write(filter) | ||
|
||
view._filter_wrapper = filter_field.data | ||
|
||
return self._columns[view.name] | ||
|
||
# def add_reference(self, field: fld.Field): | ||
# def change_filter(self, field: fld.Field, filter: np.ndarray): | ||
# """ | ||
# | ||
# :param field: | ||
# :param filter: | ||
# :return: | ||
# """ | ||
# Add a field without coping the data over the HDF5 group. | ||
# :param field: field to be constructed in this dataframe. | ||
# pass | ||
# | ||
# def remove_filter(self, field: Union[str, fld.Field]): | ||
# """ | ||
# Remove filter from this dataframe specified by the field or field name. | ||
# """ | ||
# if isinstance(field, fld.NumericField): | ||
# fld.numeric_field_constructor(self._dataset.session, self, field.name, field._nformat) | ||
# nfield = fld.NumericField(self._dataset.session, field._field, self, write_enabled=True) | ||
# self._columns[field.name] = nfield | ||
# return self._columns[field.name] | ||
# if not isinstance(field, str) and not isinstance(field, fld.Field): | ||
# raise TypeError("The target field should be type field or string (name of the field in this dataframe).") | ||
# | ||
# name = field if isinstance(field, str) else field.name | ||
# if name not in self._columns: | ||
# raise ValueError("The target field is not in this dataframe.") | ||
# else: | ||
# del self._filters_grp[name] | ||
|
||
def drop(self, | ||
name: str): | ||
|
@@ -136,10 +181,9 @@ def drop(self, | |
|
||
:param name: name of field to be dropped | ||
""" | ||
if name in self._h5group.keys(): | ||
del self._columns[name] # should always be | ||
if name in self._h5group.keys(): # in case of reference only | ||
del self._h5group[name] | ||
if name in self._columns.keys(): | ||
del self._columns[name] | ||
|
||
def create_group(self, | ||
name: str): | ||
|
@@ -294,22 +338,6 @@ def contains_field(self, field): | |
return True | ||
return False | ||
|
||
def _write_filter(self, filter): | ||
""" | ||
|
||
""" | ||
nformat = 'int32' if filter[-1] < 2 ** 31 - 1 else 'int64' | ||
filter_name = '_filter' | ||
if filter_name not in self._filters_grp.keys(): | ||
fld.numeric_field_constructor(self._dataset.session, self._filters_grp, filter_name, nformat) | ||
filter_field = fld.NumericField(self._dataset.session, self._filters_grp[filter_name], self, write_enabled=True) | ||
filter_field.data.write(filter) | ||
else: | ||
filter_field = fld.NumericField(self._dataset.session, self._filters_grp[filter_name], self, write_enabled=True) | ||
if nformat not in filter_field._fieldtype: | ||
filter_field = filter_field.astype(nformat) | ||
filter_field.data.clear() | ||
filter_field.data.write(filter) | ||
|
||
def _get_filter_grp(self, field: Union[str, fld.Field]=None): | ||
""" | ||
|
@@ -318,68 +346,6 @@ def _get_filter_grp(self, field: Union[str, fld.Field]=None): | |
filter_name = '_filter' | ||
return self._filters_grp[filter_name] | ||
|
||
# def set_filter(self, field: Union[str, fld.Field], filter): | ||
# """ | ||
# Add or modify a filter of the field. | ||
# | ||
# :param field: The target field. | ||
# :param filter: The filter, as list or np.ndarray of indices. | ||
# """ | ||
# if not isinstance(field, str) and not isinstance(field, fld.Field): | ||
# raise TypeError("The target field should be type field or string (name of the field in this dataframe).") | ||
# | ||
# name = field if isinstance(field, str) else field.name | ||
# if name not in self._columns: | ||
# raise ValueError("The target field is not in this dataframe.") | ||
# | ||
# nformat = 'int32' if filter[-1] < 2 ** 31 - 1 else 'int64' | ||
# if name in self._filters_grp.keys(): | ||
# filter_field = fld.NumericField(self._dataset.session, self._filters_grp[name], self, | ||
# write_enabled=True) | ||
# if nformat not in filter_field._fieldtype: | ||
# filter_field = filter_field.astype(nformat) | ||
# filter_field.data.clear() | ||
# filter_field.data.write(filter) | ||
# else: | ||
# fld.numeric_field_constructor(self._dataset.session, self._filters_grp, name, nformat) | ||
# filter_field = fld.NumericField(self._dataset.session, self._filters_grp[name], self, | ||
# write_enabled=True) | ||
# filter_field.data.write(filter) | ||
# | ||
# self._columns[name].filter = self._filters_grp[name] | ||
# return filter_field | ||
|
||
def remove_filter(self, field: Union[str, fld.Field]): | ||
""" | ||
Remove filter from this dataframe specified by the field or field name. | ||
""" | ||
if not isinstance(field, str) and not isinstance(field, fld.Field): | ||
raise TypeError("The target field should be type field or string (name of the field in this dataframe).") | ||
|
||
name = field if isinstance(field, str) else field.name | ||
if name not in self._columns: | ||
raise ValueError("The target field is not in this dataframe.") | ||
else: | ||
del self._filters_grp[name] | ||
|
||
# def get_data(self, field: Union[str, fld.Field]): | ||
# """ | ||
# Get the data from a field. The data returned is masked by the filter. | ||
# | ||
# """ | ||
# if not isinstance(field, str) and not isinstance(field, fld.Field): | ||
# raise TypeError("The target field should be type field or string (name of the field in this dataframe).") | ||
# | ||
# name = field if isinstance(field, str) else field.name | ||
# if name not in self.columns.keys(): | ||
# raise ValueError("Can not found the field name from this dataframe.") | ||
# else: | ||
# if name in self.filters.keys(): | ||
# d_filter = self.filters[name].data[:] | ||
# return self.columns[name].data[d_filter] | ||
# else: | ||
# return self.columns[name].data[:] | ||
|
||
def __getitem__(self, name): | ||
""" | ||
Get a field stored by the field name. | ||
|
@@ -433,10 +399,10 @@ def __delitem__(self, name): | |
if not self.__contains__(name=name): | ||
raise ValueError("There is no field named '{}' in this dataframe".format(name)) | ||
else: | ||
if name in self._h5group.keys(): | ||
del self._columns[name] # should always be | ||
if name in self._h5group.keys(): # in case of reference only | ||
del self._h5group[name] | ||
if name in self._columns.keys(): | ||
del self._columns[name] | ||
|
||
|
||
def delete_field(self, field): | ||
""" | ||
|
@@ -596,25 +562,18 @@ def apply_filter(self, filter_to_apply, ddf=None): | |
:returns: a dataframe contains all the fields filterd, self if ddf is not set | ||
""" | ||
filter_to_apply_ = val.validate_filter(filter_to_apply) | ||
if ddf is not None: | ||
if ddf is not None and ddf is not self: | ||
if not isinstance(ddf, DataFrame): | ||
raise TypeError("The destination object must be an instance of DataFrame.") | ||
ddf._write_filter(np.where(filter_to_apply_ == True)[0]) | ||
filter_to_apply_ = filter_to_apply_.nonzero()[0] | ||
for name, field in self._columns.items(): | ||
# hard copy | ||
# newfld = field.create_like(ddf, name) | ||
# field.apply_filter(filter_to_apply_, target=newfld) | ||
# soft copy - view | ||
newfld = ddf.add_view(field) | ||
newfld.filter = ddf._get_filter_grp() | ||
|
||
ddf._add_view(field, filter_to_apply_) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. check if the same dataset |
||
return ddf | ||
else: | ||
for field in self._columns.values(): | ||
field.apply_filter(filter_to_apply_, in_place=True) | ||
return self | ||
|
||
|
||
def apply_index(self, index_to_apply, ddf=None): | ||
""" | ||
Apply an index to all fields in this dataframe, returns \ | ||
|
@@ -638,34 +597,20 @@ def apply_index(self, index_to_apply, ddf=None): | |
:param ddf: optional- the destination data frame | ||
:returns: a dataframe contains all the fields re-indexed, self if ddf is not set | ||
""" | ||
if ddf is not None: | ||
if ddf is not None and ddf is not self: | ||
if not isinstance(ddf, DataFrame): | ||
raise TypeError("The destination object must be an instance of DataFrame.") | ||
if ddf == self: | ||
val.validate_all_field_length_in_df(self) | ||
for field in self._columns.values(): | ||
for name, field in self._columns.items(): | ||
# newfld = field.create_like(ddf, name) | ||
# field.apply_index(index_to_apply, target=newfld) | ||
ddf._add_view(field, index_to_apply) | ||
return ddf | ||
else: | ||
val.validate_all_field_length_in_df(self) | ||
|
||
if ddf == self: | ||
field.apply_index(index_to_apply, in_place=True) | ||
else: | ||
newfld = field.create_like(ddf, field.name) | ||
field.apply_index(index_to_apply, target=newfld) | ||
else: # | ||
nformat = 'int32' if index_to_apply[-1] < 2 ** 31 - 1 else 'int64' | ||
for field in self._columns.values(): | ||
if field.name in self._filters_grp.keys(): | ||
flt_fld = fld.NumericField(self._dataset.session, self._filters_grp[field.name], self, | ||
write_enabled=True) | ||
if nformat not in flt_fld._fieldtype: | ||
flt_fld = flt_fld.astype(nformat) | ||
flt_fld.data.clear() | ||
flt_fld.data.write(index_to_apply) | ||
else: | ||
fld.numeric_field_constructor(self._dataset.session, self._filters_grp, field.name, nformat) | ||
flt_fld = fld.NumericField(self._dataset.session, self._filters_grp[field.name], self, | ||
write_enabled=True) | ||
flt_fld.data.write(index_to_apply) | ||
field.filter = flt_fld._field | ||
field.apply_index(index_to_apply, in_place=True) | ||
return self | ||
|
||
|
||
def sort_values(self, by: Union[str, List[str]], ddf: DataFrame = None, axis=0, ascending=True, kind='stable'): | ||
|
@@ -1123,7 +1068,7 @@ def describe(self, include=None, exclude=None, output='terminal'): | |
def view(self): | ||
dfv = self.dataset.create_dataframe(self.name + '_view') | ||
for f in self.columns.values(): | ||
dfv.add_view(f) | ||
dfv._add_view(f) | ||
return dfv | ||
|
||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ddf = self if ddf is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ddf not in (None, self)