Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feather files #390

Open
wants to merge 34 commits into
base: dev
Choose a base branch
from
Open

Conversation

AaronDJohnson
Copy link
Collaborator

@AaronDJohnson AaronDJohnson commented Aug 4, 2024

Adds the ability to store Pulsar objects and noise dictionaries into FeatherPulsar objects which can be saved and read using pyarrow feather types.

Additionally, this

  • Removes libstempo as a required dependency
  • Removes PINT as a required dependency
  • Removes to_pickle from BasePulsar: pickles should be removed and discouraged for many reasons
  • Adds to_feather to BasePulsar
  • Creates a new FeatherPulsar type which holds data and allows for reading/writing Pulsar objects
  • Removes to_pickle test and adds to_feather test

@AaronDJohnson
Copy link
Collaborator Author

An important design choice is left to be made: FeatherPulsar could potentially be absorbed into BasePulsar. This would require a minimal amount of work, but may make the feather objects a little bigger than they are currently due to storage of _isort and _iisort and other pieces which are not currently stored.

@AaronDJohnson
Copy link
Collaborator Author

Tests are failing because of an installation issue with fastshermanmorrison

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 66.19718% with 24 lines in your changes missing coverage. Please review.

Project coverage is 71.64%. Comparing base (ae7521e) to head (6fa7e04).
Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
enterprise/pulsar.py 66.19% 24 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##              dev     #390       +/-   ##
===========================================
- Coverage   85.36%   71.64%   -13.72%     
===========================================
  Files          13       13               
  Lines        3163     3231       +68     
===========================================
- Hits         2700     2315      -385     
- Misses        463      916      +453     
Files with missing lines Coverage Δ
enterprise/signals/white_signals.py 98.61% <ø> (ø)
enterprise/pulsar.py 28.62% <66.19%> (-65.17%) ⬇️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e84f13...6fa7e04. Read the comment docs.

@vhaasteren
Copy link
Member

Hi @AaronDJohnson, I finally looked into this. As you know, I fully agree with making tempo2/pint optional. That is really great.

Are there any negatives that can come from use of feather files? And do we really need to abandon pickles altogether? I would love to abandon pickles, but at the same time some workflows might rely on them.

@vhaasteren
Copy link
Member

Really great work by the way, this is a significant update

@AaronDJohnson
Copy link
Collaborator Author

Are there any negatives that can come from use of feather files? And do we really need to abandon pickles altogether? I would love to abandon pickles, but at the same time some workflows might rely on them.

If the feather files don't include a part that is necessary for the models, then we could encounter issues. I have made sure that the tests pass, so hopefully this is sufficient.

I have strong opinions about using and passing around pickles. Personally, my reasons against them are two-fold: 1. There are very serious security concerns with using pickles (especially in the way that they have been used in the past with people passing them around), and 2. If the pickle is made using a different version of the software, it may become unusable. On top of this, they should be unnecessary at this point and take up far more space than the feather files. We can leave them in, and I will do so if people want it, but I don't think it's a good idea to do so.

@vhaasteren
Copy link
Member

I fully agree with your sentiment.

My one concern is that pickle files have been passed around a lot, and people will want to keep using them. Will those pickles still load if we update Enterprise? The same modules are there, but will we get problems if we remove the to_pickle functions?

@vhaasteren
Copy link
Member

Hi @AaronDJohnson , this one also supersedes #340 , right? Shall we close that?

@AaronDJohnson
Copy link
Collaborator Author

AaronDJohnson commented Dec 9, 2024

@vhaasteren, yes it does supersede that one. We can close it as soon as this one is merged.

There are a couple of things to finish before merging this. At the moment, the feather files fail when DMX isn't present. An update has been made to Discovery to fix this issue, and when I have a moment, I will update this PR with that change.

After that change has been made, I'll check the pickle files too and make sure that legacy pickle files will load with the modifications.

@AaronDJohnson
Copy link
Collaborator Author

There were a few extra lines added to support the case where DMX or another parameter doesn't show up in the Pulsar object. A warning is displayed when this happens to notify the user.

@AaronDJohnson
Copy link
Collaborator Author

This is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants