Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add a PacketCollection class to work with multiple packets #110

Closed
wants to merge 1 commit into from

Conversation

greglucas
Copy link
Collaborator

@greglucas greglucas commented Oct 18, 2024

This enables outputting to other Python package datasets and types easily by adding methods onto the collection. It also adds the ability to easily see a summary of what is in a packet file.

The idea here is to make a PacketCollection (PacketList?, it is a list subclass with extra methods added) class rather than individual functions. A PacketCollection can then be output to whatever else we want, so this collection is now sort of our standard storage mechanism. This is an alternative to adding functions to do this for users.

I'll add some questions/thoughts inline as comments. I'll add tests and fix linting later.

Quick example with the test datasets

from space_packet_parser.packets import PacketCollection

packets = PacketCollection.from_packet_file("tests/test_data/jpss/J01_G011_LZ_2021-04-09T00-00-00Z_V01.DAT1",
                                            "tests/test_data/jpss/jpss1_geolocation_xtce_v1.xml")
# packets = PacketCollection.from_packet_file("tests/test_data/idex/sciData_2023_052_14_45_05",
#                                             "tests/test_data/idex/idex_combined_science_definition.xml")
print(packets)
print(packets.to_numpy("PKT_APID"))
print(packets.to_xarray())

output:

<PacketCollection>: 7200 packets
Packets per apid (apid: npackets)
  11: 7200
[11 11 11 ... 11 11 11]
<xarray.Dataset> Size: 598kB
Dimensions:      (packet: 7200)
Coordinates:
  * packet       (packet) int64 58kB 0 1 2 3 4 5 ... 7195 7196 7197 7198 7199
Data variables: (12/27)
    VERSION      (packet) uint8 7kB 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    TYPE         (packet) uint8 7kB 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    SEC_HDR_FLG  (packet) uint8 7kB 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
    PKT_APID     (packet) uint16 14kB 11 11 11 11 11 11 11 ... 11 11 11 11 11 11
    SEQ_FLGS     (packet) uint8 7kB 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
    SRC_SEQ_CTR  (packet) uint16 14kB 2606 2607 2608 2609 ... 9803 9804 9805
    ...           ...
    ADAET2MS     (packet) uint32 29kB 86399930 930 1930 ... 7197930 7198930
    ADAET2US     (packet) uint16 14kB 941 945 940 940 940 ... 939 942 938 938
    ADCFAQ1      (packet) float32 29kB -0.2164 -0.2162 ... -0.04277 -0.0426
    ADCFAQ2      (packet) float32 29kB 0.7625 0.7622 0.7619 ... 0.3403 0.3399
    ADCFAQ3      (packet) float32 29kB 0.257 0.2571 0.2572 ... 0.3341 0.3341
    ADCFAQ4      (packet) float32 29kB 0.553 0.5534 0.5538 ... 0.8779 0.8781

closes #75

Checklist

  • Changes are fully implemented without dangling issues or TODO items
  • Deprecated/superseded code is removed or marked with deprecation warning
  • Current dependencies have been properly specified and old dependencies removed
  • New code/functionality has accompanying tests and any old tests have been updated to match any new assumptions
  • The changelog.md has been updated

This enables outputting to other Python datasets and types easily
by adding methods onto the collection. It also adds the ability
to easily see a summary of what is in a packet file.
@greglucas greglucas requested a review from medley56 as a code owner October 18, 2024 22:05
@@ -353,6 +353,63 @@ def _get_container_base_container(
restrictions = []
return self._find_container(base_container_element.attrib['containerRef']), restrictions

def _get_minimum_numpy_datatype(self, name: str, *, raw_value: bool = False) -> str | None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where this made the most sense... Should we actually add the _min_dtype to the encoding classes themselves? The issue I see is that I think it would have to be a method/property because the size_in_bits are dynamic and not known until read in.

packet_generator = packet_definition.packet_generator(binary_data)
return cls(packet_generator, packet_definition=packet_definition)

def to_numpy(self, variable, raw_value=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to output a recarray/multi-dimensional array based on variables in the packets similar to the xarray dataset (i.e. shape: (npackets, nvariables))? Right now, I made it just do one specific variable a user requests.

@greglucas
Copy link
Collaborator Author

I'm going to close this for now. After giving this some more thought I don't love adding a PacketCollection. I think having functions to do what we need and that can be chained together makes sense, i.e. we don't need this intermediate representation.

@greglucas greglucas closed this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Direct Output of Xarray DataSets
1 participant