-
Notifications
You must be signed in to change notification settings - Fork 45
Implement Level3 and Level4 subsetting logic #128
Comments
Thinking about L3 and L4 subsetting again.
|
As it turns out, we want to be retrieving the |
Hi @lewismc, can you please pull up some link that would help me retrieve OpeNDAP DDX response or maybe some API link? I would like to explore more. Is pydap one of the utilities to access the data? When I google it all I could find was some documentation links. Thanks. |
Hi @Omkar20895 yes one resides here. It's very simple XML. |
Hi @lewismc, I see from the attached xml data that the following are the list of variables in the data:
Correct me if I am getting something wrong. We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names. I will start working on it and write a prototype. Can you please give me a link of the api to call with the dataset name or id from the prototype function to get the response? Please let me know if you have any questions/concerns in the approach. I will also look for some documentation on l2, l3 and l4 subsetting on PO.DAAC forums, I need to read more on this, honestly I forgot a lot of stuff, please suggest any documentation that you would think would be helpful to me. Thanks. |
@Omkar20895 thanks for stepping up here.
The only issue just now is that Podaac.dataset_variables function is only available for a handful of datasets... this means that, by enlarge level 3 and 4 subsetting is unavailable using the Webservices API. We need to be more creative in the implementation! I think we need to do as follows Edit the function called 'dataset_variables` to do the followingExecute a granule_search (because we can only obtain a DDX for an OPeNDAP granule) e.g.
this will return an atom XML response which include the OPeNDAP URL as follows
From that we can substitute the trailing Add a new function called subset_L3_L4_granules()Essentially here we design the function as follows
This allows us to essentially execute a granule_search, extract the OPeNDAP URL and then to execute the OPeNDAP request with all of the parameters. The response can be saved to wherever Does this make sense? |
@lewismc yes, it makes sense to me. I have a question, the present example that you have mentioned above returns only one entry because it is subsetted using start time and end time. I tried removing start and end times, it returned multiple entries of the dataset and the set of variables in all the entries are common. For example, I used:
then replaced .html of each entry with .ddx and observed the set of variables for each entry. I see that the entries are basically time series datasets, measuring the same set of variables at different time instances. But still, Is there a case where different entries have different variables? |
In short no that is not a scenario I think is possible. The contents of the
granule data products is consistent. It’s only the actual sensor
observation values which change. Thanks for looking at this.
|
Hi @lewismc, I have one last question, please bear with me here. Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)? The presence of both Array and Grid tag was not common, in most of the cases the .ddx response has Array tag, but there are some datasets which have both Array and Grid tags, for example, the .ddx response for the dataset PODAAC-GHGMB-3CO02: here. If they are associated with different levels what are other possible tags(except Array or Grid)? Thanks. |
No. If you look at the following collapsed XML snippet you will see that the |
So really, it is the |
@lewismc I am almost done writing new code(rewriting the original dataset_variables function) instead of using the API provided by web services as it does not support all the datasets. But, this increases dependency since we are basically providing a workaround, for example, what if replacing .html with .ddx does not work in the future? Feel free to correct me if I am missing something. Please let me know your thoughts on this, in the meanwhile I will send a pull request for review. |
Hi,I like to help.So,going through comments and from my understanding the function 'dataset_variables' isn't working for all L3 and L4 datasets. The examples in the issue title as Updating dataset_variable to support L3 and L4 datasets #129 where Dataset id = PODAAC-SASSX-L3UCD deoesn't have OPENDAP URL links as per the the code line no 224
Do we want to handle this error or we want to find variables for this data using other methods? |
Hi @ShubhamShaswat did you see the proposed solution at the following PR #129 (comment) |
Right now there is no standardized, user friendly mechanism for subsetting level3 or level4 data from PO.DAAC. This is a major issue and it is an area for podaacpy to address.
In order to subset, typical parameters include a dataset name OR short name AND space AND time AND variable information.
Currently, we do have a function for retrieving variables for a dataset however this service is only available for a very small number of PO.DAAC datasets.
In order to address this, we would need to obtain the variables from an OPeNDAP DDX response for a given datasetId OR shortName.
We should implement a utility function for obtaining the variables for a given dataset and then we should provide another utility function which enables passing in relevant parameters to do subsetting operation.
The text was updated successfully, but these errors were encountered: