Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct convertion to DimensionalArray #72

Closed
Datseris opened this issue Feb 3, 2020 · 4 comments
Closed

Direct convertion to DimensionalArray #72

Datseris opened this issue Feb 3, 2020 · 4 comments

Comments

@Datseris
Copy link
Contributor

Datseris commented Feb 3, 2020

(opening new issue to not further derail discussions in other issues)

So far we have talked about how it would be nice to have some kind of way to go from ncdataset into an in-memory data that are bundled to their dimensions, like e.g. AxisArray or DimensionalData. In the past couple of weeks I've spent a lot of time learning DimensionalData, which (at least to me) seems to be a nice package that covers most of my usage needs. I am happy with it and I've started contributing there regularly and soon it will also have more complete documentation.

I've also tried to learn GeoData.jl, but I've concluded that I have no reason to use it as it implements a lot of things that I don't need. As of now it is also unregistered and in beta state, but for me this is not the prime reason to not use it (see discussion below).

What I (personally of course) need is a simple way to go from NCDataset to a simple, low level Julia structure that binds the array with its dimensions. This structure is DimensionalArray. The code to do this is this, which at the moment only uses a dictionary from GeoData.jl is:

using DimensionalData: DimensionalArray, @dim, Time, X, Y, Z
@dim Lon "Longitude" "lon"
@dim Lat "Latitude" "lat"
@dim Vert "Vertical" "height"

const COMMONNAMES = Dict(
"lat" => Lat,
"latitude" => Lat,
"lon" => Lon,
"long" => Lon,
"longitude" => Lon,
"time" => Time,
"lev" => Vert,
"level" => Vert,
"vertical" => Vert,
"x" => X,
"y" => Y,
"z" => Z,
)

function DimensionalData.dims(ds::NCDataset)
    dnames = Tuple(keys(ds.dim)) # need tuple to construct A
    true_dims = getindex.(Ref(COMMONNAMES), dnames)
    dim_values = Array.(getindex.(Ref(ds), dnames))
    return dim_values .|> true_dims
end

function Base.getindex(ds::NCDataset, var::Symbol)
    svar = string(var)
    if svar  ds.dim
        DimensionalArray(Array(ds[svar]), dims(ds))
    else
        COMMONNAMES[svar](Array(ds[svar]))
    end
end

I am using this in my daily work so far.

I've really tried to understand the ncdatasets.jl file, and here is what I currently believe its purpose is: it is a wrapper over both the core GeoData structures, as well as NCDatasets.jl. For reasons I could not understand yet, it re-implements most functionality of NCDatasets, like e.g. loading .nc files, nc-stacks, multi-file files, etc., and wraps these as on-disk or in-memory files into GeoData specific structures like NCDstack, and also bundling metadata in there. The reason for doing this (and in general the reason for GeoData.jl) is to establish a standard way of representing geo data in Julia, based on DimensionalArrays.jl. Of course, the owner @rafaqz is welcome to shed more light, my point is not to do injustice or anything, I am just laying down the things as I've understood them so far. GeoData.jl has give-or-take around 2000 lines of code, so I can be certain that my "naive" snippet above is missing things.

The reason I am posting this code here is the following:

  1. (For me personally) the more packages I can avoid as dependencies, and the less amount of Julia code I need to get from A to B, the better.
  2. This code is small (thus faster to understand) and gives me the lowest level interface, which means I have to learn less things to use it.
  3. I believe that a conversion should exist that goes directly to the lowest level (in our scenario I believe this is DimensionalArray). I also believe that such a conversion should actually exist in the package that does the loading, but I guess many would not agree with this.

In any case, this simple code snippet might be useful to people that want to go from NCDatasets directly to DimensionalArray, and if you agree we could have this conversion here. In addition, once rafaqz/DimensionalData.jl#34 is resolved, one could take advantage of a lot (if not most) dispatch-related features to be implemented in GeoData.jl.


Here is an example of my snippet in action. I chose to overload ds[:var_name], but another way is to simply define DimensionalArray(ds::NCDataset, var_name::String).

ncdir = datadir("CERES", "CERES_EBAF_TOA.nc")
TOA = NCDataset(ncdir)
sw = TOA[:toa_sw_all_mon]
lw = TOA[:toa_lw_all_mon]
lon = TOA[:lon]
@Alexander-Barth
Copy link
Owner

What I (personally of course) need is a simple way to go from NCDataset to a simple, low level Julia structure that binds the array with its dimensions.

I understood you point that you want to avoid unnecessary dependencies, but is that not one of the the proposes of GeoData.jl ? (I am genuily asking, because I do not use GeoData.jl myself)

I am happy for you that you found your way to integrate with DimensionalData as you shown in your snipped. But integrating a code looking to all these spelling variants is too brittle for my taste.

@rafaqz
Copy link
Contributor

rafaqz commented Feb 7, 2020

But integrating a code looking to all these spelling variants is too brittle for my taste.

There is a fallback Dim{:name} dime for other names. In practise they hardly ever come up.

@Datseris
Copy link
Contributor Author

I understood you point that you want to avoid unnecessary dependencies, but is that not one of the the proposes of GeoData.jl ? (I am genuily asking, because I do not use GeoData.jl myself)

I don't understand this statement. How does "using one more package" avoids unnecessary dependencies...?

But integrating a code looking to all these spelling variants is too brittle for my taste.

One can always use the fallback Rafaqz mentioned, which means there is nothing brittle happening here. It is as simple as replacing the line true_dims = getindex.(Ref(COMMONNAMES), dnames) with a function that checks if COMMONNAMES has the key, and if not it calls Dim{Symbol(key)}.

I do want to mention that this name matching and spelling variant is exactly the kind of process that has to happen if one wants to have #54 resolved. If you have a look at the discussion and arguments in PainterQubits/Unitful.jl#298 , it is clear that different people and different communities have wildly different idea of how a string representing a unit should look like.

At the moment your comment about spelling variants applies to units as well. I hope that by pointing this out you can relax your requirements about brittleness of such processes.

@Alexander-Barth
Copy link
Owner

Alexander-Barth commented Feb 10, 2020

I don't understand this statement. How does "using one more package" avoids unnecessary dependencies...?

Clearly it does not avoid a dependency, but if you need this feature in GeoData.jl, then maybe GeoData.jl is not an unnecessary dependency for you after all.

At the moment your comment about spelling variants applies to units as well.

Units should be understood by UDUNITS with some exceptions specified in the CF standard. Does the standard mandates something about the dimension names? I don't think so, but if I am wrong, I would be happy to learn this.

Note that sometimes the grids are rotated relative to the lon/lat directions. There is no single dimension representing longitude or latitude.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants