Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexing pyaerocom collocated data #151

Open
jshaw35 opened this issue Nov 1, 2019 · 3 comments
Open

indexing pyaerocom collocated data #151

jshaw35 opened this issue Nov 1, 2019 · 3 comments

Comments

@jshaw35
Copy link
Contributor

jshaw35 commented Nov 1, 2019

@jgliss
I'm having some trouble selecting data from pyaerocom collocated data objects. I want to use sns.regplot to create a scatter plot comparison between EBASMC and GCM data like the function .plot_scatter, but I can't select the data in the way I am used to with xarray.

This is what my array object looks like:
<xarray.DataArray 'concoa' (data_source: 3, time: 180, station_name: 10)>
array([[[0.036875, 0.051544, ..., 0.051615, 0.954757],
[0.103363, 0.1591 , ..., 0.054503, 1.945504],
...,
[0.034183, 0.05595 , ..., 0.05753 , 1.224979],
[0.048925, 0.055559, ..., 0.069318, 1.282269]],

   [[     nan, 0.33852 , ..., 0.57064 ,      nan],
    [     nan, 0.27265 , ...,      nan,      nan],
    ...,
    [     nan,      nan, ...,      nan,      nan],
    [     nan,      nan, ...,      nan,      nan]],

   [[0.047413, 0.06474 , ..., 0.191954, 1.217985],
    [0.069326, 0.0777  , ..., 0.195007, 1.07033 ],
    ...,
    [0.069153, 0.074234, ..., 0.181238, 1.02389 ],
    [0.053241, 0.064977, ..., 0.266541, 0.578958]]])

Coordinates:

  • data_source (data_source) object 'CESM2-WACCM' 'EBASMC' 'NorESM2-LM'
    var_name (data_source) object ...
    var_units (data_source) object ...
    ts_type_src (data_source) object ...
  • time (time) datetime64[ns] 2000-01-01 2000-02-01 ... 2014-12-01
  • station_name (station_name) object 'Ambler' ... 'Virolahti II'
    latitude (station_name) float64 ...
    longitude (station_name) float64 ...
    altitude (station_name) float64 ...
    Attributes:
    data_source: ['EBASMC', 'CESM2-WACCM']
    var_name: ['concoa', 'concoa']
    ts_type: monthly
    filter_name: WORLD-wMOUNTAINS
    ts_type_src: ['weekly;hourly;hourly; weekly', 'monthly']
    start_str: 20000101
    stop_str: 20141231
    var_units: ['ug m-3', 'ug m-3']
    vert_scheme: None
    data_level: 3
    revision_ref: 20191001
    from_files: ['concoa_CESM2-WACCM_historical__1980_2014.nc', 'conc...
    from_files_ref: None
    stations_ignored: None
    colocate_time: False
    apply_constraints: True
    outliers_removed: True
    region: WORLD
    lon_range: [-180 180]
    lat_range: [-90 90]
    alt_range: None
@JohanRaniseth
Copy link
Contributor

JohanRaniseth commented Nov 1, 2019

@jshaw35
Not sure what form you want the data on but:
object.data[1] gives you an xarray of only data_source 1 (EBASMC)
object.data[1].values would give you the values of data_source 1 (EBASMC) in the form of a normal array.

If you want it on the form of a pandas dataframe you could try:
EBASMC = object.data[1]
df = pd.DataFrame(EBASMC.data,columns=EBASMC.station_name,index = EBASMC.time.values)

@jgliss
Copy link
Contributor

jgliss commented Nov 1, 2019

@jshaw35: Have you created the ColocatedData object via one of the pyaerocom colocation routines (from the output above it looks like it)?
If that is true, then @JohanRaniseth is right. The current definition of the ColocatedData object is that the first dimension of the data array specifies the data source, where the first index (0) of that data_source dimension corresponds to observation data and the second (index 1) to the model data , i.e. to get obs and model as 2 individual instances of xarray.DataArray you can do:

obs_arr = coldata.data[0]
model_arr = coldata.data[1]

Both objects share the same dimensions, which are station_name and time. Based on these 2 arrays it should be easy to do a scatter plot or other analyses.

Cheers

@jgliss
Copy link
Contributor

jgliss commented Nov 1, 2019

Just checked again your output: how did you manage to get 3 entries in data_source dimension? The ColocatedData object should technically only contain 2 entries (in the data_source dimension), one for obs, and one for model (if it was created through pyaerocom colocation routines), as explained in previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants