The components of a netCDF file are described in section 2 of the NUG [NUG] . In this section we describe conventions associated with filenames and the basic components of a netCDF file. We also introduce new attributes for describing the contents of a file.
The netCDF data types char
, byte
, short
, int
, float
or real
, and double
are all acceptable. The char
type is not intended for numeric data. One byte numeric data should be stored using the byte
data type. All integer types are treated by the netCDF interface as signed. It is possible to treat the byte
type as unsigned by using the NUG convention of indicating the unsigned range using the valid_min
, valid_max
, or valid_range
attributes.
NetCDF does not support a character string type, so these must be represented as character arrays. In this document, a one dimensional array of character data is simply referred to as a "string". An n-dimensional array of strings must be implemented as a character array of dimension (n,max_string_length), with the last (most rapidly varying) dimension declared large enough to contain the longest string in the array. All the strings in a given array are therefore defined to be equal in length. For example, an array of strings containing the names of the months would be dimensioned (12,9) in order to accommodate "September", the month with the longest name.
Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores. Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows use of the hyphen character. The netCDF interface also allows leading underscores in names, but the NUG states that this is reserved for system use.
Case is significant in netCDF names, but it is recommended that names should not be distinguished purely by case, i.e., if case is disregarded, no two names should be the same. It is also recommended that names should be obviously meaningful, if possible, as this renders the file more effectively self-describing.
This convention does not standardize any variable or dimension names. Attribute names and their contents, where standardized, are given in English in this document and should appear in English in conforming netCDF files for the sake of portability. Languages other than English are permitted for variables, dimensions, and non-standardized attributes. The content of some standardized attributes are string values that are not standardized, and thus are not required to be in English. For example, a description of what a variable represents may be given in a non-English language using the long_name
attribute (see [long-name] ) whose contents are not standardized, but a description given by the standard_name
attribute (see [standard-name] ) must be taken from the standard name table which is in English.
A variable may have any number of dimensions, including zero, and the dimensions must all have different names. COARDS strongly recommends limiting the number of dimensions to four, but we wish to allow greater flexibility . The dimensions of the variable define the axes of the quantity it contains. Dimensions other than those of space and time may be included. Several examples can be found in this document. Under certain circumstances, one may need more than one dimension in a particular quantity. For instance, a variable containing a two-dimensional probability density function might correlate the temperature at two different vertical levels, and hence would have temperature on both axes.
If any or all of the dimensions of a variable have the interpretations of "date or time" (T
), "height or depth" (Z
), "latitude" (Y
), or "longitude" (X
) then we recommend, but do not require (see [coards-relationship] ), those dimensions to appear in the relative order T
, then Z
, then Y
, then X
in the CDL definition corresponding to the file. All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.
Dimensions may be of any size, including unity. When a single value of some coordinate applies to all the values in a variable, the recommended means of attaching this information to the variable is by use of a dimension of size unity with a one-element coordinate variable. It is also acceptable to use a scalar coordinate variable which eliminates the need for an associated size one dimension in the data variable. The advantage of using either a coordinate variable or an auxiliary coordinate variable is that all its attributes can be used to describe the single-valued quantity, including boundaries. For example, a variable containing data for temperature at 1.5 m above the ground has a single-valued coordinate supplying a height of 1.5 m, and a time-mean quantity has a single-valued time coordinate with an associated boundary variable to record the start and end of the averaging period.
This convention does not standardize variable names.
NetCDF variables that contain coordinate data are referred to as coordinate variables, auxiliary coordinate variables, scalar coordinate variables, or multidimensional coordinate variables.
The NUG conventions
(NUG Appendix A, Attribute Conventions)
provide the _FillValue
, missing_value
,
valid_min
, valid_max
, and valid_range
attributes to indicate
missing data. Missing data is allowed in data variables and auxiliary coordinate variables.
Generic applications should treat the data as missing where any auxiliary coordinate variables
have missing values; special-purpose applications might be able to make use of the data.
Missing data is not allowed in coordinate variables.
The NUG conventions for missing data changed significantly between version 2.3 and version 2.4. Since version 2.4 the NUG defines missing data as all values outside of the valid_range
, and specifies how the valid_range
should be defined from the _FillValue
(which has library specified default values) if it hasn’t been explicitly specified. If only one missing value is needed for a variable then we recommend that this value be specified using the _FillValue
attribute. Doing this guarantees that the missing value will be recognized by generic applications that follow either the before or after version 2.4 conventions.
The scalar attribute with the name _FillValue
and of the same type as its
variable is recognized by the netCDF library as the value used to pre-fill disk
space allocated to the variable. This value is considered to be a special value
that indicates undefined or missing data, and is returned when reading values
that were not written. The _FillValue
should be outside the range
specified by valid_range
(if used) for a variable. The netCDF library
defines a default fill value for each data type
(See the "Note on fill values" in NUG Appendix B, File Format Specifications).
The missing values of a variable with scale_factor
and/or
add_offset
attributes (see [packed-data]) are
interpreted relative to the variable’s external values (a.k.a. the
packed values, the raw values, the values stored in the netCDF file),
not the values that result after the scale and offset are applied.
Applications that process variables that have attributes to indicate
both a transformation (via a scale and/or offset) and missing values
should first check that a data value is valid, and then apply the
transformation. Note that values that are identified as missing should
not be transformed. Since the missing value is outside the valid range
it is possible that applying a transformation to it could result in an
invalid operation. For example, the default _FillValue
is very
close to the maximum representable value of IEEE single precision
floats, and multiplying it by 100 produces an "Infinity" (using single
precision arithmetic).
This convention defines a two-element vector attribute actual_range
for
variables containing numeric data. If the variable is packed using the
scale_factor
and add_offset
attributes (see [packed-data]), the
elements of the actual_range
should have the type intended for the
unpacked data. The elements of actual_range
must be exactly equal to the
minimum and the maximum data values which occur in the variable (when unpacked
if packing is used), and both must be within the valid_range
if
specified. If the data is all missing or invalid, the actual_range
attribute cannot be used.
This standard describes many attributes (some mandatory, others optional), but a file may also contain non-standard attributes. Such attributes do not represent a violation of this standard. Application programs should ignore attributes that they do not recognise or which are irrelevant for their purposes. Conventional attribute names should be used wherever applicable. Non-standard names should be as meaningful as possible. Before introducing an attribute, consideration should be given to whether the information would be better represented as a variable. In general, if a proposed attribute requires ancillary data to describe it, is multidimensional, requires any of the defined netCDF dimensions to index its values, or requires a significant amount of storage, a variable should be used instead. When this standard defines string attributes that may take various prescribed values, the possible values are generally given in lower case. However, applications programs should not be sensitive to case in these attributes. Several string attributes are defined by this standard to contain "blank-separated lists". Consecutive words in such a list are separated by one or more adjacent spaces. The list may begin and end with any number of spaces. See [attribute-appendix] for a list of attributes described by this standard.
We recommend that netCDF files that follow these conventions indicate
this by setting the NUG defined global attribute Conventions
to
the string value "CF-1.8
". The string is interpreted as a
directory name relative to a directory that is a repository of documents
describing sets of discipline-specific conventions. The conventions
directory name is currently interpreted relative to the directory
pub/netcdf/Conventions/
on the host machine
ftp.unidata.ucar.edu
. The web based versions of this document are
linked from the
netCDF
Conventions web page.
It is possible for a netCDF file to adhere to more than one set of conventions, even when there is no inheritance relationship among the conventions. In this case, the value of the Conventions attribute may be a single text string containing a list of the convention names separated by blank space (recommended) or commas (if a convention name contains blanks). This is the Unidata recommended syntax from NetCDF Users Guide, Appendix A. If the string contains any commas, it is assumed to be a comma-separated list.
When CF is listed with other conventions, this asserts the same full compliance with CF requirements and interpretations as if CF was the sole convention. It is the responsibility of the data-writer to ensure that all common metadata is used with consistent meaning between conventions.
The following attributes are intended to provide information about where the data came from and what has been done to it. This information is mainly for the benefit of human readers. The attribute values are all character strings. For readability in ncdump outputs it is recommended to embed newline characters into long strings to break them into lines. For backwards compatibility with COARDS none of these global attributes is required.
The NUG defines title
and history
to be global attributes. We wish to allow the newly defined attributes, i.e., institution
, source
, references
, and comment
, to be either global or assigned to individual variables. When an attribute appears both globally and as a variable attribute, the variable’s version has precedence.
title
-
A succinct description of what is in the dataset.
institution
-
Specifies where the original data was produced.
source
-
The method of production of the original data. If it was model-generated,
source
should name the model and its version, as specifically as could be useful. If it is observational,source
should characterize it (e.g., "surface observation
" or "radiosonde
"). history
-
Provides an audit trail for modifications to the original data. Well-behaved generic netCDF filters will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file. We recommend that each line begin with a timestamp indicating the date and time of day that the program was executed.
references
-
Published or web-based references that describe the data or methods used to produce it.
comment
-
Miscellaneous information about the data or methods used to produce it.
The global external_variables
attribute is a blank-separated list of the names of variables which are named by attributes in the file but which are not present in the file.
These variables are to be found in other files (called "external files") but CF does not provide conventions for identifying the files concerned.
The only attribute for which CF standardises the use of external variables is cell_measures
.
Groups provide a powerful mechanism to structure data hierarchically. This convention does not standardize group names. It may be of benefit to name groups in such a way that human readers can interpret them. However, files that conform to this standard shall not require software to interpret or decode information from group names. References to out-of-group variable and dimensions shall be found by applying the scoping rules outlined below.
The scoping mechanism is in keeping with the following principal:
"Dimensions are scoped such that they are visible to all child groups. For example, you can define a dimension in the root group, and use its dimension id when defining a variable in a sub-group."
Any variable or dimension can be referred to, as long as it can be found with one of the following search strategies:
-
Search by absolute path
-
Search by relative path
-
Search by proximity
These strategies are explained in detail in the following sections.
If any dimension of an out-of-group variable has the same name as a dimension of the referring variable, the two must be the same dimension (i.e. they must have the same netCDF dimension ID).
A variable or dimension specified with an absolute path (i.e., with a leading slash "/") is at the indicated location relative to the root group, as in a UNIX-style file convention.
For example, a coordinates
attribute of /g1/lat
refers to the lat
variable in group /g1
.
As in a UNIX-style file convention, a variable or dimension specified with a relative path (i.e., containing a slash but not with a leading slash, e.g. child/lat
) is at the location obtained by affixing the relative path to the absolute path of the referring attribute.
For example, a coordinates
attribute of g1/lat
refers to the lat
variable in subgroup g1
of the current (referring) group.
Upward path traversals from the current group are indicated with the UNIX convention.
For example, ../g1/lat
refers to the lat
variable in the sibling group g1
of the current (referring) group.
A variable or dimension specified with no path (for example, lat
) refers to the variable or dimension of that name, if there is one, in the referring group.
If not, the ancestors of the referring group are searched for it, starting from the direct ancestor and proceeding toward the root group, until it is found.
A special case exists for coordinate variables. Because coordinate variables must share dimensions with the variables that reference them, the ancestor search is executed only until the local apex group is reached. For coordinate variables that are not found in the referring group or its ancestors, a further strategy is provided, called lateral search. The lateral search proceeds downwards from the local apex group width-wise through each level of groups until the sought coordinate is found. The lateral search algorithm may only be used for NUG coordinate variables; it shall not be used for auxiliary coordinate variables.
Note
|
This use of the lateral search strategy to find them is discouraged. They are allowed mainly for backwards-compatibility with existing datasets, and may be deprecated in future versions of the standard. |
The following attributes are optional for non-root groups. They are allowed in order to provide additional provenance and description of the subsidiary data. They do not override attributes from parent groups.
-
title
-
history
If these attributes are present, they may be applied additively to the parent attributes of the same name. If a file containing groups is modified, the user or application need only update these attributes in the root group, rather than traversing all groups and updating all attributes that are found with the same name. In the case of conflicts, the root group attribute takes precedence over per-group instances of these attributes.
The following attributes may only be used in the root group and shall not be duplicated or overridden in child groups:
-
Conventions
-
external_variables
Furthermore, per-variable attributes must be attached to the variables to which they refer. They may not be attached to a group, even if all variables within that group use the same attribute and value.
If attributes are present within groups without being attached to a variable, these attributes apply to the group where they are defined, and to that group’s descendants, but not to ancestor or sibling groups. If a group attribute is defined in a parent group, and one of the child group redefines the same attribute, the definition within the child group applies for the child and all of its descendants.