Skip to content

Commit

Permalink
Improve LabeledArray and documentation (#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
junyuan-chen authored Nov 28, 2022
1 parent 26c9daf commit 151e102
Show file tree
Hide file tree
Showing 13 changed files with 517 additions and 217 deletions.
2 changes: 1 addition & 1 deletion docs/src/man/date-and-time-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The full lists of recognized date/time formats for the statistical software
are stored as dictionary keys;
while the associated values are tuples of reference date/time and period length.[^2]
If a variable is in a date/time format that is contained in the dictionary keys,
[`readstat`](@ref) will handle the conversion into a Julia time type
[`readstat`](@ref) will handle the conversion to a Julia time type
(unless the `convert_datetime` option prevents it).
Otherwise, if a date/time format is not found in the dictionary keys,
no type conversion will be attempted.
Expand Down
46 changes: 23 additions & 23 deletions docs/src/man/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,55 +95,55 @@ Variables with value labels are stored in [`LabeledArray`](@ref)s.
To convert a `LabeledArray` to another array type,
we may either obtain an array of [`LabeledValue`](@ref)s
or collect the values and labels separately.
If only the labels contain the relevant information,
we can make use of the `labels` function which returns an iterator for the labels.
For example, to convert a `LabeledArray` to a `CategoricalArray` from
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl):
The data values can be directly retrieved by calling [`refarray`](@ref):

```@repl getting-started
using CategoricalArrays
CategoricalArray(labels(tb.mylabl))
refarray(tb.mylabl)
```

Sometimes, the values have special meanings while the labels are not so important.
To access the array of values underlying a `LabeledArray` directly:
!!! note

```@repl getting-started
refarray(tb.mylabl)
```
The array returned by `refarray`
is exactly the same array underlying the `LabeledArray`.
Therefore, modifying the elements of the array
will also mutate the values in the associated `LabeledArray`.

Alternatively, convert a `LabeledArray` to an array with appropriate element type:
If only the value labels are needed,
we can obtain an iterator of the value labels via [`valuelabels`](@ref).
For example, to convert a `LabeledArray` to a `CategoricalArray` from
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl):

```@repl getting-started
convert(Vector{Int}, tb.mylabl)
using CategoricalArrays
CategoricalArray(valuelabels(tb.mylabl))
```

In the last example, the element type of the output array has become `Int`
while the labels are ignored.
It is also possible to only convert the type of the underlying data values:

!!! note
```@repl getting-started
convertvalue(Int32, tb.mylabl)
```

The array returned by `refarray` (and by `convert` if element type is not converted)
is exactly the same array underlying the `LabeledArray`.
Therefore, modifying the elements of the array
will also mutate the values in the associated `LabeledArray`.
```@docs
convertvalue
```

## More Options

The behavior of `readstat` can be adjusted by passing keyword arguments.
The behavior of `readstat` can be adjusted by passing keyword arguments:

```@docs
readstat
```

The accepted values for selecting certain variables (columns) are shown below:
The accepted types of values for selecting certain variables (data columns) are shown below:

```@docs
ReadStatTables.ColumnIndex
ReadStatTables.ColumnSelector
```

File-level metadata can be obtained without reading the entire data file.
File-level metadata can be obtained without reading the entire data file:

```@docs
readstatmeta
Expand Down
2 changes: 1 addition & 1 deletion docs/src/man/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ m["label"]
copy(m)
```

However, it can not be modified directly via `setindex!`:
However, it cannot be modified directly via `setindex!`:

```@repl meta
m["label"] = "A new label"
Expand Down
4 changes: 2 additions & 2 deletions docs/src/man/table-interface.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Table Interface

This page provides further details on the interface of `ReadStatTable`.
This page provides further details on the interface of [`ReadStatTable`](@ref).

```@docs
ReadStatTable
Expand Down Expand Up @@ -59,7 +59,7 @@ end
## Data Values

In addition to retrieving the data columns,
it is possible to directly retrieving and modifying individual data values
it is possible to directly retrieve and modify individual data values
via `getindex` and `setindex!`.

```@repl table
Expand Down
56 changes: 38 additions & 18 deletions docs/src/man/value-labels.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,67 @@
# Value Labels

Value labels from the data files are incorporated into the data columns
via a customized array type `LabeledArray`.
Value labels collected from the data files are incorporated into the associated data columns
via a customized array type [`LabeledArray`](@ref).

## LabeledValue and LabeledArray

`LabeledValue` and `LabeledArray` are designed to
imitate how variables associated with value labels
are represented in the original data files from the statistical software.
The former wraps a data array with a reference to the value labels;
while the latter wraps a single data value.
The element of a `LabeledArray` is always a `LabeledValue`.
However, a `LabeledValue` obtained from a `LabeledArray`
is only constructed when being retrieved via `getindex` for efficient storage.

In general, variables associated with value labels
should not be treated as categorical data.
Here are some noteworthy distinctions of `LabeledArray` from
an array type designed for categorical data (e.g., `CategoricalArray`):
Some noteworthy distinctions of a `LabeledArray` are highlighted below:

- Values are never recoded when a `LabeledArray` is constructed.[^1]
- It is allowed for some values in a `LabeledArray` to not have a label.[^2]
- Values are never re-encoded when a `LabeledArray` is constructed.[^1]
- It is allowed for some values in a `LabeledArray` to not have a value label.[^2]
- A label is always a `String` even when it is associated with `missing`.

In essence, a `LabeledArray` is simply an array of data values (typically numbers)
bundled with a dictionary of value labels.
There is no restriction imposed on the correspondence
between the data values and value labels.
Namely, a data value in a `LabeledArray` is not necessarily attached with a value label
from the associated dictionary;
while the key of a value label contained in the dictionary
may not match any array element.
Furthermore, the dictionary of value labels may be switched and shared
across different `LabeledArray`s.
When setting values in a `LabeledArray`,
the array of data values are modified directly
with no additional check on the associated dictionary of value labels.
For this reason, the functionality of a `LabeledArray`
is not equivalent to that of an array type designed for categorical data
(e.g., `CategoricalArray` from
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl)).
They are not complete substitutes for each other.

More details are below.

```@docs
LabeledValue
LabeledArray
LabeledVector
LabeledMatrix
```

## Accessing Labels and Values
## Accessing Values and Labels

If only the labels of a `LabeledArray` are needed,
an iterator that maintains the shape of the `LabeledArray`
can be obtained by calling `labels`.
The iterator can be used for either collecting all labels in a different array type
or retrieving labels for specific values.
On the other hand, if only the values are needed,
the labels can be ignored
if one directly accesses the underlying array that holds the values.
For `LabeledValue`, the underlying data value can be retrieved via [`unwrap`](@ref).
The value label can be obtained via [`valuelabel`](@ref) or conversion to `String`.
For `LabeledArray`, the underlying data values can be retrieved via [`refarray`](@ref).
An iterator of value labels that maintains the shape of the `LabeledArray`
can be obtained by calling [`valuelabels`](@ref).

```@docs
labels
unwrap
valuelabel
getvaluelabels
refarray
valuelabels
```

[^1]:
Expand Down
Loading

0 comments on commit 151e102

Please sign in to comment.