diff --git a/README.md b/README.md index fd08518..e709686 100644 --- a/README.md +++ b/README.md @@ -26,21 +26,22 @@ [ReadStatTables.jl](https://github.com/junyuan-chen/ReadStatTables.jl) is a Julia package for reading data files from Stata, SAS and SPSS into a [Tables.jl](https://github.com/JuliaData/Tables.jl)-compatible table. -It relies on [ReadStat.jl](https://github.com/queryverse/ReadStat.jl), -which is a Julia interface of the -[ReadStat](https://github.com/WizardMac/ReadStat) C library, +It utilizes the [ReadStat](https://github.com/WizardMac/ReadStat) C library +developed by [Evan Miller](https://www.evanmiller.org) for parsing the data files. -The same C library is also the backend -for popular packages such as [pyreadstat](https://github.com/Roche/pyreadstat) -and [haven](https://github.com/tidyverse/haven). +The same C library is also the backend of popular packages in other languages such as +[pyreadstat](https://github.com/Roche/pyreadstat) for Python +and [haven](https://github.com/tidyverse/haven) for R. +ReadStatTables.jl can be viewed as the Julia counterpart for similar purposes. ## Features -ReadStatTables.jl adds the following features on top of the read support -from [ReadStat.jl](https://github.com/queryverse/ReadStat.jl). +ReadStatTables.jl provides the following features in addition to +wrapping the C interface of ReadStat. -- A lightweight [Tables.jl](https://github.com/JuliaData/Tables.jl)-compatible column table `ReadStatTable` for basic operations. -- Integration of value labels via a customized array type `LabeledArray`. +- Efficient data collection from ReadStat parser to a [Tables.jl](https://github.com/JuliaData/Tables.jl)-compatible column table `ReadStatTable`. +- Interface of file-level and variable-level metadata compatible with [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl). +- Integration of value labels into data columns via a customized array type `LabeledArray`. - Translation of date and time values into Julia time types `Date` and `DateTime`. ## Quick Start @@ -66,7 +67,7 @@ To access a column from the above table: ```julia julia> tb.myord -5-element LabeledVector{Union{Missing, Int8}, LabeledValue{Union{Missing, Int8}}}: +5-element LabeledVector{Union{Missing, Int8}, Vector{Union{Missing, Int8}}, Union{Char, Int32}}: 1 => low 2 => medium 3 => high @@ -75,20 +76,31 @@ julia> tb.myord ``` Notice that for data variables with value labels, -both the original values and the labels are preserved. +both the original values and the value labels are preserved. -To access metadata including variable labels: +File-level and variable-level metadata can be retrieved and modified +via methods compatible with [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl): ```julia -julia> getmeta(tb) +julia> metadata(tb) ReadStatMeta: - variable labels: Dict(:myord => "ordinal", :mynum => "numeric", :mydate => "date", :mychar => "character", :dtime => "datetime", :mytime => "time", :mylabl => "labeled") - variable formats: Dict(:myord => "%16.0f", :mynum => "%16.2f", :mydate => "%td", :mychar => "%-1s", :dtime => "%tc", :mytime => "%tcHH:MM:SS", :mylabl => "%16.0f") - value label names: Dict(:myord => "myord", :mynum => "", :mydate => "", :mychar => "", :dtime => "", :mytime => "", :mylabl => "mylabl") - value labels: Dict{String, Dict{Any, String}}("myord" => Dict(2 => "medium", 3 => "high", 1 => "low"), "mylabl" => Dict(2 => "Female", 1 => "Male")) - file label: A test file - file timestamp: 2021-04-23T04:36:00 - file extension: .dta + row count => 5 + var count => 7 + modified time => 2021-04-23T04:36:00 + file format version => 118 + file label => A test file + file extension => .dta + +julia> colmetadata(tb, :mylabl) +ReadStatColMeta: + label => labeled + format => %16.0f + type => READSTAT_TYPE_INT8 + value label => mylabl + storage width => 1 + display width => 16 + measure => READSTAT_MEASURE_UNKNOWN + alignment => READSTAT_ALIGNMENT_RIGHT ``` For additional details, please see the [documentation][docs-stable-url].