Releases: easystats/datawizard
datawizard 0.7.1
BREAKING CHANGES
add_labs()
was renamed intoassign_labels()
. Sinceadd_labs()
existed
only for a few days, there will be no alias for backwards compatibility.
NEW FUNCTIONS
labels_to_levels()
, to use value labels of factors as their levels.
MINOR CHANGES
data_read()
now checks if the imported object actually is a data frame (or
coercible to a data frame), and if not, no longer errors, but gives an
informative warning of the type of object that was imported.
BUG FIXES
- Fix test for CRAN check on Mac OS arm64
datawizard 0.7.0
BREAKING CHANGES
-
In selection patterns, expressions like
-var1:var3
to exclude all variables
betweenvar1
andvar3
are no longer accepted. The correct expression is
-(var1:var3)
. This is for 2 reasons:- to be consistent with the behavior for numerics (
-1:2
is not accepted but
-(1:2)
is); - to be consistent with
dplyr::select()
, which throws a warning and only
uses the first variable in the first expression.
- to be consistent with the behavior for numerics (
NEW FUNCTIONS
-
recode_into()
, similar todplyr::case_when()
, to recode values from one
or more variables into a new variable. -
mean_sd()
andmedian_mad()
for summarizing vectors to their mean (or
median) and a range of one SD (or MAD) above and below. -
data_write()
as counterpart todata_read()
, to write data frames into
CSV, SPSS, SAS, Stata files and many other file types. One advantage over
existing functions to write data in other packages is that labelled (numeric)
data can be converted into factors (with values labels used as factor levels)
even for text formats like CSV and similar. This allows exporting "labelled"
data into those file formats, too. -
add_labs()
, to manually add value and variable labels as attributes to
variables. These attributes are stored as"label"
and"labels"
attributes,
similar to thelabelled
class from the haven package.
MINOR CHANGES
data_rename()
gets averbose
argument.winsorize()
now errors if the threshold is incorrect (previously, it provided
a warning and returned the unchanged data). The argumentverbose
is now
useless but is kept for backward compatibility. The documentation now contains
details about the valid values forthreshold
(#357).- In all functions that have arguments
select
and/orexclude
, there is now
one warning per misspelled variable. The previous behavior was to have only one
warning. - Fixed inconsistent behaviour in
standardize()
when only one of the arguments
center
orscale
were provided (#365). unstandardize()
andreplace_nan_inf()
now work with select helpers (#376).- Added informative warning and error messages to
reverse()
. Furthermore, the
docs now describe therange
argument more clearly (#380). unnormalize()
errors with unexpected inputs (#383).
BUG FIXES
empty_columns()
(and thereforeremove_empty_columns()
) now correctly detects
columns containing onlyNA_character_
(#349).- Select helpers now work in custom functions when argument is called
select
(#356). - Fix unexpected warning in
convert_na_to()
whenselect
is a list (#352). - Fixed issue with correct labelling of numeric variables with more than nine
unique values and associated value labels.
datawizard 0.6.5
MAJOR CHANGES
- Etienne Bacher is the new maintainer.
MINOR CHANGES
-
standardize()
,center()
,normalize()
andrescale()
can be used in
model formulas, similar tobase::scale()
. -
data_codebook()
now includes the proportion for each category/value, in
addition to the counts. Furthermore, if data contains taggedNA
values,
these are included in the frequency table.
BUG FIXES
-
center(x)
now works correctly whenx
is a single value and either
reference
orcenter
is specified (#324). -
Fixed issue in
data_codebook()
, which failed for labelled vectors when
values of labels were not in sorted order.
datawizard 0.6.4
NEW FUNCTIONS
-
data_codebook()
: to generate codebooks of data frames. -
New functions to deal with duplicates:
data_duplicated()
(keep all duplicates,
including the first occurrence) anddata_unique()
(returns the data, excluding
all duplicates except one instance of each, based on the selected method).
MINOR CHANGES
-
.data.frame
methods should now preserve custom attributes. -
The
include_bounds
argument innormalize()
can now also be a numeric
value, defining the limit to the upper and lower bound (i.e. the distance
to 1 and 0). -
data_filter()
now works with grouped data.
BUG FIXES
-
data_read()
no longer prints message for empty columns when the data
actually had no empty columns. -
data_to_wide()
now drops columns that are not inid_cols
(if specified),
names_from
, orvalues_from
. This is the behaviour observed intidyr::pivot_wider()
.
datawizard 0.6.3
MAJOR CHANGES
-
There is a new publication about the
{datawizard}
package:
https://joss.theoj.org/papers/10.21105/joss.04684 -
Fixes failing tests due to changes in
R-devel
. -
data_to_long()
anddata_to_wide()
have had significant performance
improvements, sometimes as high as a ten-fold speedup.
MINOR CHANGES
-
When column names are misspelled, most functions now suggest which existing
columns possibly could be meant. -
Miscellaneous performance gains.
-
convert_to_na()
now requires argumentna
to be of class 'Date' to convert
specific dates toNA
. For example,convert_to_na(x, na = "2022-10-17")
must be changed toconvert_to_na(x, na = as.Date("2022-10-17"))
.
BUG FIXES
data_to_long()
anddata_to_wide()
now correctly keep thedate
format.
datawizard 0.6.2
BREAKING CHANGES
-
Methods for grouped data frames (
.grouped_df
) no longer support
dplyr::group_by()
for{dplyr}
before version0.8.0
. -
empty_columns()
andremove_empty_columns()
now also remove columns that
contain only empty characters. Likewise,empty_rows()
and
remove_empty_rows()
remove observations that completely have missing or
empty character values.
CHANGES
-
data_arrange()
now works with data frames that were grouped using
data_group()
(#274). -
data_read()
gains aconvert_factors
argument, to turn off automatic
conversion from numeric variables into factors.
datawizard 0.6.1
- Updates tests for upcoming changes in the
{tidyselect}
package (#267).
datawizard 0.6.0
BREAKING CHANGES
-
The minimum needed R version has been bumped to
3.6
. -
Following deprecated functions have been removed:
data_cut()
,data_recode()
,data_shift()
,data_reverse()
,data_rescale()
,
data_to_factor()
,data_to_numeric()
-
New
text_format()
alias is introduced forformat_text()
, latter of which
will be removed in the next release. -
New
recode_values()
alias is introduced forchange_code()
, latter of which
will be removed in the next release. -
data_merge()
now errors if columns specified inby
are not in both datasets. -
Using negative values in arguments
select
andexclude
now removes the columns
from the selection/exclusion. The previous behavior was to start the
selection/exclusion from the end of the dataset, which was inconsistent with
the use of "-" with other selecting possibilities.
NEW FUNCTIONS
-
data_peek()
: to peek at values and type of variables in a data frame. -
coef_var()
: to compute the coefficient of variation.
CHANGES
-
data_filter()
will give more informative messages on malformed syntax of
thefilter
argument. -
It is now possible to use curly brackets to pass variable names to
data_filter()
,
like the following example. See examples section in the documentation of
data_filter()
. -
The
regex
argument was added to functions that use select-helpers and did
not already have this argument. -
Select helpers
starts_with()
,ends_with()
, andcontains()
now accept
several patterns, e.gstarts_with("Sep", "Petal")
. -
Arguments
select
andexclude
that are present in most functions have been
improved to work in loops and in custom functions. For example, the following
code now works:
foo <- function(data) {
i <- "Sep"
find_columns(data, select = starts_with(i))
}
foo(iris)
for (i in c("Sepal", "Sp")) {
head(iris) |>
find_columns(select = starts_with(i)) |>
print()
}
- There is now a vignette summarizing the various ways to select or exclude
variables in most{datawizard}
functions.
datawizard 0.5.1
- Fixes tests for
{poorman}
update
datawizard 0.5.0
MAJOR CHANGES
-
Following statistical transformation functions have been renamed to not have
data_*()
prefix, since they do not work exclusively with data frames, but
are typically first of all used with vectors, and therefore had misleading
names:data_cut()
->categorize()
data_recode()
->change_code()
data_shift()
->slide()
data_reverse()
->reverse()
data_rescale()
->rescale()
data_to_factor()
->to_factor()
data_to_numeric()
->to_numeric()
Note that these functions also have
.data.frame()
methods and still work
for data frames as well. Former function names are still available as aliases,
but will be deprecated and removed in a future release. -
Bumps the needed minimum R version to
3.5
. -
Removed deprecated function
data_findcols()
. Please use its replacement,
data_find()
. -
Removed alias
extract()
fordata_extract()
function since it collided with
tidyr::extract()
. -
Argument
training_proportion
indata_partition()
is deprecated. Please use
proportion
now. -
Given his continued and significant contributions to the package, Etienne
Bacher (@etiennebacher) is now included as an author. -
unstandardise()
now works forcenter(x)
-
unnormalize()
now works forchange_scale(x)
-
reshape_wider()
now follows more consistentlytidyr::pivot_wider()
syntax.
Argumentscolnames_from
,sep
, androws_from
are deprecated and should be
replaced bynames_from
,names_sep
, andid_cols
respectively.
reshape_wider()
also gains an argumentnames_glue
(#182, #198). -
Similarly,
reshape_longer()
now follows more consistently
tidyr::pivot_longer()
syntax. Argumentcolnames_to
is deprecated and
should be replaced bynames_to
.reshape_longer()
also gains new arguments:
names_prefix
,names_sep
,names_pattern
, andvalues_drop_na
(#189).
CHANGES
-
Some of the text formatting helpers (like
text_concatenate()
) gain an
enclose
argument, to wrap text elements with surrounding characters. -
winsorize
now accepts "raw" and "zscore" methods (in addition to
"percentile"). Additionally, whenrobust
is set toTRUE
together with
method = "zscore"
, winsorizes via the median and median absolute deviation
(MAD); else via the mean and standard deviation. (@rempsyc, #177, #49, #47). -
data_partition()
now allows to create multiple partitions from the data,
returning multiple training and a remaining test set. -
Functions like
center()
,normalize()
orstandardize()
no longer fail
when data contains infinite values (Inf
).
NEW FUNCTIONS
row_to_colnames()
andcolnames_to_row()
to move a row to column names, and
column names to row (@etiennebacher, #169).
BUG FIXES
- Fixed wrong column names in
data_to_wide()
(#173).