Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snapshot tests #560

Merged
merged 9 commits into from
Nov 4, 2024
Merged

Fix snapshot tests #560

merged 9 commits into from
Nov 4, 2024

Conversation

strengejacke
Copy link
Member

Fixes #559

@strengejacke

This comment was marked as outdated.

@mattansb

This comment was marked as outdated.

@strengejacke
Copy link
Member Author

strengejacke commented Nov 3, 2024

See easystats/insight#953

Doesn't work perfect if the last row in the data frame is a duplicate - then, there are two table lines at the bottom, this still needs to be fixed:

And: which default? Which name for the argument? (currently: remove_duplicates)

print(datawizard::data_codebook(ggeffects::efc[, 1:4]), remove_duplicates = FALSE)
#> ggeffects::efc[, 1:4] (908 rows and 4 variables, 4 shown)
#> 
#> ID | Name     | Label                                    | Type    |  Missings
#> ---+----------+------------------------------------------+---------+----------
#> 1  | c12hour  | average number of hours of care per week | numeric |  6 (0.7%)
#> ---+----------+------------------------------------------+---------+----------
#> 2  | e15relat | relationship to elder                    | numeric |  7 (0.8%)
#>    |          |                                          |         |          
#>    |          |                                          |         |          
#>    |          |                                          |         |          
#>    |          |                                          |         |          
#>    |          |                                          |         |          
#>    |          |                                          |         |          
#>    |          |                                          |         |          
#> ---+----------+------------------------------------------+---------+----------
#> 3  | e16sex   | elder's gender                           | numeric |  7 (0.8%)
#>    |          |                                          |         |          
#> ---+----------+------------------------------------------+---------+----------
#> 4  | e17age   | elder' age                               | numeric | 17 (1.9%)
#> ------------------------------------------------------------------------------
#> 
#> ID |    Values | Value Labels            |           N
#> ---+-----------+-------------------------+------------
#> 1  |  [4, 168] |                         |         902
#> ---+-----------+-------------------------+------------
#> 2  |         1 | spouse/partner          | 171 (19.0%)
#>    |         2 | child                   | 473 (52.5%)
#>    |         3 | sibling                 |  29 ( 3.2%)
#>    |         4 | daughter or son -in-law |  85 ( 9.4%)
#>    |         5 | ancle/aunt              |  23 ( 2.6%)
#>    |         6 | nephew/niece            |  22 ( 2.4%)
#>    |         7 | cousin                  |   6 ( 0.7%)
#>    |         8 | other, specify          |  92 (10.2%)
#> ---+-----------+-------------------------+------------
#> 3  |         1 | male                    | 296 (32.9%)
#>    |         2 | female                  | 605 (67.1%)
#> ---+-----------+-------------------------+------------
#> 4  | [65, 103] |                         |         891
#> ------------------------------------------------------
print(datawizard::data_codebook(ggeffects::efc[, 1:4]), remove_duplicates = TRUE)
#> ggeffects::efc[, 1:4] (908 rows and 4 variables, 4 shown)
#> 
#> ID | Name     | Label                                    | Type    |  Missings
#> ---+----------+------------------------------------------+---------+----------
#> 1  | c12hour  | average number of hours of care per week | numeric |  6 (0.7%)
#> ---+----------+------------------------------------------+---------+----------
#> 2  | e15relat | relationship to elder                    | numeric |  7 (0.8%)
#> ---+----------+------------------------------------------+---------+----------
#> 3  | e16sex   | elder's gender                           | numeric |  7 (0.8%)
#> ---+----------+------------------------------------------+---------+----------
#> 4  | e17age   | elder' age                               | numeric | 17 (1.9%)
#> ------------------------------------------------------------------------------
#> 
#> ID |    Values | Value Labels            |           N
#> ---+-----------+-------------------------+------------
#> 1  |  [4, 168] |                         |         902
#> ---+-----------+-------------------------+------------
#> 2  |         1 | spouse/partner          | 171 (19.0%)
#>    |         2 | child                   | 473 (52.5%)
#>    |         3 | sibling                 |  29 ( 3.2%)
#>    |         4 | daughter or son -in-law |  85 ( 9.4%)
#>    |         5 | ancle/aunt              |  23 ( 2.6%)
#>    |         6 | nephew/niece            |  22 ( 2.4%)
#>    |         7 | cousin                  |   6 ( 0.7%)
#>    |         8 | other, specify          |  92 (10.2%)
#> ---+-----------+-------------------------+------------
#> 3  |         1 | male                    | 296 (32.9%)
#>    |         2 | female                  | 605 (67.1%)
#> ---+-----------+-------------------------+------------
#> 4  | [65, 103] |                         |         891
#> ------------------------------------------------------

print(datawizard::data_codebook(ggeffects::efc[, 1:3]), remove_duplicates = FALSE)
#> ggeffects::efc[, 1:3] (908 rows and 3 variables, 3 shown)
#> 
#> ID | Name     | Label                                    | Type    | Missings
#> ---+----------+------------------------------------------+---------+---------
#> 1  | c12hour  | average number of hours of care per week | numeric | 6 (0.7%)
#> ---+----------+------------------------------------------+---------+---------
#> 2  | e15relat | relationship to elder                    | numeric | 7 (0.8%)
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#> ---+----------+------------------------------------------+---------+---------
#> 3  | e16sex   | elder's gender                           | numeric | 7 (0.8%)
#> ---+----------+------------------------------------------+---------+---------
#> -----------------------------------------------------------------------------
#> 
#> ID |   Values | Value Labels            |           N
#> ---+----------+-------------------------+------------
#> 1  | [4, 168] |                         |         902
#> ---+----------+-------------------------+------------
#> 2  |        1 | spouse/partner          | 171 (19.0%)
#>    |        2 | child                   | 473 (52.5%)
#>    |        3 | sibling                 |  29 ( 3.2%)
#>    |        4 | daughter or son -in-law |  85 ( 9.4%)
#>    |        5 | ancle/aunt              |  23 ( 2.6%)
#>    |        6 | nephew/niece            |  22 ( 2.4%)
#>    |        7 | cousin                  |   6 ( 0.7%)
#>    |        8 | other, specify          |  92 (10.2%)
#> ---+----------+-------------------------+------------
#> 3  |        1 | male                    | 296 (32.9%)
#>    |        2 | female                  | 605 (67.1%)
#> -----------------------------------------------------
print(datawizard::data_codebook(ggeffects::efc[, 1:3]), remove_duplicates = TRUE)
#> ggeffects::efc[, 1:3] (908 rows and 3 variables, 3 shown)
#> 
#> ID | Name     | Label                                    | Type    | Missings
#> ---+----------+------------------------------------------+---------+---------
#> 1  | c12hour  | average number of hours of care per week | numeric | 6 (0.7%)
#> ---+----------+------------------------------------------+---------+---------
#> 2  | e15relat | relationship to elder                    | numeric | 7 (0.8%)
#> ---+----------+------------------------------------------+---------+---------
#> 3  | e16sex   | elder's gender                           | numeric | 7 (0.8%)
#> ---+----------+------------------------------------------+---------+---------
#> -----------------------------------------------------------------------------
#> 
#> ID |   Values | Value Labels            |           N
#> ---+----------+-------------------------+------------
#> 1  | [4, 168] |                         |         902
#> ---+----------+-------------------------+------------
#> 2  |        1 | spouse/partner          | 171 (19.0%)
#>    |        2 | child                   | 473 (52.5%)
#>    |        3 | sibling                 |  29 ( 3.2%)
#>    |        4 | daughter or son -in-law |  85 ( 9.4%)
#>    |        5 | ancle/aunt              |  23 ( 2.6%)
#>    |        6 | nephew/niece            |  22 ( 2.4%)
#>    |        7 | cousin                  |   6 ( 0.7%)
#>    |        8 | other, specify          |  92 (10.2%)
#> ---+----------+-------------------------+------------
#> 3  |        1 | male                    | 296 (32.9%)
#>    |        2 | female                  | 605 (67.1%)
#> -----------------------------------------------------

Created on 2024-11-03 with reprex v2.1.1

@mattansb
Copy link
Member

mattansb commented Nov 3, 2024

Why is this only an issue for the last row? 🤔

@strengejacke
Copy link
Member Author

strengejacke commented Nov 3, 2024

I'm looking for consecutive duplicated lines, and these will be removed (from first duplicated to the one before the last - we want to keep one line that separates data rows).

In the 2nd example with three columns, we have following for the last data row (e16sex):

 ---+----------+------------------------------------------+---------+---------
 ---+----------+------------------------------------------+---------+---------
 -----------------------------------------------------------------------------

one of the two duplicated lines is removed, the other one is preserved as separator for the next data row - but there's no next data row, there's only the last line (which is not a duplicate, because it has no + signs).

@strengejacke
Copy link
Member Author

I think we have to fix this somewhere in the new code that removes the duplicates:

https://github.com/easystats/insight/pull/953/files#diff-cafb3e0523369ec1d779f9377d74656f6e10b4a9131008d3f87a4e2c8a5482e1R734-R777

@strengejacke
Copy link
Member Author

ok, should be fixed. Once the PR in insight is merged, we can run checks here again

Copy link

codecov bot commented Nov 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.29%. Comparing base (003e2b8) to head (f2b8670).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #560   +/-   ##
=======================================
  Coverage   91.29%   91.29%           
=======================================
  Files          76       76           
  Lines        5983     5983           
=======================================
  Hits         5462     5462           
  Misses        521      521           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@etiennebacher
Copy link
Member

I'm confused about those changes. Does the default print method for data_codebook() produces a different output? If so, I don't know whether this change is necessary but it should be mentioned in the news as a breaking change (and the required version of insight should be bumped?).

@mattansb
Copy link
Member

mattansb commented Nov 4, 2024

This is an easystats-wide change to make outputs more console-friendly.

The insight req versions should be bumped, yes.

@strengejacke
Copy link
Member Author

strengejacke commented Nov 4, 2024

Sorry, misunderstood

@strengejacke
Copy link
Member Author

strengejacke commented Nov 4, 2024

Basically, the table_width argument in insight::export_table() now defaults to "auto", and it can now split tables into as many parts as required to make the table-width fit into the console (previously, a maximum of two splits were possible).

The major issue is table_width = "auto" now. While previously wide tables were overflowing and printed in the next console line (and the table looked a bit messy), now tables are more readable. This doesn't work for this reprex of course, but the screenshots below show it quite well.

And these changes break certain snapshot tests. The other changes in this PR are just ensuring that ... is passed to insight::export_table() where necessary.

print(datawizard::data_codebook(ggeffects::efc[, 1:3]), table_width = Inf)
#> ggeffects::efc[, 1:3] (908 rows and 3 variables, 3 shown)
#> 
#> ID | Name     | Label                                    | Type    | Missings |   Values | Value Labels            |           N
#> ---+----------+------------------------------------------+---------+----------+----------+-------------------------+------------
#> 1  | c12hour  | average number of hours of care per week | numeric | 6 (0.7%) | [4, 168] |                         |         902
#> ---+----------+------------------------------------------+---------+----------+----------+-------------------------+------------
#> 2  | e15relat | relationship to elder                    | numeric | 7 (0.8%) |        1 | spouse/partner          | 171 (19.0%)
#>    |          |                                          |         |          |        2 | child                   | 473 (52.5%)
#>    |          |                                          |         |          |        3 | sibling                 |  29 ( 3.2%)
#>    |          |                                          |         |          |        4 | daughter or son -in-law |  85 ( 9.4%)
#>    |          |                                          |         |          |        5 | ancle/aunt              |  23 ( 2.6%)
#>    |          |                                          |         |          |        6 | nephew/niece            |  22 ( 2.4%)
#>    |          |                                          |         |          |        7 | cousin                  |   6 ( 0.7%)
#>    |          |                                          |         |          |        8 | other, specify          |  92 (10.2%)
#> ---+----------+------------------------------------------+---------+----------+----------+-------------------------+------------
#> 3  | e16sex   | elder's gender                           | numeric | 7 (0.8%) |        1 | male                    | 296 (32.9%)
#>    |          |                                          |         |          |        2 | female                  | 605 (67.1%)
#> --------------------------------------------------------------------------------------------------------------------------------

print(datawizard::data_codebook(ggeffects::efc[, 1:3]))
#> ggeffects::efc[, 1:3] (908 rows and 3 variables, 3 shown)
#> 
#> ID | Name     | Label                                    | Type    | Missings
#> ---+----------+------------------------------------------+---------+---------
#> 1  | c12hour  | average number of hours of care per week | numeric | 6 (0.7%)
#> ---+----------+------------------------------------------+---------+---------
#> 2  | e15relat | relationship to elder                    | numeric | 7 (0.8%)
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#>    |          |                                          |         |         
#> ---+----------+------------------------------------------+---------+---------
#> 3  | e16sex   | elder's gender                           | numeric | 7 (0.8%)
#>    |          |                                          |         |         
#> -----------------------------------------------------------------------------
#> 
#> ID |   Values | Value Labels            |           N
#> ---+----------+-------------------------+------------
#> 1  | [4, 168] |                         |         902
#> ---+----------+-------------------------+------------
#> 2  |        1 | spouse/partner          | 171 (19.0%)
#>    |        2 | child                   | 473 (52.5%)
#>    |        3 | sibling                 |  29 ( 3.2%)
#>    |        4 | daughter or son -in-law |  85 ( 9.4%)
#>    |        5 | ancle/aunt              |  23 ( 2.6%)
#>    |        6 | nephew/niece            |  22 ( 2.4%)
#>    |        7 | cousin                  |   6 ( 0.7%)
#>    |        8 | other, specify          |  92 (10.2%)
#> ---+----------+-------------------------+------------
#> 3  |        1 | male                    | 296 (32.9%)
#>    |        2 | female                  | 605 (67.1%)
#> -----------------------------------------------------

Created on 2024-11-04 with reprex v2.1.1

image

image

@etiennebacher
Copy link
Member

etiennebacher commented Nov 4, 2024

Ok, thanks for the explanation, this sounds good to me. I was wondering whether this would also change the output when the console is wide enough to support the full table without splitting, but apparently this is not the case. I don't believe anyone relies on the current overflowing behavior and the main change comes from insight, so I don't think we need to put this in news here.

@etiennebacher
Copy link
Member

@strengejacke I'm not sure the conversation with @mattansb is resolved so I'll let you merge if it is.

@strengejacke
Copy link
Member Author

Ok, thanks for the explanation, this sounds good to me. I was wondering whether this would also change the output when the console is wide enough to support the full table without splitting, but apparently this is not the case. I don't believe anyone relies on the current overflowing behavior and the main change comes from insight, so I don't think we need to put this in news here.

If the table width ("output") fits into the console (i.e. console is wide enough), no new behaviour occurs. It's literally only an improvement. And even in vignettes, e.g. here:
https://easystats.github.io/parameters/articles/model_parameters.html
the table is not split, because the markdown document sets the line width to 120:

knitr::opts_chunk$set(tidy.opts = list(width.cutoff = 120))

only tables wider than 120 chars would be split into several chunks.

@strengejacke strengejacke merged commit b723df2 into main Nov 4, 2024
22 checks passed
@strengejacke strengejacke deleted the strengejacke/issue559 branch November 4, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix snapshot tests
3 participants