Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong alignment for SAV (SPSS) files and enhnace to read roles #246

Open
lodonnel opened this issue Nov 15, 2023 · 2 comments
Open

Wrong alignment for SAV (SPSS) files and enhnace to read roles #246

lodonnel opened this issue Nov 15, 2023 · 2 comments
Labels
bug Something isn't working enhancement New feature or request requires changes in Readstat waiting for changes in the C library Readstat to be reported in Readstat

Comments

@lodonnel
Copy link

Task: To readin the file specification of an SPSS data file.
where FILE_PATH is the path to the file attached

df, data = pyreadstat.read_sav(FILE_PATH, metadataonly=False)

Description
3 Problems
Bug: 1. Attributes missing_ranges and missing_user_values remain at {} even though the file does have missing value entries in variable view in SPSS.
... from pycharm
missing_ranges = {dict:0} {}
missing_user_values = {dict:0} {}

Bug: 2. variable_alignment is a set of key/value pairs where the value is always 'unknown'
variable_alignment = {dict: 12}
C22326161.zip
{'ID': 'unknown', 'Responded': 'unknown', 'Previous': 'unknown', 'Controlpackage': 'unknown', 'Age': 'unknown', 'Income': 'unknown', 'Education': 'unknown', 'Reside': 'unknown', 'Gender': 'unknown', 'Married': 'unknown', 'Children': 'unknown', 'Region': 'unknown'}

Enhancement: 3. role is currently not included in the attributes of the data block.

Describe the issue
A clear and concise description of what the issue is.

To Reproduce
df, data = pyreadstat.read_sav(FILE_PATH, metadataonly=False)
the data block holds the attributes mentioned above

File example
Attached

Expected behavior

  1. Both missing_range and missing_user_values to be populated correctly
  2. Variable_alignment should be populated with { Left, Right or Center} instead of Unknown
  3. Role should be added as an attribute woith possible values {Input, Target, Both, None, Partition, Split}

Setup Information:
How did you install pyreadstat? (pip, conda, directly from repo)
using pip
Platform (windows, macOS, linux, 32 or 64 bit)
windows 10
Python Version
3.9
Python Distribution (System, plain python, Anaconda)
Using Virtualenv or condaenv?
venv

@ofajardo
Copy link
Collaborator

ofajardo commented Nov 23, 2023

Thanks for the reproducible issue.

Regarding missing_ranges, please use the argument user_missing=True to get them. Please take the time to read the Readme in the section about missing values where this is explained.
missing_user_values is only for SAS and STATA as described in the [module documentation(https://ofajardo.github.io/pyreadstat_documentation/_build/html/index.html) and therefore will always be empty for SPSS files.

@ofajardo
Copy link
Collaborator

For alignment it seems that the underlying C library Readstat is not reading them correctly. The same library does not currently define any function to extract role, so it cannot be obtained in pyreadstat.

If you can please report these issues in Readstat directly with your example. Otherwise I may do it at a later point in time. You have to be aware that issues in Readstat typically take very long to be solved, so I would say a solution to those two is not likely to appear in the near future.

@ofajardo ofajardo added bug Something isn't working enhancement New feature or request labels Nov 23, 2023
@ofajardo ofajardo changed the title Reading SPSS (.sav) file specification Wrong alignment for SAV (SPSS) files and enhnace to read roles Jan 10, 2024
@ofajardo ofajardo added requires changes in Readstat waiting for changes in the C library Readstat to be reported in Readstat labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request requires changes in Readstat waiting for changes in the C library Readstat to be reported in Readstat
Projects
None yet
Development

No branches or pull requests

2 participants