Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDF CSV importer breaks when keys are missing #1293

Open
1 task done
marc-vdm opened this issue May 29, 2024 · 7 comments
Open
1 task done

SDF CSV importer breaks when keys are missing #1293

marc-vdm opened this issue May 29, 2024 · 7 comments
Labels
bug Issues/PRs related to bugs

Comments

@marc-vdm
Copy link
Member

marc-vdm commented May 29, 2024

Updating AB

  • Yes, I have updated AB and still experience this issue

What happened?

When loading a CSV, the 'from key' and to key columns are read as literal_eval because they should contain a tuple with the key. However, when no keys are given, the cell is interpreted as empty string. This breaks ast.literal_eval() with a SyntaxError as it only supports valid python expressions.

Problem code:

df = pd.read_csv(path, compression='infer', sep=separator, index_col=False,
converters={'from key': ast.literal_eval, 'to key': ast.literal_eval})

The Excel reader reads the file and afterwards properly deals with the literal_eval, we should do something similar for the CSV reader:

# Convert specific columns that may have tuples as strings
columns = ["from categories", "from key", "to categories", "to key"]
data.loc[:, columns] = data[columns].map(convert_tuple_str)

def convert_tuple_str(x):
try:
return literal_eval(x)
except (ValueError, SyntaxError) as e:
return x

However, not that when I just remove the converter and add the 'excel implementation', I get ~75% of my flows back unlinked, this doesn't make sense, as the CSV with missing keys saved as excel loads normally. This requires some more investigation.

Relevant errors

Traceback (most recent call last):
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\activity_browser\bwutils\superstructure\utils.py", line 70, in wrapper
    result = func(*args)
             ^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\activity_browser\layouts\tabs\LCA_setup.py", line 605, in load_action
    df = ABCSVImporter.read_file(path, separator=separator)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\activity_browser\bwutils\superstructure\file_imports.py", line 170, in read_file
    df = pd.read_csv(path, compression='infer', sep=separator, index_col=False,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\readers.py", line 617, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\readers.py", line 1748, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 920, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 1044, in pandas._libs.parsers.TextReader._convert_column_data
  File "parsers.pyx", line 2113, in pandas._libs.parsers._apply_converter
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\ast.py", line 64, in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 0

SyntaxError: invalid syntax

Operating system

Windows 11

Conda environment

No response

@marc-vdm marc-vdm added the bug Issues/PRs related to bugs label May 29, 2024
@romainsacchi
Copy link

@marc-vdm I got the same issue (AB 2.10.1) but without missing values for from key and to key:


17:00:12 | INFO | Loading Scenario file. This may take a while for large files
17:00:14 | ERROR | SyntaxError: invalid syntax (<unknown>, line 0)
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/activity_browser/bwutils/superstructure/utils.py", line 81, in wrapper
    result = func(*args)
             ^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/activity_browser/layouts/tabs/LCA_setup.py", line 603, in load_action
    df = ABCSVImporter.read_file(path, separator=separator)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/activity_browser/bwutils/superstructure/file_imports.py", line 223, in read_file
    df = pd.read_csv(
         ^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 1045, in pandas._libs.parsers.TextReader._convert_column_data
  File "parsers.pyx", line 2116, in pandas._libs.parsers._apply_converter
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/ast.py", line 64, in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 0
    
SyntaxError: invalid syntax

As said above, my SDF does not miss any value for the from key and to key fields; see the first few lines:

from activity name,from reference product,from location,from categories,from database,from key,to activity name,to reference product,to location,to categories,to database,to key,flow type,remind - SSP2-PkBudg1150 - 2035,remind - SSP2-PkBudg1150 - 2045
"market for heat, district or industrial, natural gas","heat, district or industrial, natural gas",RoW,,ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150,"('ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150', 'bd1f664bf4604b04864aa0bb912da7d7')","1,1-difluoroethane production","1,1-difluoroethane",US,,ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150,"('ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150', 'b226476e310ced5b991672eec4dfeac7')",technosphere,0.0,0.0

@romainsacchi
Copy link

Could it be that the way the string representation of the tuples fail to be interpreted?

@AlesPaulu
Copy link

Hi, I have the same issue as Romain, as I cannot load a scenario database, getting the same syntax error. Has anyone managed to solve the issue yet?

@B3n0x
Copy link

B3n0x commented Nov 15, 2024

Hi everyone, I'm encountering the same issue related to this topic, and i would add my experience:
I need to create a Superstructure (and an SDF file) based on Ecoinvent 3.10 - SSP2 - BASE - consequential, with projections for two years (2025 and 2050). However, when generating the SDF file, six flows are incorrectly created due to missing "to key" fields. Specifically, the problematic flows are:

Output from miniconda prompt ->
<<not found key for consumer ('formaldehyde production, methanol oxidation', 'formaldehyde', None, 'RER', 'kilogram', 'production')
not found key for consumer ('formaldehyde production, methanol oxidation', 'formaldehyde', None, 'RER', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RER', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RER', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RoW', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RoW', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RoW', 'kilogram', 'production')>>

When the SDF file is loaded into AB, it generates the error in question due to these missing flow keys.

Currently, we’re manually deleting the problematic flows from the SDF file, but the results aren’t satisfactory. Does anyone have suggestions on how to resolve this issue?

Thank you in advance for your help.

@romainsacchi
Copy link

@B3n0x how did you generate the SDF: using the ScenarioLInk plugin or premise?

@romainsacchi
Copy link

@B3n0x:

  • 'formaldehyde production, methanol oxidation', 'formaldehyde', 'RER', 'kilogram'
  • 'hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', 'RER', 'kilogram'

Both exist in ecoinvent 3.10 consequential. Can you check that they also exist in the superstructure DB to which the SDF was meant to point?

@B3n0x
Copy link

B3n0x commented Nov 16, 2024

Hi @romainsacchi, thank you for your reply.
We are using the ScenarioLink plugin for SDF generation (and have updated to the latest version of the plugin).

We reviewed the superstructure database by extracting it into an Excel file. The flows in question are included in the superstructure DB with the following key codes:

  • Activity -> formaldehyde production, methanol oxidation (RER): 9396d13e6fdebf532c026b42b31c9efb
  • Activity -> hydrocarbons purification, molecular sieve separation of naphtha (RER): 05deffa7a4f6087a134f0ac58fc7a48e
  • Activity -> hydrocarbons purification, molecular sieve separation of naphtha (RoW): 54e780b19bdedf33b7913bffe36a16f3

We attempted to manually add the missing "to key" values in the SDF file using the reported key codes. While the SDF file is successfully loaded into AB, it still does not correctly recognize these flows.

To provide a clear overview, the steps we followed in AB are as follows:

    • Downloaded the PREMISE (v. 2,1,3) Remind Ecoinvent 3.10 consequential - SSP2 - BASE using the ScenarioLink plugin.
    • Selected the two required years for our analysis (e.g., 2025 and 2030).
    • Created the Superstructure database and the SDF file through the AB interface by clicking on "Produce superstructure DB" → "Import".
  1. During the "writing scenario difference file" process in the Miniconda prompt, the mentioned error occurred.
  2. The Superstructure DB and the SDF file were generated, but the "to key" parameter was missing in the specified flows of SDF.

Additionally, I would like to note that when we created in AB individual databases for each specific year (DB-2025 and DB-2050) separately, no errors occurred.

I hope my explanation was as clear as possible. Apologies for the lengthy description, and thank you in advance for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues/PRs related to bugs
Projects
None yet
Development

No branches or pull requests

4 participants