SDF CSV importer breaks when keys are missing #1293

marc-vdm · 2024-05-29T10:49:44Z

Updating AB

Yes, I have updated AB and still experience this issue

What happened?

When loading a CSV, the 'from key' and to key columns are read as literal_eval because they should contain a tuple with the key. However, when no keys are given, the cell is interpreted as empty string. This breaks ast.literal_eval() with a SyntaxError as it only supports valid python expressions.

Problem code:

activity-browser/activity_browser/bwutils/superstructure/file_imports.py

Lines 170 to 171 in dd71811

    
           df = pd.read_csv(path, compression='infer', sep=separator, index_col=False, 
        
                            converters={'from key': ast.literal_eval, 'to key': ast.literal_eval})

The Excel reader reads the file and afterwards properly deals with the literal_eval, we should do something similar for the CSV reader:

activity-browser/activity_browser/bwutils/superstructure/excel.py

Lines 84 to 86 in dd71811

    
           # Convert specific columns that may have tuples as strings 
        
           columns = ["from categories", "from key", "to categories", "to key"] 
        
           data.loc[:, columns] = data[columns].map(convert_tuple_str)

activity-browser/activity_browser/bwutils/superstructure/excel.py

Lines 18 to 22 in dd71811

    
           def convert_tuple_str(x): 
        
               try: 
        
                   return literal_eval(x) 
        
               except (ValueError, SyntaxError) as e: 
        
                   return x

However, not that when I just remove the converter and add the 'excel implementation', I get ~75% of my flows back unlinked, this doesn't make sense, as the CSV with missing keys saved as excel loads normally. This requires some more investigation.

Relevant errors

Traceback (most recent call last):
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\activity_browser\bwutils\superstructure\utils.py", line 70, in wrapper
    result = func(*args)
             ^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\activity_browser\layouts\tabs\LCA_setup.py", line 605, in load_action
    df = ABCSVImporter.read_file(path, separator=separator)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\activity_browser\bwutils\superstructure\file_imports.py", line 170, in read_file
    df = pd.read_csv(path, compression='infer', sep=separator, index_col=False,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\readers.py", line 617, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\readers.py", line 1748, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 920, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 1044, in pandas._libs.parsers.TextReader._convert_column_data
  File "parsers.pyx", line 2113, in pandas._libs.parsers._apply_converter
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\ast.py", line 64, in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\meidemtvander\AppData\Local\miniconda3\envs\ab\Lib\ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 0

SyntaxError: invalid syntax

Operating system

Windows 11

Conda environment

No response

The text was updated successfully, but these errors were encountered:

romainsacchi · 2024-09-06T15:06:16Z

@marc-vdm I got the same issue (AB 2.10.1) but without missing values for from key and to key:


17:00:12 | INFO | Loading Scenario file. This may take a while for large files
17:00:14 | ERROR | SyntaxError: invalid syntax (<unknown>, line 0)
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/activity_browser/bwutils/superstructure/utils.py", line 81, in wrapper
    result = func(*args)
             ^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/activity_browser/layouts/tabs/LCA_setup.py", line 603, in load_action
    df = ABCSVImporter.read_file(path, separator=separator)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/activity_browser/bwutils/superstructure/file_imports.py", line 223, in read_file
    df = pd.read_csv(
         ^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 1045, in pandas._libs.parsers.TextReader._convert_column_data
  File "parsers.pyx", line 2116, in pandas._libs.parsers._apply_converter
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/ast.py", line 64, in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/ab/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 0
    
SyntaxError: invalid syntax

As said above, my SDF does not miss any value for the from key and to key fields; see the first few lines:

from activity name,from reference product,from location,from categories,from database,from key,to activity name,to reference product,to location,to categories,to database,to key,flow type,remind - SSP2-PkBudg1150 - 2035,remind - SSP2-PkBudg1150 - 2045
"market for heat, district or industrial, natural gas","heat, district or industrial, natural gas",RoW,,ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150,"('ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150', 'bd1f664bf4604b04864aa0bb912da7d7')","1,1-difluoroethane production","1,1-difluoroethane",US,,ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150,"('ecoinvent-3.10-cutoff - remind - SSP2-PkBudg1150', 'b226476e310ced5b991672eec4dfeac7')",technosphere,0.0,0.0

romainsacchi · 2024-09-06T15:07:17Z

Could it be that the way the string representation of the tuples fail to be interpreted?

AlesPaulu · 2024-11-10T14:36:50Z

Hi, I have the same issue as Romain, as I cannot load a scenario database, getting the same syntax error. Has anyone managed to solve the issue yet?

B3n0x · 2024-11-15T18:26:58Z

Hi everyone, I'm encountering the same issue related to this topic, and i would add my experience:
I need to create a Superstructure (and an SDF file) based on Ecoinvent 3.10 - SSP2 - BASE - consequential, with projections for two years (2025 and 2050). However, when generating the SDF file, six flows are incorrectly created due to missing "to key" fields. Specifically, the problematic flows are:

Output from miniconda prompt ->
<<not found key for consumer ('formaldehyde production, methanol oxidation', 'formaldehyde', None, 'RER', 'kilogram', 'production')
not found key for consumer ('formaldehyde production, methanol oxidation', 'formaldehyde', None, 'RER', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RER', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RER', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RoW', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RoW', 'kilogram', 'production')
not found key for consumer ('hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', None, 'RoW', 'kilogram', 'production')>>

When the SDF file is loaded into AB, it generates the error in question due to these missing flow keys.

Currently, we’re manually deleting the problematic flows from the SDF file, but the results aren’t satisfactory. Does anyone have suggestions on how to resolve this issue?

Thank you in advance for your help.

romainsacchi · 2024-11-16T11:23:55Z

@B3n0x how did you generate the SDF: using the ScenarioLInk plugin or premise?

romainsacchi · 2024-11-16T11:30:01Z

@B3n0x:

'formaldehyde production, methanol oxidation', 'formaldehyde', 'RER', 'kilogram'
'hydrocarbons purification, molecular sieve separation of naphtha', 'hexane', 'RER', 'kilogram'

Both exist in ecoinvent 3.10 consequential. Can you check that they also exist in the superstructure DB to which the SDF was meant to point?

B3n0x · 2024-11-16T16:18:54Z

Hi @romainsacchi, thank you for your reply.
We are using the ScenarioLink plugin for SDF generation (and have updated to the latest version of the plugin).

We reviewed the superstructure database by extracting it into an Excel file. The flows in question are included in the superstructure DB with the following key codes:

Activity -> formaldehyde production, methanol oxidation (RER): 9396d13e6fdebf532c026b42b31c9efb
Activity -> hydrocarbons purification, molecular sieve separation of naphtha (RER): 05deffa7a4f6087a134f0ac58fc7a48e
Activity -> hydrocarbons purification, molecular sieve separation of naphtha (RoW): 54e780b19bdedf33b7913bffe36a16f3

We attempted to manually add the missing "to key" values in the SDF file using the reported key codes. While the SDF file is successfully loaded into AB, it still does not correctly recognize these flows.

To provide a clear overview, the steps we followed in AB are as follows:

- Downloaded the PREMISE (v. 2,1,3) Remind Ecoinvent 3.10 consequential - SSP2 - BASE using the ScenarioLink plugin.
- Selected the two required years for our analysis (e.g., 2025 and 2030).
- Created the Superstructure database and the SDF file through the AB interface by clicking on "Produce superstructure DB" → "Import".
During the "writing scenario difference file" process in the Miniconda prompt, the mentioned error occurred.
The Superstructure DB and the SDF file were generated, but the "to key" parameter was missing in the specified flows of SDF.

Additionally, I would like to note that when we created in AB individual databases for each specific year (DB-2025 and DB-2050) separately, no errors occurred.

I hope my explanation was as clear as possible. Apologies for the lengthy description, and thank you in advance for your help

marc-vdm added the bug Issues/PRs related to bugs label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDF CSV importer breaks when keys are missing #1293

SDF CSV importer breaks when keys are missing #1293

marc-vdm commented May 29, 2024 •

edited

Loading

romainsacchi commented Sep 6, 2024

romainsacchi commented Sep 6, 2024

AlesPaulu commented Nov 10, 2024

B3n0x commented Nov 15, 2024

romainsacchi commented Nov 16, 2024

romainsacchi commented Nov 16, 2024

B3n0x commented Nov 16, 2024

SDF CSV importer breaks when keys are missing #1293

SDF CSV importer breaks when keys are missing #1293

Comments

marc-vdm commented May 29, 2024 • edited Loading

Updating AB

What happened?

Relevant errors

Operating system

Conda environment

romainsacchi commented Sep 6, 2024

romainsacchi commented Sep 6, 2024

AlesPaulu commented Nov 10, 2024

B3n0x commented Nov 15, 2024

romainsacchi commented Nov 16, 2024

romainsacchi commented Nov 16, 2024

B3n0x commented Nov 16, 2024

marc-vdm commented May 29, 2024 •

edited

Loading