You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @motin I created a cli tool for running the evaluation and cache results locally and upload to google spreadsheet. It's mostly the same process as in the run_evaluation.py notebook. But somehow I couldn't upload the session results to google spreadsheet.
When I enable this line (or append_data(session_df)), it result in an error
File ~/src/work/gapminder/experiments/gapminder-ai/automation-api/lib/gsheets/gsheets_worksheet_editor.py:90, in GsheetsWorksheetEditor.replace_data(self, df)
88 def replace_data(self, df: pd.DataFrame) -> None:
89 self.data.df = df
---> 90 export_df = self.data.export()
91 gspread_dataframe.set_with_dataframe(self.worksheet, export_df, resize=True)
File ~/src/work/gapminder/experiments/gapminder-ai/automation-api/lib/gsheets/gsheets_worksheet_data.py:94, in GsheetsWorksheetData.export(self)
93 def export(self) -> pa.typing.DataFrame[DfSchemaModel]:
---> 94 return self.restore_current_row_numbers_in_formulas(self.df).rename(
95 columns=self.attributes_to_columns_map
96 )
File ~/src/work/gapminder/experiments/gapminder-ai/automation-api/lib/gsheets/gsheets_worksheet_data.py:88, in GsheetsWorksheetData.restore_current_row_numbers_in_formulas(self, df)
82 row[col_name] = value.replace(
83 self.row_number_placeholder_in_formulas,
84 str(row["__row_number"]),
85 )
86 return row
---> 88 replaced_df = df.apply(
89 restore_current_row_numbers, axis=1, result_type="broadcast"
90 )
91 return replaced_df.drop(columns=["__row_number"])
File ~/.pyvenv/gapm39/lib/python3.9/site-packages/pandas/core/frame.py:9568, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
9557 from pandas.core.apply import frame_apply
9559 op = frame_apply(
9560 self,
9561 func=func,
(...)
9566 kwargs=kwargs,
9567 )
-> 9568 return op.apply().__finalize__(self, method="apply")
File ~/.pyvenv/gapm39/lib/python3.9/site-packages/pandas/core/apply.py:754, in FrameApply.apply(self)
752 # broadcasting
753 if self.result_type == "broadcast":
--> 754 return self.apply_broadcast(self.obj)
756 # one axis empty
757 elif not all(self.obj.shape):
File ~/.pyvenv/gapm39/lib/python3.9/site-packages/pandas/core/apply.py:1013, in FrameColumnApply.apply_broadcast(self, target)
1012 def apply_broadcast(self, target: DataFrame) -> DataFrame:
-> 1013 result = super().apply_broadcast(target.T)
1014 return result.T
File ~/.pyvenv/gapm39/lib/python3.9/site-packages/pandas/core/apply.py:870, in FrameApply.apply_broadcast(self, target)
867 result_compare = target.shape[0]
869 for i, col in enumerate(target.columns):
--> 870 res = self.f(target[col])
871 ares = np.asarray(res).ndim
873 # must be a scalar or 1d
File ~/src/work/gapminder/experiments/gapminder-ai/automation-api/lib/gsheets/gsheets_worksheet_data.py:80, in GsheetsWorksheetData.restore_current_row_numbers_in_formulas.<locals>.restore_current_row_numbers(row)
78 if col_name == "__row_number":
79 continue
---> 80 value = row[col_name]
81 if isinstance(value, str) and len(value) > 0 and value[0] == "=":
82 row[col_name] = value.replace(
83 self.row_number_placeholder_in_formulas,
84 str(row["__row_number"]),
85 )
File ~/.pyvenv/gapm39/lib/python3.9/site-packages/pandas/core/frame.py:3807, in DataFrame.__getitem__(self, key)
3805 if self.columns.nlevels > 1:
3806 return self._getitem_multilevel(key)
-> 3807 indexer = self.columns.get_loc(key)
3808 if is_integer(indexer):
3809 indexer = [indexer]
File ~/.pyvenv/gapm39/lib/python3.9/site-packages/pandas/core/indexes/base.py:3804, in Index.get_loc(self, key, method, tolerance)
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise
3807 # InvalidIndexError. Otherwise we fall through and re-raise
3808 # the TypeError.
3809 self._check_indexing_error(key)
KeyError: 'session_id'
But I checked the session_df, it has passed SessionResultsDf validation and the dtypes looks correct.
Then I found that it's possibly caused by the size of session_df, because if I only upload a few rows from session_df, it worked without issue. 1990 rows worked, but 3980 rows not work.
I suggest that, we can check if the DataFrame to be uploaded is too large and give a more meaningful error message instead of KeyError .
Besides, it seems that 3980 rows is not a very big number.. Is it possible to make it work with larger DataFrames? For our usage, I think we are looking at 200 questions * 3 models * 10 rounds * 10 prompts = 60000 rows
The text was updated successfully, but these errors were encountered:
Hi @motin I created a cli tool for running the evaluation and cache results locally and upload to google spreadsheet. It's mostly the same process as in the
run_evaluation.py
notebook. But somehow I couldn't upload the session results to google spreadsheet.gapminder-ai/automation-api/lib/pilot/cli.py
Line 157 in 47851d7
When I enable this line (or
append_data(session_df)
), it result in an errorBut I checked the
session_df
, it has passedSessionResultsDf
validation and the dtypes looks correct.Then I found that it's possibly caused by the size of
session_df
, because if I only upload a few rows fromsession_df
, it worked without issue. 1990 rows worked, but 3980 rows not work.I suggest that, we can check if the DataFrame to be uploaded is too large and give a more meaningful error message instead of
KeyError
.Besides, it seems that 3980 rows is not a very big number.. Is it possible to make it work with larger DataFrames? For our usage, I think we are looking at 200 questions * 3 models * 10 rounds * 10 prompts = 60000 rows
The text was updated successfully, but these errors were encountered: