-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export pivoted table into csv #30658
Comments
Hey there, @frlm! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this together! The error you're encountering is due to the Here's a summary of the fix:
This fix should resolve the |
Hi, I have the very identical issue and I see you found the solution as well. Best Regards |
@frlm would you be willing to open a PR with the change? That would be fantastic, and could auto-close this issue if/when merged. |
@frim @frabenetti do either of you have an easy to reproduce test case? It seems trivial to take this small change and open a PR, but it would be nice to have a failing unit test we can make pass here, or at least a way to manually test the PR for success. CC @villebro @kgabryje in case either of you (or someone you can hot-potato this to) have any interest here. |
Hi,
I'm sorry for the late, next week I will prepare a PR with the fix.
Best regards
Il ven 15 nov 2024, 23:32 Evan Rusackas ***@***.***> ha
scritto:
… @frim <https://github.com/frim> @frabenetti
<https://github.com/frabenetti> do either of you have an easy to
reproduce test case? It seems trivial to take this small change and open a
PR, but it would be nice to have a failing unit test we can make pass here,
or at least a way to manually test the PR for success.
CC @villebro <https://github.com/villebro> @kgabryje
<https://github.com/kgabryje> in case either of you (or someone you can
hot-potato this to) have any interest here.
—
Reply to this email directly, view it on GitHub
<#30658 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWXUXJBUKEZSR3VNKJ7DH4D2AZZBLAVCNFSM6AAAAABQJ65EYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBQGA2TINJYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
UPDATING implemented fix and added it in following PR: #30961Title: fix(csv_export): use custom CSV_EXPORT parameters in pd.read_csv Bug descriptionFunction: apply_post_process The issue is that Steps to reproduce error:
Cause: The error is generated by an anomaly in the input DataFrame df, which has the following format (a single column with all distinct fields separated by a semicolon separator):
Fix: Added a bug fix to read data with right CSV_EXPORT settings Code Changes: elif query["result_format"] == ChartDataResultFormat.CSV:
df = pd.read_csv(StringIO(data),
delimiter=superset_config.CSV_EXPORT.get('sep'),
encoding=superset_config.CSV_EXPORT.get('encoding'),
decimal=superset_config.CSV_EXPORT.get('decimal')) Complete Code def apply_post_process(
result: dict[Any, Any],
form_data: Optional[dict[str, Any]] = None,
datasource: Optional[Union["BaseDatasource", "Query"]] = None,
) -> dict[Any, Any]:
form_data = form_data or {}
viz_type = form_data.get("viz_type")
if viz_type not in post_processors:
return result
post_processor = post_processors[viz_type]
for query in result["queries"]:
if query["result_format"] not in (rf.value for rf in ChartDataResultFormat):
raise Exception( # pylint: disable=broad-exception-raised
f"Result format {query['result_format']} not supported"
)
data = query["data"]
if isinstance(data, str):
data = data.strip()
if not data:
# do not try to process empty data
continue
if query["result_format"] == ChartDataResultFormat.JSON:
df = pd.DataFrame.from_dict(data)
elif query["result_format"] == ChartDataResultFormat.CSV:
df = pd.read_csv(StringIO(data),
delimiter=superset_config.CSV_EXPORT.get('sep'),
encoding=superset_config.CSV_EXPORT.get('encoding'),
decimal=superset_config.CSV_EXPORT.get('decimal'))
# convert all columns to verbose (label) name
if datasource:
df.rename(columns=datasource.data["verbose_map"], inplace=True)
processed_df = post_processor(df, form_data, datasource)
query["colnames"] = list(processed_df.columns)
query["indexnames"] = list(processed_df.index)
query["coltypes"] = extract_dataframe_dtypes(processed_df, datasource)
query["rowcount"] = len(processed_df.index)
# Flatten hierarchical columns/index since they are represented as
# `Tuple[str]`. Otherwise encoding to JSON later will fail because
# maps cannot have tuples as their keys in JSON.
processed_df.columns = [
" ".join(str(name) for name in column).strip()
if isinstance(column, tuple)
else column
for column in processed_df.columns
]
processed_df.index = [
" ".join(str(name) for name in index).strip()
if isinstance(index, tuple)
else index
for index in processed_df.index
]
if query["result_format"] == ChartDataResultFormat.JSON:
query["data"] = processed_df.to_dict()
elif query["result_format"] == ChartDataResultFormat.CSV:
buf = StringIO()
processed_df.to_csv(buf)
buf.seek(0)
query["data"] = buf.getvalue()
return result |
Bug description
Function: pivot_df
Error: The function pivot_df raised a KeyError when trying to pivot the DataFrame due to a missing column.
Log:
Steps to reproduce error:
Click on Download > Export to Pivoted .CSV
Download is blocked by an error.
Cause: The error is generated by an anomaly in the input DataFrame df, which has the following format (a single column with all distinct fields separated by a semicolon separator):
Fix: Added a bug fix to split the first column using ";" and expand it into multiple columns, then reassign the original column names.
Code Changes:
Complete Code
Screenshots/recordings
No response
Superset version
4.0.2
Python version
3.10
Node version
16
Browser
Chrome
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: