Auto column width and row height with multi-line cell values #87

nachtkinder · 2020-10-16T15:41:40Z

Hi,

I'm having an issue with the auto determined column width and row height when using StyleFrame with a pandas DataFrame that contains multi-line values.

import pandas as pd
from styleframe import StyleFrame

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

What I expect test.xlsx to look like is:

   a   |  b
-------|------
Line 1 |Line 1
Line 2 |Line 2
Line 3 |Line 3
-------|------

What it actually looks like is:

                   a                   |                   b
---------------------------------------|--------------------------------------
                 Line 2                |                 Line 2
---------------------------------------|--------------------------------------

"Line 1" and "Line 3" are present in each cell but the row is not rendered with sufficient height for them to be visible.

This behaviour is the same regardless of any wrap_text or width properties set on the columns.

The README.md states:

The StyleFrame object will auto-adjust the columns width and the rows height but they can be changed manually

The API documentation for StyleFrame.to_excel states the formula used to determine a column's width as:

(len(longest_value_in_column) + A_FACTOR) * P_FACTOR

...which results in column widths a little bigger than the largest combined length of all lines plus the newline characters between them in a single value.

Is there a way to instruct StyleFrame to treat multi-line value differently such that:

a. Columns that contain multi-line values are not much wider than the longest line in a value contained within them

and

b. Rows that contain multi-line values have sufficient height to display all lines of any cell contained within

Versions

Python 3.7.5
StyleFrame 3.0.5
pandas 1.0.4
openpyxl 2.6.3

Described behaviour also present with pandas 1.1.3 and openpyxl 3.0.5

The text was updated successfully, but these errors were encountered:

… calling to_excel. Closes issue #87

DeepSpace2 · 2020-10-16T17:36:20Z

Hi.

a. Columns that contain multi-line values are not much wider than the longest line in a value contained within them

This is quite easily fixable, see the commit in which I referenced this issue. This will split each row in each column on \n and use the length of the longest line in each row as the length for the calculation.

b. Rows that contain multi-line values have sufficient height to display all lines of any cell contained within

This is a bit tricker because I suspect this is going to be more font-size sensitive.

For now, you can use the following workaround before calling .to_excel:

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.set_column_width_dict({column: (max(map(len, sf.data_df[column].astype(str).str.split('\n'))) + sf.A_FACTOR) * sf.P_FACTOR
                          for column in sf.data_df.columns})
sf.set_row_height_dict({index + 2: (max(sf.data_df[column].astype(str).str.count('\n') for column in sf.data_df.columns) + sf.A_FACTOR) * sf.P_FACTOR * 2
                        for index, _ in enumerate(sf.data_df.index)})
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

fiendish · 2021-06-24T20:05:20Z

I don't think that set_row_height_dict works. Pandas doesn't like applying max to Series like that now. Also something about that arithmetic seems off to me.

fiendish · 2021-06-24T20:34:36Z

I have no idea what A_FACTOR and P_FACTOR are supposed to be, but I think rather for height you would want to multiply the number of lines (count("\n")+1) by a padded character height. Width ought to be similar as well counting line characters instead and also account for the column header widths. I don't see how the posted code could accomplish either of those.

DeepSpace2 · 2021-06-24T22:45:37Z

@fiendish

multiply the number of lines (count("\n")+1) by a padded character height

That's essentially what A_FACTOR and P_FACTOR are:

best_fit will attempt to calculate the correct column-width based on the longest value in each provided column. However this isn’t guaranteed to work for all fonts (works best with monospaced fonts). The formula used to calculate a column’s width is equivalent to (len(longest_value_in_column) + A_FACTOR) * P_FACTOR. The default values for A_FACTOR and P_FACTOR are 13 and 1.3 respectively, and can be modified before calling StyleFrame.to_excel by directly modifying StyleFrame.A_FACTOR and StyleFrame.P_FACTOR.

from the docs (although this refers to their usage when calculating columns' width, the same applies when using their values in the suggestion solution).

The above suggested solution does what it says on the tin :)

This

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

generates

however

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.set_row_height_dict({index + 2: (max(sf.data_df[column].astype(str).str.count('\n').max() for column in sf.data_df.columns) + sf.A_FACTOR) * sf.P_FACTOR * 2
                        for index, _ in enumerate(sf.data_df.index)})
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

generates

coolbeam · 2021-09-21T16:54:06Z

Hi
I got an error when processing my data. I guess when the table has multiple rows, this code may not work.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Maybe it is because the max operation in sf.set_row_height_dict() line. But I don't know.

Another problem is in apply_style_by_indexes, when no grid or index satisfy the provided condition, there is a key error.
KeyError: "None of [Index([.......dtype='object')] are in the [columns]"

DeepSpace2 · 2021-09-21T19:54:13Z

@coolbeam Indeed, sf.data_df[column].astype(str).str.count('\n') should be sf.data_df[column].astype(str).str.count('\n').max(). I will edit the original comment.

As for your second point, I think that should be a separate issue as I'm not sure this is related to this. Anyway, please provide a reproducible example as I'm not sure I follow.

buhtz · 2022-07-14T07:21:48Z

Maybe related #131

DeepSpace2 added a commit that referenced this issue Oct 16, 2020

Fixed auto column width calculation when using best_fit argument when…

e75d387

… calling to_excel. Closes issue #87

DeepSpace2 added the dont mark as stale label Oct 26, 2020

DeepSpace2 mentioned this issue Oct 16, 2022

Auto Adjusting Column Length and Row Height #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto column width and row height with multi-line cell values #87

Auto column width and row height with multi-line cell values #87

nachtkinder commented Oct 16, 2020 •

edited

Loading

DeepSpace2 commented Oct 16, 2020 •

edited

Loading

fiendish commented Jun 24, 2021 •

edited

Loading

fiendish commented Jun 24, 2021 •

edited

Loading

DeepSpace2 commented Jun 24, 2021 •

edited

Loading

coolbeam commented Sep 21, 2021

DeepSpace2 commented Sep 21, 2021 •

edited

Loading

buhtz commented Jul 14, 2022

Auto column width and row height with multi-line cell values #87

Auto column width and row height with multi-line cell values #87

Comments

nachtkinder commented Oct 16, 2020 • edited Loading

Versions

DeepSpace2 commented Oct 16, 2020 • edited Loading

fiendish commented Jun 24, 2021 • edited Loading

fiendish commented Jun 24, 2021 • edited Loading

DeepSpace2 commented Jun 24, 2021 • edited Loading

coolbeam commented Sep 21, 2021

DeepSpace2 commented Sep 21, 2021 • edited Loading

buhtz commented Jul 14, 2022

nachtkinder commented Oct 16, 2020 •

edited

Loading

DeepSpace2 commented Oct 16, 2020 •

edited

Loading

fiendish commented Jun 24, 2021 •

edited

Loading

fiendish commented Jun 24, 2021 •

edited

Loading

DeepSpace2 commented Jun 24, 2021 •

edited

Loading

DeepSpace2 commented Sep 21, 2021 •

edited

Loading