Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto column width and row height with multi-line cell values #87

Open
nachtkinder opened this issue Oct 16, 2020 · 7 comments
Open

Auto column width and row height with multi-line cell values #87

nachtkinder opened this issue Oct 16, 2020 · 7 comments

Comments

@nachtkinder
Copy link

nachtkinder commented Oct 16, 2020

Hi,

I'm having an issue with the auto determined column width and row height when using StyleFrame with a pandas DataFrame that contains multi-line values.

import pandas as pd
from styleframe import StyleFrame

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

What I expect test.xlsx to look like is:

   a   |  b
-------|------
Line 1 |Line 1
Line 2 |Line 2
Line 3 |Line 3
-------|------

What it actually looks like is:

                   a                   |                   b
---------------------------------------|--------------------------------------
                 Line 2                |                 Line 2
---------------------------------------|--------------------------------------

"Line 1" and "Line 3" are present in each cell but the row is not rendered with sufficient height for them to be visible.

This behaviour is the same regardless of any wrap_text or width properties set on the columns.

The README.md states:

The StyleFrame object will auto-adjust the columns width and the rows height but they can be changed manually

The API documentation for StyleFrame.to_excel states the formula used to determine a column's width as:

(len(longest_value_in_column) + A_FACTOR) * P_FACTOR

...which results in column widths a little bigger than the largest combined length of all lines plus the newline characters between them in a single value.

Is there a way to instruct StyleFrame to treat multi-line value differently such that:

a. Columns that contain multi-line values are not much wider than the longest line in a value contained within them

and

b. Rows that contain multi-line values have sufficient height to display all lines of any cell contained within

Versions

Python 3.7.5
StyleFrame 3.0.5
pandas 1.0.4
openpyxl 2.6.3

Described behaviour also present with pandas 1.1.3 and openpyxl 3.0.5

DeepSpace2 added a commit that referenced this issue Oct 16, 2020
@DeepSpace2
Copy link
Owner

DeepSpace2 commented Oct 16, 2020

Hi.

a. Columns that contain multi-line values are not much wider than the longest line in a value contained within them

This is quite easily fixable, see the commit in which I referenced this issue. This will split each row in each column on \n and use the length of the longest line in each row as the length for the calculation.

b. Rows that contain multi-line values have sufficient height to display all lines of any cell contained within

This is a bit tricker because I suspect this is going to be more font-size sensitive.

For now, you can use the following workaround before calling .to_excel:

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.set_column_width_dict({column: (max(map(len, sf.data_df[column].astype(str).str.split('\n'))) + sf.A_FACTOR) * sf.P_FACTOR
                          for column in sf.data_df.columns})
sf.set_row_height_dict({index + 2: (max(sf.data_df[column].astype(str).str.count('\n') for column in sf.data_df.columns) + sf.A_FACTOR) * sf.P_FACTOR * 2
                        for index, _ in enumerate(sf.data_df.index)})
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

@fiendish
Copy link

fiendish commented Jun 24, 2021

I don't think that set_row_height_dict works. Pandas doesn't like applying max to Series like that now. Also something about that arithmetic seems off to me.

@fiendish
Copy link

fiendish commented Jun 24, 2021

I have no idea what A_FACTOR and P_FACTOR are supposed to be, but I think rather for height you would want to multiply the number of lines (count("\n")+1) by a padded character height. Width ought to be similar as well counting line characters instead and also account for the column header widths. I don't see how the posted code could accomplish either of those.

@DeepSpace2
Copy link
Owner

DeepSpace2 commented Jun 24, 2021

@fiendish

multiply the number of lines (count("\n")+1) by a padded character height

That's essentially what A_FACTOR and P_FACTOR are:

best_fit will attempt to calculate the correct column-width based on the longest value in each provided column. However this isn’t guaranteed to work for all fonts (works best with monospaced fonts). The formula used to calculate a column’s width is equivalent to (len(longest_value_in_column) + A_FACTOR) * P_FACTOR. The default values for A_FACTOR and P_FACTOR are 13 and 1.3 respectively, and can be modified before calling StyleFrame.to_excel by directly modifying StyleFrame.A_FACTOR and StyleFrame.P_FACTOR.

from the docs (although this refers to their usage when calculating columns' width, the same applies when using their values in the suggestion solution).

The above suggested solution does what it says on the tin :)

This

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

generates

image

however

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.set_row_height_dict({index + 2: (max(sf.data_df[column].astype(str).str.count('\n').max() for column in sf.data_df.columns) + sf.A_FACTOR) * sf.P_FACTOR * 2
                        for index, _ in enumerate(sf.data_df.index)})
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

generates

image

@coolbeam
Copy link

Hi
I got an error when processing my data. I guess when the table has multiple rows, this code may not work.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Maybe it is because the max operation in sf.set_row_height_dict() line. But I don't know.

Another problem is in apply_style_by_indexes, when no grid or index satisfy the provided condition, there is a key error.
KeyError: "None of [Index([.......dtype='object')] are in the [columns]"

@DeepSpace2
Copy link
Owner

DeepSpace2 commented Sep 21, 2021

@coolbeam Indeed, sf.data_df[column].astype(str).str.count('\n') should be sf.data_df[column].astype(str).str.count('\n').max(). I will edit the original comment.

As for your second point, I think that should be a separate issue as I'm not sure this is related to this. Anyway, please provide a reproducible example as I'm not sure I follow.

@buhtz
Copy link
Contributor

buhtz commented Jul 14, 2022

Maybe related #131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants