Skip to content

Commit

Permalink
Trim empty rows when converting xlsx to tsv
Browse files Browse the repository at this point in the history
With this change, excel to tabby conversion will preserve empty lines
in the middle of the file (maybe a visual separation of sections in a
many-objects file), but truncate empty lines at the end (maybe excel
artefact). This requires double iteration over the rows (first to find
where data ends, then to export), but it seems inexpensive.

This should help situations when excel (or calc) xlsx file preserves
blank lines.

One test data file (tsv) used to test round-tripping is altered to
remove empty lines at the end. So in the end we no longer guarantee
round-tripping these empty lines, but I feel this was a non-feature.
  • Loading branch information
mslw committed Nov 21, 2023
1 parent 0408aaf commit cae025a
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 5 deletions.
11 changes: 10 additions & 1 deletion datalad_tabby/io/xlsx.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,13 @@ def _sheet2tsv(ws: Worksheet, dest: Path):
tsvfile,
delimiter='\t',
)
writer.writerows(ws.iter_rows(values_only=True))

# find the last nonempty row
max_idx = 1
for i, row in enumerate(ws.iter_rows(values_only=True)):
if any(v is not None for v in row):
max_idx = i + 1 # max row is a 1-based index

# write tsv, truncating empty rows at the end
writer.writerows(ws.iter_rows(values_only=True, max_row=max_idx))

4 changes: 0 additions & 4 deletions datalad_tabby/tests/data/demorecord/tabbydemo_files.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,3 @@ path[POSIX] size[bytes] checksum[md5] url
raw/adelie.csv 23755 e7e2be6b203a221949f05e02fcefd853 https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.219.3&entityid=002f3893385f710df69eeebe893144ff
raw/gentoo.csv 11263 1549566fb97afa879dc9446edcf2015f https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.3&entityid=e03b43c924f226486f2f0ab6709d2381
raw/chinstrap.csv 18872 e4b0710c69297031d63866ce8b888f25 https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.221.2&entityid=fe853aa8f7a59aa84cdd3197619ef462




0 comments on commit cae025a

Please sign in to comment.