Parse the tabular data in the input file:
import fables
for parse_result in fables.parse('myfile.zip'):
for table in parse_result.tables:
print(table.name)
print(table.df.head())
for error in parse_result.errors:
print(error.message)
Inspect the contents of the input file:
node = fables.detect('myfile.zip')
print(node.name)
print(node.mimetype)
for child in node.children:
print(child.name)
print(child.mimetype)
Note if you've already discovered the input tree from detect()
,
you can pass it into parse()
:
parse_results = parse(tree=node)
Handling encrypted zip
, xlsx
, and xls
files:
node = fables.detect('encrypted.xlsx')
assert node.encrypted
node.add_password('encrypted.xlsx', 'fables')
assert not node.encrypted
You can also supply a passwords dictionary (filename -> password) into detect and parse:
node = fables.detect(
'encrypted.zip',
passwords={
'encrypted.zip': 'fables',
# an encrypted file inside the zip
'encrypted.xlsx': 'foobles',
}
)
# and/or parse
parse_results = fables.parse(
'sub_dir',
passwords={
'sub_dir/encrypted.xlsx': 'fables',
'sub_dir/encrypted.xls': 'foobles',
},
)
The python library python-magic
requires additional system dependencies. There are installation instructions
there, but here are recommended routes to try:
-
on OSX:
brew install libmagic
. -
on Windows: this
pip install python-magic-bin
will install a built version using ctypes to access the libmagic file type identification library.
Then pip install -r requirements.txt
should do the trick.
- all tests:
pytest
- coverage:
pytest --cov=fables tests
- coverage:
- integration:
pytest tests/integration
- coverage:
pytest --cov=fables tests/integration
- coverage:
- unit:
pytest tests/unit
- coverage:
pytest --cov=fables tests/unit
- coverage:
Note all the coverage statistics are for statements.
mypy fables
- We enforce flake8:
flake8 .
nox