HTML Table Extractor

Description

This Python script extracts tables from HTML files and converts them into Excel spreadsheets. It handles hyperlinks by creating separate URL columns and can add a custom 'type' column to all extracted tables.

Features

Extracts all tables from a given HTML file
Preserves hyperlinks by creating separate URL columns
Adds a custom 'type' column to all tables
Outputs data to an Excel file with multiple sheets (one per table)
Prepends "https://scan.merlinchain.io" to all extracted URLs

Requirements

Python 3.x
pandas
BeautifulSoup4
openpyxl

Usage

Run the script from the command line with two arguments:

Path to the HTML file
Value for the 'type' column

Output

The script generates an Excel file named '[original_filename]_tables.xlsx' in the same directory as the input HTML file. Each table from the HTML file is placed in a separate sheet within the Excel file.

Main Functions

read_html_file(file_path): Reads the HTML file
html_table_to_dataframe(table, type_value): Converts an HTML table to a pandas DataFrame
extract_tables_to_excel(html_content, output_file, type_value): Extracts all tables and saves them to an Excel file
main(): Handles command-line arguments and orchestrates the extraction process

Notes

The script assumes relative URLs in the HTML and prepends "https://scan.merlinchain.io" to all URLs
Empty URL columns are automatically removed from the output
Use at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
extract_tables.py		extract_tables.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTML Table Extractor

Description

Features

Requirements

Usage

Output

Main Functions

Notes

About

Releases

Packages

Languages

License

hattizai/extract_tables_from_html

Folders and files

Latest commit

History

Repository files navigation

HTML Table Extractor

Description

Features

Requirements

Usage

Output

Main Functions

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages