Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 852 Bytes

README.md

File metadata and controls

33 lines (26 loc) · 852 Bytes

python-docx-reader

A simple Microsoft Word .docx reader for Python.

Parses paragraphs, graphics, and inline equations (to tex)

Requirements

  • python 2.7
  • lxml>=3.4.1

Installation

python setup.py install

Usage

from docx.document import Document
doc = Document('path/to/your/docx/file')
# or doc = Document('path/to/your/docx/file', graphics=True, equations=True)

# Get generator of all paragraphs
paragraphs = doc.paragraphs
# Iterate over paragraphs and print paragraph text, graphics, and equations
for paragraph in paragraphs:
    print(paragraph.text)
    print(paragraph.graphics)
    print(paragraph.equations)
# Get all of the text, graphics, and equations in the document
print(doc.text)
print(doc.graphics)
print(doc.equations)