-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OneNote support #55
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,9 @@ It presently supports: | |
- Audio (EXIF metadata, and speech transcription) | ||
- HTML (special handling of Wikipedia, etc.) | ||
- Various other text-based formats (csv, json, xml, etc.) | ||
- OneNote (.one) | ||
|
||
Note: OneNote is not supported. | ||
|
||
# Installation | ||
|
||
|
@@ -51,6 +54,18 @@ result = md.convert("example.jpg") | |
print(result.text_content) | ||
``` | ||
|
||
To convert OneNote files, you can use the following example: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example is redundant, there’s one already for converting files. |
||
|
||
```python | ||
from markitdown import MarkItDown | ||
|
||
markitdown = MarkItDown() | ||
result = markitdown.convert("example.one") | ||
print(result.text_content) | ||
``` | ||
|
||
Note: For OneNote support, the `one-extract` package is used. | ||
|
||
## Contributing | ||
|
||
This project welcomes contributions and suggestions. Most contributions require you to agree to a | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,7 @@ dependencies = [ | |
"youtube-transcript-api", | ||
"SpeechRecognition", | ||
"pathvalidate", | ||
"onenote", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this the correct package name? could not find it in pip registry, but found this one instead https://pypi.org/project/one-extract/ can you link which is the appropriate one? |
||
] | ||
|
||
[project.urls] | ||
|
@@ -76,3 +77,5 @@ exclude_lines = [ | |
"if __name__ == .__main__.:", | ||
"if TYPE_CHECKING:", | ||
] | ||
|
||
# Note: OneNote is not supported. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,15 @@ | |
"data:image/svg+xml,%3Csvg%20width%3D", | ||
] | ||
|
||
ONENOTE_TEST_STRINGS = [ | ||
"# Section 1", | ||
"## Page 1", | ||
"This is a test OneNote page.", | ||
"# Section 2", | ||
"## Page 2", | ||
"Another test OneNote page.", | ||
] | ||
|
||
|
||
@pytest.mark.skipif( | ||
skip_remote, | ||
|
@@ -164,6 +173,12 @@ def test_markitdown_local() -> None: | |
for test_string in SERP_TEST_STRINGS: | ||
assert test_string in text_content | ||
|
||
# Test OneNote processing | ||
result = markitdown.convert(os.path.join(TEST_FILES_DIR, "test.one")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you forget to add the test file, |
||
text_content = result.text_content.replace("\\", "") | ||
for test_string in ONENOTE_TEST_STRINGS: | ||
assert test_string in text_content | ||
|
||
|
||
@pytest.mark.skipif( | ||
skip_exiftool, | ||
|
@@ -179,7 +194,7 @@ def test_markitdown_exiftool() -> None: | |
assert target in result.text_content | ||
|
||
|
||
if __name__ == "__main__": | ||
if __name__main__": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like there's a small typo in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your checking. I have already fix it. |
||
"""Runs this file's tests from the command line.""" | ||
test_markitdown_remote() | ||
test_markitdown_local() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a typo? the comment doesn't seem consistent?