forked from uogbuji/amara3-xml
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsetup.py
executable file
·189 lines (148 loc) · 6.03 KB
/
setup.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Highly recommend installing using `pip install -U .` not `python setup.py install`
Uses pkgutil-style namespace package (Working on figuring out PEP 420)
Note: careful not to conflate install_requires with requirements.txt
https://packaging.python.org/discussions/install-requires-vs-requirements/
Reluctantly use setuptools for now to get install_requires & long_description_content_type
$ python -c "import amara3; import amara3.iri; import amara3.uxml; import amara3.uxml.version; print(amara3.uxml.version.version_info)"
('3', '0', '1')
'''
import sys
from setuptools import setup, Extension
#from distutils.core import setup, Extension
#from distutils.core import Extension
PROJECT_NAME = 'amara3.xml'
PROJECT_DESCRIPTION = 'Amara3 project, which offers a variety of data processing tools. This module adds the MicroXML support, and adaptation to classic XML.'
PROJECT_LICENSE = 'License :: OSI Approved :: Apache Software License'
PROJECT_AUTHOR = 'Uche Ogbuji'
PROJECT_AUTHOR_EMAIL = '[email protected]'
PROJECT_URL = 'https://github.com/uogbuji/amara3-xml'
PACKAGE_DIR = {'amara3': 'pylib'}
PACKAGES = [
'amara3.uxml',
'amara3.uxml.uxpath'
]
SCRIPTS = [
'exec/microx'
]
CORE_REQUIREMENTS = [
'amara3.iri>=3.0.1',
'nameparser',
'pytest',
'ply',
'html5lib',
]
# From http://pypi.python.org/pypi?%3Aaction=list_classifiers
CLASSIFIERS = [
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Development Status :: 4 - Beta",
#"Environment :: Other Environment",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Internet :: WWW/HTTP",
]
KEYWORDS=['xml', 'web', 'data']
version_file = 'pylib/uxml/version.py'
exec(compile(open(version_file, "rb").read(), version_file, 'exec'), globals(), locals())
__version__ = '.'.join(version_info)
#If you run into a prob with missing limits.h on Ubuntu/Mint, try:
#sudo apt-get install libc6-dev
cxmlstring = Extension('amara3.cmodules.cxmlstring', sources=['clib/xmlstring.c'], include_dirs=['clib'])
LONGDESC = '''# Amara 3 XML
Python 3 tools for processing [MicroXML](http://www.w3.org/community/microxml/), a simplification of XML. Amara 3 XML implements the MicroXML data model, and allows you to parse into this from tradiional XML and MicroXML.
The `microx` command line tool is especially useful for quick query and processing of XML.
## Install
Requires Python 3.4+. Just run:
```
pip install amara3.xml
```
## Use
Though Amara 3 is focused on MicroXML rather than full XML, the reality is that
most of the XML-like data you’ll be dealing with is full XML
1.0. his package provides capabilities to parse legacy XML and reduce it to
MicroXML. In many cases the biggest implication of this is that
namespace information is stripped. As long as you know what you’re doing
you can get pretty far by ignoring this, but make sure you know what
you’re doing.
from amara3.uxml import xml
MONTY_XML = """<monty xmlns="urn:spam:ignored">
<python spam="eggs">What do you mean "bleh"</python>
<python ministry="abuse">But I was looking for argument</python>
</monty>"""
builder = xml.treebuilder()
root = builder.parse(MONTY_XML)
print(root.xml_name) #"monty"
child = next(root.xml_children)
print(child) #First text node: "\n "
child = next(root.xml_children)
print(child.xml_value) #"What do you mean \"bleh\""
print(child.xml_attributes["spam"]) #"eggs"
There are some utilities to make this a bit easier as well.
from amara3.uxml import xml
from amara3.uxml.treeutil import *
MONTY_XML = """<monty xmlns="urn:spam:ignored">
<python spam="eggs">What do you mean "bleh"</python>
<python ministry="abuse">But I was looking for argument</python>
</monty>"""
builder = xml.treebuilder()
root = builder.parse(MONTY_XML)
py1 = next(select_name(root, "python"))
print(py1.xml_value) #"What do you mean \"bleh\""
py2 = next(select_attribute(root, "ministry", "abuse"))
print(py2.xml_value) #"But I was looking for argument"
## Experimental MicroXML parser
For this parser the input truly must be MicroXML. Basics:
>>> from amara3.uxml.parser import parse
>>> events = parse('<hello><bold>world</bold></hello>')
>>> for ev in events: print(ev)
...
(<event.start_element: 1>, 'hello', {}, [])
(<event.start_element: 1>, 'bold', {}, ['hello'])
(<event.characters: 3>, 'world')
(<event.end_element: 2>, 'bold', ['hello'])
(<event.end_element: 2>, 'hello', [])
>>>
Or…And now for something completely different!…Incremental parsing.
>>> from amara3.uxml.parser import parsefrags
>>> events = parsefrags(['<hello', '><bold>world</bold></hello>'])
>>> for ev in events: print(ev)
...
(<event.start_element: 1>, 'hello', {}, [])
(<event.start_element: 1>, 'bold', {}, ['hello'])
(<event.characters: 3>, 'world')
(<event.end_element: 2>, 'bold
## Implementation notes
Switched to a hand-crafted parser because:
1) Worried about memory consumption of the needed PLY lexer
2) Lack of incremental feed parse for PLY
3) Inspiration from James Clark's JS parser https://github.com/jclark/microxml-js/blob/master/microxml.js
----
Author: [Uche Ogbuji](http://uche.ogbuji.net) <[email protected]>
'''
LONGDESC_CTYPE = 'text/markdown'
setup(
#namespace_packages=['amara3'],
name=PROJECT_NAME,
version=__version__,
description=PROJECT_DESCRIPTION,
license=PROJECT_LICENSE,
author=PROJECT_AUTHOR,
author_email=PROJECT_AUTHOR_EMAIL,
#maintainer=PROJECT_MAINTAINER,
#maintainer_email=PROJECT_MAINTAINER_EMAIL,
url=PROJECT_URL,
package_dir=PACKAGE_DIR,
packages=PACKAGES,
scripts=SCRIPTS,
ext_modules = [cxmlstring],
install_requires=CORE_REQUIREMENTS,
classifiers=CLASSIFIERS,
long_description=LONGDESC,
long_description_content_type=LONGDESC_CTYPE,
keywords=KEYWORDS,
)