Skip to content

Commit

Permalink
chrysanthème
Browse files Browse the repository at this point in the history
  • Loading branch information
jlmeunier committed Nov 21, 2019
1 parent e01af44 commit 1ae8878
Show file tree
Hide file tree
Showing 233 changed files with 20,814 additions and 4,995 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
.cache/
.settings/
__pycache__/
*.bak
3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
BSD 3-Clause License

Copyright (c) 2016, Transkribus
Copyright (c) 2016-2019, NAVER LABS Europe

All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
# TranskribusDU
Document Understanding tools

### Requirements, installation & testing
Updated: 2019-11-20

### Requirements, installation

#### Python

* Install [Python] 3.x

We recommend installing __anaconda3__ . You can then train using pystruct and/or tensorflow (both to be installed on top of anaconda).

* conda install shapely rtree
* conda install shapely rtree lxml scipy
* pip install future scikit-learn pytest --upgrade

To learn with pystruct (using a graph-CRF model):
* pip install cvxopt ad3 pystruct --upgrade

To learn with Tensorflow (using an Edge Convolutional Network):
* conda install -c anaconda tensorflow(-gpu)
* pip install future lxml scipy scikit-learn pytest cvxopt ad3 pystruct --upgrade

### Usage
* see use-cases
Expand Down
50 changes: 32 additions & 18 deletions RELEASE_NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,36 @@
RELEASE NOTES - TranskribusDU
-----------------------------

--- Edelweiss - 2019-07-22
- ECN/GAT
- conjugate
- --g1 --g2
- table understanding
--- Chrysanthème - 2019-11-21
- ICDAR19 papers are reproducible
- major code reorganisation
- Multipage XML bug fixes
- standard projection profile method
- convex hull for cluster Coords
- ECN ensemble bug fix
- various bug fixes
- --server mode
- segmentation task using agglomerative clustering
- Json input
- pipe example
- table reconstruction
- generic features (when no page info)
- edge features reworked
- cluster evaluation metrics


--- Iris - 2019-04-25
- CRF, ECN GAT supported
- conjugate mode supported
- --vld option to specify a validation set, or a ratio of validation graphs
taken from the training set. The best model on validation set is kept.
- --graph option to store the edges in the output XML
- --max_iter applies to all learning methods
- --seed to seed the randomizer with a constant
- dynamic load of the learners
- major code re-organization
- for example of use, see in tasks: DU_TABLE_BIO.py or DU_Table_Row_Edge.py


--- Jonquille - 2017-04-28
- multi-type classification supported
Expand All @@ -30,21 +55,10 @@


---------------------------------------------------------------------------------
Copyright (C) 2016, 2017 H. Déjean, JL Meunier

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Copyright (C) 2016-2019 H. Déjean, JL Meunier

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

Developed for the EU project READ. The READ project has received funding
from the European Union's Horizon 2020 research and innovation programme
under grant agreement No 674943.
under grant agreement No 674943.
11 changes: 7 additions & 4 deletions TranskribusDU/ObjectModel/XMLDSBASELINEClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,10 @@ def computePoints(self):
if self.lPoints is None:
self.lPoints = self.getAttribute('blpoints')
# print 'after split?',self.lPoints

self.lPoints = self.lPoints.replace(" ",",")
if self.lPoints is not None:
lX = list(map(lambda x:float(x),self.lPoints.split(',')))[0::2]
lY = list(map(lambda x:float(x),self.lPoints.split(',')))[1::2]
self.lPoints = list(zip(lX,lY))
self.lPoints = zip(lX,lY)
# lY.sort()
# if len(lY)> 10: ## if basline automatically generated: beg and end noisy
# lY= lY[1:-2]
Expand All @@ -67,9 +65,14 @@ def computePoints(self):
import numpy as np
a,b = np.polyfit(lX, lY, 1)
self.setAngle(a)
self.setBx(b)
# ymax = a * self.getX2() +b
# ymin = a*self.getX() + b
# import libxml2
# verticalSep = libxml2.newNode('PAGEBORDER')
# verticalSep.setProp('points', '%f,%f,%f,%f'%(self.getX(),ymin,self.getX2(),ymax))
# # print 'p',self.getParent()
# # print 'pp',self.getParent().getParent()
# self.getParent().getNode().addChild(verticalSep)

"""
TO simulate 'DS' objects
Expand Down
23 changes: 1 addition & 22 deletions TranskribusDU/ObjectModel/XMLDSCELLClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,7 @@
a class for table cell from a XMLDocument
READ project
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Developed for the EU project READ. The READ project has received funding
Expand Down Expand Up @@ -96,17 +85,7 @@ def fromDom(self,domNode):
self.setNode(domNode)
# get properties
for prop in domNode.keys():
try:
self.addAttribute(prop,domNode.get(prop))
if prop =='x': self._x= float(domNode.get(prop))
elif prop =='y': self._y = float(domNode.get(prop))
elif prop =='height': self._h = float(domNode.get(prop))
elif prop =='width': self.setWidth(float(domNode.get(prop)))
except:
self._x=-1
self._y=-1
self._h=0
self._w=0
self.addAttribute(prop,domNode.get(prop))

self.setIndex(int(self.getAttribute('row')),int(self.getAttribute('col')))

Expand Down
12 changes: 7 additions & 5 deletions TranskribusDU/ObjectModel/XMLDSGRAHPLINEClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,9 @@ def computePoints(self):
self.lPoints = self.getAttribute('points')
# print 'after split?',self.lPoints
if self.lPoints is not None:
lX=[float(x) for p in self.lPoints.split(' ') for x in p.split(',')[0::2]]
# lX = list(map(lambda x:float(x),self.lPoints.split(',')))[0::2]
lY = [float(x) for p in self.lPoints.split(' ') for x in p.split(',')[1::2]]
self.lPoints = list(zip(lX,lY))
lX = list(map(lambda x:float(x),self.lPoints.split(',')))[0::2]
lY = list(map(lambda x:float(x),self.lPoints.split(',')))[1::2]
self.lPoints = zip(lX,lY)
try:
self.avgY = 1.0 * sum(lY)/len(lY)
except ZeroDivisionError:
Expand All @@ -59,6 +58,10 @@ def computePoints(self):
# self.setAngle(a)
# ymax = a * self.getX2() +b
# ymin = a*self.getX() + b
# import libxml2
# verticalSep = libxml2.newNode('PAGEBORDER')
# verticalSep.setProp('points', '%f,%f,%f,%f'%(self.getX(),ymin,self.getX2(),ymax))
# self.getParent().getNode().addChild(verticalSep)

"""
TO simulate 'DS' objects
Expand All @@ -76,7 +79,6 @@ def getWidth(self): return abs(float(self.getAttribute('width')))


def setPoints(self,lp): self.lPoints = lp
def getPoints(self): return self.lPoints

def fromDom(self,domNode):
"""
Expand Down
13 changes: 5 additions & 8 deletions TranskribusDU/ObjectModel/XMLDSLINEClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,13 @@ def fromDom(self,domNode):
self.setNode(domNode)

# get properties
# for prop in domNode.keys():
# self.addAttribute(prop,domNode.get(prop))

for prop in domNode.keys():
self.addAttribute(prop,domNode.get(prop))
if prop =='x': self._x= float(domNode.get(prop))
elif prop =='y': self._y = float(domNode.get(prop))
elif prop =='height': self._h = float(domNode.get(prop))
elif prop =='width': self.setWidth(float(domNode.get(prop)))


# ctxt = domNode.doc.xpathNewContext()
# ctxt.setContextNode(domNode)
# ldomElts = ctxt.xpathEval('./%s'%(ds_xml.sTEXT))
# ctxt.xpathFreeContext()
ldomElts = domNode.findall('./%s'%(ds_xml.sTEXT))
for elt in ldomElts:
myObject= XMLDSTEXTClass(elt)
Expand Down
Loading

0 comments on commit 1ae8878

Please sign in to comment.