Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cut multiple line segments #262

Open
alexander-winkler opened this issue Jun 24, 2021 · 2 comments
Open

cut multiple line segments #262

alexander-winkler opened this issue Jun 24, 2021 · 2 comments
Labels
Priority: Low Type: Enhancement Indicates an enhancement proposal for an existing feature

Comments

@alexander-winkler
Copy link

Hello!

This is a small feature request originating from my work with OCR4all/LAREX:

Line segmentation isn't always perfect. For some reason (maybe this can be avoided tweaking the preferences) a bunch of lines gets not segmented properly, for example:

cut_multiple_line_region

As this happens rather often, drawing new rectangles and adding them to the reading order can become time-consuming, so I was wondering if you could add something like the cut line function (cut_function) in the Segments mode to the Lines mode as well.

Possible behaviour:

  1. Select cut tool
  2. Select multi-line TextLine-Element
  3. Draw one or multiple more or less horizontal lines that cut the entire TextLine-Element
  4. Add newly created elements to reading order

A similar function for vertical segmentation would be useful as well, but reorganizing the reading order is definitely more difficult.

Thank you!

@maxnth maxnth added the Type: Enhancement Indicates an enhancement proposal for an existing feature label Jun 24, 2021
@maxnth
Copy link
Member

maxnth commented Jun 24, 2021

Hi,

Line segmentation isn't always perfect. For some reason (maybe this can be avoided tweaking the preferences) a bunch of lines gets not segmented properly, for example:

Line segmentation in OCR4all isn't really implemented optimally at the moment and while – as you said – one can often improve the results with parameter tweaking this doesn't work always. The upcoming release of OCR4all will feature refactored code for the line segmentation and will hopefully improve the line segmentation.

As this happens rather often, drawing new rectangles and adding them to the reading order can become time-consuming, so I was wondering if you could add something like the cut line function in the Segments mode to the Lines mode as well.

Would the subtract rectangle / subtract polygon work for your use case (see video)?

Peek.2021-06-24.17-25.mp4

I just quickly looked into adding a cut-from-line (instead of rectangle / polygon) feature into LAREX but Paper.js doesn't seem to like intersecting / dividing open paths like lines and closed paths like polygons (the cut function in Edit and Segments doesn't work "on the fly" via Paper.js but through the backend) but I'm probably just missing something so this might still get added as soon as I figure it out.

Add newly created elements to reading order

Great idea, I guess adding a toggle for that would make a lot of sense for the current subtraction features as well.

A similar function for vertical segmentation would be useful as well, but reorganizing the reading order is definitely more difficult.

Ordering the newly created segments (through subtraction or division) by lowest x or y coordinate (determined by the state of the added toggle) might probably work for most vertical / horizontal segmentation, wouldn't it?

@alexander-winkler
Copy link
Author

Hello!

Would the subtract rectangle / subtract polygon work for your use case (see video)?

This could work for series of use cases, I guess. Thanks for this idea! One will probably have to adjust the two resulting polygons, but that is not terribly cumbersome. Maybe one might add a polygon reduce function. If not closed, you could automatically add for each point x/y a point x/y-1px, thus mimicking a cut function that is not implemented in Paper.js.

Add newly created elements to reading order

Great idea, I guess adding a toggle for that would make a lot of sense for the current subtraction features as well.

Very much in favour of this idea!

A similar function for vertical segmentation would be useful as well, but reorganizing the reading order is definitely more difficult.

Ordering the newly created segments (through subtraction or division) by lowest x or y coordinate (determined by the state of the added toggle) might probably work for most vertical / horizontal segmentation, wouldn't it?

I'm not sure how this would work out on a skewed page with two-column layout. In any case, one could also think of a possibility of moving multiple lines in the reading order batchwise (select group of lines, move them to a specific position in the reading order). More generally, however, I would advocate for a "redo reading order" function. When I add several new lines, it would be easier to have the reading order recognized once again instead of manually adding the new lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low Type: Enhancement Indicates an enhancement proposal for an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants