Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Réformes constitutionnelles not handled #7

Open
RouxRC opened this issue May 12, 2018 · 9 comments
Open

Réformes constitutionnelles not handled #7

RouxRC opened this issue May 12, 2018 · 9 comments

Comments

@RouxRC
Copy link

RouxRC commented May 12, 2018

Hello, I'm trying to run DuraLex on the constitutional reform:

duralex --url http://www.assemblee-nationale.fr/15/projets/pl0911.asp 

But "Constitution" was not handled as a detected law/code yet, so I tried to fix it adding into duralex/alinea_parser.py line 239:

    # de la Constitution                                                                                                                                                                                                                  
    elif i + 4 < len(tokens) and tokens[i + 4].lower() == u'constitution':
        i += 4
        node['lawType'] = 'constitution'
        node['id'] = 'constitution_du_4_octobre_1958' 

Debugging indicates it does the trick with a few articles, unfortunately it then breaks on a maximum recursion depth exception which I do not understand :(

Traceback (most recent call last):
  File "/home/roux/.pyenv/versions/duralex/bin/duralex", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/home/roux/dev/DuraLex/bin/duralex", line 118, in <module>
    sys.exit(main())
  File "/home/roux/dev/DuraLex/bin/duralex", line 107, in main
    handle_data(data, args)
  File "/home/roux/dev/DuraLex/bin/duralex", line 77, in handle_data
    SwapDefinitionAndReferenceVisitor().visit(tree)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 93, in visit
    self.visit_node(node)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 87, in visit_node
    self.visit_node(child)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 87, in visit_node
    self.visit_node(child)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 87, in visit_node
    self.visit_node(child)
  [Previous line repeated 990 more times]
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 83, in visit_node
    self.visitors[node['type']](node, False)
RecursionError: maximum recursion depth exceeded
@Seb35
Copy link
Member

Seb35 commented May 22, 2018

There were 2 additionnal issues on this text after the Constitution case is handled. Specifically about the Constitution, I find it more sensible to add it in the parse_code_reference method since, similarly to a code, the Constitution is a (natively-) consolidated text and has no ID number. I added the following code in parse_code_reference:

    # de la Constitution
    elif i + 4 < len(tokens) and tokens[i + 4].lower() == u'constitution':
        node['id'] = 'constitution du 4 octobre 1958'
        i = alinea_lexer.skip_to_token(tokens, i, tokens[i+4]) + 2
    # la Constitution
    elif i + 2 < len(tokens) and tokens[i + 2].lower() == u'constitution':
        node['id'] = 'constitution du 4 octobre 1958'
        i = alinea_lexer.skip_to_token(tokens, i, tokens[i+2]) + 2

It could also be added a specific case for the Constitution with a new constant in duralex/tree.py, for instance TYPE_CONSTITUTION_REFERENCE = u'constitution-reference'.


The other issues are probably some forms of sentences:

  1. In the article 11 of the constitutional law, the part "Au sixième alinéa de l'article 16, à l'article 54, au deuxième alinéa de l'article 61, et au dernier alinéa de l'article 88-6 de la Constitution" is not handled, it works with a unique article "Au sixième alinéa de l'article 16 de la Constitution"
  2. In the article 13 of the constitutional law, the part "Les articles 68-1 à 68-3 de la Constitution" is not handled, it works with a unique article.

As of now I don’t understand exactly why these unhandled cases imply an indefinite recursion.

@Seb35
Copy link
Member

Seb35 commented Jun 29, 2018

La récursion infinie est corrigée par 46b6e66. Certaines syntaxes ne sont pas encore reconnues, travail en cours -- à la minute où j’écris, il y a 8/18 articles de la pjl qui passent entièrement.

@Seb35
Copy link
Member

Seb35 commented Jun 29, 2018

And the commit for the Constitution was this previous one 6ffe532. As of now, this pjl is recognized at 11/18 (we work in the git branch 'peg').

@Seb35
Copy link
Member

Seb35 commented Jun 30, 2018

Now 13/18

@Seb35
Copy link
Member

Seb35 commented Jul 16, 2018

I copy here the details of correct and incorrect articles in DuraLex:

✓ article 1
✓ article 2
✓ article 3 - fun fact: this article was removed during commission work https://twitter.com/ApffelArnaud/status/1012607263914242048
✓ article 4
✓ article 5
    article 6 (1° et 2° are correct, issue on 3°)
    article 7 (1° is correct, issue on 2°)
✓ article 8
✓ article 9
✓ article 10
    article 11 (expression "au dernier alinéa de l’article 88-6" unrecognised + no word-reference)
✓ article 12
    article 13 (expression "articles 68-1 à 68-3" unrecognised)
✓ article 14
✓ article 15
✓ article 16
    article 17 (a lot of issues)
✓ article 18

@Seb35
Copy link
Member

Seb35 commented Jul 16, 2018

I’ve just tested SedLex generated diffs with DuraLex output, article by article (there are some fatal errors in SedLex if you try on the entire DuraLex output). With a small SedLex patch applied (and pushed 46cff7b), there are 8 perfect articles and 5 partially perfect articles, 2 half-good articles. When DuraLex bad outputs are removed, there are 5 articles whose the issues are SedLex-related.

By the way, Archéo Lex seems to have an issue to create one-article-per-file repository, I’ve used an old Archéo Lex version, and to match SedLex conventions I renamed the generated folder to 'constitution'.

Perfect diff in SedLex:

  • Article 1
  • Article 3 - I-1° and II
  • Article 4
  • Article 5
  • Article 6 - 1° and 2°
  • Article 7 - 1°
  • Article 8 (just a typo, a comma with a space before)
  • Article 9 (just a typo, a comma with a space before)
  • Article 10 (just a typo, it should not be introduced an empty line when a line is deleted)
  • Article 15 - 2°
  • Article 16 (some typo to improve, put some empty lines between each alinea to be Markdown-compliant)
  • Article 18 - 1° (without object for SedLex, it is not an amendment, even if we could improve DuraLex/SedLex to understand (partial) entries into force)
  • Article 18 - 2° (without object for SedLex, it is not an amendment)

Partially correct diff in SedLex:

  • Article 6 - 3° (small issue in DuraLex without effect here, but small issue in SedLex)
  • Article 7 - 2° (small issue in DuraLex without effect here, but small issue in SedLex)

Don’t work at all in SedLex

  • Article 2 (good DuraLex)
  • Article 3 - I-2° (good DuraLex)
  • Article 11 (bad DuraLex, fatal error in SedLex)
  • Article 12 (good DuraLex)
  • Article 13 (bad DuraLex, it is needed to add articles ranges in DuraLex (and other types of ranges))
  • Article 14 (good DuraLex, fatal error in SedLex)
  • Article 15 - 1° (good DuraLex)
  • Article 17 (very bad DuraLex, fatal error in SedLex)

@Seb35
Copy link
Member

Seb35 commented Jul 18, 2018

Improved some parts of SedLex. I will update this comment when SedLex will be further improved (to keep the original state).

Perfect diff in SedLex (12 perfect + 2 partially perfect):

  • Article 1
  • Article 2
  • Article 3
  • Article 4
  • Article 5
  • Article 6 - 1° and 2°
  • Article 7 - 1°
  • Article 8
  • Article 9
  • Article 10
  • Article 12
  • Article 15
  • Article 16
  • Article 18 - 1° (without object for SedLex, it is not an amendment, even if we could improve DuraLex/SedLex to understand (partial) entries into force)
  • Article 18 - 2° (without object for SedLex, it is not an amendment)

Partially correct diff in SedLex (2):

  • Article 6 - 3° (small issue in DuraLex, which imply a mistake in SedLex output, SedLex itself is correct if the DuraLex tree is fixed)
  • Article 7 - 2° (small issue in DuraLex, which imply a mistake in SedLex output, SedLex itself is correct if the DuraLex tree is fixed)

Don’t work at all in SedLex (4):

  • Article 11 (bad DuraLex, fatal error in SedLex)
  • Article 13 (bad DuraLex, it is needed to add articles ranges in DuraLex (and other types of ranges))
  • Article 14 (good DuraLex, fatal error in SedLex)
  • Article 17 (very bad DuraLex, fatal error in SedLex)

@RouxRC
Copy link
Author

RouxRC commented Jul 22, 2018

Good job, nearly there ! ;)

@Seb35
Copy link
Member

Seb35 commented Jul 23, 2018

SedLex now gracefully (and partially) fail instead of triggerring an exception, hence the DuraLex tree from this constitutionnal law can be tested with SedLex (without splitting it by articles). Fatal errors are replaced by a property 'error' in the node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants