Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle newlines in the middle of a name #2107

Closed
wants to merge 1 commit into from

Conversation

VirginiaDooley
Copy link
Contributor

@VirginiaDooley VirginiaDooley commented Jun 5, 2023

co-authored with @Bekabyx

Closes #2106

While my initial fix was more of any improvement for this specific SOPN, in the process of review, @Bekabyx discovered it was actually camelot that is responsible for adding new lines. At her suggestion, I've moved the code to strip newlines from the SOPN earlier in the process, (we don't store addresses so as long as this is the case, this change does not post any issues, AFAIK) into the extract table step. I've updated tests to reflect this change to the process.

Test locally by uploading the SOPN for https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.west-lancashire.rural-south.2023-06-22/ and parse. Names should be formatted as expected.

@VirginiaDooley VirginiaDooley force-pushed the hotfix/fix-surname-parsing branch 2 times, most recently from 130924b to 3dc0e16 Compare June 6, 2023 10:15
@VirginiaDooley VirginiaDooley marked this pull request as ready for review June 6, 2023 10:15
@VirginiaDooley VirginiaDooley requested review from symroe and Bekabyx June 6, 2023 10:15
@VirginiaDooley
Copy link
Contributor Author

@VirginiaDooley VirginiaDooley force-pushed the hotfix/fix-surname-parsing branch from e6442f0 to 16ab6fa Compare June 26, 2023 15:29
Copy link
Contributor

@Bekabyx Bekabyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working as expected on my local machine, SOPN is parsing correctly and all names + party descriptions look fine. As long as we're cool with not having newlines in addresses I'm happy for this to merge 😁

@VirginiaDooley
Copy link
Contributor Author

@VirginiaDooley VirginiaDooley force-pushed the hotfix/fix-surname-parsing branch from 16ab6fa to 54285c2 Compare June 29, 2023 13:43
@VirginiaDooley
Copy link
Contributor Author

VirginiaDooley commented Jun 29, 2023

@symroe Based on SOPN testing (updated results in Trello card), there was a decrease in people parsed as a result of this change. I think we should not merge this work and instead focus parsing improvements in conjunction with the spike into AWS textract.

@VirginiaDooley
Copy link
Contributor Author

VirginiaDooley commented Jul 3, 2023

Closing based on #2107 (comment) and opening https://trello.com/b/N1TKOKQ5/dc-icebox?search_id=e2ecd086-9537-40f5-bac0-53a66235b90f. This solution proved to be too nuclear and led to some missed matches with parties. It's still worthwhile knowing about these issues so we can use them in testing in the spike.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing failure on surname
2 participants