Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace between headers #6

Open
bradenneal1 opened this issue Aug 31, 2020 · 4 comments
Open

Whitespace between headers #6

bradenneal1 opened this issue Aug 31, 2020 · 4 comments

Comments

@bradenneal1
Copy link

The regular expression MESSAGE_REGEX does not allow whitespace (or newlines) between each header. For example, if the test MESSAGE_1 is defined as:

MESSAGE1 = """{1:F01ASDFJK20AXXX0987654321}
{2:I103ASDFJK22XXXXN}
{4: :20:20180101-ABCDEF :23B:GHIJ :32A:180117CAD5432,1 :33B:EUR9999,0 :50K:/123456-75901 SOMEWHERE New York 999999 GR :53B:/20100213012345 :57C://SC200123 :59:/201001020 First Name Last Name a12345bc6d789ef01a23 Nowhere NL :70:test reference test reason payment group: 1234567-ABCDEF :71A:SHA :77B:Test this
-}"""

It does not parse:

>>> import mt103
>>> message = mt103.MT103(MESSAGE1)
>>> message.text
>>>

Redefining the regex to accept whitespace characters between headers:

MESSAGE_REGEX = re.compile(
    r"^"
    r"({1:(?P<basic_header>[^}]+)})?\s*"
    r"({2:(?P<application_header>(I|O)[^}]+)})?\s*"
    r"({3:"
        r"(?P<user_header>"
            r"({113:[A-Z]{4}})?"
            r"({108:[A-Z 0-9]{0,16}})?"
            r"({111:[0-9]{3}})?"
            r"({121:[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-4[a-zA-Z0-9]{3}-[89ab][a-zA-Z0-9]{3}-[a-zA-Z0-9]{12}})?\s*"  # NOQA: E501
        r")"
    r"})?"
    r"({4:\s*(?P<text>.+?)\s*-})?\s*"
    r"({5:(?P<trailer>.+)})?"
    r"$",
    re.DOTALL
)

solves the issue

>>> import mt103
>>> message = mt103.MT103(MESSAGE1)
>>> message.text
:20:20180101-ABCDEF :23B:GHIJ :32A:180117CAD5432,1 :33B:EUR9999,0 :50K:/123456-75901 SOMEWHERE New York 999999 GR :53B:/20100213012345 :57C://SC200123 :59:/2010
01020 First Name Last Name a12345bc6d789ef01a23 Nowhere NL :70:test reference test reason payment group: 1234567-ABCDEF :71A:SHA :77B:Test this
@danielquinn
Copy link
Owner

I'm not sure about this one. Is there somewhere in the spec that says it's ok to have newlines in these locations and not others? Your suggested changes are simple enough, and making your suggested changes does indeed mean that you can parse a message with new lines in it, but I'm not clear on whether the mt103 message in question is valid with new lines in it, or that your suggested placements for new lines represents all the cases where this would be a problem. Do you have a spec I can reference for confirmation?

I ask because the placement of the \s* bits seems strangely arbitrary. You've got one after every section except 5, and they only appear after a header but not between sections

If this is valid:

{1:F01ASDFJK20AXXX0987654321}
{2:I103ASDFJK22XXXXN}
{4: :20:20180101-ABCDEF :23B:GHIJ :32A:180117CAD5432,1 :33B:EUR9999,0 :50K:/123456-75901 SOMEWHERE New York 999999 GR :53B:/20100213012345 :57C://SC200123 :59:/201001020 First Name Last Name a12345bc6d789ef01a23 Nowhere NL :70:test reference test reason payment group: 1234567-ABCDEF :71A:SHA :77B:Test this
-}

Is this not?

{1:F01ASDFJK20AXXX0987654321}
{2:I103ASDFJK22XXXXN}
{
4: :20:20180101-ABCDEF :23B:GHIJ :32A:180117CAD5432,1 :33B:EUR9999,0 :50K:/123456-75901 SOMEWHERE New York 999999 GR :53B:/20100213012345 :57C://SC200123 :59:/201001020 First Name Last Name a12345bc6d789ef01a23 Nowhere NL :70:test reference test reason payment group: 1234567-ABCDEF :71A:SHA :77B:Test this
-}

Might it be better to just message.replace("\n", "") before parsing it, or is that likely to break things elsewhere? Until I'm certain, I'm not keen on making this change. If you have something I can reference to be sure, that'd go a long way toward helping me figure this out.

@bradenneal
Copy link

I don't have a specification to provide unfortunately.

I initially was using message.replace("\n", ""), but became unstuck when parsing tags which contain more than 1 component. For example, if the above message was formatted:

{1:F01ASDFJK20AXXX0987654321}
{2:I103ASDFJK22XXXXN}
{4:
:20:20180101-ABCDEF
:23B:GHIJ
:32A:180117CAD5432,1
:33B:EUR9999,0
:50K:/123456-75901
SOMEWHERE
New York
999999
GR
:53B:/2010021301234
:57C://SC200123
:59:/201001020 
First Name Last Name
a12345bc6d789ef01a23
Nowhere
NL
:70:test reference
test reason
payment group:
1234567-ABCDEF
:71A:SHA
:77B:Test this
-}

Both 50K and 59 tags follow a format of Account, Name1, Name2, Address, City/Postal Code. With the newline characters removed, there is no way to determine where "Account" finishes and "Name1" starts etc. Keeping the newlines (and making the parser newline insensitive) allows message.ordering_customer.split('\n') to identify the individual components.

@bradenneal
Copy link

You've got one after every section except 5

That's an oversight on my behalf. I would consider a message with trailing whitespace still valid (but have simply never seen one)

@danielquinn
Copy link
Owner

Alright I've had a conversation with some more financially-minded (as opposed to software like me) -people and it looks like line breaks are common in a message, so I'm going to make this change.

Do you perhaps have a few test messages I can use to ensure that everything works as-expected? All of the messages I have access to have no line breaks.

Repository owner deleted a comment from studentforcode Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@danielquinn @bradenneal @bradenneal1 and others