Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lxmltreeserializer str versus list #1020

Closed
skinkie opened this issue Apr 22, 2024 · 5 comments · Fixed by #1022 or #1023
Closed

Lxmltreeserializer str versus list #1020

skinkie opened this issue Apr 22, 2024 · 5 comments · Fixed by #1022 or #1023
Labels
bug Something isn't working

Comments

@skinkie
Copy link
Contributor

skinkie commented Apr 22, 2024

I am trying out #975.

import glob

from xsdata.formats.dataclass.context import XmlContext
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.config import ParserConfig
from xsdata.formats.dataclass.parsers.handlers import LxmlEventHandler, lxml
from xsdata.formats.dataclass.serializers import LxmlTreeSerializer

from netex import ServiceFrame

def conversion(input_filename: str, output_filename: str):
    context = XmlContext()
    config = ParserConfig(fail_on_unknown_properties=False)
    parser = XmlParser(context=context, config=config, handler=LxmlEventHandler)
    tree = lxml.etree.parse(input_filename)

    service_frame: ServiceFrame
    service_frame = parser.parse(tree.find(".//{http://www.netex.org.uk/netex}ServiceFrame"), ServiceFrame)

    lxml_serializer = LxmlTreeSerializer()
    element = tree.find(".//{http://www.netex.org.uk/netex}ServiceFrame")
    element.getparent().replace(element, lxml_serializer.render(service_frame))

    tree.write(output_filename, pretty_print=True, strip_text=True)

if __name__ == '__main__':
    for input_filename in glob.glob("/tmp/NeTEx_WSF_WSF_20240415_20240415.xml.gz"):
        print(input_filename)
        output_filename = input_filename.replace('/tmp/', 'netex-output-epip/')
        conversion(input_filename, output_filename)

The writing fails on the lxml_serializer.render, which complains with:

Traceback (most recent call last):
  File "/home/skinkie/Sources/reference/gtfs-netex-test/test-lxml.py", line 36, in <module>
    conversion(input_filename, output_filename)
  File "/home/skinkie/Sources/reference/gtfs-netex-test/test-lxml.py", line 28, in conversion
    element.getparent().replace(element, lxml_serializer.render(service_frame))
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/skinkie/Sources/reference/venv/lib/python3.11/site-packages/xsdata/formats/dataclass/serializers/tree/lxml.py", line 21, in render
    self.build(obj, builder)
  File "/home/skinkie/Sources/reference/venv/lib/python3.11/site-packages/xsdata/formats/dataclass/serializers/tree/mixins.py", line 55, in build
    builder.end(*element)
  File "src/lxml/saxparser.pxi", line 848, in lxml.etree.TreeBuilder.end
  File "src/lxml/saxparser.pxi", line 780, in lxml.etree.TreeBuilder._handleSaxEnd
  File "src/lxml/saxparser.pxi", line 749, in lxml.etree.TreeBuilder._flush
TypeError: sequence item 0: expected str instance, list found

If it can parse the code it, must be able to write it right?

With XmlTreeSerializer serialisation works.

File can be downloaded from: https://data.ndovloket.nl/netex/wsf/NeTEx_WSF_WSF_20240415_20240415.xml.gz

I have also found other bugs with rendering via lxmltreeserializer, where some elements just don't get rendered, their element name is added, but for example not their attributes.

@tefra tefra added the bug Something isn't working label Apr 23, 2024
@tefra
Copy link
Owner

tefra commented Apr 23, 2024

Thanks for reporting @skinkie

The tree serializer, wasn't encoding values before feeding them to the tree builders, stuff like xs:NMTOKENS was failing.

@skinkie
Copy link
Contributor Author

skinkie commented Apr 23, 2024

Thanks for reporting @skinkie

The tree serializer, wasn't encoding values before feeding them to the tree builders, stuff like xs:NMTOKENS was failing.

I'll test if the rest works now too.

@skinkie
Copy link
Contributor Author

skinkie commented Apr 23, 2024

@tefra I think it is worse now, now all attributes are gone.

@skinkie
Copy link
Contributor Author

skinkie commented Apr 23, 2024

            <ServiceJourneyPattern>
              <RouteRef></RouteRef>
              <DirectionRef></DirectionRef>
              <DestinationDisplayRef></DestinationDisplayRef>
              <pointsInSequence>
                <StopPointInJourneyPattern>
                  <ScheduledStopPointRef></ScheduledStopPointRef>
                  <OnwardTimingLinkRef></OnwardTimingLinkRef>
                  <IsWaitPoint>true</IsWaitPoint>
                </StopPointInJourneyPattern>
                <StopPointInJourneyPattern>
                  <ScheduledStopPointRef></ScheduledStopPointRef>
                </StopPointInJourneyPattern>
              </pointsInSequence>
            </ServiceJourneyPattern>

What I expect:

            <ServiceJourneyPattern id="WSF:ServiceJourneyPattern:B-V" version="1">
              <RouteRef version="1" ref="WSF:Route:B-V"/>
              <DirectionRef version="1" ref="OPENOV:Direction:outbound"/>
              <DestinationDisplayRef version="1" ref="WSF:DestinationDisplay:V"/>
              <pointsInSequence>
                <StopPointInJourneyPattern id="WSF:StopPointInJourneyPattern:B-V-B" version="1" order="1">
                  <ScheduledStopPointRef version="1" ref="WSF:ScheduledStopPoint:B"/>
                  <OnwardTimingLinkRef version="1" ref="WSF:TimingLink:B-V"/>
                  <IsWaitPoint>true</IsWaitPoint>
                </StopPointInJourneyPattern>
                <StopPointInJourneyPattern id="WSF:StopPointInJourneyPattern:B-V-V" version="1" order="2">
                  <ScheduledStopPointRef version="1" ref="WSF:ScheduledStopPoint:V"/>
                </StopPointInJourneyPattern>
              </pointsInSequence>
            </ServiceJourneyPattern>

@tefra
Copy link
Owner

tefra commented Apr 23, 2024

Yeap I forgot to encode the attrs as well @skinkie, next one, please open new issues please, with simple examples that I can quickly reproduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants