Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGIS reads shp with 11 character attribute header #58152

Closed
2 tasks done
pigreco opened this issue Jul 17, 2024 · 11 comments
Closed
2 tasks done

QGIS reads shp with 11 character attribute header #58152

pigreco opened this issue Jul 17, 2024 · 11 comments
Labels
Won't fix By design, or won't be fixed for some other reason

Comments

@pigreco
Copy link
Contributor

pigreco commented Jul 17, 2024

What is the bug or the crash?

A colleague created, using matlab, an empty shapefile (only with one line with NULL attributes) and with an attribute with a header with 11 characters (consistency)
QGIS reads it and also allows you to add features, while ArcGIS PRO reads it but truncates the header at the tenth character.

I'm reporting this because it could be a bug!

image

image

Steps to reproduce the issue

  1. load the shapefile into QGIS
  2. and look at the attribute properties

Versions

OSGeo4W QGIS 3.34.8

Supported QGIS version

  • I'm running a supported QGIS version according to the roadmap.

New profile

Additional context

header11.zip

@pigreco pigreco added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Jul 17, 2024
@agiudiceandrea
Copy link
Contributor

agiudiceandrea commented Jul 17, 2024

The DBF format uses 11 bytes to store a 0 terminated string as a field name, so the field name is actually up to 10 characters. It looks like matlab doesn't add the 0 as ending byte, allowing to use all the 11 characters to store the field name.

I don't think this is actually an issue. Anyway the OGR provider sanitizes the fields name to 10 characters max when exporting to the ESRI Shapefile format.
Ping @rouault.

@rouault
Copy link
Contributor

rouault commented Jul 17, 2024

Yes, this is the intended behavior. Cf following constants of the shapelib library:

/* Shapelib read up to 11 characters, even if only 10 should normally be used */
#define XBASE_FLDNAME_LEN_READ 11
/* On writing, we limit to 10 characters */
#define XBASE_FLDNAME_LEN_WRITE 10

@rouault rouault closed this as completed Jul 17, 2024
@rouault rouault removed the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Jul 17, 2024
@aborruso
Copy link

Yes, this is the intended behavior.

Thank you @rouault . Do you know why this choice was made, to object only to one character and not, for example, 3?

@rouault
Copy link
Contributor

rouault commented Jul 18, 2024

Do you know why this choice was made, to object only to one character and not, for example, 3?

Answered by @agiudiceandrea in #58152 (comment)

@aborruso
Copy link

Answered by @agiudiceandrea in #58152 (comment)

Thank you.

We could introduce an additional check during reading. If the eleventh byte is a null character (\0), treat it as the standard terminator. Otherwise, raise an error indicating that the field name is not correctly terminated.
Therefore, it would be acceptable to have 11 characters only if the eleventh character is the \0 terminator.

What do you think about

@rouault
Copy link
Contributor

rouault commented Jul 18, 2024

Therefore, it would be acceptable to have 11 characters only if the eleventh character is the \0 termin

The field name is encoded on 11 bytes: not less, not more. So it is nominally 10 characters or less + a \0 terminator (or the non-standard situation of 11 characters without a \0 terminator)

@aborruso
Copy link

or the non-standard situation of 11 characters without a \0 terminator

Please don't hate me: and why was it decided here that the software must accept a nonstandard situation, since it is easily mapped as wrong?
I probably do not understand you, because I am not technically prepared. And I apologize for that.

@rouault
Copy link
Contributor

rouault commented Jul 18, 2024

and why was it decided here that the software must accept a nonstandard situation

I don't know. Probably the common logic: "be lax on reading, and strict on writing". This dates back to the initial revision of shapelib 29 years ago: OSGeo/shapelib@4af7724#diff-dad95b297ecd1ec47ff997ebfc36e4be7104f688b01954f704f0393a58ba5e09R543

@aborruso
Copy link

But if it is 12 characters is it always lax on reading? 🙃

Okay, I'll stop here. For me it is something that should be changed, because the case is left open and it fits, because it may be 10 or 11, for the reasons said, but it is not mapped correctly.

Thank you for your time

@nyalldawson nyalldawson added the Won't fix By design, or won't be fixed for some other reason label Jul 18, 2024
@pigreco
Copy link
Contributor Author

pigreco commented Jul 18, 2024

Thanks @agiudiceandrea for the reply and thanks @rouault for your time and thanks @aborruso for the time you wasted (it's wasted at this point)
I would just like to understand why Nyall put the label Won't fix, what does this label mean?

@nyalldawson
Copy link
Collaborator

It means there'll be no action taken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Won't fix By design, or won't be fixed for some other reason
Projects
None yet
Development

No branches or pull requests

5 participants