Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test limited API #149

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Test limited API #149

wants to merge 1 commit into from

Conversation

marcelm
Copy link
Owner

@marcelm marcelm commented Nov 12, 2024

After watching
https://ep2024.europython.eu/session/cython-and-the-limited-api/, I wanted to test how well switching to the stable ABI would work.

It would be great to be able to build a single wheel for each platform and not have to update when a new Python version gets released.

There are two issues. The first is that using the limited API makes dnaio slower. Reading FASTQ files takes 50% more time. I did not benchmark anything else. It was actually about 10 time slower when I used the limited API for Python 3.7. (This PR uses what is available in Python 3.12.) Maybe this will get better in newer versions.

The second issue is that still some non-ABI3 symbols are being used in our code, in particular PyUnicode_New.

abi3audit (https://github.com/pypa/abi3audit) outputs this:

$ abi3audit -v --assume-minimum-abi3 3.12 src/dnaio/_core.abi3.so
[09:47:43] 👎 : _core.abi3.so has non-ABI3 symbols
           ┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
           ┃ Symbol                  ┃
           ┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
           │ PyUnicode_New           │
           │ Py_CompileStringExFlags │
           │ _Py_FatalErrorFunc      │
           └─────────────────────────┘
           💁 _core.abi3.so: 1 extensions scanned; 0 ABI version mismatches and 3 ABI violations found

This PR is just meant to document the experiment, so I’ll close it right away.

After watching
https://ep2024.europython.eu/session/cython-and-the-limited-api/, I wanted
to test how well switching to the stable ABI would work.

It would be great to be able to build a single wheel for each platform and
not have to update when a new Python version gets released.

There are two issues. The first is that using the limited API makes dnaio
slower. Reading FASTQ files takes 50% more time. I did not benchmark anything
else. It was actually about 10 time slower when I used the limited API for
Python 3.7. (This PR uses what is available in Python 3.12.) Maybe this will
get better in newer versions.

The second issue is that still some non-ABI3 symbols are being used in our
code, in particular PyUnicode_New.

abi3audit (https://github.com/pypa/abi3audit) outputs this:

```
$ abi3audit -v --assume-minimum-abi3 3.12 src/dnaio/_core.abi3.so
[09:47:43] 👎 : _core.abi3.so has non-ABI3 symbols
           ┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
           ┃ Symbol                  ┃
           ┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
           │ PyUnicode_New           │
           │ Py_CompileStringExFlags │
           │ _Py_FatalErrorFunc      │
           └─────────────────────────┘
           💁 _core.abi3.so: 1 extensions scanned; 0 ABI version mismatches and 3 ABI violations found
```

This PR is just meant to document the experiment, so I’ll close it right
away.
@rhpvorderman
Copy link
Collaborator

PyUnicode_GET_LENGTH and PyUnicode_DATA depend on the internal representation of the string object. As such they can never be stable. This is a bit problematic, because dnaio depends on those macros a lot for SequenceObject operations. If we cannot rely on these, the speed will slow down to python speeds (Calling len() and requiring to unwrap the object as an integer).

So it is not just the parsing, but a lot of operations on the SequenceRecord objects will get slower as well.

@marcelm
Copy link
Owner Author

marcelm commented Nov 12, 2024

I’m not saying we have to do this but want to document what I found.

There’s also PyUnicode_GetLength, which is in the stable ABI. It’s not a macro, so it’ll be slightly more expensive than PyUnicode_GET_LENGTH, but it’s probably not be too bad.

We use PyUnicode_DATA for reading the data in a unicode object and also for creating new ones (with PyUnicode_New).

Victor Stinner proposed PEP 756. The PEP was ultimately witdrawn, but what I got out of the discussion is that it’s perhaps an alternative to use PyUnicode_AsUTF8AndSize for the case where we just want to read the data. If I understand correctly, then the function just returns the pointer to the ASCII data if it uses the ASCII representation anyway.

To replace usage of PyUnicode_New (needed, for example, in FastqIter.__next__ for the attributes of the SequenceRecord), we would need to switch to PyUnicode_DecodeUTF8 or PyUnicode_DecodeASCII as appropriate. My guess is that the overhead may also not be too bad. we would no longer need do an explicit memcpy(), and for ASCII data, we could remove our custom check for whether the data is ASCII and let PyUnicode_DecodeASCII do that check.

@rhpvorderman
Copy link
Collaborator

Interesting suggestions. PyUnicode_DecodeASCII was quite a lot slower because it checks all strings individually rather than checking one big block and using memcpy. It also does the ASCII check and copying simultaneously: https://github.com/python/cpython/blob/main/Objects/unicodeobject.c#L4992

PyUnicode_DecodeLatin1 does what we do now, and does a find_max_char call to find the maximum size. So it should be faster than PyUnicode_DecodeASCII (allthough this needs benchmarking). If we keep our own ASCII check in we can still guarantee there is no violation of the FASTQ spec. Our custom ASCII check is much faster because it does not expect to find errors and checks larger blocks. (This is a typical case of tuning the algorithm to fit the data, something that the generic API calls in CPython cannot do.)

I could not find a stable ABI equivalent for PyUnicode_New. PyUnicode_FromKindAndData leads to PyUnicode_From1ByteKind which leads to PyUnicode_DecodeLatin1, which I already researched.

This is indeed interesting to pursue. If raw speed was the goal this library would need to be written entirely in C anyway (yes, I have a branch somewhere that proves this...), but this library was written in Cython to make it easier. Limiting the code to the stable API is another step in reducing the maintenance burden.

@rhpvorderman
Copy link
Collaborator

I don't know how it will work out for Cython, but I tried it in pure C. I ran into a few limitations. Cython will also have to mitigate for these limitations.

  • When throwing errors Py_TYPE(obj)->tp_name is not available, PyType_GetName should be used instead. That is only available from thea 3.11 stable ABI onwards.
  • PyBUF_READ and PyBUF_WRITE only available from 3.11 stable ABI. PyMemoryView_FromMemory is available from 3.4 but its last argument can only be filled from 3.11 onwards.
  • Creating custom deallocators is harder as accessing the tp_free slot now requires PyType_GetSlot rather than direct access.
  • Only heap types are allowed.Types cannot be referenced in code as they are not static. They float somewhere in memory. This makes calling PyObject_New very hard as you have to find where it is floating in memory. This requires fiddling with module states and passing the module as a parameter.

Probably Cython can work with the stable ABI before 3.11 but that will involve a lot of hacking on their side. The simplest things become very hard when this limitation is present. A simple library with a few C functions, sure, that will work, but having classes that are custom iterators... I don't know. I will look at it again when the 3.14 release is imminent.

@rhpvorderman
Copy link
Collaborator

Ok. I managed to convert sequali. Stable ABI version 3.10 (with a minor hack: defining PyBUF_READ and PyBUF_WRITE).

PyUnicode_AsUTF8AndSize works pretty well and will suite most of our string reading cases. The ASCII check is also very simple. See this commit for details: rhpvorderman/sequali@9a64694

For dnaio it is best to wait for ABI 3.11 as that includes the entire buffer protocol, and that is used for the chunking.

Currently I am looking into how to package all this and found this example project https://github.com/joerick/python-abi3-package-sample.

I found another reason to go to the stable ABI. And that is that it requires the usage of heap types rather than statically defined types in the code. This makes type checking harder, so a module state is required. This leads to having a properly defined module setup with a state and all the proper initialization required for proper module isolation. https://docs.python.org/3/howto/isolating-extensions.html Technically this can also be done without the stable ABI, but the stable ABI makes it pretty much the only option. Isolating modules is important in a free-threading python future, and I am quite sure many bioinformaticians will jump on the opportunity to use dnaio in that context.

Sorry for the info dump here, but I figure it would be nice to be able to reference to something when implementing the stable ABI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants