Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems reading stdin on Windows 11 Python 3.12 #791

Closed
peterjc opened this issue Jun 21, 2024 · 7 comments
Closed

Problems reading stdin on Windows 11 Python 3.12 #791

peterjc opened this issue Jun 21, 2024 · 7 comments

Comments

@peterjc
Copy link
Contributor

peterjc commented Jun 21, 2024

This issue was identified on peterjc/thapbi-pict#627 where continuous integration tests using cutadapt to generate test inputs started to fail WITHOUT relevant code changes. i.e. Something changes in the AppVeyor Windows environment (unclear what) and/or the PyPI packages (nothing obvious).

I have since reduced this to a local test case under Windows 11, Python 3.12.4 (installed from python.org with the PATH option ticked), and cutadapt 4.9 (installed with pip install -U cutadapt):

C:\Users\xxx>python --version
Python 3.12.4

C:\Users\xxx>pip install -U cutadapt
Collecting cutadapt
  Downloading cutadapt-4.9-cp312-cp312-win_amd64.whl.metadata (3.5 kB)
Collecting dnaio>=1.2.0 (from cutadapt)
  Downloading dnaio-1.2.1-cp312-cp312-win_amd64.whl.metadata (3.6 kB)
Collecting xopen>=1.6.0 (from cutadapt)
  Downloading xopen-2.0.2-py3-none-any.whl.metadata (15 kB)
Collecting isal>=1.6.1 (from xopen>=1.6.0->cutadapt)
  Downloading isal-1.6.1-cp312-cp312-win_amd64.whl.metadata (10 kB)
Collecting zlib-ng>=0.4.1 (from xopen>=1.6.0->cutadapt)
  Downloading zlib_ng-0.4.3-cp312-cp312-win_amd64.whl.metadata (6.9 kB)
Downloading cutadapt-4.9-cp312-cp312-win_amd64.whl (231 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.1/231.1 kB 939.6 kB/s eta 0:00:00
Downloading dnaio-1.2.1-cp312-cp312-win_amd64.whl (86 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.2/86.2 kB 808.2 kB/s eta 0:00:00
Downloading xopen-2.0.2-py3-none-any.whl (17 kB)
Downloading isal-1.6.1-cp312-cp312-win_amd64.whl (201 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.5/201.5 kB 942.2 kB/s eta 0:00:00
Downloading zlib_ng-0.4.3-cp312-cp312-win_amd64.whl (88 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.5/88.5 kB 1.7 MB/s eta 0:00:00
Installing collected packages: zlib-ng, isal, xopen, dnaio, cutadapt
Successfully installed cutadapt-4.9 dnaio-1.2.1 isal-1.6.1 xopen-2.0.2 zlib-ng-0.4.3

C:\Users\xxx>cutadapt --version
4.9

This was then patched to include d9cf273 for a clearer error message.

Working example using a filename and one of the cutadapt test files (the primer isn't really appropriate for this dataset):

C:\Users\xxx\cutadapt\tests\data>cutadapt --quiet -g GAAGGTGAAGTCGTAACAAGG 454.fa | find ">" /C
59

Restructured to read from stdin:

C:\Users\xxx\Projects\cutadapt\tests\data>type 454.fa | cutadapt --quiet -g GAAGGTGAAGTCGTAACAAGG - | find ">" /C
Input file format not recognized. The file starts with b'TGTA', but files in supported formats start with '>' (FASTA), '@' (FASTQ) or 'BAM'
0

Inserting additional logging suggests function detect_file_format is called twice, the first time works and says FASTA format. The second time it is part way though the file, and fails.

This reminds me of #774, but is something Windows specific it seems.

@rhpvorderman
Copy link
Collaborator

does it work with the latest xopen 1.x version? 2.0.0 was a massive refactoring that enabled all sorts of cool functionality while also using less code, but as a consequence there were unforeseen bugs.

@peterjc
Copy link
Contributor Author

peterjc commented Jun 21, 2024

I will try and check that next week (won't have access to a Windows machine over the weekend).

@peterjc
Copy link
Contributor Author

peterjc commented Jun 24, 2024

No change with xopen 1.7.0, 1.8.0, or 1.9.0 - and my debugging to stderr still shows the detect_file_format function being called twice.

@marcelm
Copy link
Owner

marcelm commented Jun 24, 2024

I've now been able to reproduce the problem on Windows 10. I can also see where the function is called twice, will see what I can do.

@marcelm
Copy link
Owner

marcelm commented Jul 29, 2024

Interesting, it works on Python 3.10, 3.11 and 3.12.0, but fails on 3.12.4. Bisecting points to commit python/cpython@de347c0 (part of 3.12.3).

@marcelm
Copy link
Owner

marcelm commented Nov 13, 2024

I realize only now that this is the same issue as the one that we encountered in dnaio two days ago when trying to make a new release. This is now fixed by this PR: marcelm/dnaio#148

I’ve also changed the dnaio CI so that the Windows tests run on all supported Python versions, which should help to catch something like this better.

@marcelm marcelm closed this as completed Nov 13, 2024
@peterjc
Copy link
Contributor Author

peterjc commented Nov 13, 2024

I actually adjusted my code's test suite after reporting this not to use stdin/stdout so much with cutadapt - otherwise you might have had more bug reports from me ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants