Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

svntrx · 2022-09-05T13:02:39Z

Dear ark team,

I was trying out the pixel clustering notebook out today and ran into an issue I don't know how to resolve.

After successfully training the pixel SOM (step 2.1), trying to run the next cell first gives the output "The data for FOV has been corrupted, removing" for all FOVs in the fovs list, then proceeds to throw the key error "pixel_som_cluster", as there are no FOVs left to process.

I tried reinstalling the codebase and updating docker, rerunning the notebook, unfortunately to no avail.
I also checked all involved paths and couldn't find any issues there.

Do you guys have an idea where the issue may lie?
Thanks already for your help!

Best wishes,
Sven

alex-l-kong · 2022-09-06T17:58:07Z

@sventrx I'm assuming no errors were encountered during the create_pixel_matrix step.

After deleting your current pixel_output_dir, can you try changing the pixel SOM assignment cell to this:

som_utils.cluster_pixels(
    fovs,
    channels,
    base_dir,
    data_dir=pixel_data_dir,
    norm_vals_name=norm_vals_name,
    weights_name=pixel_weights_name,
    pc_chan_avg_som_cluster_name=pc_chan_avg_som_cluster_name,
    batch_size=1  # this is the new parameter being added
)

alex-l-kong · 2022-09-07T22:20:51Z

@sventrx I had another user encounter an error like this, their problem ended up being a multithreading issue with setting the number of cores. I'll link the issue I just opened here: #695.

If you need to test a fix out immediately, you'll need to modify ark/phenotyping/run_pixel_som.R and ark/phenotyping/pixel_consensus_cluster.R. In the files mentioned, change the line nCores <- parallel::detectCores() - 1 to a more reasonable value, such as nCores <- {number of CPU cores on your machine / 2}.

jonhsussman · 2022-09-18T01:27:04Z

I have encountered a similar error, however neither of the solutions mentioned was able to solve this problem. I am running it in a conda environment and it works successfully with 10 markers, but upon increasing the number of markers for the pixel clustering, it is now stuck on this step.

One of the marker files was potentially corrupt, but upon removing it still did not work, and it seems unlikely that many of the single channel tiff files are corrupt. Do you happen to have any other suggests about what could cause this error in this case?

alex-l-kong · 2022-09-19T16:54:20Z

@jonhsussman we recently updated our repo to include more detailed error messages in case of FOV processing errors, can you ensure you have the most recent changes by running git pull, then posting the error message after running the pipeline through again?

jonhsussman · 2022-09-19T23:05:24Z

I ran it using the new files, and the current error message is below. Of note, perhaps I should mention that the preceding step, I initially received an error of: Error in SOM(data = as.matrix(pixelSubsetData, rlen = numPasses, xdim = xdim, : unused argument (map = FALSE) but resolved this by removing the "map = FALSE" argument from the create_pixel_som.R file at the call to SOM function. Although it ran successfully with the example dataset, I wonder if there could be a dependency issue here or other issue with the SOM step.

Please let me know if you have any other thoughts here!

alex-l-kong · 2022-09-20T20:03:02Z

@jonhsussman for the SOM error, I would say wait a bit until we get a new Docker image pushed. It seems like you may be using an older image which didn't implement the map = FALSE argument in the SOM function.

The fovStatuses[i, 'status'] == 1 error is surprising, since that seems to imply the fovStatuses data frame being returned on your end doesn't contain the column 'status' or is placing a NULL value there. I'm assuming you didn't run into this issue prior to the updates, can you try printing out fovStatuses and see what happening?

jonhsussman · 2022-09-20T20:15:17Z

Thanks for you reply here! Regarding the SOM function, I was not using Docker, but instead using conda, and thus referencing a native version of R/4.2.1, on which I have installed all the necessary R dependencies. Did you modify the SOM function, and if so, are you able to provide the most up-to-date version of the SOM function for implementation onnative R installation. Thanks also for for your suggestion on the fovStatuses error. I will try this shortly, but wanted to see if you have some information about the correct SOM function for native R. But about this error, although in the previous version I wasn't running into that exact same error message it was still giving me an error at that same step but only the message simply changed.

…

On Tue, Sep 20, 2022, 4:03 PM alex-l-kong ***@***.***> wrote: @jonhsussman <https://urldefense.com/v3/__https://github.com/jonhsussman__;!!LIr3w8kk_Xxm!rU78-YwxBk5HU9Tt4BdG-ghqm8L2sojiUFmNTnuaVjRlr2BY1GLPWHHtz8TvU-4Ax6s-I8P7oLzvYMLbD_fMYd0S$> for the SOM error, I would say wait a bit until we get a new Docker image pushed. It seems like you may be using an older image which didn't implement the map = FALSE argument in the SOM function. The fovStatuses[i, 'status'] == 1 error is surprising, since that seems to imply the fovStatuses data frame being returned on your end doesn't contain the column 'status' or is placing a NULL value there. I'm assuming you didn't run into this issue prior to the updates, can you try printing out fovStatuses and seeing what happening? — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/angelolab/ark-analysis/issues/688*issuecomment-1252844996__;Iw!!LIr3w8kk_Xxm!rU78-YwxBk5HU9Tt4BdG-ghqm8L2sojiUFmNTnuaVjRlr2BY1GLPWHHtz8TvU-4Ax6s-I8P7oLzvYMLbD8xOJB68$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKIGYMWYID7OI7TDRI47OQDV7IKADANCNFSM6AAAAAAQE7D2VA__;!!LIr3w8kk_Xxm!rU78-YwxBk5HU9Tt4BdG-ghqm8L2sojiUFmNTnuaVjRlr2BY1GLPWHHtz8TvU-4Ax6s-I8P7oLzvYMLbD-605TZm$> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jonhsussman · 2022-09-21T01:12:01Z

As a brief update, this is the result of printing out fovStatuses, I'm not sure how best to interpret this:

And this is some information from the log file (added some print statements to print out the filename and path)

ngreenwald · 2022-09-21T01:20:07Z

Hey @jonhsussman, if you look in the docker file you'll see that we're importing a forked version of Flowsom.

Given the complexity of the dependencies for ark, we won't be able to offer troubleshooting help if you decide to go the conda route. Totally understand the appeal of having everything work from the terminal, but we moved towards the Docker model because of all the hard to track down issues that kept coming up.

jonhsussman · 2022-09-21T20:19:42Z

I managed to solve this problem by processing this step in batches, I am linking to the solution here so others can see if they encounter the same problem:

SofieVG/FlowSOM#54

jonhsussman · 2022-09-24T13:17:54Z

I wanted to get back to this to expand on it and list the direct solution here so it can be more easily referenced. It is clear that on large fovs, the existing code (Docker and conda alike) will crash at this .C step, due to a hard limit of passing a vector of more than 2^31 elements, I tailored the excellent solution above to the code at hand, and showed that in the file run_pixel_som.R the line clusters <- FlowSOM:::MapDataToCodes(somWeights, as.matrix(fovPixelData)) can be directly replaced by the following:

block_size = 100  # Can adjust 
nwdata = as.matrix(fovPixelData)
clusters = matrix(0.0, nrow(nwdata), 2)  # MapDataToCodes returns 2 columns
block_end = nrow(nwdata)
for (block_i in 0:((block_end-1) %/% block_size)) {
  i_start = 1 + block_i*block_size
  i_end = min((block_i+1)*block_size, block_end)
  cat(i_start, i_end, "\n")
  clusters[i_start:i_end,] = FlowSOM:::MapDataToCodes(somWeights, nwdata[i_start:i_end,])
}

The block_size can be adjusted, I found that 100 works well, taking an hour or so to complete which seemed reasonable. It probably can go up to 1,000,000 and might be much faster with a very large block, but I didn't have a good reason to test that. It might need some experimentation to optimize this. Perhaps if the image is above a certain size, it could use this approach, or if it's fast enough in all cases, simply to use this in general to avoid other possible errors that may arise

ngreenwald · 2022-09-25T02:55:14Z

Hey @jonhsussman, thanks for following up! Can you confirm (if you haven't already) that this is only an issue, even on your end, with large FOVs? Maybe just taking a small crop (around 2k x 2k, which is what our default is) from one of your larger images and making sure it runs with our current pipeline?

I just want to make sure there isn't some other error you've found with the pipeline in general, and that it really is specific to large images

jonhsussman · 2022-09-25T19:08:51Z

@ngreenwald That is correct, in fact this error never came up at all with one of our images that was only slightly smaller (but still very massive). And also ran just fine with the example images provided.

ngreenwald · 2022-09-25T21:04:16Z

Okay awesome, thanks!

ngreenwald · 2022-09-27T21:25:59Z

Hey @jonhsussman, just had our weekly meeting to discuss. Not sure what your bandwidth is, but if you're interested in contributing to the repository, we always welcome pull requests from collaborators! If not, no worries, we've added this to our todo list and we'll get the change merged in.

jonhsussman · 2022-10-10T18:49:17Z

I would be very happy to contribute to the repository! My bandwidth has been very much reduced the last couple of weeks (which also explains my delay here as I've had to turn away my attention), but I will get back to this quite soon and start testing some of the solutions and contributing pull requests.

…

On Tue, Sep 27, 2022 at 5:26 PM Noah F. Greenwald ***@***.***> wrote: Hey @jonhsussman <https://urldefense.com/v3/__https://github.com/jonhsussman__;!!LIr3w8kk_Xxm!uJd95REklNYTbpHz2wYMggwmXxTTiI0rJc77xwqxDA4NQ5FRUVR1gDGVZ5HkJcYVjAVKqNAvtMVoesPLLvvriWuG$>, just had our weekly meeting to discuss. Not sure what your bandwidth is, but if you're interested in contributing to the repository, we always welcome pull requests from collaborators! If not, no worries, we've added this to our todo list and we'll get the change merged in. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/angelolab/ark-analysis/issues/688*issuecomment-1260071510__;Iw!!LIr3w8kk_Xxm!uJd95REklNYTbpHz2wYMggwmXxTTiI0rJc77xwqxDA4NQ5FRUVR1gDGVZ5HkJcYVjAVKqNAvtMVoesPLLjVqrxA-$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKIGYMXVJS73H73S6JHQV6DWANQ7DANCNFSM6AAAAAAQE7D2VA__;!!LIr3w8kk_Xxm!uJd95REklNYTbpHz2wYMggwmXxTTiI0rJc77xwqxDA4NQ5FRUVR1gDGVZ5HkJcYVjAVKqNAvtMVoesPLLr7G8haI$> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ngreenwald · 2022-10-11T01:16:10Z

That's sounds great! No rush, totally understand that you have other responsibilities. We have some general guidelines here: https://ark-analysis.readthedocs.io/en/latest/_rtd/contributing.html

This one is pretty straightforward, and you already have code solution which is great. The main thing is just to confirm that you get the exact same results for your new implementation compared to the base version. Generating some random vectors and passing them to both versions of the function, for example. We don't need to test the whole pixie pipeline, just that specific flowsom call.

jonhsussman · 2022-10-14T00:02:21Z

Thanks for the link and inclusion to the codes! Might take a week or longer, but will keep you posted with updates herein

…

On Mon, Oct 10, 2022 at 9:16 PM Noah F. Greenwald ***@***.***> wrote: That's sounds great! No rush, totally understand that you have other responsibilities. We have some general guidelines here: https://ark-analysis.readthedocs.io/en/latest/_rtd/contributing.html <https://urldefense.com/v3/__https://ark-analysis.readthedocs.io/en/latest/_rtd/contributing.html__;!!LIr3w8kk_Xxm!pbfQqs7TJ8mxvhVXzpGkjqh-mDJn487SdxBUqn8wz1Qp6rjyj2iyUraHjMWOSVn5T0Nl0FgF1lG6AEdqw-yNYs3m$> This one is pretty straightforward, and you already have code solution which is great. The main thing is just to confirm that you get the exact same results for your new implementation compared to the base version. Generating some random vectors and passing them to both versions of the function, for example. We don't need to test the whole pixie pipeline, just that specific flowsom call. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/angelolab/ark-analysis/issues/688*issuecomment-1273967244__;Iw!!LIr3w8kk_Xxm!pbfQqs7TJ8mxvhVXzpGkjqh-mDJn487SdxBUqn8wz1Qp6rjyj2iyUraHjMWOSVn5T0Nl0FgF1lG6AEdqw7nkeYtg$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKIGYMTT5XRCJWKFKBFOQZ3WCS5WNANCNFSM6AAAAAAQE7D2VA__;!!LIr3w8kk_Xxm!pbfQqs7TJ8mxvhVXzpGkjqh-mDJn487SdxBUqn8wz1Qp6rjyj2iyUraHjMWOSVn5T0Nl0FgF1lG6AEdqw4L6LY2S$> . You are receiving this because you were assigned.Message ID: ***@***.***>

svntrx added the question Further information is requested label Sep 5, 2022

ngreenwald assigned alex-l-kong Sep 6, 2022

alex-l-kong assigned jonhsussman Oct 10, 2022

alex-l-kong mentioned this issue Dec 16, 2022

Integrate pyFlowSOM with Pixie #863

Merged

ngreenwald closed this as completed in #863 Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

svntrx commented Sep 5, 2022

alex-l-kong commented Sep 6, 2022

alex-l-kong commented Sep 7, 2022

jonhsussman commented Sep 18, 2022 •

edited

Loading

alex-l-kong commented Sep 19, 2022

jonhsussman commented Sep 19, 2022 •

edited

Loading

alex-l-kong commented Sep 20, 2022 •

edited

Loading

jonhsussman commented Sep 20, 2022 via email •

edited

Loading

jonhsussman commented Sep 21, 2022 •

edited

Loading

ngreenwald commented Sep 21, 2022

jonhsussman commented Sep 21, 2022

jonhsussman commented Sep 24, 2022 •

edited

Loading

ngreenwald commented Sep 25, 2022

jonhsussman commented Sep 25, 2022

ngreenwald commented Sep 25, 2022

ngreenwald commented Sep 27, 2022

jonhsussman commented Oct 10, 2022 via email

ngreenwald commented Oct 11, 2022

jonhsussman commented Oct 14, 2022 via email

Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

Comments

svntrx commented Sep 5, 2022

alex-l-kong commented Sep 6, 2022

alex-l-kong commented Sep 7, 2022

jonhsussman commented Sep 18, 2022 • edited Loading

alex-l-kong commented Sep 19, 2022

jonhsussman commented Sep 19, 2022 • edited Loading

alex-l-kong commented Sep 20, 2022 • edited Loading

jonhsussman commented Sep 20, 2022 via email • edited Loading

jonhsussman commented Sep 21, 2022 • edited Loading

ngreenwald commented Sep 21, 2022

jonhsussman commented Sep 21, 2022

jonhsussman commented Sep 24, 2022 • edited Loading

ngreenwald commented Sep 25, 2022

jonhsussman commented Sep 25, 2022

ngreenwald commented Sep 25, 2022

ngreenwald commented Sep 27, 2022

jonhsussman commented Oct 10, 2022 via email

ngreenwald commented Oct 11, 2022

jonhsussman commented Oct 14, 2022 via email

jonhsussman commented Sep 18, 2022 •

edited

Loading

jonhsussman commented Sep 19, 2022 •

edited

Loading

alex-l-kong commented Sep 20, 2022 •

edited

Loading

jonhsussman commented Sep 20, 2022 via email •

edited

Loading

jonhsussman commented Sep 21, 2022 •

edited

Loading

jonhsussman commented Sep 24, 2022 •

edited

Loading