Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

Closed
svntrx opened this issue Sep 5, 2022 · 18 comments · Fixed by #863
Closed

Supposed file corruption during Pixel Clustering step 2.2: Assign pixel SOM clusters #688

svntrx opened this issue Sep 5, 2022 · 18 comments · Fixed by #863
Assignees
Labels
question Further information is requested

Comments

@svntrx
Copy link

svntrx commented Sep 5, 2022

Dear ark team,

I was trying out the pixel clustering notebook out today and ran into an issue I don't know how to resolve.

After successfully training the pixel SOM (step 2.1), trying to run the next cell first gives the output "The data for FOV has been corrupted, removing" for all FOVs in the fovs list, then proceeds to throw the key error "pixel_som_cluster", as there are no FOVs left to process.

I tried reinstalling the codebase and updating docker, rerunning the notebook, unfortunately to no avail.
I also checked all involved paths and couldn't find any issues there.

Do you guys have an idea where the issue may lie?
Thanks already for your help!

Best wishes,
Sven

grafik
grafik

@svntrx svntrx added the question Further information is requested label Sep 5, 2022
@alex-l-kong
Copy link
Contributor

@sventrx I'm assuming no errors were encountered during the create_pixel_matrix step.

After deleting your current pixel_output_dir, can you try changing the pixel SOM assignment cell to this:

som_utils.cluster_pixels(
    fovs,
    channels,
    base_dir,
    data_dir=pixel_data_dir,
    norm_vals_name=norm_vals_name,
    weights_name=pixel_weights_name,
    pc_chan_avg_som_cluster_name=pc_chan_avg_som_cluster_name,
    batch_size=1  # this is the new parameter being added
)

@alex-l-kong
Copy link
Contributor

@sventrx I had another user encounter an error like this, their problem ended up being a multithreading issue with setting the number of cores. I'll link the issue I just opened here: #695.

If you need to test a fix out immediately, you'll need to modify ark/phenotyping/run_pixel_som.R and ark/phenotyping/pixel_consensus_cluster.R. In the files mentioned, change the line nCores <- parallel::detectCores() - 1 to a more reasonable value, such as nCores <- {number of CPU cores on your machine / 2}.

@jonhsussman
Copy link

jonhsussman commented Sep 18, 2022

I have encountered a similar error, however neither of the solutions mentioned was able to solve this problem. I am running it in a conda environment and it works successfully with 10 markers, but upon increasing the number of markers for the pixel clustering, it is now stuck on this step.

One of the marker files was potentially corrupt, but upon removing it still did not work, and it seems unlikely that many of the single channel tiff files are corrupt. Do you happen to have any other suggests about what could cause this error in this case?

image

@alex-l-kong
Copy link
Contributor

@jonhsussman we recently updated our repo to include more detailed error messages in case of FOV processing errors, can you ensure you have the most recent changes by running git pull, then posting the error message after running the pipeline through again?

@jonhsussman
Copy link

jonhsussman commented Sep 19, 2022

I ran it using the new files, and the current error message is below. Of note, perhaps I should mention that the preceding step, I initially received an error of: Error in SOM(data = as.matrix(pixelSubsetData, rlen = numPasses, xdim = xdim, : unused argument (map = FALSE) but resolved this by removing the "map = FALSE" argument from the create_pixel_som.R file at the call to SOM function. Although it ran successfully with the example dataset, I wonder if there could be a dependency issue here or other issue with the SOM step.

Please let me know if you have any other thoughts here!

image

@alex-l-kong
Copy link
Contributor

alex-l-kong commented Sep 20, 2022

@jonhsussman for the SOM error, I would say wait a bit until we get a new Docker image pushed. It seems like you may be using an older image which didn't implement the map = FALSE argument in the SOM function.

The fovStatuses[i, 'status'] == 1 error is surprising, since that seems to imply the fovStatuses data frame being returned on your end doesn't contain the column 'status' or is placing a NULL value there. I'm assuming you didn't run into this issue prior to the updates, can you try printing out fovStatuses and see what happening?

@jonhsussman
Copy link

jonhsussman commented Sep 20, 2022 via email

@jonhsussman
Copy link

jonhsussman commented Sep 21, 2022

As a brief update, this is the result of printing out fovStatuses, I'm not sure how best to interpret this:

image

And this is some information from the log file (added some print statements to print out the filename and path)

image

@ngreenwald
Copy link
Member

Hey @jonhsussman, if you look in the docker file you'll see that we're importing a forked version of Flowsom.

Given the complexity of the dependencies for ark, we won't be able to offer troubleshooting help if you decide to go the conda route. Totally understand the appeal of having everything work from the terminal, but we moved towards the Docker model because of all the hard to track down issues that kept coming up.

@jonhsussman
Copy link

I managed to solve this problem by processing this step in batches, I am linking to the solution here so others can see if they encounter the same problem:

SofieVG/FlowSOM#54

@jonhsussman
Copy link

jonhsussman commented Sep 24, 2022

I wanted to get back to this to expand on it and list the direct solution here so it can be more easily referenced. It is clear that on large fovs, the existing code (Docker and conda alike) will crash at this .C step, due to a hard limit of passing a vector of more than 2^31 elements, I tailored the excellent solution above to the code at hand, and showed that in the file run_pixel_som.R the line clusters <- FlowSOM:::MapDataToCodes(somWeights, as.matrix(fovPixelData)) can be directly replaced by the following:

block_size = 100  # Can adjust 
nwdata = as.matrix(fovPixelData)
clusters = matrix(0.0, nrow(nwdata), 2)  # MapDataToCodes returns 2 columns
block_end = nrow(nwdata)
for (block_i in 0:((block_end-1) %/% block_size)) {
  i_start = 1 + block_i*block_size
  i_end = min((block_i+1)*block_size, block_end)
  cat(i_start, i_end, "\n")
  clusters[i_start:i_end,] = FlowSOM:::MapDataToCodes(somWeights, nwdata[i_start:i_end,])
}

The block_size can be adjusted, I found that 100 works well, taking an hour or so to complete which seemed reasonable. It probably can go up to 1,000,000 and might be much faster with a very large block, but I didn't have a good reason to test that. It might need some experimentation to optimize this. Perhaps if the image is above a certain size, it could use this approach, or if it's fast enough in all cases, simply to use this in general to avoid other possible errors that may arise

@ngreenwald
Copy link
Member

Hey @jonhsussman, thanks for following up! Can you confirm (if you haven't already) that this is only an issue, even on your end, with large FOVs? Maybe just taking a small crop (around 2k x 2k, which is what our default is) from one of your larger images and making sure it runs with our current pipeline?

I just want to make sure there isn't some other error you've found with the pipeline in general, and that it really is specific to large images

@jonhsussman
Copy link

@ngreenwald That is correct, in fact this error never came up at all with one of our images that was only slightly smaller (but still very massive). And also ran just fine with the example images provided.

@ngreenwald
Copy link
Member

Okay awesome, thanks!

@ngreenwald
Copy link
Member

Hey @jonhsussman, just had our weekly meeting to discuss. Not sure what your bandwidth is, but if you're interested in contributing to the repository, we always welcome pull requests from collaborators! If not, no worries, we've added this to our todo list and we'll get the change merged in.

@jonhsussman
Copy link

jonhsussman commented Oct 10, 2022 via email

@ngreenwald
Copy link
Member

That's sounds great! No rush, totally understand that you have other responsibilities. We have some general guidelines here: https://ark-analysis.readthedocs.io/en/latest/_rtd/contributing.html

This one is pretty straightforward, and you already have code solution which is great. The main thing is just to confirm that you get the exact same results for your new implementation compared to the base version. Generating some random vectors and passing them to both versions of the function, for example. We don't need to test the whole pixie pipeline, just that specific flowsom call.

@jonhsussman
Copy link

jonhsussman commented Oct 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants