Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fail on GTX 1080Ti with CUDA_ERROR_OUT_OF_MEMORY #83

Open
Jiede1 opened this issue Sep 19, 2018 · 0 comments
Open

Test fail on GTX 1080Ti with CUDA_ERROR_OUT_OF_MEMORY #83

Jiede1 opened this issue Sep 19, 2018 · 0 comments

Comments

@Jiede1
Copy link

Jiede1 commented Sep 19, 2018

The environment is Centos7.4 with Cuda9.0 and one GeForce GTX 1080Ti.

- Run map + reduce on datasets with 100,000,000 elements - multiple partitions
- Run map + map + reduce on datasets - multiple partitions
- Run map + map + map + collect on datasets
- Run map + map + map + reduce on datasets - multiple partitions
- Run map on dataset with a single primitive array column
- Run map with free variables on dataset with a single primitive array column
- Run reduce on dataset with a single primitive array column
- Run map & reduce on a single primitive array in a structure *** FAILED ***
  jcuda.CudaException: CUDA_ERROR_OUT_OF_MEMORY
  at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
  at jcuda.driver.JCudaDriver.cuCtxCreate(JCudaDriver.java:1444)
  at com.ibm.gpuenabler.GPUSparkEnv$.get(GPUSparkEnv.scala:143)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply$mcV$sp(CUDADSFunctionSuite.scala:743)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply(CUDADSFunctionSuite.scala:740)
  at com.ibm.gpuenabler.CUDADSFunctionSuite$$anonfun$47.apply(CUDADSFunctionSuite.scala:740)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  ...
- Run logistic regression *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 1 times, most recent failure: Lost task 5.0 in stage 1.0 (TID 13, localhost, executor driver): jcuda.CudaException: CUDA_ERROR_INVALID_CONTEXT
        at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
        at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
        at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:102)
        at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:87)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:87)
        at com.ibm.gpuenabler.CUDAManager.getModule(CUDAManager.scala:62)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$JCUDAIteratorImpl.processGPU(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$JCUDAIteratorImpl.hasNext(Unknown Source)
        at com.ibm.gpuenabler.MAPGPUExec$$anonfun$doExecute$1.apply(CUDADSUtils.scala:152)
        at com.ibm.gpuenabler.MAPGPUExec$$anonfun$doExecute$1.apply(CUDADSUtils.scala:73)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant