The newer version of spark,adam,sparklingwater for "Genomic Analysis Using ADAM, Spark and Deep Learning" to the people who want to reproduce the test #2

car2008 · 2016-08-26T05:19:29Z

Now i have some advice for Genomic Analysis Using ADAM, Spark and Deep Learning to the people who want to reproduce the test using the newer version tools:

car2008 · 2016-08-26T05:49:39Z

Hi @nfergu ,i have some advice for Genomic Analysis Using ADAM, Spark and Deep Learning to the people who want to reproduce the test .So i post all the changes here ,and i hope it's helpful to others:
first, in the .pom file :

Spark version 1.6.1 replacing 1.2.0
ADAM version 0.19.0 replacing 0.16.0
Sparkling Water version 1.6.5 replacing 1.2.5
H2O version 3.8.2.6 replacing 3.0.0.8(we can only modify the version and don't install it after we have installed Sparkling Water)

<dependency>
        <groupId>org.bdgenomics.adam</groupId>
        <artifactId>adam-core</artifactId>
        <version>${adam.version}</version>
</dependency>
<dependency>
         <groupId>org.bdgenomics.adam</groupId>
         <artifactId>adam-apis</artifactId>
         <version>${adam.version}</version>
</dependency>

is modified to

<dependency>
         <groupId>org.bdgenomics.adam</groupId>
         <artifactId>adam-core_2.10</artifactId>
         <version>${adam.version}</version>
</dependency>
<dependency>
         <groupId>org.bdgenomics.adam</groupId>
         <artifactId>adam-apis_2.10</artifactId>
         <version>${adam.version}</version>
</dependency>

then ,in the codes :

val header = StructType(Array(StructField("Region", StringType)) ++
      sortedVariantsBySampleId.first()._2.map(variant => {StructField(variant.variantId.toString, IntegerType)}))

is modified to

val header = DataTypes.createStructType(Array(DataTypes.createStructField("Region", DataTypes.StringType,false)) ++
      sortedVariantsBySampleId.first()._2.map(variant => {DataTypes.createStructField(variant.variantId.toString,DataTypes.IntegerType,false)}))

// Create the SchemaRDD from the header and rows and convert the SchemaRDD into a H2O dataframe
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val schemaRDD = sqlContext.applySchema(rowRDD, header)
    val h2oContext = new H2OContext(sc).start()
    import h2oContext._
    val dataFrame = h2oContext.toDataFrame(schemaRDD)

is modified to

// Create the SchemaRDD from the header and rows and convert the SchemaRDD into a H2O dataframe
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    //val dataFrame=sqlContext.createDataFrame(rowRDD, header)
    val schemaRDD = sqlContext.applySchema(rowRDD, header)
    val h2oContext = new H2OContext(sc).start()
    import h2oContext._ 
    val dataFrame1 =h2oContext.asH2OFrame(schemaRDD)
    val dataFrame=H2OFrameSupport.allStringVecToCategorical(dataFrame1)

// Split the dataframe into 50% training, 30% test, and 20% validation data
    val frameSplitter = new FrameSplitter(dataFrame, Array(.5, .3), Array("training", "test", "validation").map(Key.make), null)

is modified to

// Split the dataframe into 50% training, 30% test, and 20% validation data
   val frameSplitter = new FrameSplitter(dataFrame, Array(.5, .3), Array("training", "test", "validation").map(Key.make[Frame](_)), null)

// Set the parameters for our deep learning model.
    val deepLearningParameters = new DeepLearningParameters()
    deepLearningParameters._train = training
    deepLearningParameters._valid = validation

is modified to

// Set the parameters for our deep learning model.
    val deepLearningParameters = new DeepLearningParameters()
    deepLearningParameters._train = training._key
    deepLearningParameters._valid = validation._key

// Score the model against the entire dataset (training, test, and validation data)
    // This causes the confusion matrix to be printed
    deepLearningModel.score(dataFrame)('predict)

is modified to

// Score the model against the entire dataset (training, test, and validation data)
    // This causes the confusion matrix to be printed
    deepLearningModel.score(dataFrame)

Add

import org.apache.spark.sql.types.DataTypes
import hex._
import water.fvec._
import water.support._
import _root_.hex.Distribution.Family
import _root_.hex.deeplearning.DeepLearningModel
import _root_.hex.tree.gbm.GBMModel
import _root_.hex.{Model, ModelMetricsBinomial}

ok ,that's all, i have tested it successfully ,it will be better if you have other advice . Thank you again !

Added more population groups

1d8fbac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The newer version of spark,adam,sparklingwater for "Genomic Analysis Using ADAM, Spark and Deep Learning" to the people who want to reproduce the test #2

The newer version of spark,adam,sparklingwater for "Genomic Analysis Using ADAM, Spark and Deep Learning" to the people who want to reproduce the test #2

car2008 commented Aug 26, 2016

car2008 commented Aug 26, 2016 •

edited

Loading

The newer version of spark,adam,sparklingwater for "Genomic Analysis Using ADAM, Spark and Deep Learning" to the people who want to reproduce the test #2

Are you sure you want to change the base?

The newer version of spark,adam,sparklingwater for "Genomic Analysis Using ADAM, Spark and Deep Learning" to the people who want to reproduce the test #2

Conversation

car2008 commented Aug 26, 2016

car2008 commented Aug 26, 2016 • edited Loading

car2008 commented Aug 26, 2016 •

edited

Loading