This repository has been archived by the owner on Nov 28, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 12
BAMBDGDataSource genomic intervals predicate pushdowns using BAI
Marek Wiewiórka edited this page Jul 25, 2018
·
4 revisions
spark-shell --master=local[4] \
--driver-memory=8g \
--jars /Users/marek/git/forks/bdg-sequila/target/scala-2.11/bdg-sequila-assembly-0.4.1-SNAPSHOT.jar
import org.apache.spark.sql.SequilaSession
import org.biodatageeks.utils.{SequilaRegister, UDFRegister}
val ss = SequilaSession(spark)
/*inject bdg-granges strategy*/
SequilaRegister.register(ss)
ss.sql("""
CREATE TABLE reads_exome USING org.biodatageeks.datasources.BAM.BAMDataSource OPTIONS(path '/Users/marek/Downloads/data/NA12878.ga2.exome.maq.recal.bam')""")
spark.time{
ss.sqlContext.setConf("spark.biodatageeks.bam.predicatePushdown","false")
ss.sql("SELECT count(*) FROM reads_exome WHERE contigName='chr1' AND start=20138").show
}
18/07/25 12:57:44 WARN BAMRelation: GRanges: chr1:20138-20138, false
+--------+
|count(1)|
+--------+
| 20|
+--------+
Time taken: 186045 ms
spark.time{
ss.sqlContext.setConf("spark.biodatageeks.bam.predicatePushdown","true")
ss.sql("SELECT count(*) FROM reads_exome WHERE contigName='chr1' AND start=20138").show
}
18/07/25 13:01:40 WARN BAMRelation: GRanges: chr1:20138-20138, true
18/07/25 13:01:40 WARN BAMRelation: Interval query detected and predicate pushdown enabled, trying to do predicate pushdown using intervals chr1:20138-20138
Using Java builtin Inflater
+--------+
|count(1)|
+--------+
| 20|
+--------+
Time taken: 732 ms