Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the partial members of genomix-data, genomix-hyracks for adapting ray-style #60

Closed
wants to merge 69 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
cb22472
add mate0 and mate1 readSequence
Nov 10, 2013
ec2081e
make up related functions in readHeadSet
Nov 10, 2013
1661432
comment the genomix-data temporally, will recover later
Nov 10, 2013
9c26dc2
use VKmerList to replace the EdgeMap
Nov 10, 2013
d672c7a
change edgeType in Node
Nov 10, 2013
9db851c
change graph building to adapt new ray style
Nov 10, 2013
0b81fcc
tweak graphviz
Nov 10, 2013
d417ccf
change function name of node from getEdgeMap to getEdgeList
Nov 10, 2013
547603f
Merge commit '1b7005d5a2f185c29ea677f4246a4889d805fa36' into nanzhang…
Nov 12, 2013
8522c18
Merge branch 'genomix/fullstack_genomix' into nanzhang/ray-genomix
Nov 13, 2013
a8e0e32
change this/thatSequence, and bitField for not writing any null members
Nov 14, 2013
90ebdfe
change SetAsCopy() and getLengthInBytes() to be consistent with writi…
Nov 14, 2013
9affc37
remove contain(Kmer kmer) and recover contain(VkmerList kmer)
Nov 14, 2013
87d90dd
change partial member function to be consistent with ReadHeadInfo
Nov 14, 2013
59346bc
get mateId in a indirect way, and consider the single long end read
Nov 14, 2013
c90a6ee
Merge branch 'genomix/fullstack_genomix' into nanzhang/ray-genomix
Nov 14, 2013
1033803
change setAsCopy() to use getThisReadSequence() and getThatReadSequen…
Nov 15, 2013
30f7255
use getKmerLetterLength() instead of getKmerBytesLength()
Nov 15, 2013
ce34294
check toString() in readHeadInfo
Nov 15, 2013
5f1d6ae
change getEdgeList() to getEdgeMap() in readKeyValueParser
Nov 15, 2013
8a6a000
small teaks and prepare for test new graph building
Nov 16, 2013
524f258
Merge branch 'genomix/fullstack_genomix' into nanzhang/ray-genomix
Nov 16, 2013
a4cadb0
recover it VkmerList initially and prepare for refining it
Nov 16, 2013
7887496
a tweak
Nov 16, 2013
cbb120f
Merge branch 'nanzhang/genomix_main_branch' into nanzhang/ray-genomix
Nov 18, 2013
0995d14
Merge branch 'genomix/fullstack_genomix' into nanzhang/ray-genomix
Nov 20, 2013
f975ac2
fix new instance bug in readHeadInfo
Nov 20, 2013
e21ceea
use getThatReadSeq instead of getThisReadSeq
Nov 20, 2013
97b5ce2
fix bug in readFiled() and getreadSeq(), and getMateSeq()
Nov 20, 2013
b227f65
change name from getThatReadSeq() to getMateReadSeq()
Nov 20, 2013
30beef5
change name from thatReadSeq to mateReadSeq
Nov 20, 2013
17886d9
fix the bug for supporting single read
Nov 20, 2013
cb84316
clean legacy code
Nov 20, 2013
4ab3820
remove uesless code
Nov 20, 2013
c4ead19
rm system.out.println()
Nov 20, 2013
cce134f
complete hyracks test
Nov 20, 2013
9d9eec7
fix the bug for forgetting add the byte length 1 in getLength() due t…
Nov 21, 2013
7b25a64
change genomix-hadoop to be consistent with our new genomix-data
Nov 21, 2013
ed6733c
hadoop test passed!
Nov 21, 2013
8d45898
change the name of readHeadSet
Nov 21, 2013
85539b8
change graphstatics to remove the readId in it
Nov 21, 2013
40846bc
genomix-hyracks test passed!
Nov 21, 2013
14d7781
prepare KMP algorithm
Nov 21, 2013
16523c3
complete the KMP test
Nov 21, 2013
9587903
complete fast detect sub Vkmer using KMP, not complete test
Nov 21, 2013
cc8838d
complete find sub-vkmer function’s test
Nov 21, 2013
3c7df0a
add readHeadInfo(data[] offset); remove asLong()
Nov 22, 2013
4acc66b
add readHeadInfo(data[] offset) and remove asLong()
Nov 22, 2013
7d604bc
remove updateEdgeRead()
Nov 22, 2013
67add02
modify the typo
Nov 22, 2013
12b7a7a
remove conflict for genomix-pregelix
Nov 22, 2013
603e47a
remove active_Field ’s thisReadSequence field
Nov 22, 2013
2566c8d
refactor readHeadInfo
Nov 22, 2013
f7b170b
use getMateReadSeq() in set() for readHeadInfo
Nov 22, 2013
42a5bb2
fix the bug for setting null to thisReadSeq, and add setUUID()
Nov 22, 2013
246c623
finish genomix-hyracks test
Nov 22, 2013
6763566
add exception
Nov 22, 2013
e5099ac
remove all code related to parsing filename
Nov 23, 2013
cd7c12d
remove all code related to parsing file name in genomix-hadoop
Nov 23, 2013
d1e1e4a
remove KMP to create a separate branch instead
Nov 23, 2013
8cae514
factor out edgeMap code in favor of VKmerList; remove SplitRepeat
jakebiesinger Nov 23, 2013
9e2fdd6
check both msg and vertex in symmetry checker
jakebiesinger Nov 25, 2013
2af7a94
add some error checking; remove unused functions
jakebiesinger Nov 25, 2013
4c61c2c
rename all `edgeMap`s and `edgeList`s to `edges`
jakebiesinger Nov 25, 2013
d96e25b
Merge remote-tracking branch 'origin/nanzhang/ray-genomix' into wbies…
jakebiesinger Dec 3, 2013
3048f59
Merge remote-tracking branch 'origin/genomix/fullstack_genomix' into …
jakebiesinger Dec 3, 2013
e4ec721
fix to support non-default pregelix cc http ports
jakebiesinger Dec 4, 2013
79664c2
Merge branch 'genomix/fullstack_genomix' into wbiesing/ray-genomix-re…
jakebiesinger Dec 4, 2013
d3e3869
Merge pull request #71 from uci-cbcl/wbiesing/ray-genomix-refactor
jakebiesinger Dec 4, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.lang3.StringUtils;
Expand All @@ -29,19 +28,17 @@
import org.kohsuke.args4j.Option;

import edu.uci.ics.genomix.minicluster.GenerateGraphViz.GRAPH_TYPE;
import edu.uci.ics.genomix.type.EdgeMap;
import edu.uci.ics.genomix.type.Kmer;
import edu.uci.ics.genomix.type.Node;
import edu.uci.ics.genomix.type.VKmer;

@SuppressWarnings("deprecation")
public class GenomixJobConf extends JobConf {

public static boolean debug = false;
public static ArrayList<VKmer> debugKmers;

private static Map<String, Long> tickTimes = new HashMap<String, Long>();

/* The following section ties together command-line options with a global JobConf
* Each variable has an annotated, command-line Option which is private here but
* is accessible through JobConf.get(GenomixConfigOld.VARIABLE).
Expand Down Expand Up @@ -162,7 +159,7 @@ private static class Options {

@Option(name = "-threadsPerMachine", usage = "The number of threads to use per slave machine. Default is 1.", required = false)
private int threadsPerMachine = 1;

@Option(name = "-extraConfFiles", usage = "Read all the job confs from the given comma-separated list of multiple conf files", required = false)
private String extraConfFiles;
}
Expand Down Expand Up @@ -293,6 +290,7 @@ public static void verifyPatterns(Patterns[] patterns) {
// GAGE Metrics Evaluation
public static final String STATS_EXPECTED_GENOMESIZE = "genomix.conf.expectedGenomeSize";
public static final String STATS_MIN_CONTIGLENGTH = "genomix.conf.minContigLength";

// intermediate date evaluation

public GenomixJobConf(int kmerLength) {
Expand Down Expand Up @@ -434,11 +432,11 @@ private void fillMissingDefaults() {
// hdfs setup
if (get(HDFS_WORK_PATH) == null)
set(HDFS_WORK_PATH, "genomix_out"); // should be in the user's home directory?

// default conf setup
if (get(EXTRA_CONF_FILES) == null)
set(EXTRA_CONF_FILES, "");

// hyracks-specific

// if (getBoolean(RUN_LOCAL, false)) {
Expand Down Expand Up @@ -505,7 +503,7 @@ private void setFromOpts(Options opts) {
if (opts.plotSubgraph_startSeed != null)
set(PLOT_SUBGRAPH_START_SEEDS, opts.plotSubgraph_startSeed);
setInt(PLOT_SUBGRAPH_NUM_HOPS, opts.plotSubgraph_numHops);

// read conf.xml
if (opts.extraConfFiles != null)
set(EXTRA_CONF_FILES, opts.extraConfFiles);
Expand Down Expand Up @@ -537,12 +535,13 @@ public static long tock(String counter) {
public static void setGlobalStaticConstants(Configuration conf) {
Kmer.setGlobalKmerLength(Integer.parseInt(conf.get(GenomixJobConf.KMER_LENGTH)));
// EdgeWritable.MAX_READ_IDS_PER_EDGE = Integer.parseInt(conf.get(GenomixJobConf.MAX_READIDS_PER_EDGE));
EdgeMap.logReadIds = Boolean.parseBoolean(conf.get(GenomixJobConf.LOG_READIDS));
debug = conf.get(GenomixJobConf.DEBUG_KMERS) != null;
debugKmers = new ArrayList<VKmer>();
if (conf.get(GenomixJobConf.DEBUG_KMERS) != null) {
for (String kmer : conf.get(GenomixJobConf.DEBUG_KMERS).split(",")) {
debugKmers.add(new VKmer(kmer));
if (debugKmers == null) {
debugKmers = new ArrayList<VKmer>();
if (conf.get(GenomixJobConf.DEBUG_KMERS) != null) {
for (String kmer : conf.get(GenomixJobConf.DEBUG_KMERS).split(",")) {
debugKmers.add(new VKmer(kmer));
}
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import java.io.File;
import java.util.HashMap;
import java.util.Map.Entry;

import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
Expand All @@ -14,7 +13,6 @@

import edu.uci.ics.genomix.type.EDGETYPE;
import edu.uci.ics.genomix.type.Node;
import edu.uci.ics.genomix.type.ReadIdSet;
import edu.uci.ics.genomix.type.VKmer;

//TODO by Jianfeng: move this to script
Expand Down Expand Up @@ -146,37 +144,37 @@ public static byte[] convertGraphToImg(JobConf conf, String srcDir, String destD
public static String convertEdgeToGraph(String outputNode, Node value, GRAPH_TYPE graphType) {
String outputEdge = "";
for (EDGETYPE et : EDGETYPE.values) {
for (Entry<VKmer, ReadIdSet> e : value.getEdgeMap(et).entrySet()) {
for (VKmer e : value.getEdges(et)) {
String destNode = "";
switch (graphType) {
case UNDIRECTED_GRAPH_WITHOUT_LABELS:
if (map.containsKey(e.getKey().toString()))
destNode += map.get(e.getKey().toString());
if (map.containsKey(e.toString()))
destNode += map.get(e.toString());
else {
count++;
map.put(e.getKey().toString(), count);
map.put(e.toString(), count);
destNode += count;
}
outputEdge += outputNode + " -> " + destNode + "[dir=none]\n";
break;
case DIRECTED_GRAPH_WITH_SIMPLELABEL_AND_EDGETYPE:
if (map.containsKey(e.getKey().toString()))
destNode += map.get(e.getKey().toString());
if (map.containsKey(e.toString()))
destNode += map.get(e.toString());
else {
count++;
map.put(e.getKey().toString(), count);
map.put(e.toString(), count);
destNode += count;
}
outputEdge += outputNode + " -> " + destNode + "[color = \"" + getColor(et) + "\" label =\""
+ et + "\"]\n";
break;
case DIRECTED_GRAPH_WITH_KMERS_AND_EDGETYPE:
outputEdge += outputNode + " -> " + e.getKey().toString() + "[color = \"" + getColor(et)
outputEdge += outputNode + " -> " + e.toString() + "[color = \"" + getColor(et)
+ "\" label =\"" + et + "\"]\n";
break;
case DIRECTED_GRAPH_WITH_ALLDETAILS:
outputEdge += outputNode + " -> " + e.getKey().toString() + "[color = \"" + getColor(et)
+ "\" label =\"" + et + ": " + e.getValue() + "\"]\n";
outputEdge += outputNode + " -> " + e.toString() + "[color = \"" + getColor(et)
+ "\" label =\"" + et + "\"]\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anbangx since there are no readids on the edges in this version, we'll have to reconsider what DIRECTED_GRAPH_WITH_ALLDETAILS now means. Perhaps it should print the internalKmer instead of the key?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakebiesinger WITH_ALLDETAILS includes all the other fields inside the node, this key is only for edge connection.

break;
default:
throw new IllegalStateException("Invalid input Graph Type!!!");
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
package edu.uci.ics.genomix.type;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public enum EDGETYPE implements Writable {
public enum EDGETYPE {
//public enum EDGETYPE implements Writable {

FF((byte) (0b00)),
FR((byte) (0b01)),
Expand Down Expand Up @@ -109,11 +105,11 @@ public static DIR dir(EDGETYPE edgeType) {
throw new RuntimeException("Unrecognized direction in dirFromEdgeType: " + edgeType);
}
}

public DIR neighborDir() {
return neighborDir(this);
}

public static DIR neighborDir(EDGETYPE et) {
switch (et) {
case FF:
Expand Down Expand Up @@ -238,14 +234,13 @@ public static boolean sameOrientation(byte b1, byte b2) {
return sameOrientation(et1, et2);
}

@Override
public void write(DataOutput out) throws IOException {
out.writeByte(this.get());
}

@Override
public void readFields(DataInput in) throws IOException {
this.val = in.readByte();
}

// @Override
// public void write(DataOutput out) throws IOException {
// out.writeByte(this.get());
// }
//
// @Override
// public void readFields(DataInput in) throws IOException {
// this.val = in.readByte();
// }
}

This file was deleted.

Loading