Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the partial members of genomix-data, genomix-hyracks for adapting ray-style #60

Closed
wants to merge 69 commits into from

Conversation

Nan-Zhang
Copy link
Collaborator

@jakebiesinger @JavierJia @anbangx please give some feedback. I have changed some partial structure in genomix-data, and genomix-hyracks for adapting to ray-style. tests are not been completed but in works.

  1. in ReadHeadInfo, add two VKmer field(mate0ReadSequence, mate1ReadSequence) to store the read sequence(initial plan is to use Kmer to store these stuff, but kmer_size was set to 55 initially, there may be some conflict if I use Kmer to store readSequence). The .equal(), .compare(), and .hashcode() have nothing to do with ReadSequence.
  2. changed some functions in ReadHeadSet, Node, Graphviz for corresponding to the changes in ReadHeadInfo.
  3. because we don't use the readIdSet in EdgeMap, So I use VKmerList to replace EdgeMap to store in Node.
  4. changed hyracks-graphbuilding code for corresponding these above changes. Assume the inputFormat is: readId '\t'mate0Sequence'\t'mate1Sequence. (left sequence must come from mate0, right sequence must come from mate1)

Nan Zhang added 9 commits November 9, 2013 17:10
…/ray-genomix

# By Jake Biesinger
# Via Jake Biesinger
* commit '1b7005d5a2f185c29ea677f4246a4889d805fa36':
  remove garbage GAGETEST.fasta
  fix scaffolding bug where incomingMsg details weren't included
  explicitly set the length to 0 at the beginning of BFS
  fix BubbleMerge bug where an incorrect major/minor to bubble dir was used
  use null for default values in all pregelix messages
  stopgap fix for pathmerge problem introduced by #51 in commit 447146c

Conflicts:
	genomix/genomix-pregelix/src/main/java/edu/uci/ics/genomix/pregelix/io/message/BubbleMergeMessage.java
	genomix/genomix-pregelix/src/main/java/edu/uci/ics/genomix/pregelix/io/message/PathMergeMessage.java
outputEdge += outputNode + " -> " + e.getKey().toString() + "[color = \"" + getColor(et)
+ "\" label =\"" + et + ": " + e.getValue() + "\"]\n";
outputEdge += outputNode + " -> " + e.toString() + "[color = \"" + getColor(et)
+ "\" label =\"" + et + "\"]\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anbangx since there are no readids on the edges in this version, we'll have to reconsider what DIRECTED_GRAPH_WITH_ALLDETAILS now means. Perhaps it should print the internalKmer instead of the key?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakebiesinger WITH_ALLDETAILS includes all the other fields inside the node, this key is only for edge connection.

@jakebiesinger
Copy link
Contributor

I want to see those test cases back in place and I think we're probably going to need an optimized representation for these kmer lists (like a sorted list backed by a byte array for all the kmers).

Also, I think the mate0 vs mate1 isn't the right way to think about this... Should be this vs other and single-end reads should work without any change to the interface.

Nan Zhang added 3 commits November 13, 2013 10:33
# By buyingyi (16) and others
# Via [email protected] (8) and others
* genomix/fullstack_genomix: (41 commits)
  update to new pregelix aggregator interface
  fix application lifecyle mgmt in hyracks nc
  fix the pinned page issue during a node failure
  Make sure the validty bit in the metadata page is flushed to disk when marking a component to be valid.
  NodeControllers clean up appEntryPoints on shutdown (2nd try)
  fix an issue found by Sattam
  Fix for issue 127.
  minor fix for heartbeat state population
  support multiple user-defined global aggregators
  reverted the change of removing adjacent exchange operators
  updated hivestrix test case for running aggregation fix
  Fixed a bug on unclosed running aggregation runtime; fixed an issue on two adjacent exchange operators (connectors) when duplicate sort operator is removed.
  Fixed the incorrect exchange merging introduced by the previous commit; updated the IntroHashPartitionMergeExchange rule to handle the hash-merge-exchange operator.
  Fixed a bug on omitted order by columns when added an exchange operator to enforce the group-by property.
  Fixed a bug on unclosed running aggregation runtime; fixed an issue on two adjacent exchange operators (connectors) when duplicate sort operator is removed.
  revert a minor change
  revert a minor change
  fix IIndexAccessor interface, add a boolean exclusiveMode parameter for the createSearchCursor method
  fix file write race condition
  Revert changes to InlineVariablesRule.
  ...
@jakebiesinger
Copy link
Contributor

Hey @Nan-Zhang the KMP stuff is cool but really belongs in a different branch. It's a totally separate feature!

Could you fix up this branch based on the comments here and submit a separate PR for the KMP stuff?

@jakebiesinger
Copy link
Contributor

Also as a general comment for everyone, your commits should rarely cause syntax errors. Syntax errors will cause any future git bisect to fail without being able to detect the problem commit!

Nan Zhang and others added 23 commits November 22, 2013 11:10
…ing/ray-genomix-refactor

Conflicts:
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/operator/pathmerge/BasicPathMergeVertex.java
…wbiesing/ray-genomix-refactor

Conflicts:
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/config/GenomixJobConf.java
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/minicluster/GenerateGraphViz.java
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/type/EDGETYPE.java
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/type/EdgeMap.java
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/type/Node.java
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/type/ReadHeadSet.java
	genomix/genomix-
data/src/main/java/edu/uci/ics/genomix/type/VKmer.java
	genomix/genomix-
data/src/test/java/edu/uci/ics/genomix/type/EdgeMapTest.java
	genomix/genomix-
data/src/test/java/edu/uci/ics/genomix/type/NodeTest.java
	genomix/genomix-
data/src/test/java/edu/uci/ics/genomix/type/ReadIdSetTest.java
	genomix/genomix-
data/src/test/java/edu/uci/ics/genomix/type/VKmerTest.java
	genomix/genomix-
hadoop/src/main/java/edu/uci/ics/genomix/hadoop/contrailgraphbuilding/GenomixReducer.java
	genomix/genomix-
hadoop/src/main/java/edu/uci/ics/genomix/hadoop/graph/GraphStatistics.java
	genomix/genomix-
hyracks/src/main/java/edu/uci/ics/genomix/hyracks/graph/dataflow/AggregateKmerAggregateFactory.java
	genomix/genomix-
hyracks/src/main/java/edu/uci/ics/genomix/hyracks/graph/dataflow/ReadsKeyValueParserFactory.java
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/operator/DeBruijnGraphCleanVertex.java
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/operator/bubblemerge/SimpleBubbleMergeVertex.java
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/operator/pathmerge/BasicPathMergeVertex.java
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/operator/splitrepeat/SplitRepeatVertex.java
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/operator/unrolltandemrepeat/UnrollTandemRepeat.java
	genomix/genomix-
pregelix/src/main/java/edu/uci/ics/genomix/pregelix/testhelper/BubbleAddVertex.java
…factor

Conflicts:
	genomix/genomix-
pregelix/src/test/java/edu/uci/ics/genomix/pregelix/jobgen/JobGenerator.java
Refactor on top of Nan's changes to remove syntax errors
@jakebiesinger
Copy link
Contributor

This PR was superseded by #71, which has now been merged in.

@jakebiesinger jakebiesinger deleted the nanzhang/ray-genomix branch December 4, 2013 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants