-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concatenated jobs save state #110
Comments
This affects #101. When the |
I didn't quite get the problem. The |
Ah, but the values are different classes. How can we expect to call
Yes, I think on the first iteration, we'll have to add a "soft-reset" which will just clear the state. The pregelix bug was just fixed so as a result, our jobs may not work now. I'll investigate today. |
Got it. |
@anbangx @Elmira88 @JavierJia @Nan-Zhang
Elmira found a pretty major problem when running without
-saveIntermediateResults
recently: the vertex value is unchanged between iterations. I had thought that the entire dataset would be scrubbed by going through the Input/OutputFormat adapters between jobs (Node -> P4VertexValue -> Node -> RayVertexValue -> Node -> ...).Turns out that's false and the only reason we haven't noticed before is because of a bug in pregelix by which concatenated jobs all use the generic base class
VertexValue
. I've submitted the bug report to @sigmod and there's a temporary solution: we manually scrub the data on the first iteration.In the meantime, I recommend using
-saveIntermediateResults
to make sure that the state gets wiped between jobs. Specifying this flag will force an HDFS write and therefore a trip through the Input/OutputFormat adapters.The text was updated successfully, but these errors were encountered: