-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context building slog sender #10300
Context building slog sender #10300
Conversation
Deploying agoric-sdk with Cloudflare Pages
|
Grafana logs link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a really good start!
Besides the specific comments there are 2 changes I'd like to see:
- I am really skeptical of using a SQLite DB for persisting a simple JSON object. Can we just sync write as a JSON serialized value to a file? No need to fsync eother, I don't care about failure recovery in this case.
- Can we extract the function that does transformation from a slog event into a log record into a separate module that doesn't depend on otel stuff? I believe @warner may be interested in using this context building logic on its own. I can be wrapped in a maker that takes the
db
object as power to persist / restore the trigger context. The rest is fully stateless.
Oh yeah I'd also love to see this processing on an existing slog file from a mainnet follower! I have plenty around if needed. |
Done
Done. Separated a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback so far. Some more refactoring suggestions. I also really want us to figure out this timestamp issue as I'd like to be able to upload old slogs to GCP with the original time the events were generated as the native timestamp of the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more refactoring nits. Let's continue the timestamp discussion offline.
case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH: | ||
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START: | ||
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: { | ||
assert(!!blockContext); | ||
break; | ||
} | ||
case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: { | ||
assert(!!blockContext && !triggerContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all these can be merged together
case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH: | |
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START: | |
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: { | |
assert(!!blockContext); | |
break; | |
} | |
case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: { | |
assert(!!blockContext && !triggerContext); | |
case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH: | |
case SLOG_TYPES.COSMIC_SWINGSET.BOOTSTRAP_BLOCK.FINISH: | |
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START: | |
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: | |
case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: { | |
assert(!!blockContext && !triggerContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but holding my approval until I review the output of aa processed slog in case I missed anything. Is the grafana link up top still valid? I'll also look at the processed slog file you sent
const [secondsStr, fractionStr] = String(timestamp).split('.'); | ||
const seconds = parseInt(secondsStr, 10); | ||
const nanoSeconds = parseInt( | ||
(fractionStr || String(0)).padEnd(9, String(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't trust numbers, and want to be resilient in case the fractionStr
is more than 9 chars already.
In general I prefer string literals where possible
(fractionStr || String(0)).padEnd(9, String(0)), | |
(fractionStr.substr(0, 9) || '').padEnd(9, '0'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to microtime.nowDouble documentation, it returns microseconds precision so I don't think we should ever get 9 decimal places right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precision of the source data doesn't mean anything for the representation as an IEEE 754 number. Those tend to do weird things often enough that I wouldn't bet on there being a max of 9 digits when stringified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I knew I wouldn't get it right just by staring at the code generating slog events. After looking at the output, looks like we have some tweaks to make.
} | ||
// eslint-disable-next-line no-restricted-syntax | ||
case SLOG_TYPES.COSMIC_SWINGSET.RUN.FINISH: { | ||
triggerContext = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake on the logic of the trigger context persistence. We need to conditionally persist the context here:
triggerContext = null; | |
persistContext(finalBody.remainingBeans > 0 ? null : triggerContext); | |
triggerContext = null; |
Then we can simply restore the context on run start for runNum === 0
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think we can persist an empty context instead (so you can keep null
as missing).
triggerContext = null; | |
persistContext(finalBody.remainingBeans > 0 ? {} : triggerContext); | |
triggerContext = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have no assertion on triggerContext
here, it could be null
at this point. So I think we should either do
persistContext(finalBody.remainingBeans || !triggerContext ? {} : triggerContext);
Or have an assertion:
assert(!!triggerContext);
persistContext(finalBody.remainingBeans ? {} : triggerContext)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a trigger context by this point. Let's add an assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that in the final version the > 0
part of the check was dropped, which causes the persisted context to always be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are missing the context in case of negative remaining beans right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing the persisted context roughly always because the remaining beans is almost never 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified that the processor seem to generate coherent context data.
Once the slicing of timestamp fraction is fixed, feel free to merge.
Also consider adding a context-aware-slog-file.js
(or better name) that simply writes the contextualized data to a file. Here is what I used (to experiment with ingest-slog
), but feel free to adapt (or add support for persisting context, probably by extracting the tool from the otel file into a standalone module).
import { makeFsStreamWriter } from '@agoric/internal/src/node/fs-stream.js';
import { makeContextualSlogProcessor } from './context-aware-slog.js';
import { serializeSlogObj } from './serialize-slog-obj.js';
/**
* @import {MakeSlogSenderOptions as Options} from './index.js'
*/
/**
* @param {Options} options
*/
export const makeSlogSender = async options => {
const { CHAIN_ID, CONTEXTUAL_SLOGFILE } = options.env || {};
if (!CONTEXTUAL_SLOGFILE)
return console.warn(
'Ignoring invocation of slogger "context-aware-slog-file" without the presence of "CONTEXTUAL_SLOGFILE"',
);
const stream = await makeFsStreamWriter(CONTEXTUAL_SLOGFILE);
if (!stream) {
return undefined;
}
const contextualSlogProcessor = makeContextualSlogProcessor({
'chain-id': CHAIN_ID,
});
/**
* @param {import('./context-aware-slog.js').Slog} slog
*/
const slogSender = slog => {
const contextualizedSlog = contextualSlogProcessor(slog);
// eslint-disable-next-line prefer-template
stream.write(serializeSlogObj(contextualizedSlog) + '\n').catch(() => {});
};
return Object.assign(slogSender, {
forceFlush: () => stream.flush(),
shutdown: () => stream.close(),
});
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about the return value of slogSender
. Let's make sure we don't end up with unhandled rejections.
refs: #10300 Incidental Best reviewed commit-by-commit ## Description While verifying #10300 I ran into some errors and lack of stdout streaming features. This is what I arrived at to let me process some slog files manually. ### Security Considerations None ### Scaling Considerations None production impacting This adds a new block throttle mechanism to the ingest-slog tool, while relaxing the line based throttle. ### Documentation Considerations None ### Testing Considerations Manually tested with the slog sender detailed in #10300 (review). ### Upgrade Considerations Affects chain software, but only the optional telemetry side. Not consensus affecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix the usage of non bound methods.
7046bbb
to
3ddcc6d
Compare
closes: #10269 ## Description Adds a slog sender which will build various contexts along the way and report them along with the slogs for better logs querying and identification ### Security Considerations None ### Scaling Considerations This uses a json file storage ### Documentation Considerations This is a new slogger which can be opted into ### Testing Considerations This will be deployed on testnets (already deployed on one of the testnets and log link is added in a comment below) ### Upgrade Considerations This can be configured on existing deployments by bumping the telemetry package
closes: #10269
Description
Adds a slog sender which will build various contexts along the way and report them along with the slogs for better logs querying and identification
Security Considerations
None
Scaling Considerations
This uses a json file storage
Documentation Considerations
This is a new slogger which can be opted into
Testing Considerations
This will be deployed on testnets (already deployed on one of the testnets and log link is added in a comment below)
Upgrade Considerations
This can be configured on existing deployments by bumping the telemetry package