Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new accumulo command #5073

Open
wants to merge 7 commits into
base: 2.1
Choose a base branch
from

Conversation

kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Nov 18, 2024

closes #5034

I did a deep dive into the code executed by accumulo check-server-config to figure out what is being checked to see if an instance can be avoided or what could be checked without an instance. Here is root code for reference:

public static void main(String[] args) {
try (var context = new ServerContext(SiteConfiguration.auto())) {
context.getConfiguration();
}
}

Here is a summary of my findings:

  • SiteConfiguration.auto() does validation: checks that the deprecated accumulo-site.xml file doesn't exist, loads properties from the config file (accumulo.properties), checks that both deprecated and the replacement properties aren't both declared in the file, replaces deprecated properties with their non-deprecated versions, and returns a new SiteConfiguration object (the SiteConfiguration object constructor calls ConfigCheckUtil.validate(config) to validate the config passed in)
  • context.getConfiguration() does validation. However, as far as I can tell, the only validation done from this call requires an instance (the only validation seems to stem from ZooBasedConfiguration/PropSnapshot/PropStore/PropStoreKey/etc.). Validates at least the instance.secret property (maybe more). Since this is related to an instance, seems fine that it wouldn't be validated here anyways.
  • new ServerContext(SiteConfiguration.auto()) (= new ServerContext(new ServerInfo(SiteConfiguration.auto()))) does validation. This creates a new ServerInfo(siteConfig) passing that to the ServerContext constructor. There is no validation done in the ServerContext constructor: most fields are memoized suppliers (the only one we access is ServerConfigurationFactory serverConfFactory which is accessed through context.getConfiguration() and we went into that code above). The ones that aren't memoized suppliers don't do any validation. But there is validation done from constructing ServerInfo. We will take what we can (does some form of validation and is not reliant on an instance) and use that in our new check.

With this info, I wrote a new check CheckServerConfigNoInstance which, to the best of my knowledge, is all the validation that the current CheckServerConfig would be able to do without a running instance. So, this PR adds a new accumulo check-server-config-no-inst (in addition to the existing accumulo check-server-config) which performs the checks that accumulo check-server-config does without the checks that require a running Accumulo instance. Useful for (at least mostly) validating the accumulo.properties file without an instance.

I'm not sure which branch this should target. Targeting 2.1 for now since the change could be made as early as 2.1 and it may be a desireable command here, but not sure if we would want to add a new command in the 2.1 line.

Any input/suggestions on this PR would be appreciated.

Adds new `accumulo check-server-config-no-inst` command which performs the checks that `accumulo check-server-config` does without the checks that require a running Accumulo instance. Useful for (at least mostly) validating the `accumulo.properties` file without a running instance.
@kevinrr888 kevinrr888 added this to the 2.1.4 milestone Nov 18, 2024
@kevinrr888 kevinrr888 self-assigned this Nov 18, 2024
@kevinrr888 kevinrr888 linked an issue Nov 18, 2024 that may be closed by this pull request
@dlmarion
Copy link
Contributor

@kevinrr888 - wondering if we could just run the no-instance checks if the error is thrown when an instance doesn't exist so that we can re-use the command? Thoughts?

@kevinrr888
Copy link
Member Author

@kevinrr888 - wondering if we could just run the no-instance checks if the error is thrown when an instance doesn't exist so that we can re-use the command? Thoughts?

I considered that, but thought it might hide some issues. For example, if a user expects that they have a running instance and run accumulo check-server-config without a running instance, it would do less checks than may be expected and would not result in any no-instance errors. To me, it seemed safest to split these into two separate commands and keep the no-instance errors of the original command.

@dlmarion
Copy link
Contributor

I was thinking a stern warning could printed to the screen that no instance is running so we are doing a reduced set of checks. I assume someone is going to run this interactively and not via a script. If we don't want to make that assumption, we could return a non-zero code.

@kevinrr888
Copy link
Member Author

I was thinking a stern warning could printed to the screen that no instance is running so we are doing a reduced set of checks. I assume someone is going to run this interactively and not via a script. If we don't want to make that assumption, we could return a non-zero code.

Yeah, I think a warning like that would be a way to replace the extra command. However, another thing I'm thinking of is it might be tricky to identify what a no-instance exception is. There is a certain exception that currently occurs, but that exception may change.

@dlmarion
Copy link
Contributor

We could just raise our own exception, NoInstanceException, for example.

Deleted CheckServerConfigNoInstance and moved functionality to
CheckServerConfig
@kevinrr888
Copy link
Member Author

I removed the new check and now check if an instance exists before the full set of checks runs in CheckServerConfig. The instance is checked with VolumeManager.getInstanceIDFromHdfs(instanceIdPath, hadoopConfig). Coincidentally, the vars needed for this call perform all* the checks that can be performed without a running instance: SiteConfiguration.auto(), VolumeManagerImpl.get(siteConfig, hadoopConfig), and new ServerDirs(siteConfig, hadoopConfig).

I wanted to avoid just surrounding the existing code:

    try (var context = new ServerContext(SiteConfiguration.auto())) {
      context.getConfiguration();
    }

with a try-catch as the exception that occurs is a few layers deep and may be changed in the future and it would not be clear that changing those exceptions would affect this command. Also, if we surrounded it with a try-catch we would have to identify what is an exception relating to no instance from many potential exceptions. Calling the method directly avoids these issues.

*to the best of my knowledge

@kevinrr888
Copy link
Member Author

One issue with this current impl is that the stack trace will still show an exception (this time a ConnectException) even though the calling code handles all Exceptions. My guess is that the exception is occurring in another thread. So, an output of accumulo check-server-config would look like this with no instance running:

2024-11-19T12:16:25,916 [conf.SiteConfiguration] INFO : Found Accumulo configuration on classpath at /home/krathbun/Desktop/github/fluo-uno/install/accumulo-2.1.4-SNAPSHOT/conf/accumulo.properties
2024-11-19T12:16:26,300 [fs.VolumeManager] ERROR: Problem reading instance id out of hdfs at hdfs://localhost:8020/accumulo/instance_id
java.net.ConnectException: Call From groot/10.0.0.182 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:930) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:845) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1571) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1513) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) ~[hadoop-client-api-3.3.6.jar:?]
	at jdk.proxy2/jdk.proxy2.$Proxy30.getListing(Unknown Source) ~[?:?]
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:689) ~[hadoop-client-api-3.3.6.jar:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) ~[hadoop-client-api-3.3.6.jar:?]
	at jdk.proxy2/jdk.proxy2.$Proxy31.getListing(Unknown Source) ~[?:?]
	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1702) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1686) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1113) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:149) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1188) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1185) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1195) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.accumulo.server.fs.VolumeManager.getInstanceIDFromHdfs(VolumeManager.java:214) ~[accumulo-server-base-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at org.apache.accumulo.server.conf.CheckServerConfig.main(CheckServerConfig.java:54) ~[accumulo-server-base-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at org.apache.accumulo.server.conf.CheckServerConfig.execute(CheckServerConfig.java:78) ~[accumulo-server-base-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at org.apache.accumulo.start.Main.lambda$execKeyword$0(Main.java:122) ~[accumulo-start-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:205) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:600) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:652) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:773) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:347) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1632) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1457) ~[hadoop-client-api-3.3.6.jar:?]
	... 28 more
2024-11-19T12:16:26,300 [conf.CheckServerConfig] WARN : Performed only a subset of checks (which passed). There is a problem relating to the instance:
Can't tell if Accumulo is initialized; can't read instance id at hdfs://localhost:8020/accumulo/instance_id
and no further checks could be done. If this is unexpected, make sure an instance is running and re-run the command.

The subset of the checks have still occurred and passed (noted in the output), but the exception unfortunately makes the info less visible and it might be better if the exception was not seen here.

@dlmarion
Copy link
Contributor

Regarding the exception being printed, that might be due to the logging configuration for the client (log4j2.properties). If that is set to direct output to a file, then you likely would not see that. For that reason, any output that we have from the command may want to use System.out or System.err

@kevinrr888
Copy link
Member Author

kevinrr888 commented Nov 19, 2024

Gotcha. Changed to print the warning. That logged error from my running of it was my only concern with this, but if it's fine to you then I have no problems with it.

try {
VolumeManager.getInstanceIDFromHdfs(instanceIdPath, hadoopConfig);
} catch (Exception e) {
System.out.println("WARNING : Performed only a subset of checks (which passed). There is a "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctubbsii - Do you have thoughts on this and the discussion around it on the conversation tab?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure. It seems to me that the original intent of the check, like most checks, are to check this instance. I'm not sure that this command makes any sense at all to run without an instance, so I think just having an error is fine in those circumstances.

It seems like the use case that #5034 was trying to report was different than this check was written for. That use case seems to be "I want to validate this config file, independently of any Accumulo instance". For that case, I think a much simpler solution can be done... just provide a config file as an optional argument, do SiteConfiguration.fromFile(new File(path)).build() using the specified file to validate it, rather than using SiteConfiguration.auto(), and then skip any instance-specific checks after that. After all, that use case is just asking for validation of the contents of a config file, not validation of any particular instance's configuration.

Copy link
Member Author

@kevinrr888 kevinrr888 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid messing with the original functionality of the CheckServerConfig check (as it seems like it is working as intended), created new check to validate the Accumulo configuration file in 1255f90. Allows users to optionally provide the config file, as you mentioned. Also see comment.

Edit: removed the ability to accept a file path as an arg in 973aefa. Appears unacceptable in our build constraints:

[ERROR] High: This API (java/io/File.<init>(Ljava/lang/String;)V) reads a file whose location might be specified by user input [org.apache.accumulo.server.conf.CheckAccumuloConfig] At CheckAccumuloConfig.java:[line 44] PATH_TRAVERSAL_IN

Checks the accumulo config at the given file path if one is provided.
Otherwise, attempts to find and check the config file. Useful for
checking the accumulo config without a running instance.
Comment on lines +47 to +52
try {
VolumeManagerImpl.get(siteConfig, hadoopConfig);
} catch (IOException e) {
throw new IllegalStateException(e);
}
new ServerDirs(siteConfig, hadoopConfig);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also included these checks which were the only checks I found (other than SiteConfiguration.<...>.build()) in the original CheckServerConfig code which would have done some validation that did not require an instance. These check properties like instance.volumes and instance.volumes.replacements (maybe more). Seemed fine to include as they just perform extra checks on instance related properties without the instance running or needing to be started.

Maybe it would be better if these weren't included or maybe it would be better if more checks were included (would need info on what else to include). I'm not sure. Some input on this would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Utility to check configuration cannot be run prior to init
3 participants