You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 (using the shaded jar of the new version) I faced performance degradation almost doubling the time of my tests.
I have a performance test case which I run on my fileSystem implementation which uses org.apache.hadoop.fs.FileSystem the test runs several operations [create, read, write, rename, checkIfExists, mkDir] on 100 files with multiple threads.
I ran same tests several time on both versions of the Hadoop connectors and the new [2.2.8] is showing overall slower execution time (almost 2-2.2X the old connector time).
Below is a comparison between the average execution time for each operation while using each connector version:
I have checked this github issue & tried to follow the recommendation to fine tune the performance using the configs/params but failed to find any improvement.
Is there any guidelines on parameter configurations to improve the above operations time?
Or might this performance issue be due to some incompatibility in my class-path jars? Even though I am using the shaded jar can other jars interfere?
@Override
public InputStream read() throws IOException {
return fs.open(path);
}
My test case simply creates many threads each has different a different instance of a file object which has different path (path to a unique GCS bucket object, path i.e gs://some-bucket/objectX) and then do read operation in example.
When upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 (using the shaded jar of the new version) I faced performance degradation almost doubling the time of my tests.
I also created this Stackoverflow question
I have a performance test case which I run on my fileSystem implementation which uses org.apache.hadoop.fs.FileSystem the test runs several operations [create, read, write, rename, checkIfExists, mkDir] on 100 files with multiple threads.
I ran same tests several time on both versions of the Hadoop connectors and the new [2.2.8] is showing overall slower execution time (almost 2-2.2X the old connector time).
Below is a comparison between the average execution time for each operation while using each connector version:
I have checked this github issue & tried to follow the recommendation to fine tune the performance using the configs/params but failed to find any improvement.
Is there any guidelines on parameter configurations to improve the above operations time?
Or might this performance issue be due to some incompatibility in my class-path jars? Even though I am using the shaded jar can other jars interfere?
Here is a list of jars I have in my class path:
The text was updated successfully, but these errors were encountered: