Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HCatalog and Pig #114

Merged
merged 1 commit into from
Mar 8, 2024
Merged

Add support for HCatalog and Pig #114

merged 1 commit into from
Mar 8, 2024

Conversation

jphalip
Copy link
Collaborator

@jphalip jphalip commented Nov 27, 2023

No description provided.

Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: Patch coverage is 0% with 261 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (def5ba8) to head (fb51317).
Report is 180 commits behind head on main.

Files Patch % Lines
...ery/connector/utils/bq/BigQueryValueConverter.java 0.00% 49 Missing ⚠️
...bigquery/connector/BigQueryStorageHandlerBase.java 0.00% 36 Missing ⚠️
...gquery/connector/utils/hcatalog/HCatalogUtils.java 0.00% 36 Missing ⚠️
.../hive/bigquery/connector/BigQueryMetaHookBase.java 0.00% 30 Missing ⚠️
...igquery/connector/output/OutputCommitterUtils.java 0.00% 28 Missing ⚠️
.../cloud/hive/bigquery/connector/utils/JobUtils.java 0.00% 24 Missing ⚠️
.../hive/bigquery/connector/utils/hive/HiveUtils.java 0.00% 12 Missing ⚠️
...uery/connector/output/BigQueryOutputCommitter.java 0.00% 8 Missing ⚠️
...igquery/connector/output/BigQueryOutputFormat.java 0.00% 7 Missing ⚠️
.../bigquery/connector/input/BigQueryInputFormat.java 0.00% 6 Missing ⚠️
... and 9 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #114       +/-   ##
============================================
- Coverage     80.40%   0.00%   -80.41%     
============================================
  Files            38      74       +36     
  Lines          1536    2276      +740     
  Branches        192     281       +89     
============================================
- Hits           1235       0     -1235     
- Misses          207    2276     +2069     
+ Partials         94       0       -94     
Flag Coverage Δ
integrationtest ?
integrationtest_hive2 0.00% <0.00%> (?)
integrationtest_hive3 0.00% <0.00%> (?)
unittest ?
unittest_hive2 0.00% <0.00%> (?)
unittest_hive3 0.00% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jphalip
Copy link
Collaborator Author

jphalip commented Jan 25, 2024

@yigress I've removed all the Spark code from this PR. We can leave the Spark stuff aside for now. Let me know what you think about this one!

@jphalip jphalip changed the base branch from sparksql-support to main February 15, 2024 19:00
@yigress
Copy link
Collaborator

yigress commented Feb 15, 2024

I tried to run a pig example but got error

JobId	Alias	Feature	Message	Outputs
job_1707866711793_0028	A	MAP_ONLY	Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: nation
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:300)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1678)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1675)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1675)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
	at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
	at java.base/java.lang.Thread.run(Thread.java:829)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:298)
Caused by: java.lang.NullPointerException
	at com.google.cloud.hive.bigquery.repackaged.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:889)
	at com.google.cloud.hive.bigquery.connector.input.BigQueryInputSplit.createSplitsFromBigQueryReadStreams(BigQueryInputSplit.java:179)
	at com.google.cloud.hive.bigquery.connector.input.BigQueryInputFormat.getSplits(BigQueryInputFormat.java:44)
	at org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:163)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:283)
	... 18 more

tried to debug a little but couldn't figure out why NPE. the code logic seems okay, but unless the jobconf somehow got cleaned up

README.md Show resolved Hide resolved
@jphalip
Copy link
Collaborator Author

jphalip commented Feb 16, 2024

tried to debug a little but couldn't figure out why NPE. the code logic seems okay, but unless the jobconf somehow got cleaned up

I have a if statement to update the Hadoop conf for HCatalog:

https://github.com/GoogleCloudDataproc/hive-bigquery-connector/pull/114/files#diff-3522a32732801b4b87e3fccebc73d130a1304d99e487f0eef511631180c639bbR46-R49

It looks like somehow the if statement's condition (HCatalogUtils.isHCatalogInputJob(jobConf)) isn't satisfied in your environment. Odd... By chance are you able to debug that line?

@yigress
Copy link
Collaborator

yigress commented Feb 16, 2024

tried to debug a little but couldn't figure out why NPE. the code logic seems okay, but unless the jobconf somehow got cleaned up

I have a if statement to update the Hadoop conf for HCatalog:

https://github.com/GoogleCloudDataproc/hive-bigquery-connector/pull/114/files#diff-3522a32732801b4b87e3fccebc73d130a1304d99e487f0eef511631180c639bbR46-R49

It looks like somehow the if statement's condition (HCatalogUtils.isHCatalogInputJob(jobConf)) isn't satisfied in your environment. Odd... By chance are you able to debug that line?

the if seems executed, that is why i suspect maybe somehow the conf get swapped somewhere

@jphalip
Copy link
Collaborator Author

jphalip commented Feb 16, 2024

the if seems executed, that is why i suspect maybe somehow the conf get swapped somewhere

The if and checkNotNull statements should be running in the same thread and I don't see where the conf object would be modified between the two statements.

Perhaps tableInfo.getDataColumns().getFieldNames() returns null in your case. Are you able to verify that?

@yigress
Copy link
Collaborator

yigress commented Mar 7, 2024

LGTM + 1

@jphalip jphalip merged commit 64af1b0 into main Mar 8, 2024
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants