Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

Open
sudhikul opened this issue Mar 25, 2021 · 2 comments
Open

Comments

@sudhikul
Copy link

Hi,

I am facing an issue with Spark job, which is reading streaming data from Azure Event Hub and storing the data in ADL(Azure Data Lake) Gen1 file system.

Spark Version: 3.0.0

Please help and let me know

  1. What is the root cause of the issue ?
  2. How to fix it ? Is this something to do with the size of ADL Gen1 file system.
  3. Also, one more observation is that - this is happening usually when the size of the input transactions is more (1 million). But, this issue is usually not seen when the size is less than 1M. Is this just a co-incidence ? Or is it something to do with the size of input load also ?

Brief Overview
Our Big Data Product runs in AKS Cluster deployed in Microsoft Azure.

All the jobs executed within the product are Apache Spark jobs. In addition to HDFS, even Azure Data Lake Gen1 is also one of the supported file systems.

Scenario
Source generates events and publishes them into Azure Event Hubs. Spark Streaming job is waiting for events on a particular EH(Event Hub) and it will keep on writing the data into Azure Data Lake Gen1 file system.

  • Huge number of transactions (1 to 5 million) records have been injected at the source side
  • Spark streaming job is continuously running for hours and writing the data into the ADL Gen1 file system

All of a sudden, it fails in the middle with the below error:
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245)
Caused by: com.microsoft.azure.datalake.store.ADLException: Error creating file /landing_home/delta_log/.00000000000000001748.json.9d2edecf-973c-4d61-a178-4db46bd70f2c.tmp
Operation CREATE failed with exception java.net.SocketTimeoutException : Read timed out
Last encountered exception thrown after 1 tries. [java.net.SocketTimeoutException]
[ServerRequestId:null]
at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at com.microsoft.azure.datalake.store.ADLStoreClient.createFile(ADLStoreClient.java:281)
at org.apache.hadoop.fs.adl.AdlFileSystem.create(AdlFileSystem.java:374)
at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1228)
at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100)
at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692)

@sudhikul
Copy link
Author

Hi All,

This is just a gentle reminder!!!

Can anyone look into this issue and provide your inputs on how to fix this issue at the earliest?

Thanks and Regards,
Sudhindra

@rahuldutta90
Copy link
Contributor

@sudhikul Apologies for the delay. what is the version of adls java sdk you are using? You can find the jar name "azure-data-lake-store-*"? I recall there was a issue in older version of sdk.

Also I would recommend you to open a case through Azure portal for your gen1 account in that way you can provide detail more on the account etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants