Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

sudhikul · 2021-03-25T13:56:45Z

Hi,

I am facing an issue with Spark job, which is reading streaming data from Azure Event Hub and storing the data in ADL(Azure Data Lake) Gen1 file system.

Spark Version: 3.0.0

Please help and let me know

What is the root cause of the issue ?
How to fix it ? Is this something to do with the size of ADL Gen1 file system.
Also, one more observation is that - this is happening usually when the size of the input transactions is more (1 million). But, this issue is usually not seen when the size is less than 1M. Is this just a co-incidence ? Or is it something to do with the size of input load also ?

Brief Overview
Our Big Data Product runs in AKS Cluster deployed in Microsoft Azure.

All the jobs executed within the product are Apache Spark jobs. In addition to HDFS, even Azure Data Lake Gen1 is also one of the supported file systems.

Scenario
Source generates events and publishes them into Azure Event Hubs. Spark Streaming job is waiting for events on a particular EH(Event Hub) and it will keep on writing the data into Azure Data Lake Gen1 file system.

Huge number of transactions (1 to 5 million) records have been injected at the source side
Spark streaming job is continuously running for hours and writing the data into the ADL Gen1 file system

All of a sudden, it fails in the middle with the below error:
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245)
Caused by: com.microsoft.azure.datalake.store.ADLException: Error creating file /landing_home/delta_log/.00000000000000001748.json.9d2edecf-973c-4d61-a178-4db46bd70f2c.tmp
Operation CREATE failed with exception java.net.SocketTimeoutException : Read timed out
Last encountered exception thrown after 1 tries. [java.net.SocketTimeoutException]
[ServerRequestId:null]
at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at com.microsoft.azure.datalake.store.ADLStoreClient.createFile(ADLStoreClient.java:281)
at org.apache.hadoop.fs.adl.AdlFileSystem.create(AdlFileSystem.java:374)
at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1228)
at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100)
at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692)

sudhikul · 2021-03-27T06:44:16Z

Hi All,

This is just a gentle reminder!!!

Can anyone look into this issue and provide your inputs on how to fix this issue at the earliest?

Thanks and Regards,
Sudhindra

rahuldutta90 · 2021-03-29T18:12:14Z

@sudhikul Apologies for the delay. what is the version of adls java sdk you are using? You can find the jar name "azure-data-lake-store-*"? I recall there was a issue in older version of sdk.

Also I would recommend you to open a case through Azure portal for your gen1 account in that way you can provide detail more on the account etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

sudhikul commented Mar 25, 2021

sudhikul commented Mar 27, 2021

rahuldutta90 commented Mar 29, 2021

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

Comments

sudhikul commented Mar 25, 2021

sudhikul commented Mar 27, 2021

rahuldutta90 commented Mar 29, 2021