You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am facing an issue with Spark job, which is reading streaming data from Azure Event Hub and storing the data in ADL(Azure Data Lake) Gen1 file system.
Spark Version: 3.0.0
Please help and let me know
What is the root cause of the issue ?
How to fix it ? Is this something to do with the size of ADL Gen1 file system.
Also, one more observation is that - this is happening usually when the size of the input transactions is more (1 million). But, this issue is usually not seen when the size is less than 1M. Is this just a co-incidence ? Or is it something to do with the size of input load also ?
Brief Overview
Our Big Data Product runs in AKS Cluster deployed in Microsoft Azure.
All the jobs executed within the product are Apache Spark jobs. In addition to HDFS, even Azure Data Lake Gen1 is also one of the supported file systems.
Scenario
Source generates events and publishes them into Azure Event Hubs. Spark Streaming job is waiting for events on a particular EH(Event Hub) and it will keep on writing the data into Azure Data Lake Gen1 file system.
Huge number of transactions (1 to 5 million) records have been injected at the source side
Spark streaming job is continuously running for hours and writing the data into the ADL Gen1 file system
All of a sudden, it fails in the middle with the below error: org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245)
Caused by: com.microsoft.azure.datalake.store.ADLException: Error creating file /landing_home/delta_log/.00000000000000001748.json.9d2edecf-973c-4d61-a178-4db46bd70f2c.tmp
Operation CREATE failed with exception java.net.SocketTimeoutException : Read timed out
Last encountered exception thrown after 1 tries. [java.net.SocketTimeoutException]
[ServerRequestId:null]
at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at com.microsoft.azure.datalake.store.ADLStoreClient.createFile(ADLStoreClient.java:281)
at org.apache.hadoop.fs.adl.AdlFileSystem.create(AdlFileSystem.java:374)
at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1228)
at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100)
at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692)
The text was updated successfully, but these errors were encountered:
@sudhikul Apologies for the delay. what is the version of adls java sdk you are using? You can find the jar name "azure-data-lake-store-*"? I recall there was a issue in older version of sdk.
Also I would recommend you to open a case through Azure portal for your gen1 account in that way you can provide detail more on the account etc.
Hi,
I am facing an issue with Spark job, which is reading streaming data from Azure Event Hub and storing the data in ADL(Azure Data Lake) Gen1 file system.
Spark Version: 3.0.0
Please help and let me know
Brief Overview
Our Big Data Product runs in AKS Cluster deployed in Microsoft Azure.
All the jobs executed within the product are Apache Spark jobs. In addition to HDFS, even Azure Data Lake Gen1 is also one of the supported file systems.
Scenario
Source generates events and publishes them into Azure Event Hubs. Spark Streaming job is waiting for events on a particular EH(Event Hub) and it will keep on writing the data into Azure Data Lake Gen1 file system.
All of a sudden, it fails in the middle with the below error:
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245)
Caused by: com.microsoft.azure.datalake.store.ADLException: Error creating file /landing_home/delta_log/.00000000000000001748.json.9d2edecf-973c-4d61-a178-4db46bd70f2c.tmp
Operation CREATE failed with exception java.net.SocketTimeoutException : Read timed out
Last encountered exception thrown after 1 tries. [java.net.SocketTimeoutException]
[ServerRequestId:null]
at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at com.microsoft.azure.datalake.store.ADLStoreClient.createFile(ADLStoreClient.java:281)
at org.apache.hadoop.fs.adl.AdlFileSystem.create(AdlFileSystem.java:374)
at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1228)
at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100)
at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692)
The text was updated successfully, but these errors were encountered: