Spark Structured Streaming

Bug Fixes

  • SPAR-3091: When output files are written to S3 in Structured Streaming, task re-attempts can result in duplicate task data files. Spark ignores such duplicate files but other engines such as Hive, Presto, etc. consider those files for downstream ETL operations. With this fix, such duplicate files are deleted and the task re-attempts that are caused due to Eventual Consistency (EC) are reduced. This issue is fixed in Spark 2.3.2 and 2.4.0 versions.
  • SPAR-3258: Issues with over-ridden kinesis configuration is fixed . Now, the kinesis connector can handle more than 100 shards in the input Kinesis stream. This issue is fixed in Spark 2.3.2 and 2.4.0 versions.