Spark 3.3.2 Features

Spark on Qubole provides customized features with Spark 3.3.2 as listed in the following table.

Feature

Description

Reference

Pandas API: Increased Coverage with New Features

PySpark now natively understands datetime.timedelta across Spark SQL and Pandas API on Spark.

Pandas API Coverage.

This Python type now maps to the date-time interval type in Spark SQL.

Many missing parameters and new API features are now supported for Pandas API on Spark. Examples include endpoints like ps.merge_asof, ps.timedelta_range and ps.to_timedelta.

ANSI compliance

Support of the ANSI interval data types.

ANSI Enhancements.

Can read/write interval values from/to tables.

Use intervals in many functions/operators to do date/time arithmetic, including aggregation and comparison.

Implicit casting in ANSI mode now supports safe casts between types while protecting against data loss.

A growing library of “try” functions, such as “try_add” and “try_multiply”, complement ANSI mode allowing users to embrace the safety of ANSI mode rules while also still allowing for fault tolerant queries.

New built-in functions

A growing library of “try” functions, such as “try_add” and “try_multiply”.

Built-in Functions.

Nine new linear regression functions and statistical functions.

Four new string processing functions.

Aes_encryption and decryption functions.

Generalized floor and ceiling functions.

“To_number” formatting and others.

For more information about all the features and enhancements of Apache Spark 3.3.2, see Apache Spark 3.3.2 documentation.