Understanding Authorization of Hive Objects in Spark

Spark on Qubole supports SQL Standard authorization of Hive objects in Spark 2.0 and later versions. With this feature, Spark honors the privileges and roles set in Hive as per Understanding Qubole Hive Authorization and offer Hive table data security through granular access to table data.

For more information on Hive authorization and privileges, see Understanding Qubole Hive Authorization. This feature is available for beta access. To enable it on a Qubole account, create a ticket with Qubole Support.

Spark on Qubole supports table-level security on all supported languages. This means that any Spark command accessing Hive objects, SQL, Scala, pyspark, or Spark R honors authorization.

For details on how to configure Hive Thrift Metastore Interface as a Spark cluster override, see Configuring Thrift Metastore Server Interface for the Custom Metastore (AWS).

Prerequisites for Enabling Authorization of Hive Objects in Spark

Authorization of Hive Objects is enabled on a QDS account with this prerequisite:

  • Per-user interpreter mode is enabled on all active Spark clusters. For more information on the user interpreter mode, see Using the User Interpreter Mode for Spark Notebooks. The legacy interpreter mode gets disabled after Hive authorization is enabled on the QDS account.

Running Hive Admin Commands Through SparkSQL

Starting with Spark 2.4, Spark on Qubole enables you to run Hive Admin commands through SparkSQL. A user with appropriate privileges can run the following commands:

  • Set role
  • Grant privilege (SELECT, INSERT, DELETE, UPDATE or ALL)
  • Revoke privilege (SELECT, INSERT, DELETE, UPDATE or ALL)
  • Grant role
  • Revoke role
  • Show Grant
  • Show current roles
  • Show roles
  • Show role grant
  • Show principals for role.

The syntax of Hive Admin commands for Spark is same as the Hive authorization commands. For more information about the syntax, see SQL Standard Based Hive Authorization

Limitations of Hive Admin Commands in Spark

  • Show Grant command: Currently, the ALL case is not supported. The supported forms of the Show Grant Command are as follows:

      SHOW GRANT USER user1 on TABLE table1;
      SHOW GRANT on TABLE table1;
    
    **Example of unsupported cases**
    
      .. sourcecode:: bash
    
         SHOW GRANT USER user1 on ALL;
         SHOW GRANT ON ALL;
    
  • Set Role command: None and setting of multiple roles at once are not supported.

    Example of unsupported cases

    SET ROLE NONE;
    SET ROLE role1, role2;
    

Known Issues in Authorization of Hive Objects in Spark

These are known issues only in Spark 2.0.0:

  • CREATE DATABASE does not pass the owner information. A temporary workaround would be to create databases using Hive.
  • In CREATE TABLE commands, permissions are not given to the owner of the table, hence any query made by the owner on the table created fails due to an authorization failure. A temporary workaround would be to create tables using Hive.
  • SHOW COLUMNS does not honor authorization and any user can perform that query on a table.

This is a known issue only in Spark 2.1.0:

  • ANALYZE TABLE does not honor authorization and any user can perform that query on a table.