SQL Authorization through Ranger in Presto

Presto on Qubole supports authorization using Apache Ranger. In addition to providing column-level authorization, Ranger provides additional features including row-level filtering and data masking. Ranger authorization works for all connectors in Presto on Qubole. For a connector that does not have a corresponding Ranger plugin in Ranger Admin, you can use an existing plugin such as Hive to define policies but with the limitation that auto-completion of fields in Ranger admin is not available for such connectors. For example, for MySQL connector of Presto, you can use Hive-Service in Ranger Admin and you can configure Presto to use that Hive-service for the given MySQL catalog.

Note

Currently, Apache Ranger is only supported from Presto version 0.208.

The following sections are:

Examples of SQL Authorization of Presto in Ranger provides examples that help you understand the SQL authorization.

Supported Ranger Features

The supported Ranger features are:

SQL Authorization

Presto on Qubole supports all forms of SQL Authorization with the Ranger plugin, which are: Schema level, Table level, and Column level. Supported

The supported Ranger policy constructs for authorization are: Allow-Conditions, Deny-Conditions, Resource-Inclusion, and Resource-Exclusion. Presto also supports User Groups defined in Ranger.

Row-level Filtering

In Ranger, admins can set policies to hide certain rows from particular users or groups. Assume, for example, an organization wants to prevent access by non-managers in the Finance department to certain details from the Salary table (empId, designation, salary) for executive staff. In this case, the admin adds all non-managers in the Finance department into a group, say finEmps and adds a row filter on the Salary table with the filter defined as designation != 'executive'. The results of queries run by users in the finEmps group does not include any rows where the value of the designation column is executive.

Click here to learn more about row-level filters in Ranger (see use case #2).

Data Masking Policies

For datasets that include sensitive information, admins can configure a policy with Mask Conditions to mask the sensitive data for a particular set of users. Ranger provides several options for masking data, including Redact, Hash, Nullify, Custom, and more. The Ranger plugin in QDS Presto, however, currently supports only the Custom masking option.

../_images/ranger_masking_options.png

Click here to learn more about data masking in Ranger (see use case #3).

Configuring Presto to use Ranger for Authorization

Before you begin configuring your Ranger plugin, note down the following information:

  • Ranger URL. This is the endpoint of the Ranger Admin that serves your authorization policies. Let us use http://myRangerAdmin.myOrg.com:6080 as the sample Ranger URL. Ensure the connectivity between the Presto coordinator instance and the Ranger Admin instance by opening up the necessary ports. In the sample URL, for example, port 6080 is used, so configure the network such that the Presto coordinator instance can communicate with port 6080 of the Ranger Admin instance.
  • Credentials. The credentials provide access to the Ranger Admin. They are used to communicate with Ranger Admin to fetch policies and user-group information. Let us use admin and password as sample credentials for the username and password, respectively.
  • Service Name. This is the name of the service in Ranger Admin that holds the policies you want to apply to your catalog.

These configurations, along with others, are provided in the configurations files as described in the following sections:

Ranger Plugin Configuration Files

You must configure these two files for Ranger: access-control.properties and the Ranger configuration file for catalogs. This section describes both these files in detail.

System-level Configuration File

System-level configurations are defined in the access-control.properties file. You must provide the following configurations in this file:

  • access-control.name=ranger-access-control: This configures Presto to use Ranger for authorization.

  • ranger.username and ranger.password: Credentials to communicate with Ranger Admin.

  • ranger.<catalog>.audit-config-xml: This is the location of the audit configuration file. Solr is the only audit store that is currently supported with Presto. An example of the ranger.<catalog>.audit-config-xml is provided below.

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
         <property>
           <name>xasecure.audit.is.enabled</name>
           <value>true</value>
         </property>
         <property>
           <name>xasecure.audit.solr.is.enabled</name>
           <value>true</value>
         </property>
         <property>
           <name>xasecure.audit.solr.async.max.queue.size</name>
           <value>1</value>
         </property>
         <property>
           <name>xasecure.audit.solr.async.max.flush.interval.ms</name>
           <value>1000</value>
         </property>
         <property>
           <name>xasecure.audit.solr.solr_url</name>
           <value><Solr_ranger_audit_url></value>
         </property>
      </configuration>
    

    Note

    In the ranger.<catalog>.audit-config-xml file, xasecure.audit.solr.async.max.queue.size and xasecure.audit.solr.async.max.flush.interval.ms are optional configuration properties.

  • One or both of the following:

    • ranger.<Catalog>.security-config-xml: Location of the catalog-configuration XML file to use for a particular catalog.
    • ranger.<Catalog>.config-file: Location of the catalog-configuration file containing key-value pairs to use for a particular catalog’s Ranger configuration.

    When you provide both these configuration file, configuration from both files are used. In case of conflicting configuration keys, configuration from ranger.<Catalog>.security-config-xml gets preference over ranger.<Catalog>.config-file. The content of the file configured here are described in the next section, Ranger Configuration file(s) for Catalogs.

Optionally, you can also provide the following configuration:

  • ranger.user-group-cache-expiry-ttl: This governs the refresh interval in seconds of the Users-Groups cache, which is populated with information from Ranger. The default refresh interval is 30 seconds.

Ranger Configuration File(s) for Catalogs

This file defines Ranger configuration for the catalog. There must be one such file for each catalog for which you use Ranger. This is the file whose location is configured in the access-control.properties file as described above.

The file can either be a Hadoop configuration style XML or in key-value pairs of configName-configValue. As described in the System-level Configuration File, you can use either a security-config-xml or a config-file configuration to point to the location of the Ranger configuration file.

You must define the following in catalog configuration files. How they are defined depends on the file format you use, that is, either Hadoop configuration style XML or key-value pairs. Let us see examples of configuration in both styles in the next section:

  • ranger.plugin.<CatalogName>.service.name: Service Name in Ranger for the catalog.
  • ranger.plugin.<CatalogName>.policy.rest.url: Ranger endpoint from which to fetch policies.
  • ranger.service.store.rest.url: Endpoint to fetch Users-Groups information. This would be the same as the Ranger endpoint when using Ranger as the source of the User-Group information.

Additionally, you can provide the following:

  • ranger.plugin.<CatalogName>.policy.pollIntervalMs: Polling interval for policies. The default is 30000 (30 seconds).
  • Any other standard Ranger-client configuration.

Sample Ranger Plugin Configuration in QDS Presto for Hive Connector

Hive tables in QDS show up under the catalog name hive. Let us look at the Ranger configuration with hive as the catalog name. If you have configured the QDS Presto cluster to use a different catalog name, replace the name hive with your catalog name in the configuration described below.

Let us use the following sample Ranger configurations.

Sample XML-based Ranger Configuration File

The following sample values are shown in the screenshot that follows these values.

access-control.properties:
access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.hive.security-config-xml=/usr/lib/presto/etc/hive_ranger.xml

hive_ranger.xml:
<configuration>
    <property>
        <name>ranger.plugin.hive.service.name</name>
        <value>DepartmentBasedHiveAuth</value>
    </property>
    <property>
        <name>ranger.plugin.hive.policy.pollIntervalMs</name>
        <value>5000</value>
    </property>
    <property>
        <name>ranger.service.store.rest.url</name>
        <value>http://myRangerAdmin.myOrg.com:6080</value>
    </property>
    <property>
        <name>ranger.plugin.hive.policy.rest.url</name>
        <value>http://myRangerAdmin.myOrg.com:6080</value>
    </property>
</configuration>
../_images/ranger_config_overrides.png

The screenshot above is from the QDS configuration UI that shows the Presto configuration overrides. Through these configuration properties, Presto is configured to use ranger-access-control via access-control.properties. /usr/lib/presto/etc/hive_ranger.xml is configured as the XML-based Ranger configuration file for the hive catalog.

Sample Key-Value-based Ranger Configuration File

A second way to configure the Ranger plugin, as mentioned above, is using key-value pairs as shown in this section. The following sample values are shown in the screen shot that follows these values.

access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.hive.config-file=/usr/lib/presto/etc/hive_ranger

hive_ranger:
ranger.plugin.hive.service.name=DepartmentBasedHiveAuth
ranger.plugin.hive.policy.pollIntervalMs=5000
ranger.service.store.rest.url=http://myRangerAdmin.myOrg.com:6080
ranger.plugin.hive.policy.rest.url=http://myRangerAdmin.myOrg.com:6080
../_images/ranger_config_with_key-value_pairs.png

Sample Ranger Plugin Configuration in QDS Presto for Mysql Connector

In this example, let us assume that Presto is configured with a MySQL catalog named mysqlcatalog, Connecting to MySQL and JDBC Sources using Presto Clusters describes how you can achieve this configuration. Although MySQL provides comprehensive authorization support but for this example, let us assume that Ranger is used for setting up authorization on this MySQL catalog. Given that Ranger does not have a MySQL plugin yet, you do not see a Mysql Service in Ranger. But you can create a Hive-Service to define policies for MySQL catalog with the limitation that you do not get the auto-complete feature for the fields while creating the policies.

Let us use the following sample Ranger configuration.

Sample XML-based Ranger Configuration File

The following sample values are shown in the screenshot that follows these values.

access-control.properties:
access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.mysqlcatalog.security-config-xml=/usr/lib/presto/etc/mysqlcatalog_ranger.xml

mysqlcatalog_ranger.xml:
<configuration>
    <property>
        <name>ranger.plugin.mysqlcatalog.service.name</name>
        <value>DepartmentBasedMysqlAuth</value>
    </property>
    <property>
        <name>ranger.plugin.mysqlcatalog.policy.pollIntervalMs</name>
        <value>5000</value>
    </property>
    <property>
        <name>ranger.service.store.rest.url</name>
        <value>http://myRangerAdmin.myOrg.com:6080</value>
    </property>
    <property>
        <name>ranger.plugin.mysqlcatalog.policy.rest.url</name>
        <value>http://myRangerAdmin.myOrg.com:6080</value>
    </property>
</configuration>

You can also define the above configuration in the key-value pair-based configuration file as in the Hive connector example.

Limitations

These are the limitations:

  • Row filters and data masking policies with non ANSI SQL constructs are not supported and it leads to query failures. For example, strings are delimited by single quotes in Presto, while Hive uses both double quotes and single quotes for them. If existing Hive policies are used with Presto, admins should convert any double quotes to single quotes as supported by Presto.
  • Row filters and data masking policies with functions not defined in Presto are not supported and it leads to query failures. Qubole recommends that admins either define new policies for Presto using only functions supported by Presto or if they want to use existing Hive policies in Presto, they write the UDFS for the functions not defined in Presto.
  • Some SQL constructs have different semantics in Presto than in Hive. As a result, row filter policies defined for Hive might not give the same results when used in Presto. Admins can either use a new policy defined for Presto, or test the existing Hive policies thoroughly before using them.