SQL Authorization through Ranger in Presto

Introduction

QDS Presto supports authorization using Apache Ranger. In addition to providing column-level authorization, Ranger provides additional features, including row-level filtering and data masking. Ranger authorization works for the Hive connector in QDS Presto. This section describes the Ranger features supported in QDS Presto and how to configure Ranger with Presto.

Note

Currently, Apache Ranger is only supported with Presto version 0.208.

Supported Ranger features

Hive Authorization

All forms of Hive Authorization are supported with the Ranger plugin: Schema level, Table level, and Column level.

All Ranger policy constructs for authorization are also supported: Allow-Conditions, Deny-Conditions, Resource-Inclusion, and Resource-Exclusion. User Groups defined in Ranger are also supported.

Row-level Filtering

In Ranger, admins can set policies to hide certain rows from particular users or groups. Assume, for example, that an organization wants to prevent access by non-managers in the Finance department to certain details from the Salary table (empId, designation, salary) for executive staff. In this case, the admin adds all non-managers in the Finance department into a group, say finEmps, and adds a row filter on the table Salary with the filter defined as designation != 'executive'. The results of queries run by users in the finEmps group will not include any rows where the value of the designation column is executive.

Click here to learn more about row-level filters in Ranger (see use case #2).

Data Masking Policies

For datasets that include sensitive information, admins can configure a policy with Mask Conditions to mask the sensitive data for a particular set of users. Ranger provides several options for masking data, including Redact, Hash, Nullify, Custom, and more. The Ranger plugin in QDS Presto, however, currently supports only the Custom masking option. Support for other masking policies will be available in future releases.

../_images/ranger_masking_options.png

Click here to learn more about data masking in Ranger (see use case #3).

Configuring Presto to use Ranger for Authorization

Before you begin configuring your Ranger plugin, note down the following information:

  1. Ranger URL. This is the endpoint of the Ranger Admin that serves your authorization policies. We will use http://myRangerAdmin.myOrg.com:6080 as our sample Ranger URL. Connectivity between the Presto master instance and the Ranger Admin instance should be ensured by opening up the necessary ports. In the sample URL, for example, port 6080 is used, so the network should be configured such that the Presto master instance can communicate with port 6080 of the Ranger Admin instance.
  2. Credentials. The credentials provide access to the Ranger Admin. They are used to communicate with Ranger Admin to fetch policies and user-group information. We will use admin and password as sample credentials for the username and password, respectively.
  3. Service Name. This is the name of the service in Ranger Admin that holds the policies you want to apply to your catalog. We will use DepartmentBasedHiveAuth as our sample service name.

These configurations, along with others, are provided in the configurations files as described in the following sections.

Ranger Plugin Configuration Files

Two files must be configured for Ranger: access-control.properties and the Ranger configuration file for catalogs. This section describes both these files in detail.

System Level configuration file

System level configurations are defined in the access-control.properties file. The following configurations must be provided in this file:

  • access-control.name=ranger-access-control: This configures Presto to use Ranger for authorization

  • ranger.username and ranger.password: Credentials to communicate with Ranger Admin

  • One or both of the following:

    • ranger.<Catalog>.security-config-xml: Location of the catalog-configuration XML file to use for particular catalog.
    • ranger.<Catalog>.config-file: Location of the catalog-configuration file containing key-value pairs to use for particular catalog’s Ranger configuration.

    When both these configs are provided, configs from both are used and, in case of conflicting config keys, configs from ranger.<Catalog>.security-config-xml are given preference over ranger.<Catalog>.config-file. The contents of the file configured here are described in the next section, “Ranger Configuration file(s) for Catalogs.”

Optionally, the following config can also be provided:

  • ranger.user-group-cache-expiry-ttl: This governs the refresh interval in seconds of the Users-Groups cache, which is populated with information from Ranger. The default refresh interval is 30 seconds.

Ranger Configuration file(s) for Catalogs

This file defines Ranger configurations for the catalog. There must be one such file for each catalog for which you use Ranger. This is the file whose location is configured in the access-control.properties file, as described above.

The file can either be a Hadoop configuration style XML file from your existing Hive deployment, or in key-value pairs of configName-configValue. As described in the section System Level configuration file above, either a security-config-xml or a config-file config can be used to point to the location of the Ranger configuration file.

The following must be defined in catalog configuration files. How they are defined depends on the file format you use, that is, either Hadoop configuration style XML or key-value pairs. We will see examples of configuration in both syles in the next section:

  • ranger.plugin.<CatalogName>.service.name: Service Name in Ranger for the catalog.
  • ranger.plugin.<CatalogName>.policy.rest.url: Ranger endpoint from which to fetch policies.
  • ranger.service.store.rest.url: Endpoint to fetch Users-Groups information. This would be the same as the Ranger endpoint when using Ranger as the source of User-Group information.

Additionally, the following can be provided:

  • ranger.plugin.<CatalogName>.policy.pollIntervalMs: Polling interval for policies. The default is 30000 (30 seconds).
  • Any other Ranger client configs that have been used with your Hive installation.

Sample Ranger Plugin Configurations in QDS Presto

Hive tables in QDS show up under the catalog name “hive.” We will look at the Ranger configuration with “hive” as the catalog name. If you have configured your QDS Presto cluster to use a different catalog name, replace the name “hive” with your catalog name in the configs described below.

We will use the following sample Ranger configurations:

  1. Ranger URL: http://myRangerAdmin.myOrg.com:6080
  2. Ranger Credentials:
    • username: admin
    • password: password
  3. Service Name for the Hive Catalog: DepartmentBasedHiveAuth

Sample XML based Ranger Configuration file

The following sample values are shown in the screen shot that follows:

access-control.properties:
access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.hive.security-config-xml=/usr/lib/presto/etc/hive_ranger.xml

hive_ranger.xml:
<configuration>
    <property>
        <name>ranger.plugin.hive.service.name</name>
        <value>DepartmentBasedHiveAuth</value>
    </property>
    <property>
        <name>ranger.plugin.hive.policy.pollIntervalMs</name>
        <value>5000</value>
    </property>
    <property>
        <name>ranger.service.store.rest.url</name>
        <value>http://myRangerAdmin.myOrg.com:6080</value>
    </property>
    <property>
        <name>ranger.plugin.hive.policy.rest.url</name>
        <value>http://myRangerAdmin.myOrg.com:6080</value>
    </property>
</configuration>
../_images/ranger_config_overrides.png

The screenshot above is from the QDS Cluster Edit page, and shows the Presto configuration overrides. Through these configs, we have configured Presto to use ranger-access-control via access-control.properties. /usr/lib/presto/etc/hive_ranger.xml is configured as the xml based Ranger configuration file for the hive catalog.

Sample Key-Value based Ranger Configuration File

A second way to configure the Ranger plugin, as mentioned above, is using key-value pairs, as shown in this section. The following sample values are shown in the screen shot that follows:

access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.hive.config-file=/usr/lib/presto/etc/hive_ranger

hive_ranger:
ranger.plugin.hive.service.name=DepartmentBasedHiveAuth
ranger.plugin.hive.policy.pollIntervalMs=5000
ranger.service.store.rest.url=http://myRangerAdmin.myOrg.com:6080
ranger.plugin.hive.policy.rest.url=http://myRangerAdmin.myOrg.com:6080
../_images/ranger_config_with_key-value_pairs.png

Limitations

  • HTTPS mode of Ranger Admin is not yet supported. This will be supported in a future release.
  • Row filters and data masking policies with non ANSI SQL constructs are not supported, and will lead to query failures. For example, Strings are delimited by single quotes in Presto, while Hive uses both double quotes and single quotes for them. If existing Hive policies are used with Presto, admins should convert any double quotes to single quotes, as supported by Presto.
  • Row filters and data masking policies with functions not defined in Presto are not supported, and will lead to query failures. It is recommended that admins either define new policies for Presto using only functions supported by Presto or, if they want to use existing Hive policies in Presto, they write the UDFS for the functions not defined in Presto.
  • Some SQL constructs have different semantics in Presto than in Hive. As a result, row filter policies defined for Hive might not give the same results when used in Presto. Admins are advised to either use a new policy defined for Presto, or test the existing Hive policies thoroughly before using them.

Examples

We will walk through several examples of the Ranger plugin capabilities in this section.

Users and Groups

First lets see the users and groups configuration. We have two groups defined:

  1. Admins
  2. Analysts

There are three users who belong to one of these groups:

  1. Sakshi: username sakshia and belongs to Admins group
  2. Shubham: username stagra and belongs to Analyst group
  3. Sumit: username sumitm and belongs to Analyst group

Here are the above users as they appear in the Ranger Users/Groups page:

../_images/01_Users_and_Groups.png

Table Access Example

We add a policy in Ranger to allow access to the customer table to only stagra:

../_images/02_customer_access_to_shubham.png

With this, Shubham should be able to run any queries on the customer table:

../_images/03_Select_customer_by_shubham_works.png

But any other user, say Sumit, should be unable to access this table:

../_images/04_Select_customer_by_sumit_fails.png

As seen above, the query by sumitm@company.com fails with an AccessDeniedException.

Column Level Authorization

Next, lets configure a policy the the table nation. We give user Shubham full access to this table, and exclude user Sumit from access to the table:

../_images/05a_nation_table_access_to_shubham_not_sumit.png

For user Sumit, we provide access to all but n_regionkey column in this table. Notice the exclusion of this column in the policy below:

../_images/05b_nation_table_access_to_sumit_excluding_n_regionkey.png

With this setup, user Shubham should be able to access all columns of the table nation:

../_images/06_select_nation_by_shubham_works.png

But trying to select n_regionkey by user Sumit should throw an access denied exception:

../_images/07a_select_all_on_nation_by_sumit_fails.png

But, since we have allowed access to other columns of this table for user Sumit, queries on those columns work fine:

../_images/07b_select_n_name_on_nation_by_sumit_works.png

Row Filters

Next, we setup a Row Filter Policy to hide the row with n_name as INDIA from user Shubham:

../_images/08_row_filter_on_nation_hide_INDIA_row_for_shubham.png

With this policy, the user Shubham does not see any rows where the column n_name has the value INDIA:

../_images/09a_nothing_returned_for_INDIA_for_shubham.png

But since the policy does not filter any rows for user Sumit, he should be able to see the rows with n_name having value INDIA:

../_images/09b_row_returned_for_INDIA_for_sumit.png

Data Masking

To demonstrate Data Masking capability, we define a policy for user Shubham to see the n_comment values in table nation truncated to only the first five characters:

../_images/10_Data_mask_on_nation_to_show_only_5_characters_in_comment.png

Selecting n_comment by user Shubham returns the truncated data as follows:

../_images/11_Data_mask_in_action.png

Groups-based Policy

Finally, we will see an example where we set up policy on a group rather than at the user level. In this example we have given full access to the table promotion to the Admins group. The rest of the users do not have access to it:

../_images/12_policy_to_give_access_to_promotion_to_admins.png

This means that only user Sakshi, who is a part of the Admins group, should be able to access this table:

../_images/15_select_on_promotion_by_sakshi_passes.png

User Shubham should not be able to access the promotion table:

../_images/13_select_on_promotion_by_shubham_fail.png

User Sumit should also be unable to access the promotion table:

../_images/14_select_on_promotion_by_sumit_fail.png