SQL Authorization through Ranger in Presto¶
QDS Presto supports authorization using Apache Ranger. In addition to providing column-level authorization, Ranger provides additional features, including row-level filtering and data masking. Ranger authorization works for the Hive connector in QDS Presto. This section describes the Ranger features supported in QDS Presto and how to configure Ranger with Presto.
Supported Ranger features¶
All forms of Hive Authorization are supported with the Ranger plugin: Schema level, Table level, and Column level.
All Ranger policy constructs for authorization are also supported: Allow-Conditions, Deny-Conditions, Resource-Inclusion, and Resource-Exclusion. User Groups defined in Ranger are also supported.
In Ranger, admins can set policies to hide certain rows from particular users or groups. Assume, for example, that an organization wants to prevent access by non-managers in the Finance department to certain details from the Salary table (empId, designation, salary) for executive staff. In this case, the admin adds all non-managers in the Finance department into a group, say
finEmps, and adds a row filter on the table Salary with the filter defined as
designation != 'executive'. The results of queries run by users in the
finEmps group will not include any rows where the value of the designation column is
Data Masking Policies¶
For datasets that include sensitive information, admins can configure a policy with Mask Conditions to mask the sensitive data for a particular set of users. Ranger provides several options for masking data, including Redact, Hash, Nullify, Custom, and more. The Ranger plugin in QDS Presto, however, currently supports only the Custom masking option. Support for other masking policies will be available in future releases.
Configuring Presto to use Ranger for Authorization¶
Before you begin configuring your Ranger plugin, note down the following information:
- Ranger URL. This is the endpoint of the Ranger Admin that serves your authorization policies. We will use http://myRangerAdmin.myOrg.com:6080 as our sample Ranger URL. Connectivity between the Presto master instance and the Ranger Admin instance should be ensured by opening up the necessary ports. In the sample URL, for example, port 6080 is used, so the network should be configured such that the Presto master instance can communicate with port 6080 of the Ranger Admin instance.
- Credentials. The credentials provide access to the Ranger Admin. They are used to communicate with Ranger Admin to fetch policies and user-group information. We will use admin and password as sample credentials for the username and password, respectively.
- Service Name. This is the name of the service in Ranger Admin that holds the policies you want to apply to your catalog. We will use DepartmentBasedHiveAuth as our sample service name.
These configurations, along with others, are provided in the configurations files as described in the following sections.
Ranger Plugin Configuration Files¶
Two files must be configured for Ranger:
access-control.properties and the Ranger configuration file for catalogs. This section describes both these files in detail.
System Level configuration file¶
System level configurations are defined in the
access-control.properties file. The following configurations must be provided in this file:
access-control.name=ranger-access-control: This configures Presto to use Ranger for authorization
ranger.password: Credentials to communicate with Ranger Admin
One or both of the following:
ranger.<Catalog>.security-config-xml: Location of the catalog-configuration XML file to use for particular catalog.
ranger.<Catalog>.config-file: Location of the catalog-configuration file containing key-value pairs to use for particular catalog’s Ranger configuration.
When both these configs are provided, configs from both are used and, in case of conflicting config keys, configs from
ranger.<Catalog>.security-config-xmlare given preference over
ranger.<Catalog>.config-file. The contents of the file configured here are described in the next section, “Ranger Configuration file(s) for Catalogs.”
Optionally, the following config can also be provided:
ranger.user-group-cache-expiry-ttl: This governs the refresh interval in seconds of the Users-Groups cache, which is populated with information from Ranger. The default refresh interval is 30 seconds.
Ranger Configuration file(s) for Catalogs¶
This file defines Ranger configurations for the catalog. There must be one such file for each catalog for which you use Ranger. This is the file whose location is configured in the
access-control.properties file, as described above.
The file can either be a Hadoop configuration style XML file from your existing Hive deployment, or in key-value pairs of configName-configValue. As described in the section System Level configuration file above, either a security-config-xml or a config-file config can be used to point to the location of the Ranger configuration file.
The following must be defined in catalog configuration files. How they are defined depends on the file format you use, that is, either Hadoop configuration style XML or key-value pairs. We will see examples of configuration in both syles in the next section:
ranger.plugin.<CatalogName>.service.name: Service Name in Ranger for the catalog.
ranger.plugin.<CatalogName>.policy.rest.url: Ranger endpoint from which to fetch policies.
ranger.service.store.rest.url: Endpoint to fetch Users-Groups information. This would be the same as the Ranger endpoint when using Ranger as the source of User-Group information.
Additionally, the following can be provided:
ranger.plugin.<CatalogName>.policy.pollIntervalMs: Polling interval for policies. The default is 30000 (30 seconds).
- Any other Ranger client configs that have been used with your Hive installation.
Sample Ranger Plugin Configurations in QDS Presto¶
Hive tables in QDS show up under the catalog name “hive.” We will look at the Ranger configuration with “hive” as the catalog name. If you have configured your QDS Presto cluster to use a different catalog name, replace the name “hive” with your catalog name in the configs described below.
We will use the following sample Ranger configurations:
- Ranger URL: http://myRangerAdmin.myOrg.com:6080
- Ranger Credentials:
- Service Name for the Hive Catalog:
Sample XML based Ranger Configuration file¶
The following sample values are shown in the screen shot that follows:
access-control.properties: access-control.name=ranger-access-control ranger.username=admin ranger.password=password ranger.hive.security-config-xml=/usr/lib/presto/etc/hive_ranger.xml hive_ranger.xml: <configuration> <property> <name>ranger.plugin.hive.service.name</name> <value>DepartmentBasedHiveAuth</value> </property> <property> <name>ranger.plugin.hive.policy.pollIntervalMs</name> <value>5000</value> </property> <property> <name>ranger.service.store.rest.url</name> <value>http://myRangerAdmin.myOrg.com:6080</value> </property> <property> <name>ranger.plugin.hive.policy.rest.url</name> <value>http://myRangerAdmin.myOrg.com:6080</value> </property> </configuration>
The screenshot above is from the QDS Cluster Edit page, and shows the Presto configuration overrides. Through these configs, we have configured Presto to use
/usr/lib/presto/etc/hive_ranger.xml is configured as the xml based Ranger configuration file for the
Sample Key-Value based Ranger Configuration File¶
A second way to configure the Ranger plugin, as mentioned above, is using key-value pairs, as shown in this section. The following sample values are shown in the screen shot that follows:
access-control.name=ranger-access-control ranger.username=admin ranger.password=password ranger.hive.config-file=/usr/lib/presto/etc/hive_ranger hive_ranger: ranger.plugin.hive.service.name=DepartmentBasedHiveAuth ranger.plugin.hive.policy.pollIntervalMs=5000 ranger.service.store.rest.url=http://myRangerAdmin.myOrg.com:6080 ranger.plugin.hive.policy.rest.url=http://myRangerAdmin.myOrg.com:6080
- HTTPS mode of Ranger Admin is not yet supported. This will be supported in a future release.
- Row filters and data masking policies with non ANSI SQL constructs are not supported, and will lead to query failures. For example, Strings are delimited by single quotes in Presto, while Hive uses both double quotes and single quotes for them. If existing Hive policies are used with Presto, admins should convert any double quotes to single quotes, as supported by Presto.
- Row filters and data masking policies with functions not defined in Presto are not supported, and will lead to query failures. It is recommended that admins either define new policies for Presto using only functions supported by Presto or, if they want to use existing Hive policies in Presto, they write the UDFS for the functions not defined in Presto.
- Some SQL constructs have different semantics in Presto than in Hive. As a result, row filter policies defined for Hive might not give the same results when used in Presto. Admins are advised to either use a new policy defined for Presto, or test the existing Hive policies thoroughly before using them.
We will walk through several examples of the Ranger plugin capabilities in this section.
Users and Groups¶
First lets see the users and groups configuration. We have two groups defined:
There are three users who belong to one of these groups:
- Sakshi: username sakshia and belongs to Admins group
- Shubham: username stagra and belongs to Analyst group
- Sumit: username sumitm and belongs to Analyst group
Here are the above users as they appear in the Ranger Users/Groups page:
Table Access Example¶
We add a policy in Ranger to allow access to the customer table to only stagra:
With this, Shubham should be able to run any queries on the customer table:
But any other user, say Sumit, should be unable to access this table:
As seen above, the query by email@example.com fails with an AccessDeniedException.
Column Level Authorization¶
Next, lets configure a policy the the table nation. We give user Shubham full access to this table, and exclude user Sumit from access to the table:
For user Sumit, we provide access to all but n_regionkey column in this table. Notice the exclusion of this column in the policy below:
With this setup, user Shubham should be able to access all columns of the table nation:
But trying to select n_regionkey by user Sumit should throw an access denied exception:
But, since we have allowed access to other columns of this table for user Sumit, queries on those columns work fine:
Next, we setup a Row Filter Policy to hide the row with n_name as INDIA from user Shubham:
With this policy, the user Shubham does not see any rows where the column n_name has the value INDIA:
But since the policy does not filter any rows for user Sumit, he should be able to see the rows with n_name having value INDIA:
To demonstrate Data Masking capability, we define a policy for user Shubham to see the n_comment values in table nation truncated to only the first five characters:
Selecting n_comment by user Shubham returns the truncated data as follows:
Finally, we will see an example where we set up policy on a group rather than at the user level. In this example we have given full access to the table promotion to the Admins group. The rest of the users do not have access to it:
This means that only user Sakshi, who is a part of the Admins group, should be able to access this table:
User Shubham should not be able to access the promotion table:
User Sumit should also be unable to access the promotion table: