SQL Authorization through Ranger in Presto
Presto on Qubole supports authorization using Apache Ranger. In addition to providing column-level authorization, Ranger provides additional features including row-level filtering and data masking. Ranger authorization works for all connectors in Presto on Qubole. For a connector that does not have a corresponding Ranger plugin in Ranger Admin, you can use an existing plugin such as Hive to define policies but with the limitation that auto-completion of fields in Ranger admin is not available for such connectors. For example, for MySQL connector of Presto, you can use Hive-Service in Ranger Admin and you can configure Presto to use that Hive-service for the given MySQL catalog.
Note
Currently, Apache Ranger is only supported from Presto version 0.208.
The following sections are:
Examples of SQL Authorization of Presto in Ranger provides examples that help you understand the SQL authorization.
Supported Ranger Features
The supported Ranger features are:
SQL Authorization
Presto on Qubole supports all forms of SQL Authorization with the Ranger plugin, which are: Schema level, Table level, and Column level. Supported
The supported Ranger policy constructs for authorization are: Allow-Conditions
, Deny-Conditions
, Resource-Inclusion
, and
Resource-Exclusion
. Presto also supports User Groups defined in Ranger.
Row-level Filtering
In Ranger, admins can set policies to hide certain rows from particular users or groups. Assume, for example,
an organization wants to prevent access by non-managers in the Finance department to certain details from the Salary
table (empId, designation, salary) for executive staff. In this case, the admin adds all non-managers in the
Finance department into a group, say finEmps
and adds a row filter on the Salary table with the filter defined
as designation != 'executive'
. The results of queries run by users in the finEmps
group does not include any
rows where the value of the designation column is executive
.
Click here to learn more about row-level filters in Ranger (see use case #2).
Data Masking Policies
For datasets that include sensitive information, admins can configure a policy with Mask Conditions to mask the sensitive data for a particular set of users. Ranger provides several options for masking data, including Redact, Hash, Nullify, Custom, and more. The Ranger plugin in QDS Presto, however, currently supports only the Custom masking option.
Click here to learn more about data masking in Ranger (see use case #3).
Configuring Presto to use Ranger for Authorization
Before you begin configuring your Ranger plugin, note down the following information:
Ranger URL. This is the endpoint of the Ranger Admin that serves your authorization policies. Let us use http://myRangerAdmin.myOrg.com:6080 as the sample Ranger URL. Ensure the connectivity between the Presto coordinator instance and the Ranger Admin instance by opening up the necessary ports. In the sample URL, for example, port 6080 is used, so configure the network such that the Presto coordinator instance can communicate with port 6080 of the Ranger Admin instance.
Credentials. The credentials provide access to the Ranger Admin. They are used to communicate with Ranger Admin to fetch policies and user-group information. Let us use admin and password as sample credentials for the username and password, respectively.
Service Name. This is the name of the service in Ranger Admin that holds the policies you want to apply to your catalog.
These configurations, along with others, are provided in the configurations files as described in the following sections:
Sample Ranger Plugin Configuration in QDS Presto for Hive Connector
Sample Ranger Plugin Configuration in QDS Presto for Mysql Connector
Ranger Plugin Configuration Files
You must configure these two files for Ranger: access-control.properties
and the Ranger configuration file for catalogs.
This section describes both these files in detail.
System-level Configuration File
System-level configurations are defined in the access-control.properties
file. You must provide the following configurations
in this file:
access-control.name=ranger-access-control
: This configures Presto to use Ranger for authorization.ranger.username
andranger.password
: Credentials to communicate with Ranger Admin.ranger.<catalog>.audit-config-xml
: This is the location of the audit configuration file. Solr is the only audit store that is currently supported with Presto. An example of theranger.<catalog>.audit-config-xml
is provided below.<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>xasecure.audit.is.enabled</name> <value>true</value> </property> <property> <name>xasecure.audit.solr.is.enabled</name> <value>true</value> </property> <property> <name>xasecure.audit.solr.async.max.queue.size</name> <value>1</value> </property> <property> <name>xasecure.audit.solr.async.max.flush.interval.ms</name> <value>1000</value> </property> <property> <name>xasecure.audit.solr.solr_url</name> <value><Solr_ranger_audit_url></value> </property> </configuration>
Note
In the
ranger.<catalog>.audit-config-xml
file,xasecure.audit.solr.async.max.queue.size
andxasecure.audit.solr.async.max.flush.interval.ms
are optional configuration properties.One or both of the following:
ranger.<Catalog>.security-config-xml
: Location of the catalog-configuration XML file to use for a particular catalog.ranger.<Catalog>.config-file
: Location of the catalog-configuration file containing key-value pairs to use for a particular catalog’s Ranger configuration.
When you provide both these configuration file, configuration from both files are used. In case of conflicting configuration keys, configuration from
ranger.<Catalog>.security-config-xml
gets preference overranger.<Catalog>.config-file
. The content of the file configured here are described in the next section, Ranger Configuration file(s) for Catalogs.
Optionally, you can also provide the following configuration:
ranger.user-group-cache-expiry-ttl
: This governs the refresh interval in seconds of the Users-Groups cache, which is populated with information from Ranger. The default refresh interval is 30 seconds.
Ranger Configuration File(s) for Catalogs
This file defines Ranger configuration for the catalog. There must be one such file for each catalog for which you use
Ranger. This is the file whose location is configured in the access-control.properties
file as described above.
The file can either be a Hadoop configuration style XML or in key-value pairs of configName-configValue
. As described in the
System-level Configuration File, you can use either a security-config-xml
or a config-file configuration to point
to the location of the Ranger configuration file.
You must define the following in catalog configuration files. How they are defined depends on the file format you use, that is, either Hadoop configuration style XML or key-value pairs. Let us see examples of configuration in both styles in the next section:
ranger.plugin.<CatalogName>.service.name
: Service Name in Ranger for the catalog.ranger.plugin.<CatalogName>.policy.rest.url
: Ranger endpoint from which to fetch policies.ranger.service.store.rest.url
: Endpoint to fetch Users-Groups information. This would be the same as the Ranger endpoint when using Ranger as the source of the User-Group information.
Additionally, you can provide the following:
ranger.plugin.<CatalogName>.policy.pollIntervalMs
: Polling interval for policies. The default is 30000 (30 seconds).Any other standard Ranger-client configuration.
Sample Ranger Plugin Configuration in QDS Presto for Hive Connector
Hive tables in QDS show up under the catalog name hive
. Let us look at the Ranger configuration with hive
as the
catalog name. If you have configured the QDS Presto cluster to use a different catalog name, replace the name hive
with your catalog name in the configuration described below.
Let us use the following sample Ranger configurations.
Ranger URL: http://myRangerAdmin.myOrg.com:6080
Ranger Credentials:
username:
admin
password:
password
Service Name for the Hive Catalog:
DepartmentBasedHiveAuth
Sample XML-based Ranger Configuration File
The following sample values are shown in the screenshot that follows these values.
access-control.properties:
access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.hive.security-config-xml=/usr/lib/presto/etc/hive_ranger.xml
hive_ranger.xml:
<configuration>
<property>
<name>ranger.plugin.hive.service.name</name>
<value>DepartmentBasedHiveAuth</value>
</property>
<property>
<name>ranger.plugin.hive.policy.pollIntervalMs</name>
<value>5000</value>
</property>
<property>
<name>ranger.service.store.rest.url</name>
<value>http://myRangerAdmin.myOrg.com:6080</value>
</property>
<property>
<name>ranger.plugin.hive.policy.rest.url</name>
<value>http://myRangerAdmin.myOrg.com:6080</value>
</property>
</configuration>
The screenshot above is from the QDS configuration UI that shows the Presto configuration overrides. Through these configuration properties,
Presto is configured to use ranger-access-control
via access-control.properties
. /usr/lib/presto/etc/hive_ranger.xml
is configured as the XML-based Ranger configuration file for the hive
catalog.
Sample Key-Value-based Ranger Configuration File
A second way to configure the Ranger plugin, as mentioned above, is using key-value pairs as shown in this section. The following sample values are shown in the screen shot that follows these values.
access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.hive.config-file=/usr/lib/presto/etc/hive_ranger
hive_ranger:
ranger.plugin.hive.service.name=DepartmentBasedHiveAuth
ranger.plugin.hive.policy.pollIntervalMs=5000
ranger.service.store.rest.url=http://myRangerAdmin.myOrg.com:6080
ranger.plugin.hive.policy.rest.url=http://myRangerAdmin.myOrg.com:6080
Sample Ranger Plugin Configuration in QDS Presto for Mysql Connector
In this example, let us assume that Presto is configured with a MySQL catalog named mysqlcatalog
, Connecting to MySQL and JDBC Sources using Presto Clusters describes
how you can achieve this configuration. Although MySQL provides comprehensive authorization support but for this example,
let us assume that Ranger is used for setting up authorization on this MySQL catalog. Given that Ranger does not have a
MySQL plugin yet, you do not see a Mysql Service
in Ranger. But you can create a Hive-Service to define policies for
MySQL catalog with the limitation that you do not get the auto-complete feature for the fields while creating the policies.
Let us use the following sample Ranger configuration.
Ranger URL: http://myRangerAdmin.myOrg.com:6080
Ranger Credentials:
username:
admin
password:
password
Service Name for the Mysql Catalog:
DepartmentBasedMysqlAuth
Sample XML-based Ranger Configuration File
The following sample values are shown in the screenshot that follows these values.
access-control.properties:
access-control.name=ranger-access-control
ranger.username=admin
ranger.password=password
ranger.mysqlcatalog.security-config-xml=/usr/lib/presto/etc/mysqlcatalog_ranger.xml
mysqlcatalog_ranger.xml:
<configuration>
<property>
<name>ranger.plugin.mysqlcatalog.service.name</name>
<value>DepartmentBasedMysqlAuth</value>
</property>
<property>
<name>ranger.plugin.mysqlcatalog.policy.pollIntervalMs</name>
<value>5000</value>
</property>
<property>
<name>ranger.service.store.rest.url</name>
<value>http://myRangerAdmin.myOrg.com:6080</value>
</property>
<property>
<name>ranger.plugin.mysqlcatalog.policy.rest.url</name>
<value>http://myRangerAdmin.myOrg.com:6080</value>
</property>
</configuration>
You can also define the above configuration in the key-value pair-based configuration file as in the Hive connector example.
Limitations
These are the limitations:
Row filters and data masking policies with non ANSI SQL constructs are not supported and it leads to query failures. For example, strings are delimited by single quotes in Presto, while Hive uses both double quotes and single quotes for them. If existing Hive policies are used with Presto, admins should convert any double quotes to single quotes as supported by Presto.
Row filters and data masking policies with functions not defined in Presto are not supported and it leads to query failures. Qubole recommends that admins either define new policies for Presto using only functions supported by Presto or if they want to use existing Hive policies in Presto, they write the UDFS for the functions not defined in Presto.
Some SQL constructs have different semantics in Presto than in Hive. As a result, row filter policies defined for Hive might not give the same results when used in Presto. Admins can either use a new policy defined for Presto, or test the existing Hive policies thoroughly before using them.