Understanding a Data Store

It is often useful to import or export data to and from data stores other than Cloud storage. For example, you may want to run a periodic Hive query to summarize a dataset and export the results to a MySQL database; or you may want to import data from a relational database into Hive. You can identify an external data store for such purposes. QDS supports the data stores shown on the Database Type drop-down list on the Add Data Store page (see below).

Qubole is actively working on supporting more databases. For more information about using a data store, see the Data Import and Data Export sections.

DbTap API describes the APIs to create, edit, view, and delete data stores.

Adding a Data Store

Note

You must be a system administrator to add a data store.

To add a new data store, navigate to the Explore page and proceed as follows:

  1. Pull down the drop-down list near the top left of the page (it defaults to Qubole Hive) and choose Add Data Store.
  2. Enter the Data Store Name field.
  3. Choose a Database Type.
  4. Enter the Catalog Name. (This field is only visible when the data store needs to be accessible through Presto and Spark clusters, or when you select Snowflake from the Database Type drop-down list.)

Note

This field is disabled by default for Database Types other than Snowflake. Contact Qubole Support to enable this feature for your account.

  1. Enter the Database Name.

  2. Enter the host address of the database server in the Host Address text field.

  3. Enter the port number in the Port text field or accept the default.

  4. Enter the username (to be used on the host) in the Username text field.

  5. Enter the password (for the username on the host) in the Password text field.

  6. Check the Skip validation check box if you do not want QDS to validate the connection immediately.

  7. Check the Use Bastion Node check box if the data store to be created will be in a VNet, and provide the following additional information to establish a connection from QDS to datastore:

    • The Bastion node’s IP address.
    • The port on which the Bastion node can be reached.
    • The username which QDS should use to log in to the Bastion node.
    • The private key for access to the Bastion node.

Note

Cluster nodes must have direct access to the datastore.

  1. Click Save to add the data store.

    Unless you checked Skip validation, QDS attempts to connect to the database. If QDS can connect to the database, the data store is marked activated, and you should see it on list of data stores in the drop-down list on at the top left of the Explore page. A green dot shows that a data store has been activated; a red dot means that it has not.

Editing a Data Store

You can edit a data store name and give it a name if it does not already have one. To edit a data store:

  1. Navigate to the Explore page and pull down the drop-down list that defaults to Qubole Hive. Select the data store that you want to edit.
  2. Click the gear icon near the top right and choose Edit.
  3. Make edits as needed and click Update to save the new values. (Click Reset to Default to revert to default values, or Cancel to retain the current values).

Firewall Configuration for QDS access to Database (AWS)

QDS supports the following AWS regions:

  • us-east-1
  • us-west-2
  • eu-west-1
  • ap-singapore-1
  1. If the database is an RDS instance, then the database security group to which the RDS instance belongs must grant access privileges to the QDS account/default security group pair. See RDS FAQ for reference.
  2. If the database is running on an EC2 machine, then the security group to which the EC2 instance belongs, must grant access to QDS account/default security group pair. See EC2 User Guide for reference.

If the database is not located in the AWS regions supported by Qubole or is located outside AWS, then the database must have its network ports open to everyone and use username/password based security to grant access to Qubole. When the database is not in any AWS region supported by Qubole, select on-premise as the database location.

SSL support for QDS access to Postgres Database

You can use SSL based access to the Postgres database if you configure the database to use SSL. While using SSL, QDS expects you to open database ports to everyone as SSL takes care of the QDS security. If the Postgres database is in us-east-1, you can continue with opening database ports to only QDS.