Data Store

It is often useful to import or export data to and from data stores other than Cloud storage. For example, you may want to run a periodic Hive query to summarize a dataset and export the results to a MySQL database; or you may want to import data from a relational database into Hive. You can identify an external data store for such purposes. QDS supports the data stores shown on the Database Type drop-down list on the Add Data Store page (see below) including:

  • Azure SQL Database
  • Azure SQL Data Warehouse
  • Microsoft SQL Server
  • MongoDB
  • MySQL
  • Oracle
  • Postgres
  • Redshift
  • Snowflake
  • Vertica

Qubole is actively working on supporting more databases. For more information about using a data store, see Data Import and Data Export sections.

DbTap API describes the APIs to create, edit, view, and delete data stores.

Adding a Data Store

Note

You must be a system administrator to add a data store.

To add a new data store, navigate to the Explore page and proceed as follows:

  1. Pull down the drop-down list near the top left of the page (it defaults to Qubole Hive) and choose Add Data Store.

  2. Choose a Database Type.

  3. Enter the Database Name.

  4. Enter the host address of the database server in the Host Address text field.

  5. Enter the port number in the Port text field or accept the default.

  6. Enter the username (to be used on the host) in the Username text field.

  7. Enter the password (for the username on the host) in the Password text field.

  8. Check the Skip validation box if you don’t want QDS to validate the connection immediately.

  9. Depending on your Cloud platform:

    • For AWS, select the Region; or on-premise/other. See Firewall Configuration for QDS access to Database for more information. Select on-premise/other if the region is not on the pull-down list of supported regions, or if the database is in an AWS VPC. In this case, provide the following additional information:

      • Enter the IP address or hostname of the gateway server.
      • Enter the user name to be used on the gateway server.
      • Add the private key to be used for access to the gateway server.

      In this case, QDS supports account-level custom SSH keys in the data store. To add the custom SSH keys, create a ticket with Qubole Support.

    • For Azure, select Use Bastion Node if the data store to be created will be in a VNet, and provide the following additional information:

      • The Bastion node’s IP address.
      • The port on which the Bastion node can be reached.
      • The username which QDS should use to log in to the Bastion node.
      • The private key for access to the Bastion node.
  10. Click Save to add the data store.

    Unless you checked Skip validation, QDS attempts to connect to the database. If QDS can connect to the database, the data store is marked activated, and you should see it on list of data stores in the drop-down list on at the top left of the Explore page. A green dot shows that a data store has been activated; a red dot means that it has has not.

Editing a Data Store

You can edit a data store name and give it a name if it does not already have one. To edit a data store:

  1. Navigate to the Explore page and pull down the drop-down list that defaults to Qubole Hive. Select the data store that you want to edit.
  2. Click the gear icon near the top right and choose Edit.
  3. Make edits as needed and click Update to save the new values. (Click Reset to Default to revert to default values, or Cancel to retain the current values).

Firewall Configuration for QDS access to Database (AWS)

QDS supports the following AWS regions:

  • us-east-1
  • us-west-2
  • eu-west-1
  • ap-singapore-1
  1. If the database is a RDS instance, then the database security group to which the RDS instance belongs must grant access privileges to the QDS account/default security group pair. See RDS FAQ for reference.
  2. If the database is running on an EC2 machine, then the security group to which the EC2 instance belongs, must grant access to QDS account/default security group pair. See EC2 User Guide for reference.

If the database is not located in the AWS regions supported by Qubole or is located outside AWS, then the database must have its network ports open to everyone and use username/password based security to grant access to Qubole. When database is not in any AWS region supported by Qubole, then select on-premise as the database location.

SSL support for QDS access to Postgres Database

You can use SSL based access to the Postgres database if you configure the database to use SSL. While using SSL, QDS expects you to open database ports to everyone as SSL takes care of the QDS security. If the Postgres database is in us-east-1, you can continue with opening database ports to only QDS.