Using StreamX (AWS)¶
StreamX captures data from Kafka logs and preserves it in a Cloud store (currently Amazon S3), where it can be processed by engines such as Hive or Spark.
Why Use StreamX?¶
StreamX is a managed, scalable, and reliable open-source service, built on the Kafka Connect framework and running in a dedicated Qubole Data Service (QDS) cluster. It provides ready access to usable streaming ingest with minimal configuration or maintenance.
Major features include:
- Output in Avro or Parquet
- Output can be partitioned into multiple topics, and written to multiple paths in the cloud store
How to Use StreamX¶
Proceed as follows to configure and start StreamX:
- If you have not already done so, install Kafka and start a Kafka cluster.
- Use these instructions to create a persistent AWS security group, and open port 9092 in the security configuration of your Kafka cluster to allow access by members of that group.
- In the QDS UI, navigate to Clusters and choose New.
- Choose StreamX as the cluster type.
- Accept the default Kafka version or use the drop-down to change it.
- Specify the Kafka brokers as a comma-separated list of DNS names in the form <fully-qualified-domain-name:port>.
- Choose the instance types for master and slave nodes from the drop-downs.
- Specify the number of nodes in the cluster. (StreamX clusters do not support auto-scaling.)
- Assuming that your Kafka cluster is running on AWS, select the same AWS Region and Availability Zone for the QDS cluster as for the Kafka cluster.
- Optionally specify the number, type, and size of EBS reserved volumes to be mounted to each instance as additional storage.
- Provide a file name to be appended to the path for the node bootstrap file, or accept the default. Click Next to proceed.
- If your Kafka cluster is running in an AWS VPC, specify the same VPC for the QDS cluster.
- Optionally specify parameter values to override the Kafka Connect configuration defaults (the UI provides an example).
- Provide the name of the Persistent Security Group you created in step 2.
- When you are satisfied with the configuration, click Create. (For more information on optional fields, see Configuring Clusters.)