Qubole-Talend Integration Guide (AWS)¶
Qubole Data Service (QDS) is a cloud-native autonomous data platform that removes the complexity and reduces the cost of managing Big Data, allowing the data team to focus on business outcomes rather than on managing infrastructure.
QDS self-manages and constantly analyzes and learns about the platform’s usage through a combination of heuristics and machine learning, providing insights and recommendations to optimize reliability, performance and cost. QDS automatically provisions, scales and manages Big Data clusters such as Hadoop, Spark, Presto, and Hive in any public cloud, and allows direct and secure access to Big Data services with user-access control through the QDS user interface (Analyzer and Notebooks), via a REST API and ODBC/JDBC interfaces.
Talend Cloud Big Data is a unified data management PaaS that encompasses data preparation, integration, data quality and data stewardship. Talend Cloud leverages Talend Studio’s rich graphical user interface to enable users to build data pipelines in a visual dashboard. Talend generates native Hive or Spark code which can be easily run on Qubole’s platform.
While companies get better at capturing more and more data, data teams are under intensifying pressure to make the data usable to the business and provide access to everyone who needs it. In fact, the exponential growth of data, data types, use cases and user expectations, combined with static IT budgets and a shortage of big data skills make the process of turning captured data into business value very expensive. As a result, companies do one of the following:
- Spend their time and money managing legacy technologies because they lack time, money or resources to leverage big data.
- Fail to deploy big data projects because managing big data technologies is complicated, costly, and requires expertise that is hard to find.
To address this issue, Talend and Qubole partnered to help companies address the gap between IT capabilities and users’ data demand.
The Qubole-Talend integration provides a simple and intuitive data integration and preparation solution (ETL) in the cloud, at the fraction of cost and resources of traditional systems.
The integration enables users (data engineers) to create data jobs and pipelines using Talend and automatically execute them at scale on Qubole’s platform.
This integration leverages Qubole’s workload aware auto-scaling feature, which automatically resizes the big data clusters based on the data jobs and pipelines built in Talend Studio.
Users do not have to spend time managing their big data infrastructure, or writing complex MapReduce or Spark code for their data processing. Instead they use Talend Studio visual interface to create data jobs and pipelines that are executed on Qubole’s platform, thus using the cloud optimally in terms of cost and scalability.
The Qubole-Talend integration has the following benefits:
- Reduced data processing cost by 40%-70% compared to on-premises solutions.
- Qubole automatically manages and scales big data engines, and leverages AWS spot market to find the best price-performance ratio when executing Talend data pipelines.
- Simplifed data processing through the power of the cloud.
- No additional administration overhead installing, configuring and maintaining Spark or Hadoop.
- Increased productivity by eliminating reliance on IT for data preparation in Hadoop and Spark.
- Enable self-service data preparation for non-technical users, and eliminates the time spent writing complex MapReduce or Spark code.
This guide provides information about the integration, architecture, the relevant configuration, and sample use cases.
- Understanding the Talend Integration with QDS
- Workflow for using the Qubole-Talend integration
- Understanding Requirements Related to the Integration
- Configuring Qubole to Interact with Talend
- Configuring Talend to Interact with QDS
- Sample Use Case for Creating a Job
- Additional Information about the Integration
- Known Issues