Talend 7.1

Introduction

Qubole is a cloud-native platform for self-service AI, Machine Learning, and Analytics. It removes the complexity and reduces the cost of managing Data, allowing data teams to focus on business outcomes rather than infrastructure management.

Qubole analyzes and learns from usage patterns through a combination of heuristics and machine learning to automate platform management. Qubole provides insights and recommendations that improve performance, reduce cost, and increase reliability of big data workload. QDS automatically provisions, scales and manages Big Data clusters such as Hadoop, Spark, Presto, and Hive in the cloud of your choice, and allows direct and secure access to Big Data services with user-access control through the QDS user interface (Analyzer and Notebooks), via a REST API and ODBC/JDBC interfaces.

Talend Cloud Big Data is a unified data management PaaS that encompasses data preparation, integration, data quality, and data stewardship. Talend Cloud leverages Talend Studio’s rich graphical user interface for the users to build data pipelines in a visual dashboard. Talend generates native Hive or Spark code which can be easily run in Qubole.

While companies get better at capturing more and more data, the data teams are under intensifying pressure to make the data usable for the business and provide access to everyone who needs it. In fact, the exponential growth of data, data types, use cases and user expectations combined with the static IT budgets, and a shortage of Big Data skills make the process of turning captured data into business value very expensive. As a result, companies get trapped into one of the following scenarios:

  • Spend their time and money managing or updating legacy technologies that can’t keep up with business’ demand for data.
  • Fail to deploy Big Data projects as managing Big Data technologies is complicated, costly, and requires expertise that is hard to find.

To address these issues, Talend and Qubole have partnered to help companies address the gap between IT capabilities and users’ data demand. The Qubole-Talend integration provides a simple and intuitive data integration and preparation in the cloud, at a fraction of the cost and resources of traditional systems. The integration leverages Qubole’s workload-aware autoscaling, which automatically adjusts the size of the Big Data clusters according to the data jobs and pipelines built in Talend Studio. Companies don’t need to spend time managing their Big Data infrastructure or writing complex MapReduce/Spark code to prepare their data for business consumption. Instead of that, they can use Talend Studio visual interface to create data jobs and pipelines that are executed in Qubole, thus taking full advantage of the cloud’s cost and scalability.

What’s New in the Integration with Talend 7.1?

The integration with Talend 7.1 provides a serverless approach to execute Talend workflows. With Talend 7.1 customers don’t need to worry about having a cluster up and running before executing a Talend workflow, or scaling the cluster to the right size to execute Talend data pipelines efficiently. Users just need to specify the Qubole cluster label when defining the Spark cluster in Talend Studio and Qubole will automatically start, scale and shutdown the cluster on behalf of the user, eliminating the need of manually provision and manage Spark and Hadoop clusters. Talend 7.1 integration also simplifies the configuration of Qubole Spark because it doesn’t require users to hardcode the IP address of the Spark clusters in Talend Studio anymore.

This guide provides information about the integration, architecture, relevant configuration, and sample use cases.