Configuring Hive Tables Data Dependency¶
To configure hive partitions dependency, select Wait For Hive Partitions option available in Dependencies. See Hive Datasets as Schedule Dependency for more information.
Use the tooltip to know more information on each field or check box.
Perform these steps after selecting Wait for Hive Partitions:
After you select Wait for Hive Partitions, the Schema text field is displayed. Click in the text field and a list of available schema in the account is displayed as illustrated in the following figure. Select a schema from the list.
After selecting a schema, Table text field is displayed. Select a table that has partitions. The following figure illustrates a table with Hive partitions.
After you select the table, the Table Data settings are displayed as shown in the following figure.
In Global Settings:
Set the Interval and select an incremental value from the Increment drop-down list. The default value is minutes.
Set the Window Start time. See Configuring S3/Azure Blob Storage Files Data Dependency for more information.
Set the Window End time. See Configuring S3/Azure Blob Storage Files Data Dependency for more information.
Select a partition column from the Column drop-down list. The following options are displayed:
Set Date Time Mask for the partition: This value is matched with the nominal time format and then the corresponding value is used as a string to check for dependency.
Specify dependency on partition column values: This value is used as string to check for dependency.
Depending on whether you want to set Date Time Mask or specify the dependency, perform the appropriate actions:
If you want to set Date Time Mask, select the Specify DateTime Mask for this Partition check box and enter the Date/Time Mask. For example,
`%Y-%M`specifies year and month as the dependency value. An example is illustrated in the following figure.
If you want to specify the dependency value, enter values in the Partition Column field.
Values of the macros defined in a schedule are not supported for checking dependencies. Therefore, you must not enter these values in the Partition Column field.
An example is illustrated in the following figure.
Configure Timeout in minutes to change the default/previously-set time.
Large data sets are typically divided into directories. Directories map to partitions in Hive. Currently, partitions in Hive are populated manually using the following command (picking up for the miniwikistats table):
ALTER TABLE miniwikistats RECOVER PARTITIONS;
To add multiple Hive Table dependencies, click +Add New and enter the required information as described above in this topic.