Recovering Table Partitions¶
ALTER TABLE RECOVER PARTITIONS is the command that is widely used in Hive to refresh partitions as new partitions are
directly added to the file system by other users.
Qubole has added path validation to the
ALTER TABLE RECOVER PARTITIONS command. The command only
recovers valid paths. This avoids the partition structure from being corrupted and the query from fetching an
For example, if you add a directory unrelated to table data in table location, then without validation, earlier, Hive would add it as a separate partition. The command may give incorrect result for queries or may corrupt the partition’s structure.
Configuring Validation for Recovering Partitions¶
hive.qubole.recover.partitions.path.validation as the Hive configuration property as Hive Overrides in the Hadoop
(Hive) cluster. Its value can be one of the following levels of validation:
ignore: It implies that no validation is done and it is the default value.
depth: It implies that the command does not recover path, which has less directory depth than the total number of partition columns in the table.
name: It implies that the command does not recover partition directories, which do not have a name such as
<partition column name>=<partition value>in addition to the validation performed when
depthis set as the validation level.
Qubole skips the invalid path, which does not pass validation and does not throw an error. Thus the query executes successfully even if some paths are invalid.
Let us understand the validation process from the example below.
Assume the following directory structure for a table
t1 path directory with 2 partition columns
.../t1/1/2/3/<data> .../t1/1/3/<data> .../t1/4/<data> .../t1/9/ -- empty directory .../t1/p1=5/p2=6/p3=7/<data> .../t1/p1=5/p2=7/<data> .../t1/p1=8/<data> .../t1/p1=10/ -- empty directory
hive.qubole.recover.partitions.path.validation=ignore recovers these partitions.
p1=1/p2=2 (.../t1/1/2/3/<data>) p1=1/p2=3 (.../t1/1/3/<data>) p1=TES-10565/p2=9 (.../t1/9/) -- corrupts partition structure p1=5/p2=6 (.../t1/p1=5/p2=6/p3=7/<data>) p1=5/p2=7 (.../t1/p1=5/p2=7/<data>) p1=TES-10565/p2=p1%3D10 (.../t1/p1=10/) -- corrupts partition structure
hive.qubole.recover.partitions.path.validation=depth recovers these partitions.
p1=1/p2=2 (.../t1/1/2/3/<data>) p1=1/p2=3 (.../t1/1/3/<data>) p1=5/p2=6 (.../t1/p1=5/p2=6/p3=7/<data>) p1=5/p2=7 (.../t1/p1=5/p2=7/<data>)
hive.qubole.recover.partitions.path.validation=name recovers these partitions.
p1=5/p2=6 (.../t1/p1=5/p2=6/p3=7/<data>) p1=5/p2=7 (.../t1/p1=5/p2=7/<data>)