Recovering Table Partitions
ALTER TABLE RECOVER PARTITIONS
is the command that is widely used in Hive to refresh partitions as new partitions are
directly added to the file system by other users.
Qubole has added path validation to the ALTER TABLE RECOVER PARTITIONS
command. The command only
recovers valid paths. This avoids the partition structure from being corrupted and the query from fetching an
incorrect result.
For example, if you add a directory unrelated to table data in table location, then without validation, earlier, Hive would add it as a separate partition. The command may give incorrect result for queries or may corrupt the partition’s structure.
Configuring Validation for Recovering Partitions
Set hive.qubole.recover.partitions.path.validation
as the Hive configuration property as Hive Overrides in the Hadoop
(Hive) cluster. Its value can be one of the following levels of validation:
ignore
: It implies that no validation is done and it is the default value.depth
: It implies that the command does not recover path, which has less directory depth than the total number of partition columns in the table.name
: It implies that the command does not recover partition directories, which do not have a name such as<partition column name>=<partition value>
in addition to the validation performed whendepth
is set as the validation level.
Note
Qubole skips the invalid path, which does not pass validation and does not throw an error. Thus the query executes successfully even if some paths are invalid.
Let us understand the validation process from the example below.
Assume the following directory structure for a table t1
path directory with 2 partition columns p1
and p2
.
.../t1/1/2/3/<data>
.../t1/1/3/<data>
.../t1/4/<data>
.../t1/9/ -- empty directory
.../t1/p1=5/p2=6/p3=7/<data>
.../t1/p1=5/p2=7/<data>
.../t1/p1=8/<data>
.../t1/p1=10/ -- empty directory
Setting hive.qubole.recover.partitions.path.validation=ignore
recovers these partitions.
p1=1/p2=2 (.../t1/1/2/3/<data>)
p1=1/p2=3 (.../t1/1/3/<data>)
p1=TES-10565/p2=9 (.../t1/9/) -- corrupts partition structure
p1=5/p2=6 (.../t1/p1=5/p2=6/p3=7/<data>)
p1=5/p2=7 (.../t1/p1=5/p2=7/<data>)
p1=TES-10565/p2=p1%3D10 (.../t1/p1=10/) -- corrupts partition structure
Setting hive.qubole.recover.partitions.path.validation=depth
recovers these partitions.
p1=1/p2=2 (.../t1/1/2/3/<data>)
p1=1/p2=3 (.../t1/1/3/<data>)
p1=5/p2=6 (.../t1/p1=5/p2=6/p3=7/<data>)
p1=5/p2=7 (.../t1/p1=5/p2=7/<data>)
Setting hive.qubole.recover.partitions.path.validation=name
recovers these partitions.
p1=5/p2=6 (.../t1/p1=5/p2=6/p3=7/<data>)
p1=5/p2=7 (.../t1/p1=5/p2=7/<data>)