Recovering Table Partitions

ALTER TABLE RECOVER PARTITIONS is the command that is widely used in Hive to refresh partitions as new partitions are directly added to the file system by other users.

Qubole has added path validation to the ALTER TABLE RECOVER PARTITIONS command. The command only recovers valid paths. This avoids the partition structure from being corrupted and the query from fetching an incorrect result.

For example, if you add a directory unrelated to table data in table location, then without validation, earlier, Hive would add it as a separate partition. The command may give incorrect result for queries or may corrupt the partition’s structure.

Configuring Validation for Recovering Partitions

Set hive.qubole.recover.partitions.path.validation as the Hive configuration property as Hive Overrides in the Hadoop (Hive) cluster. Its value can be one of the following levels of validation:

  • ignore: It implies that no validation is done and it is the default value.
  • depth: It implies that the command does not recover path, which has less directory depth than the total number of partition columns in the table.
  • name: It implies that the command does not recover partition directories, which do not have a name such as <partition column name>=<partition value> in addition to the validation performed when depth is set as the validation level.

Note

Qubole skips the invalid path, which does not pass validation and does not throw an error. Thus the query executes successfully even if some paths are invalid.

Let us understand the validation process from the example below.

Assume the following directory structure for a table t1 path directory with 2 partition columns p1 and p2.

.../t1/1/2/3/<data>
.../t1/1/3/<data>
.../t1/4/<data>
.../t1/9/            -- empty directory
.../t1/p1=5/p2=6/p3=7/<data>
.../t1/p1=5/p2=7/<data>
.../t1/p1=8/<data>
.../t1/p1=10/        -- empty directory

Setting hive.qubole.recover.partitions.path.validation=ignore recovers these partitions.

p1=1/p2=2                   (.../t1/1/2/3/<data>)
p1=1/p2=3                   (.../t1/1/3/<data>)
p1=TES-10565/p2=9           (.../t1/9/)           -- corrupts partition structure
p1=5/p2=6                   (.../t1/p1=5/p2=6/p3=7/<data>)
p1=5/p2=7                   (.../t1/p1=5/p2=7/<data>)
p1=TES-10565/p2=p1%3D10     (.../t1/p1=10/)       -- corrupts partition structure

Setting hive.qubole.recover.partitions.path.validation=depth recovers these partitions.

p1=1/p2=2                   (.../t1/1/2/3/<data>)
p1=1/p2=3                   (.../t1/1/3/<data>)
p1=5/p2=6                   (.../t1/p1=5/p2=6/p3=7/<data>)
p1=5/p2=7                   (.../t1/p1=5/p2=7/<data>)

Setting hive.qubole.recover.partitions.path.validation=name recovers these partitions.

p1=5/p2=6                   (.../t1/p1=5/p2=6/p3=7/<data>)
p1=5/p2=7                   (.../t1/p1=5/p2=7/<data>)