Troubleshooting Oracle OCI Cluster Startup Failures

Diagnosing and Fixing Problems

The table that follows lists some common error messages that may be logged when a cluster fails to start, describes the underlying causes, and provides remedies:

Error message text

Cause

What to do

``Hadoop Bring up failed. File

<filename> could only be replicated to 0 nodes…``

Coordinator daemon cannot talk to

worker daemon, or worker is down or out of disk space.

Make sure you have configured the subnet so as to allow communication among all nodes: see Configuring Oracle OCI Resources.

The limit for this tenancy has been exceeded

Bringing up this cluster would exceed this tenancy’s limit for instances of this type.

Decrease the cluster size, or change the instance type, and try again. If that fails, ask Oracle support for a higher limit.

HEALTH-CHECK-FAILED. Reason: Failed to create socks proxy for cluster...

QDS cannot contact the cluster coordinator node via SSH.

Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209); use the subnet’s security list to do this.

Preventing Problems

Here are some guidelines to help you prevent similar problems in the future.