Troubleshooting Oracle OCI Cluster Startup Failures¶
Diagnosing and Fixing Problems¶
The table that follows lists some common error messages that may be logged when a cluster fails to start, describes the underlying causes, and provides remedies:
Error message text | Cause | What to do |
---|---|---|
|
worker daemon, or worker is down or out of disk space. |
Make sure you have configured the subnet so as to allow communication among all nodes: see Configuring Oracle OCI Resources. |
The limit for this tenancy has
been exceeded |
Bringing up this cluster would exceed this tenancy’s limit for instances of this type. | Decrease the cluster size, or change the instance type, and try again. If that fails, ask Oracle support for a higher limit. |
HEALTH-CHECK-FAILED. Reason:
Failed to create socks proxy for
cluster... |
QDS cannot contact the cluster coordinator node via SSH. | Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209); use the subnet’s security list to do this. |
Preventing Problems¶
Here are some guidelines to help you prevent similar problems in the future.
- Make sure you’ve read and understood the relevant Qubole and Cloud documentation, in particular:
- Make sure you have configured each subnet so as to allow communication among all nodes.
- Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209).
- Make sure that starting the cluster will not put you over the limit for your tenancy.