Troubleshooting Oracle OCI Cluster Startup Failures
Diagnosing and Fixing Problems
The table that follows lists some common error messages that may be logged when a cluster fails to start, describes the underlying causes, and provides remedies:
Error message text |
Cause |
What to do |
---|---|---|
|
worker daemon, or worker is down or out of disk space. |
Make sure you have configured the subnet so as to allow communication among all nodes: see Configuring Oracle OCI Resources. |
|
Bringing up this cluster would exceed this tenancy’s limit for instances of this type. |
Decrease the cluster size, or change the instance type, and try again. If that fails, ask Oracle support for a higher limit. |
|
QDS cannot contact the cluster coordinator node via SSH. |
Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209); use the subnet’s security list to do this. |
Preventing Problems
Here are some guidelines to help you prevent similar problems in the future.
- Make sure you’ve read and understood the relevant Qubole and Cloud documentation, in particular:
Make sure you have configured each subnet so as to allow communication among all nodes.
Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209).
Make sure that starting the cluster will not put you over the limit for your tenancy.