I have the following configuration in Alibaba ECS
:
Public Connector and Three Test Nodes
Connector
has network connections on the public internet and the default VSwitch
in the default VPC
. Connector
was created using the ECS web interface. The testnode[0-2]
machines were created in a script using the Alibaba cli command: aliyun.
When the instances start running, the connector can ping none of them. If I set a password on any of the test nodes, and then restart the test node, ping starts working. The script uses a snapshot of the Connector
as the image for the test nodes. The ```Connector`` has a randomly generated, long, and forgotten root password. Root access is via ssh with a passphrase protected key pair. It also has the same for a non-root user for the test code.
What I have tried is creating test nodes with the following CreateInstance
options:
No
--Password
and no--InheritPassword
options (original intent: why set a password? I have the access I need from the Connector image)--InheritPassword
option (I need a root password in order for the private network interfaces to work, the root password in the Connector image is fine)--Password
option (I need to explicitly set a root password on the test nodes)
The result is all the same, until I use the ECS web interface to set a password and restart a test node, Console
cannot ping the test nodes.
What I know:
This is not a problem with the default security group, VPC, or VSwitch as I touch no settings on these entities in order for ping to work.
This is not a problem with the instance image because as soon as ping works, ssh to the test nodes works as well.
What I am doing wrong, or what am I missing? The whole purpose is to spin up instances without having to type away at the ECS web interface. I figured out what it took to get the private network traffic moving because I wanted to debug the situation on the test nodes, and for that, I had to set a root password and gain access from the ECS web console, which again, defeats the purpose of scripting.
Aliyun
command for creating the test nodes:
aliyun ecs CreateInstance --ImageId m-2vchb2oxldfuloh51wp9 --RegionId=cn-chengdu --InstanceType=ecs.c6.xlarge --SpotStrategy SpotWithPriceLimit --SpotPriceLimit 0.25 --ZoneId cn-chengdu-a --InternetChargeType PayByTraffic --InternetMaxBandwidthOut 99 --InstanceName TEST_NODE-0 --HostName testnode0 --Password 'notgoingtotellyou'
Operating system for all instances is Ubuntu 18.0.4.
Aliyun command version is 3.0.30.
I got two answers. One from a co-worker. One from Alibaba.
Co-worker's answer: The configuration fails because the Unbuntu 18.0.4 image that I created for the non-public test machines used a static address for the internal network interface. I changed the internal network interface (
eth0
) to use dhcp and all worked. See netplan configuration examples for how to change the IP address assignment.Alibaba's answer: Try using
aliyun ecs RunInstances
instead of three individualaliyun ecs CreateInstance
andaliyun ecs StartInstance
invocations. I did not try this solution as it would have involved rewriting my scripts. Alibaba could have done more to motivate me by providing an explanation as to whyRunInstances
would produce a different result than the combination ofCreateInstance
andStartInstance
.