I have a 3 nodes setup running Marathon, mesos-master,mesos-slave and Zookeeper with HA config enabled, then tested a deployment of simple hello app using mesos-execute and it's working as expected.
Now everything looks fine, so I connect to Marathon and deploy a simple app to test marathon: (echo "hello" >> /tmp/output.txt) but the application get sucked in "waiting" status.
what could be the problem preventing Marathon to use mesos resources for deployment ?
Logs from mesos-master:
I0904 11:23:27.064332 19769 master.cpp:2813] Received SUBSCRIBE call for framework 'marathon' at [email protected]:36324
I0904 11:23:27.064623 19769 master.cpp:2890] Subscribing framework marathon with checkpointing enabled and capabilities [ PARTITION_AWARE ]
I0904 11:23:27.064669 19769 master.cpp:6272] Updating info for framework cb16118a-2257-4020-a907-63aa6294e11b-0000
I0904 11:23:27.064697 19769 master.cpp:2994] Framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at [email protected]:36324 failed over
I0904 11:23:27.065032 19770 hierarchical.cpp:342] Activated framework cb16118a-2257-4020-a907-63aa6294e11b-0000
I0904 11:23:27.065465 19770 master.cpp:7305] Sending 3 offers to framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at [email protected]:36324
I0904 11:23:27.907865 19769 http.cpp:1115] HTTP GET for /files/read?_=1504517007920&jsonp=jQuery17109098185077823333_1504516979864&length=50000&offset=352538&path=%2Fmaster%2Flog from 192.168.40.1:53525 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
I0904 11:23:28.916651 19768 http.cpp:1115] HTTP GET for /files/read?_=1504517008930&jsonp=jQuery17109098185077823333_1504516979865&length=50000&offset=353797&path=%2Fmaster%2Flog from 192.168.40.1:53525 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
E0904 11:23:30.071293 19775 process.cpp:2450] Failed to shutdown socket with fd 39, address 192.168.40.159:58072: Transport endpoint is not connected
I0904 11:23:30.073277 19768 master.cpp:1430] Framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at [email protected]:36324 disconnected
I0904 11:23:30.073307 19768 master.cpp:3160] Deactivating framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at [email protected]:36324
I0904 11:23:30.073485 19768 master.cpp:3137] Disconnecting framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at [email protected]:36324
I0904 11:23:30.073496 19768 master.cpp:1445] Giving framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at [email protected]:36324 1weeks to failover
I0904 11:23:30.073519 19768 hierarchical.cpp:374] Deactivated framework cb16118a-2257-4020-a907-63aa6294e11b-0000
curl -XGET 'http://mesosphere2:8098/v2/queue?pretty' | jq
{
"queue": [
{
"count": 1,
"delay": {
"timeLeftSeconds": 0,
"overdue": true
},
"since": "2017-09-04T13:12:42.024Z",
"processedOffersSummary": {
"processedOffersCount": 12,
"unusedOffersCount": 12,
"lastUnusedOfferAt": "2017-09-04T13:14:52.554Z",
"rejectSummaryLastOffers": [
{
"reason": "UnfulfilledRole",
"declined": 3,
"processed": 3
},
{
"reason": "UnfulfilledConstraint",
"declined": 0,
"processed": 0
},
{
"reason": "NoCorrespondingReservationFound",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientCpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientMemory",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientDisk",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientGpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientPorts",
"declined": 0,
"processed": 0
}
],
"rejectSummaryLaunchAttempt": [
{
"reason": "UnfulfilledRole",
"declined": 12,
"processed": 12
},
{
"reason": "UnfulfilledConstraint",
"declined": 0,
"processed": 0
},
{
"reason": "NoCorrespondingReservationFound",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientCpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientMemory",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientDisk",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientGpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientPorts",
"declined": 0,
"processed": 0
}
]
},
"app": {
"id": "/test03",
"acceptedResourceRoles": [
"slave_public"
],
"backoffFactor": 1.15,
"backoffSeconds": 1,
"container": {
"type": "DOCKER",
"docker": {
"forcePullImage": false,
"image": "laghao/hello-marathon",
"network": "BRIDGE",
"parameters": [],
"portMappings": [
{
"containerPort": 80,
"hostPort": 80,
"labels": {},
"protocol": "tcp",
"servicePort": 10003
}
],
"privileged": false
},
"volumes": []
},
"cpus": 0.1,
"disk": 0,
"executor": "",
"instances": 1,
"labels": {},
"maxLaunchDelaySeconds": 3600,
"mem": 64,
"gpus": 0,
"portDefinitions": [
{
"port": 10003,
"name": "default",
"protocol": "tcp"
}
],
"requirePorts": false,
"upgradeStrategy": {
"maximumOverCapacity": 1,
"minimumHealthCapacity": 1
},
"version": "2017-09-04T13:12:41.993Z",
"versionInfo": {
"lastScalingAt": "2017-09-04T13:12:41.993Z",
"lastConfigChangeAt": "2017-09-04T13:12:41.993Z"
},
"killSelection": "YOUNGEST_FIRST",
"unreachableStrategy": {
"inactiveAfterSeconds": 300,
"expungeAfterSeconds": 600
}
}
}
]
}
From documentation
In your case there is a problem with application role requirement and agent role. You can deduce it from
UnfulfilledRole
.Marathon 1.4 introduced information about stuck deployments. You can query
/v2/queue
and get statistics why offers were declined.