Using seneca mesh on rancher

200 Views Asked by At

I have a Seneca mesh which is working flawlessly on my laptop. However it's being impossible making to make it work on rancher. All the containers end diying because mesh timeouts.

I've read the entire seneca-mesh source code and the underliying sneeze library. That gave me some clues about how to debug this, but that's the longest I got. For example, I realize that seneca-mesh provides several options to sneeze which in turn it provides those to the SWIM implementation they use.

I tried increasing the mesh join timeouts, suspecting that they were two narrow (which was true because they were 777ms). Here is what I set:

edit: I just realized that it will be better if I include the full

{
    pins: ['role:mesh,cmd:test'],
    host: '10.58.58.58',
    isbase: true,
    port: '39999',
    bases: ['10.40.40.1:39999', 'base-hostname:39001'],
    stop: false,
    balance_client: { debug: { client_updates: true } },
    jointime: 2000,
    sneeze:
    {
        silent: false,
        swim: { joinTimeout: 2777, pingTimeout: 2444, pingReqTimeout: 2333 }
    },
    discover:
    {
        custom: { active: true, find: dnsSeed },
        multicast: { active: false },
        registry: { active: false }
    }
}

Sadly the only difference it made was the timeout took longer. Thanks to the silent option you can see I set to false gives you very detailed logs, the only problem is that I can't understand why they are failing.

In my simplest scenario I have two servers, one acting as base (which starts without problems) and a mesh-node that should join the mesh through SWIM. Inspecting the logs I see that they are able to communicate, and the join request from the node are received by the base-node. This happens a couple of times, spaced by the defined timeout and after a couple of join request the base-node asks for leaving the mesh and a timeout occurs:

BASE NODE

SNEEZE 87271 add 10.88.250.29:46830 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh { config:
19/6/2018 16:21:27   { type: 'web',
19/6/2018 16:21:27     port: 57280,
19/6/2018 16:21:27     host: '10.88.250.29',
19/6/2018 16:21:27     path: '/act',
19/6/2018 16:21:27     protocol: 'http',
19/6/2018 16:21:27     timeout: 5555,
19/6/2018 16:21:27     max_listen_attempts: 11,
19/6/2018 16:21:27     attempt_delay: 222,
19/6/2018 16:21:27     serverOptions: {},
19/6/2018 16:21:27     pin:
19/6/2018 16:21:27      [ 'cmd:*,role:profile',
19/6/2018 16:21:27        'cmd:probes.get,role:profile',
19/6/2018 16:21:27        'cmd:campaigns.get,role:profile',
19/6/2018 16:21:27        'cmd:universes.get,role:profile',
19/6/2018 16:21:27        'cmd:probes.del,role:profile',
19/6/2018 16:21:27        'cmd:campaigns.del,role:profile',
19/6/2018 16:21:27        'cmd:universes.del,role:profile' ],
19/6/2018 16:21:27     model: 'consume',
19/6/2018 16:21:27     ismesh: true,
19/6/2018 16:21:27     msgprefix: 'seneca_',
19/6/2018 16:21:27     callmax: 111111,
19/6/2018 16:21:27     msgidlen: 12,
19/6/2018 16:21:27     role: 'transport',
19/6/2018 16:21:27     hook: 'listen' },
19/6/2018 16:21:27  instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:27  'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:27  'tag$': 'seneca~mesh',
19/6/2018 16:21:27  'v$': 0 }
19/6/2018 16:21:30SNEEZE 90301 add 10.88.250.29:49483 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh { config:
19/6/2018 16:21:30   { type: 'web',
19/6/2018 16:21:30     port: 57280,
19/6/2018 16:21:30     host: '10.88.250.29',
19/6/2018 16:21:30     path: '/act',
19/6/2018 16:21:30     protocol: 'http',
19/6/2018 16:21:30     timeout: 5555,
19/6/2018 16:21:30     max_listen_attempts: 11,
19/6/2018 16:21:30     attempt_delay: 222,
19/6/2018 16:21:30     serverOptions: {},
19/6/2018 16:21:30     pin:
19/6/2018 16:21:30      [ 'cmd:*,role:profile',
19/6/2018 16:21:30        'cmd:probes.get,role:profile',
19/6/2018 16:21:30        'cmd:campaigns.get,role:profile',
19/6/2018 16:21:30        'cmd:universes.get,role:profile',
19/6/2018 16:21:30        'cmd:probes.del,role:profile',
19/6/2018 16:21:30        'cmd:campaigns.del,role:profile',
19/6/2018 16:21:30        'cmd:universes.del,role:profile' ],
19/6/2018 16:21:30     model: 'consume',
19/6/2018 16:21:30     ismesh: true,
19/6/2018 16:21:30     msgprefix: 'seneca_',
19/6/2018 16:21:30     callmax: 111111,
19/6/2018 16:21:30     msgidlen: 12,
19/6/2018 16:21:30     role: 'transport',
19/6/2018 16:21:30     hook: 'listen' },
19/6/2018 16:21:30  instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:30  'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:30  'tag$': 'seneca~mesh',
19/6/2018 16:21:30  'v$': 0 }
19/6/2018 16:21:35SNEEZE 95647 remove 10.88.250.29:46830 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh { config:
19/6/2018 16:21:35   { type: 'web',
19/6/2018 16:21:35     port: 57280,
19/6/2018 16:21:35     host: '10.88.250.29',
19/6/2018 16:21:35     path: '/act',
19/6/2018 16:21:35     protocol: 'http',
19/6/2018 16:21:35     timeout: 5555,
19/6/2018 16:21:35     max_listen_attempts: 11,
19/6/2018 16:21:35     attempt_delay: 222,
19/6/2018 16:21:35     serverOptions: {},
19/6/2018 16:21:35     pin:
19/6/2018 16:21:35      [ 'cmd:*,role:profile',
19/6/2018 16:21:35        'cmd:probes.get,role:profile',
19/6/2018 16:21:35        'cmd:campaigns.get,role:profile',
19/6/2018 16:21:35        'cmd:universes.get,role:profile',
19/6/2018 16:21:35        'cmd:probes.del,role:profile',
19/6/2018 16:21:35        'cmd:campaigns.del,role:profile',
19/6/2018 16:21:35        'cmd:universes.del,role:profile' ],
19/6/2018 16:21:35     model: 'consume',
19/6/2018 16:21:35     ismesh: true,
19/6/2018 16:21:35     msgprefix: 'seneca_',
19/6/2018 16:21:35     callmax: 111111,
19/6/2018 16:21:35     msgidlen: 12,
19/6/2018 16:21:35     role: 'transport',
19/6/2018 16:21:35     hook: 'listen' },
19/6/2018 16:21:35  instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:35  'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:35  'tag$': 'seneca~mesh',
19/6/2018 16:21:35  'v$': 0 }

CLIENT NODE:

As you can see there are several join request and finally a remove one, let's examine the client logs

19/6/2018 16:21:21SNEEZE 81049 joining 4 10.88.250.29:41450 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh [ 'nanoservices-base1:39999', '10.42.209.107:39999' ] { joinTimeout: 2777,
19/6/2018 16:21:21  pingTimeout: 2444,
19/6/2018 16:21:21  pingReqTimeout: 2333,
19/6/2018 16:21:21  codec: 'msgpack',
19/6/2018 16:21:21  disseminationFactor: 22,
19/6/2018 16:21:21  interval: 111,
19/6/2018 16:21:21  pingReqGroupSize: 7,
19/6/2018 16:21:21  udp: { maxDgramSize: 2048 },
19/6/2018 16:21:21  local:
19/6/2018 16:21:21   { host: '10.88.250.29:41450',
19/6/2018 16:21:21     meta:
19/6/2018 16:21:21      { config: [Object],
19/6/2018 16:21:21        instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:21        'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:21        'tag$': 'seneca~mesh',
19/6/2018 16:21:21        'v$': 0 },
19/6/2018 16:21:21     incarnation: 1529418081049 } }
19/6/2018 16:21:24SNEEZE 84284 joining 5 10.88.250.29:42022 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh [ 'nanoservices-base1:39999', '10.42.209.107:39999' ] { joinTimeout: 2777,
19/6/2018 16:21:24  pingTimeout: 2444,
19/6/2018 16:21:24  pingReqTimeout: 2333,
19/6/2018 16:21:24  codec: 'msgpack',
19/6/2018 16:21:24  disseminationFactor: 22,
19/6/2018 16:21:24  interval: 111,
19/6/2018 16:21:24  pingReqGroupSize: 7,
19/6/2018 16:21:24  udp: { maxDgramSize: 2048 },
19/6/2018 16:21:24  local:
19/6/2018 16:21:24   { host: '10.88.250.29:42022',
19/6/2018 16:21:24     meta:
19/6/2018 16:21:24      { config: [Object],
19/6/2018 16:21:24        instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:24        'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:24        'tag$': 'seneca~mesh',
19/6/2018 16:21:24        'v$': 0 },
19/6/2018 16:21:24     incarnation: 1529418084284 } }
19/6/2018 16:21:27SNEEZE 87269 joining 6 10.88.250.29:46830 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh [ 'nanoservices-base1:39999', '10.42.209.107:39999' ] { joinTimeout: 2777,
19/6/2018 16:21:27  pingTimeout: 2444,
19/6/2018 16:21:27  pingReqTimeout: 2333,
19/6/2018 16:21:27  codec: 'msgpack',
19/6/2018 16:21:27  disseminationFactor: 22,
19/6/2018 16:21:27  interval: 111,
19/6/2018 16:21:27  pingReqGroupSize: 7,
19/6/2018 16:21:27  udp: { maxDgramSize: 2048 },
19/6/2018 16:21:27  local:
19/6/2018 16:21:27   { host: '10.88.250.29:46830',
19/6/2018 16:21:27     meta:
19/6/2018 16:21:27      { config: [Object],
19/6/2018 16:21:27        instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:27        'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:27        'tag$': 'seneca~mesh',
19/6/2018 16:21:27        'v$': 0 },
19/6/2018 16:21:27     incarnation: 1529418087269 } }
19/6/2018 16:21:30SNEEZE 90298 joining 7 10.88.250.29:49483 60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604 seneca~mesh [ 'nanoservices-base1:39999', '10.42.209.107:39999' ] { joinTimeout: 2777,
19/6/2018 16:21:30  pingTimeout: 2444,
19/6/2018 16:21:30  pingReqTimeout: 2333,
19/6/2018 16:21:30  codec: 'msgpack',
19/6/2018 16:21:30  disseminationFactor: 22,
19/6/2018 16:21:30  interval: 111,
19/6/2018 16:21:30  pingReqGroupSize: 7,
19/6/2018 16:21:30  udp: { maxDgramSize: 2048 },
19/6/2018 16:21:30  local:
19/6/2018 16:21:30   { host: '10.88.250.29:49483',
19/6/2018 16:21:30     meta:
19/6/2018 16:21:30      { config: [Object],
19/6/2018 16:21:30        instance: '60226afpvh7z/1529418068191/11/3.6.0/profiles',
19/6/2018 16:21:30        'identifier$': '60226afpvh7z/1529418068191/11/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529418068604',
19/6/2018 16:21:30        'tag$': 'seneca~mesh',
19/6/2018 16:21:30        'v$': 0 },
19/6/2018 16:21:30     incarnation: 1529418090298 } }
19/6/2018 16:21:30      ERR -   Error: seneca: Action name:mesh,plugin:define,role:seneca,seq:2,tag:undefined failed: [TIMEOUT].
19/6/2018 16:21:30      ERR -   Error: seneca: Action cmd:listen,role:transport failed: [TIMEOUT].
19/6/2018 16:21:30/home/dockerfileUser/node_modules/seneca/lib/common.js:278
19/6/2018 16:21:30        throw error(err, '[DEATH LOOP] die count: ' + diecount)
19/6/2018 16:21:30        ^
19/6/2018 16:21:30
19/6/2018 16:21:30Error: [TIMEOUT]
19/6/2018 16:21:30    at Object.act_tm [as ontm] (/home/dockerfileUser/node_modules/seneca/seneca.js:923:52)
19/6/2018 16:21:30    at Timeout.timeout_check [as _onTimeout] (/home/dockerfileUser/node_modules/gate-executor/gate-executor.js:216:16)
19/6/2018 16:21:30    at ontimeout (timers.js:475:11)
19/6/2018 16:21:30    at tryOnTimeout (timers.js:310:5)
19/6/2018 16:21:30    at Timer.listOnTimeout (timers.js:270:5)

It's obvious that the same join request are seen on both ends, but for some weird reason they don't agree on joining the mesh

As reference, here is a log that works on local:

[profiles - 70833]: SNEEZE 86079 joining 0 127.0.0.1:42294 uemigxtf6h6w/1529417884911/70833/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529417886077 seneca~mesh [ '127.0.0.1:39999', '127.0.0.1:39001' ] { joinTimeout:2777,
[profiles - 70833]:   pingTimeout: 2444,
[profiles - 70833]:   pingReqTimeout: 2333,
[profiles - 70833]:   codec: 'msgpack',
[profiles - 70833]:   disseminationFactor: 22,
[profiles - 70833]:   interval: 111,
[profiles - 70833]:   pingReqGroupSize: 7,
[profiles - 70833]:   udp: { maxDgramSize: 2048 },
[profiles - 70833]:   local:
[profiles - 70833]:    { host: '127.0.0.1:42294',
[profiles - 70833]:      meta:
[profiles - 70833]:       { config: [Object],
[profiles - 70833]:         instance: 'uemigxtf6h6w/1529417884911/70833/3.6.0/profiles',
[profiles - 70833]:         'identifier$':
[profiles - 70833]:          'uemigxtf6h6w/1529417884911/70833/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529417886077',
[profiles - 70833]:         'tag$': 'seneca~mesh',
[profiles - 70833]:         'v$': 0 },
[profiles - 70833]:      incarnation: 1529417886079 } }
[base0 - 70753]: SNEEZE 86089 add 127.0.0.1:42294 uemigxtf6h6w/1529417884911/70833/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529417886077 seneca~mesh { config:
[base0 - 70753]:    { type: 'web',
[base0 - 70753]:      port: 59881,
[base0 - 70753]:      host: '0.0.0.0',
[base0 - 70753]:      path: '/act',
[base0 - 70753]:      protocol: 'http',
[base0 - 70753]:      timeout: 5555,
[base0 - 70753]:      max_listen_attempts: 11,
[base0 - 70753]:      attempt_delay: 222,
[base0 - 70753]:      serverOptions: {},
[base0 - 70753]:      pin:
[base0 - 70753]:       [ 'cmd:*,role:profile',
[base0 - 70753]:         'cmd:probes.get,role:profile',
[base0 - 70753]:         'cmd:campaigns.get,role:profile',
[base0 - 70753]:         'cmd:universes.get,role:profile',
[base0 - 70753]:         'cmd:probes.del,role:profile',
[base0 - 70753]:         'cmd:campaigns.del,role:profile',
[base0 - 70753]:         'cmd:universes.del,role:profile' ],
[base0 - 70753]:      model: 'consume',
[base0 - 70753]:      ismesh: true,
[base0 - 70753]:      msgprefix: 'seneca_',
[base0 - 70753]:      callmax: 111111,
[base0 - 70753]:      msgidlen: 12,
[base0 - 70753]:      role: 'transport',
[base0 - 70753]:      hook: 'listen' },
[base0 - 70753]:   instance: 'uemigxtf6h6w/1529417884911/70833/3.6.0/profiles',
[base0 - 70753]:   'identifier$':
[base0 - 70753]:    'uemigxtf6h6w/1529417884911/70833/3.6.0/profiles~cmd:*,role:profile,cmd:probes.get,role:profile,cmd:campaigns.get,role:profile,cmd:universes.get,role:profile,cmd:probes.del,role:profile,cmd:campaigns.del,role:profile,cmd:universes.del,role:profile~1529417886077',
[base0 - 70753]:   'tag$': 'seneca~mesh',
[base0 - 70753]:   'v$': 0 }
[profiles - 70833]: SNEEZE 86139 add 127.0.0.1:39999 8tcsh6vibtus/1529417858791/70753/3.6.0/base0~cmd:test,role:mesh~1529417859689 seneca~mesh { config:
[profiles - 70833]:    { type: 'web',
[profiles - 70833]:      port: 53238,
[profiles - 70833]:      host: '0.0.0.0',
[profiles - 70833]:      path: '/act',
[profiles - 70833]:      protocol: 'http',
[profiles - 70833]:      timeout: 5555,
[profiles - 70833]:      max_listen_attempts: 11,
[profiles - 70833]:      attempt_delay: 222,
[profiles - 70833]:      serverOptions: {},
[profiles - 70833]:      pin: [ 'cmd:test,role:mesh' ],
[profiles - 70833]:      model: 'consume',
[profiles - 70833]:      ismesh: true,
[profiles - 70833]:      msgprefix: 'seneca_',
[profiles - 70833]:      callmax: 111111,
[profiles - 70833]:      msgidlen: 12,
[profiles - 70833]:      role: 'transport',
[profiles - 70833]:      hook: 'listen' },
[profiles - 70833]:   instance: '8tcsh6vibtus/1529417858791/70753/3.6.0/base0',
[profiles - 70833]:   'identifier$':
[profiles - 70833]:    '8tcsh6vibtus/1529417858791/70753/3.6.0/base0~cmd:test,role:mesh~1529417859689',
[profiles - 70833]:   'tag$': 'seneca~mesh',
[profiles - 70833]:   'v$': 0 }
[profiles - 70833]: Joined the mesh:  { id: 'uemigxtf6h6w/1529417884911/70833/3.6.0/profiles',
[profiles - 70833]:   tag: 'profiles',
[profiles - 70833]:   start: 1529417884911 }
[profiles - 70833]: Deferred services loaded

As I said, I tried every possible combination of timeouts and register methods, I asked for help on their gitter channel, I read the source code of the two main libraries used, an I'm unable to find an answer. I know there are users facing similar problems with rancher as explained here: https://github.com/senecajs/seneca-mesh/issues/75 (by the way, the last comment is mine) but they at least are able to run a basic mesh on rancher, I can't even reach that point.

0

There are 0 best solutions below