Pulsar Broker Fails to Start in Second Instance of GCP Two-Instance Cluster

28 Views Asked by At

I am trying to create a pulsar cluster in two GCE instances. I have followed the instructions on this documentation: https://pulsar.apache.org/docs/en/deploy-bare-metal/ to create the ansible playbook below (I have started the zookeeper services beforehand and they are running fine):

---
- name: Install Apache Pulsar on Ubuntu
  hosts: pulsar_servers
  become: true
  vars:
    pulsar_instance_1: "{{ hostvars['pulsar_instance_1'].ansible_host }}"
    pulsar_instance_2: "{{ hostvars['pulsar_instance_2'].ansible_host }}"

  tasks:
    - name: Download Pulsar binary package
      get_url:
        url: "https://archive.apache.org/dist/pulsar/pulsar-3.1.1/apache-pulsar-3.1.1-bin.tar.gz"
        dest: /tmp/pulsar.tgz

    - name: Extract Pulsar binary package
      unarchive:
        src: /tmp/pulsar.tgz
        dest: /opt
        remote_src: yes

    - name: Add custom hostnames to /etc/hosts file
      lineinfile:
        path: /etc/hosts
        line: "{{ item }}"
      with_items:
        - "{{ pulsar_instance_1 }} pulsar1"
        - "{{ pulsar_instance_2 }} pulsar2"

    - name: Initialize cluster metadata
      shell: sudo /opt/apache-pulsar-3.1.1/bin/pulsar initialize-cluster-metadata  --cluster codex-pulsar-cluster  --metadata-store zk:zookeeper1:2181,zookeeper2:2181  --configuration-metadata-store zk:zookeeper1:2181,zookeeper2:2181  --web-service-url http://pulsar1:8080,pulsar2:8080  --broker-service-url pulsar://pulsar1:6650,pulsar2:6650 
      when: ansible_ssh_host == pulsar_instance_1

    - name: Edit the BookKeeper configuration file
      lineinfile:
        path: /opt/apache-pulsar-3.1.1/conf/bookkeeper.conf
        regexp: "^metadataServiceUri="
        line: "metadataServiceUri=zk://{{ ansible_ssh_host }}:2181/ledgers"

    - name: Start the BookKeeper service
      shell: sudo /opt/apache-pulsar-3.1.1/bin/pulsar-daemon start bookie

    - name: Set ansible_ssh_host variable
      set_fact:
        ansible_ssh_host: "{{ ansible_ssh_host | default(inventory_hostname) }}"

    - name: Edit the Broker configuration file
      become: true
      lineinfile:
        path: /opt/apache-pulsar-3.1.1/conf/broker.conf
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: "^metadataStoreUrl=", line: "metadataStoreUrl=zk:zookeeper1:2181,zookeeper2:2181" }
        - { regexp: "^configurationMetadataStoreUrl=", line: "configurationMetadataStoreUrl=zk:zookeeper1:2181,zookeeper2:2181" }
        - { regexp: "^advertisedAddress=", line: "advertisedAddress={{ ansible_ssh_host }}" }
        - { regexp: "^clusterName=", line: "clusterName=codex-pulsar-cluster" }
        - { regexp: "^brokerServicePort=", line: "brokerServicePort=6650" }
        - { regexp: "^loadBalancerEnabled=", line: "loadBalancerEnabled=false" }
        - { regexp: "^webServicePort=", line: "webServicePort=8081" }

    - name: Start the Broker
      shell: sudo /opt/apache-pulsar-3.1.1/bin/pulsar-daemon start broker

The problem I'm facing is that the broker in the second instance is not being started (while the one in the first instance is). I can produce and consume messages in the first instance as long as the bookies in both instances are running, BUT when I run the broker in the second instance it is killing the bookie process in that instance and the broker is not being started. This is the output I get when I run the broker in the second instance:

$ sudo /opt/apache-pulsar-3.1.1/bin/pulsar-daemon start broker
doing start broker ...
starting broker, logging to /opt/apache-pulsar-3.1.1/logs/pulsar-broker-kafka-02.log
Note: Set immediateFlush to true in conf/log4j2.yaml will guarantee the logging event is flushing to disk immediately. The default behavior is switched off due to performance considerations.
bin/pulsar-daemon: line 148: 11364 Killed                  nohup $pulsar $command "$1" > "$out" 2>&1 < /dev/null

Both bookies and brokers are using the same configs. Am I missing something? Starting the broker works fine in the first instance, why is it only happening in the second instance?

0

There are 0 best solutions below