SeldonDeployment stucks in creating / Pods stucks in pending with Kubeflow installed via manifest

122 Views Asked by At

I followed the example at https://github.com/SeldonIO/seldon-core/tree/master/examples/kubeflow.

1.kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:80
2.kubectl create namespace kubeflow-user-example-com
3.kubectl config set-context $(kubectl config current-context) --namespace=kubeflow-user-example-com
4.s2i build . seldonio/seldon-core-s2i-python37:1.2.3 seldon-sentiment:0.1 --env MODEL_NAME=Transformer --env  API_TYPE=REST --env  SERVICE_TYPE=MODEL --env  PERSISTENCE=0

s2i builds following class:

class Transformer(object):
    def __init__(self):

        # with open('/mnt/lr.model', 'rb') as model_file:
        #    self._lr_model = dill.load(model_file)

    def predict(self, X, feature_names):
        # logging.warning(X)
        # prediction = self._lr_model.predict_proba(X)
        # logging.warning(prediction)


        return X

The build is successfull:

root@ubuntu-16gb-nbg1-3:/usr/src# s2i build . seldonio/seldon-core-s2i-python37:1.2.3 seldon-sentiment:0.1
---> Installing application source...
---> Installing dependencies ...
Looking in links: /whl
Collecting dill==0.3.2 (from -r requirements.txt (line 1))
  WARNING: Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/e2/96/518a8ea959a734b70d2e95fef98bcbfdc7adad1c1e5f5dd9148c835205a5/dill-0.3.2.zip (177kB)
Requirement already satisfied: click==7.1.2 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (7.1.2)
Requirement already satisfied: numpy==1.19.1 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (1.19.1)
Collecting scikit-learn==0.23.2 (from -r requirements.txt (line 4))
  WARNING: Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8MB)
Collecting joblib>=0.11 (from scikit-learn==0.23.2->-r requirements.txt (line 4))
  WARNING: Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/91/d4/3b4c8e5a30604df4c7518c562d4bf0502f2fa29221459226e140cf846512/joblib-1.2.0-py3-none-any.whl (297kB)
Collecting scipy>=0.19.1 (from scikit-learn==0.23.2->-r requirements.txt (line 4))
  WARNING: Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/58/4f/11f34cfc57ead25752a7992b069c36f5d18421958ebd6466ecd849aeaf86/scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1MB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn==0.23.2->-r requirements.txt (line 4))
  WARNING: Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/61/cf/6e354304bcb9c6413c4e02a747b600061c21d38ba51e7e544ac7bc66aecc/threadpoolctl-3.1.0-py3-none-any.whl
Building wheels for collected packages: dill
Building wheel for dill (setup.py): started
Building wheel for dill (setup.py): finished with status 'done'
Created wheel for dill: filename=dill-0.3.2-cp37-none-any.whl size=78913 sha256=50910efb2cba1272a015391f4aff7604a5c4a48c855bfe5f597a43a54e44ab6d
Stored in directory: /root/.cache/pip/wheels/27/4b/a2/34ccdcc2f158742cfe9650675560dea85f78c3f4628f7daad0
Successfully built dill
Installing collected packages: dill, joblib, scipy, threadpoolctl, scikit-learn
Successfully installed dill-0.3.2 joblib-1.2.0 scikit-learn-0.23.2 scipy-1.7.3 threadpoolctl-3.1.0
WARNING: Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Collecting pip-licenses
Downloading https://files.pythonhosted.org/packages/61/f5/3038406547e36376c3a17a6774f61c2e9ccb65777eabf0a20708e4dacd3d/pip_licenses-3.5.4-py3-none-any.whl
Collecting PTable (from pip-licenses)
Downloading https://files.pythonhosted.org/packages/ab/b3/b54301811173ca94119eb474634f120a49cd370f257d1aae5a4abaf12729/PTable-0.9.2.tar.gz
Building wheels for collected packages: PTable
Building wheel for PTable (setup.py): started
Building wheel for PTable (setup.py): finished with status 'done'
Created wheel for PTable: filename=PTable-0.9.2-cp37-none-any.whl size=22906 sha256=7310d6e2974596f5fe2f72c3553b5386faab7eac59448d6c3e86b5d0bd3a775f
Stored in directory: /root/.cache/pip/wheels/22/cc/2e/55980bfe86393df3e9896146a01f6802978d09d7ebcba5ea56
Successfully built PTable
Installing collected packages: PTable, pip-licenses
Successfully installed PTable-0.9.2 pip-licenses-3.5.4
created path: ./licenses/license_info.csv
created path: ./licenses/license.txt
Build completed successfully

After successfull building i proceed with step 5.

5.kubectl create -f seldon-sentiment-test.yaml

The yaml looks like that

                apiVersion: machinelearning.seldon.io/v1alpha2
                kind: SeldonDeployment
                metadata:
                  labels:
                    app: seldon
                  name: seldon-sentiment-test
                  namespace: kubeflow-user-example-com
                spec:
                  annotations:
                    project_name: NLP Pipeline
                    deployment_version: v1
                  name: seldon-sentiment-test
                  predictors:
                  - componentSpecs:
                    - spec:
                        containers:
                        - image: seldon-sentiment:0.1
                          imagePullPolicy: IfNotPresent
                          name: sentiment
                          resources:
                            requests:
                              memory: 1Mi
                        terminationGracePeriodSeconds: 20
                    graph:
                      children: []
                      endpoint:
                        type: REST
                      name: sentiment
                      type: MODEL
                    name: sentiment
                    replicas: 1
                    annotations:
                      predictor_version: v1

Than i checked the status and it stucks in creating with

kubectl get sdep -n kubeflow-user-example-com seldon-sentiment-test -o json | jq .status

Output:

{
  "address": {
    "url": "http://seldon-sentiment-test-sentiment.kubeflow-user-example-com.svc.cluster.local:8000/api/v1.0/predictions"
  },
  "deploymentStatus": {
    "seldon-sentiment-test-sentiment-0-sentiment": {
      "replicas": 1
    }
  },
  "replicas": 1,
  "serviceStatus": {
    "seldon-sentiment-test-sentiment-sentiment": {
      "grpcEndpoint": "seldon-sentiment-test-sentiment-sentiment.kubeflow-user-example-com:9500",
      "httpEndpoint": "seldon-sentiment-test-sentiment-sentiment.kubeflow-user-example-com:9000",
      "svcName": "seldon-sentiment-test-sentiment-sentiment"
    }
  },
  "state": "Creating"
}

The pod stucks also in pending:

kubeflow-user-example-com   seldon-sentiment-test-sentiment-0-sentiment-8946df95-qq688   0/3     Pending            0          3m9s
1

There are 1 best solutions below

0
On

I pushed the image to docker. The pods are started now.

docker run -d -p 5000:5000 --restart=always --name registry registry:2
docker tag seldon-sentiment:0.1  localhost:5000/seldon-sentiment:0.1 
docker push localhost:5000/seldon-sentiment:0.1