I have the following code (fragment):
# 6. index documents to Elasticsearch
kb = ElasticsearchStore(
es_connection=es,
index_name=index_name,
embedding=emb_func,
strategy=ElasticsearchStore.ApproxRetrievalStrategy(),
distance_strategy="DOT_PRODUCT"
)
print("create vector store in Elasticsearch")
try:
_ = kb.add_texts(
texts=documents['page_content'].tolist(),
metadatas=[{'source': source} for source in documents['source']],
index_name=index_name,
ids=[str(i) for i in range(len(documents))] # unique for each doc
)
except Exception as e:
print("Failed to index documents:", e)
This is run in the container. It runs fine when run as root, but if I run it as not root user it fails like this:
Error adding texts: 88 document(s) failed to index.
First error reason: [1:8471] failed to parse: The [dot_product] similarity can only be used with unit-length vectors. Preview of invalid vector: [-0.18178311, -0.02720352, 0.04890755, -0.10870888, -0.10545816, ...]
Here is my Dockerfile:
# Dockerfile
## 1: Base image
FROM registry.access.redhat.com/ubi8/ubi-minimal
USER root
## 2. Latest security updates && OS packages
RUN microdnf install -y python3.11
RUN microdnf install -y python3.11-pip
RUN microdnf clean all
## 4. Initialize application sources
WORKDIR /app
## 5. Application source
## Copy the application source and build artifacts from the builder image to this one
COPY --chown=1001:0 requirements.txt ingestion.py ./
# added to make cache writable
RUN chmod -R g+w /app
ENV TRANSFORMERS_CACHE = '/app/cache/'
# 6. Install the dependencies
RUN pip3 install -U "pip>=19.3.1" && \
pip3 install --no-cache-dir -r requirements.txt
USER 1001
## Run script uses standard ways to run the application
CMD python ingestion.py
Any ideas why it fails when running as non-root?