I ingested all docs and created a collection / embeddings using Chroma. I have a local directory db. Within db there is chroma-collections.parquet and chroma-embeddings.parquet. These are not empty. Chroma-collections.parquet when opened returns a collection name, uuid, and null metadata.
When I load it up later using langchain, nothing is here.
from langchain.vectorstores import Chroma
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
CHROMA_SETTINGS = Settings(
chroma_db_impl='duckdb+parquet',
persist_directory='db',
anonymized_telemetry=False
)
db = Chroma(persist_directory='db', embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
db.get()
returns {'ids': [], 'embeddings': None, 'documents': [], 'metadatas': []}
I've tried lots of other alternate approaches online. E.g.
import chromadb
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
persist_directory='./db'))
coll = client.get_or_create_collection("langchain", embedding_function=embeddings)
coll.count() returns 0
I'm expecting all the docs and embeddings to be available. What am I missing?
I got the problem too and found it is beacause my program ran chromadb in jupyter lab (or jupyter notebook which is the same).
In chromadb official git repo example, it says:
So, If your program is also ran in jupyter env,the best way is to call client.persist() everytime when you need to save your modification to chromadb's local persistence. The example code is as follow: