Why is PynamoDB Not projecting attributes from the main table in my query results?

537 Views Asked by At

I've got a PynamoDB table and index mapping defined as follows:

class Id2Index(LocalSecondaryIndex):
    class Meta:
        projection = AllProjection()

    kind = UnicodeAttribute(hash_key=True)
    id2 = UnicodeAttribute(range_key=True)

class Resource(Model):

    kind = UnicodeAttribute(hash_key=True)
    id = UnicodeAttribute(range_key=True)
    id2 = UnicodeAttribute()

    serial_number = NumberAttribute(attr_name='serialNumber')

    created = JavaDateTimeAttribute()
    updated = JavaDateTimeAttribute()
    heartbeat = JavaDateTimeAttribute()

    id2_index = Id2Index()

When I run the following query:

r = Resource.id2_index.query('job', Job.id2 > self.last_job_key))

The records I get back have Null values for the attributes created, updated and heartbeat. From all I read, and from code that ChatGPT gave me, it seems that specifying projection = AllProjection() should be all I need to do to cause the records that I get back from the query to have the data for these attributes filled in from the main table? In other words, this test fails even though the records I get back all have a value for the created attribute in the source table.

assert(next(r).created != None)

What am I missing. Isn't this supposed to just work?

If this isn't supposed to work, then how do I most efficiently go about getting the result that I'm looking for?

UPDATE:

It still isn't clear to me if I need to have copies of all of the attributes in the index to get the behavior I want. I've found some other threads on the net that suggest that this is true, but I haven't found a definitive word in the AWS docs. I asked Chat GPT to confirm that I don't, and it seems to confirm this:

You don't need to include the created and updated attributes in the secondary index table to retrieve their values in a query.

When you specify projection = AllProjection() in the index definition, it tells DynamoDB to include all attributes from the base table in the index. Therefore, when you perform a query on the index, you'll receive all of the attributes from the base table, including created and updated.

UPDATE 2:

I'll add some additional info just in case it will be useful. It will explain where the Job. references in my code come from. I have three subclasses of the Resource class. We are persisting these three specific types of resources into the same table, differentiated by the kind attribute, which is the hash key of the table. So the Job class looks something like this:

class Job(Resource):

    _kind = 'job'

    job_id = NumberAttribute(attr_name="jobId")
    thread_id = NumberAttribute(attr_name="threadId")
    ....

All of the attributes involved in this question are defined in the Resource class, so I can't see how the subclassing of that class should make this problem any more interesting.

We don't mix queries across the three types, so all of our queries specify a single constant hash key ('job', 'thread', or 'instance').

1

There are 1 best solutions below

2
Doug Naphas On

Using AllProjection should cause created, updated, and heartbeat to carry over from the base table to the Local Secondary Index (LSI). If you don't use AllProjection, you have to use IncludeProjection and list the desired attributes.

I believe I am accomplishing what you want to accomplish with the code below.

Carrying over attributes from base table to LSI

Attributes are copied over from the base table to the LSI if they are key attributes, if they are non-key attributes that are listed specifically using the INCLUDE projection option, or if you used the ALL projection option.

The AWS docs say that here:

When you create a secondary index, you need to specify the attributes that will be projected into the index. DynamoDB provides three different options for this:

KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.

INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.

ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

Example code

The main difference between mine and yours is that I am using Resource instead of Job in my query statement.

Your query statement is:

r = Resource.id2_index.query('job', Job.id2 > self.last_job_key))

I am not sure if the extra ) is there because that line is part of a larger statement, or if it was a typo.

My query statement is:

resources = Resource.id2_index.query("wood", Resource.id2 > "a")

I used wood and id2 > "a" just as examples of an arbitrary resource kind and query condition intended to ensure my Item would be returned.

If there is a reason for using Job, please clarify it, and provide the source and explain the purpose of the Job class, and I will attempt to adapt my answer accordingly.

My code is as follows.

# create_and_query_table.py
from pynamodb.models import Model
from pynamodb.indexes import LocalSecondaryIndex, AllProjection
from pynamodb.attributes import UnicodeAttribute, NumberAttribute, UTCDateTimeAttribute
from datetime import datetime


class Id2Index(LocalSecondaryIndex):
    class Meta:
        projection = AllProjection()

    kind = UnicodeAttribute(hash_key=True)
    id2 = UnicodeAttribute(range_key=True)


class Resource(Model):
    class Meta:
        table_name = "resources"

    kind = UnicodeAttribute(hash_key=True)
    id = UnicodeAttribute(range_key=True)
    id2 = UnicodeAttribute()

    serial_number = NumberAttribute(attr_name="serialNumber")

    created = UTCDateTimeAttribute()
    updated = UTCDateTimeAttribute()
    heartbeat = UTCDateTimeAttribute()

    id2_index = Id2Index()


if __name__ == "__main__":
    Resource.create_table(wait=True, billing_mode="PAY_PER_REQUEST")
    resource_item = Resource(
        "wood",
        "w2",
        serial_number=1,
        created=datetime.now(),
        updated=datetime.now(),
        heartbeat=datetime.now(),
        id2="pine1",
    )
    resource_item.save()
    resources = Resource.id2_index.query("wood", Resource.id2 > "a")
    for r in resources:
        assert r.created != None
        print(
            "Resource {0}\ncreated {1}\nupdated {2}\nheartbeat {3}".format(
                r, r.created, r.updated, r.heartbeat
            )
        )

# Makefile
VENV := venv
MAIN_FILE := create_and_query_table.py

all: help

help:
    @echo "try: help, venv, install, test, run, clean"

$(VENV)/bin/activate:
    python3 -m venv $(VENV)

install: requirements.txt venv
    . $(VENV)/bin activate; \
    ./$(VENV)/bin/pip install -r requirements.txt

venv: $(VENV)/bin/activate

run: venv
    ./$(VENV)/bin/python3 $(MAIN_FILE)

clean:
    rm -rf $(VENV)
    find . -type f -name '*.pyc' -delete

.PHONY: all help install venv test run clean
# requirements.txt
pynamodb
datetime

With the above files in place, I can run the following, which I believe shows the result you are seeking.

$ export AWS_ACCESS_KEY_ID=<access key id>
$ export AWS_SECRET_ACCESS_KEY=<aws secret access key>
$ export AWS_SESSION_TOKEn=<session token>
$ ls
Makefile create_and_query_table.py requirements.txt
$ make install
...<install output>...
$ make run
./venv/bin/python3 create_and_query_table.py
Resource resources<wood, w2>
created 2023-05-13 16:30:45.471429+00:00
updated 2023-05-13 16:30:45.471434+00:00
heartbeat 2023-05-13 16:30:45.471435+00:00

I'm using PynamoDB version 5.5.0.