Google Cloud Talent Solution: how to use page_token

1.6k Views Asked by At

I'm trying to use v4beta1 of GCTS - search_jobs()

The docs: https://cloud.google.com/talent-solution/job-search/docs/reference/rest/v4beta1/projects.jobs/search

There are references to the parameter pageToken but in \google\cloud\talent_v4beta1\gapic\job_service_client.py there is no such parameter in the function definition:

def search_jobs(
    self,
    parent,
    request_metadata,
    search_mode=None,
    job_query=None,
    enable_broadening=None,
    require_precise_result_size=None,
    histogram_queries=None,
    job_view=None,
    offset=None,
    page_size=None,
    order_by=None,
    diversification_level=None,
    custom_ranking_info=None,
    disable_keyword_match=None,
    retry=google.api_core.gapic_v1.method.DEFAULT,
    timeout=google.api_core.gapic_v1.method.DEFAULT,
    metadata=None,
):

In the comments page_token is mentioned - eg for the Offset parameter.

How do I specify the page token for job searches?

I've specified require_precise_result_size=False but the return value doesn't contain a SearchJobsResponse.estimated_total_size. Is this a clue that search_jobs() isn't being set to the desired "mode"?

1

There are 1 best solutions below

17
On

I believe the pageToken is abstracted away for you by the python client library. If you go down to the end of the search_jobs method in the source you will see it builds an iterator that is aware of the pageToken and nextPageToken fields:

        iterator = google.api_core.page_iterator.GRPCIterator(
        client=None,
        method=functools.partial(
            self._inner_api_calls["search_jobs"],
            retry=retry,
            timeout=timeout,
            metadata=metadata,
        ),
        request=request,
        items_field="matching_jobs",
        request_token_field="page_token",
        response_token_field="next_page_token",
    )
    return iterator

So all you should need to do is the following - copied from the docs at https://googleapis.github.io/google-cloud-python/latest/talent/gapic/v4beta1/api.html:

from google.cloud import talent_v4beta1

client = talent_v4beta1.JobServiceClient()
parent = client.tenant_path('[PROJECT]', '[TENANT]')

# TODO: Initialize `request_metadata`:
request_metadata = {}

# Iterate over all results
for element in client.search_jobs(parent, request_metadata):
    # process element
    pass


# Alternatively:
# Iterate over results one page at a time
for page in client.search_jobs(parent, request_metadata).pages:
    for element in page:
        # process element
        pass

Default page size is 10 apparently, you can modify this with the pageSize parameter. Page iterator documentation can be found here:

Doco: https://googleapis.github.io/google-cloud-python/latest/core/page_iterator.html

Source: https://googleapis.github.io/google-cloud-python/latest/_modules/google/api_core/page_iterator.html#GRPCIterator

Probably the simplest way to deal with this is consume all results using

allResults = list(results_iterator)

If you have massive amounts of data and don't want to page through in one go I would do the following. The ".pages" is just returning a generator that you can work with as usual.

resultsIterator = client.search_jobs(parent, request_metadata)
pages = resultsIterator.pages
currentPageIter = next(pages)
#do work with page
currentItem = next(currentPageIter)

currentPageIter = next(pages)
# etc...

You would need to catch StopIteration error for when you run out of items or pages:

https://anandology.com/python-practice-book/iterators.html

This is why:

def _page_iter(self, increment):
    """Generator of pages of API responses.

    Args:
        increment (bool): Flag indicating if the total number of results
            should be incremented on each page. This is useful since a page
            iterator will want to increment by results per page while an
            items iterator will want to increment per item.

    Yields:
        Page: each page of items from the API.
    """
    page = self._next_page()
    while page is not None:
        self.page_number += 1
        if increment:
            self.num_results += page.num_items
        yield page
        page = self._next_page()

See how after the yield it calls _next_page? This will check for more pages and then perform another request for you if they exist.

def _next_page(self):
    """Get the next page in the iterator.

    Returns:
        Page: The next page in the iterator or :data:`None` if
            there are no pages left.
    """
    if not self._has_next_page():
        return None

    if self.next_page_token is not None:
        setattr(self._request, self._request_token_field, self.next_page_token)

    response = self._method(self._request)

    self.next_page_token = getattr(response, self._response_token_field)
    items = getattr(response, self._items_field)
    page = Page(self, items, self.item_to_value)

    return page

If you are wanting a sessionless option, you can use offset + page size and pass the current offset to the user on each ajax request:

offset (int) –

Optional. An integer that specifies the current offset (that is, starting result location, amongst the jobs deemed by the API as relevant) in search results. This field is only considered if page_token is unset.

For example, 0 means to return results starting from the first matching job, and 10 means to return from the 11th job. This can be used for pagination, (for example, pageSize = 10 and offset = 10 means to return from the second page).