Lazy import from a module (a.k.a. lazy evaluation of variable)

276 Views Asked by At

Lazy imports in Python have been long discussed and some proposals (for example the PEP609 - Lazy Imports) have been made to make it a built-in (optional) feature in the future.

I am developing a CLI package, so startup time is very important, and I would like to speed it up by lazy loading some of the modules I am using.

What I have so far
By modifying the function to implement lazy imports from Python's importlib documentation, I built the following LazyImport class:

import importlib.util
import sys
from types import ModuleType

class LazyImport:
    def __init__(self):
        pass

    def __new__(
            cls,
            name: str,
    ) -> type(ModuleType):
        try:
            return sys.modules[name]
        except KeyError:
            spec = importlib.util.find_spec(name)
            if spec:
                loader = importlib.util.LazyLoader(spec.loader)
                spec.loader = loader
                module = importlib.util.module_from_spec(spec)
                sys.modules[name] = module
                loader.exec_module(module)
                return module
            else:
                raise ModuleNotFoundError(f"No module named '{name}'") from None

Note: This is the best way I could think of to turn the function to a class, but I'm welcoming feedback on this too if you have a better way.

This works just fine for top-level module imports:

Instead of importing (for example) xarray as

import xarray as xr

I would run

xr = LazyImport('xarray')

and everything works as expected, with the difference that the xarray module is added to sys.modules but it is not loaded in memory yet (the module scripts are not run yet).
The module gets loaded into memory (so the module scripts run) only when the variable xr is first referenced (for example by calling a method/submodule or simply by referencing it as it is). So, for the example above, any of these statements would load the xarray module into memory:

  • xr.DataArray([1,2,3])
  • print(xr)
  • xr

What I want
Now I would like to be able to achieve the same result, but when I load a Class, function or variable from a module.
So (for example) instead of importing the xarray.DataArray Class through:

from xarray import DataArray as Da

I want to have something like:

Da = LazyImport('DataArray', _from='xarray')

so that the xarray module is added to sys.modules but not loaded in memory yet, and will get loaded only when I first reference the Da variable. The Da variable will reference the DataArray Class of the xarray module.

What I tried
I tried some options such as

xr = LazyImport('xarray')
Da = getattr(xr, 'DataArray')

or by modifying the LazyImport class, but every time I reference xr the xarray module gets loaded in memory. I could not manage to create a Da variable without loading xarray in memory.

Referred to the example, what I need is basically a lazy evaluation of the Da variable that evaluates (to the DataArray Class of the xarray module) only when I first reference Da (and therefore runs the module scripts only at this point).

Also, I don't want any method to be called on the variable Da to be evaluated (something like Da.load() for example), but I want the variable to be directly evaluated when first referenced.

I looked at some external libraries (such as lazy_loader), but I haven't found one that allows lazy importing of Classes and variables from external modules (modules other than the one you are developing).

Does anyone know a solution for the implementation of lazy imports from a module?

2

There are 2 best solutions below

0
On

This answer may not be very satisfying, but I think I've come as close as possible to lazy loading of arbitrary objects as is possible.

import importlib.util
import sys
from types import ModuleType
from inspect import getattr_static
from typing import Any
from collections.abc import Callable

def get_real_object(lazy_attribute: 'LazyAttribute') -> Any:
    obj = getattr_static(lazy_attribute, 'obj')
    attr = getattr_static(lazy_attribute, 'attr')
    return getattr(obj, attr)

class LazyAttribute:
    def __init__(self, obj: ModuleType, attr: str) -> None:
        self.obj = obj
        self.attr = attr
    def __getattribute__(self, attr: str) -> Any:
        return getattr(get_real_object(self), attr)

def getmethod(name: str) -> Callable:
    def method(self, *args: Any, **kwargs: Any) -> Any:
        real_object = get_real_object(self)
        return getattr(type(real_object), name)(real_object, *args, **kwargs)
    method.__name__ = name
    return method

# proxied magic methods
# Python does not use its dynamic lookup mechanisms for those, so they will
# really need to be set on the LazyAttribute class
# I just did a few, you can add what you need. You can add the whole object model if you like (except __getattribute__ and __init__).
for name in ['__call__', '__lt__', '__eq__', '__repr__', '__str__', '__gt__']:
    setattr(LazyAttribute, name, getmethod(name))

def lazy_import(name: str) -> ModuleType:
    try:
        return sys.modules[name]
    except KeyError:
        spec = importlib.util.find_spec(name)
        if spec:
            loader = importlib.util.LazyLoader(spec.loader)
            spec.loader = loader
            module = importlib.util.module_from_spec(spec)
            sys.modules[name] = module
            loader.exec_module(module)
            return module
        else:
            raise ModuleNotFoundError(f"No module named '{name}'") from None

def lazy_from_import(module: str, name: str) -> LazyAttribute:
    return LazyAttribute(lazy_import(module), name)

########
# Example:

# in testmod.py:
#   one = 1
#   def two():
#       print('hello')
#   class Three:
#       pass
one = lazy_from_import('testmod', 'one')
two = lazy_from_import('testmod', 'two')
Three = lazy_from_import('testmod', 'Three')

print(one)
two()
print(Three())

# Limitations: they're not the real deal, just proxies
try:
    print(one > one)
except TypeError as e:
    print(e)
try:
    print(5 * one)
except TypeError as e:
    print(e)

import testmod # get access to the real objects

assert one is not testmod.one
assert Three is not testmod.Three
assert type(one) is not type(testmod.one)

Sadly, I think this is as close as you're gonna get to what you want to achieve. We can't use the guts-scooping approach LazyLoader uses for non-mutable types. I have another idea I want to pursue, but I'm not confident it's going to lead anywhere.


ADDITION

So, turns out you can go further, by writing functions like this:

import gc

def replace_in(ref, obj, by):
    if isinstance(ref, list):
        for i, item in enumerate(ref):
            if item is obj:
                ref[i] = by
    elif isinstance(ref, dict):
        for key, val in ref.items():
            if val is obj:
                ref[key] = by
        if obj in ref:
            ref[by] = ref.pop(obj)
    elif isinstance(ref, tuple):
        raise TypeError('tuples are immutable, consider storing lazy values in a list instead -- or make sure to evaluate the value before constructing this tuple')
    elif hasattr(ref, '__dict__'):
        ref = ref.__dict__
        for key, val in ref.items():
            if val is obj:
                ref[key] = by
    else:
        raise TypeError(f'cannot replace {type(ref)} object yet')

def replace(obj, by):
    for ref in gc.get_referrers(obj):
        replace_in(ref, obj, by)

This gives you a function replace which can be used to replace references. Caveats:

  1. This uses a function meant for debugging. From the documentation:

    Warning: Care must be taken when using objects returned by get_referrers() because some of them could still be under construction and hence in a temporarily invalid state. Avoid using get_referrers() for any purpose other than debugging.

  2. It only supports lists, dicts and objects with a __dict__ attribute for now (which includes modules and classes).
  3. Values directly stored in tuples and other immutable classes can't be replaced, for obvious reasons.
  4. gc.get_referrers doesn't find local variables.
  5. When replacing an object used as a key in a dictionary, it changes the ordering. This could be fixed at the cost of having worse runtime performance (moving all the items after the replaced key to the back).

If these caveats are acceptable to you, you can change LazyAttribute.__getattribute__ to:

def __getattribute__(self, attr):
    real = get_real_object(self)
    replace(self, real)
    return getattr(real, attr)

... and lazily loaded objects will be replaced wherever they can be replaced. And this actually replaces them, not just proxy them or do the guts-scooping LazyLoader does.

1
On

You could make your lazy object to act as a proxy for . (getattr) and () (call) operations:

class Lazy:
    def __init__(self, mod, name):
        self.mod = mod
        self.name = name

    def __getattr__(self, item):
        return getattr(self._target(), item)

    def __call__(self, *args, **kwargs):
        return self._target()(*args, **kwargs)

    def _target(self):
        if self.mod not in sys.modules:
            __import__(self.mod)
        return getattr(sys.modules[self.mod], self.name)


r = Lazy('random', 'randint')
print(r(1, 45))

C = Lazy('collections', 'Counter')
print(C([1, 2, 1]))

But this is quite fragile - what if someday you decide to lazily import a constant? Or pass the "imported" variable around? Your initial approach is much better, just stick with xr.DataArray.