One pattern I recently started using is writing class methods to return instances. In particular, I've been using it for dataclasses (with or without the @dataclass decorator). But it has also led me to defining vague __init__ methods as follows:
def __init__(self, **kwargs):
for k,v in kwargs:
setattr(self, k, v)
As a more fleshed out example, let's say I'm writing a metadata class that holds the details of a standardized test question. I expect all instances of the class to have the same attributes, so I use __slots__, and I have functions defined in another module to read various parts of the question from an html file.
class Metadata:
__slots__ = question_id, testid, itemnum, subject, system, topic, images, tables, links
@classmethod
def from_html(cls, html: BeautifulSoup):
# next two lines will create the dict metadata with keys for
# everything in __slots__
metadata = MyModule.parse_details(html)
metadata['images'] = MyModule.process_images(html)
metadata['tables'] = MyModule.read_tables(html)
metadata['links'] = MyModule.pull_links(html)
return cls(**metadata)
@classmethod
def from_file(filepath: str):
with open(filepath, 'r') as f:
metadata = json.load(f)
return cls(**metadata)
def __init__(self, **kwargs):
for k,v in kwargs:
setattr(self, k, v)
So to me this seems like the best way to accomplish the task, which is create a dataclass to hold metadata which can be initialized from multiple different sources (files, dicts, other dataclasses I've defined, etc). The downside is that __init__ is very opaque. Also it feels weird to use **kwargs when the __init__ has to take the same keyword arguments every time for the class to work as I intend (that's partly why I used __slots__ too: to make the definition of the dataclass more clear).
Also the documentation of the attrs package for Python says this:
For similar reasons, we strongly discourage from patterns like:
pt = Point(**row.attributes)which couples your classes to the database data model. Try to design your classes in a way that is clean and convenient to use – not based on your database format. The database format can change anytime and you’re stuck with a bad class design that is hard to change. Embrace functions and classmethods as a filter between reality and what’s best for you to work with.
That's near the top of the page of the link I included, and I really don't understand what it's trying to say, hence my question.
So would you implement my code any differently, and what is the attrs documentation trying to say?
Suppose you have the following JSON:
and you initialize your class doing
The following code
will work as expected. However, suppose you need to rename e the
userIdcolumn touser_id(the "The database format can change anytime" part of the documentation). Now you need to rename ALL the occurences ofpost.userIdtopost.user_idin all of your code. It's fine if your codebase consists of only one Python file, but what if it contains a lot of files and dependencies?Now suppose you initialize your class doing
Now if
postIdis renamed topost_id, you only need to change ONE place in your whole codebase: when you read from the JSON file.Other situations include
aVeryLongFieldNameThatYouDoesNotWantYoInsertIntoYourPythonCodesnake_caseinstead ofcamelCasemypy, which does not work very well withsetattr