[Architecture Refactor][1] Base architecture implementation and user package refactor#46
[Architecture Refactor][1] Base architecture implementation and user package refactor#46stefanotroncaro wants to merge 23 commits into
Conversation
5098a7b to
64a7e7d
Compare
fa06648 to
7db5a7c
Compare
d81a866 to
eee98e5
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements a significant architectural refactor to align the codebase with hexagonal architecture and Domain-Driven Design (DDD) principles. The changes introduce a clear separation between domain logic, infrastructure concerns, and API layers, replacing the previous service-oriented approach with a repository pattern and entity-based design.
Changes:
- Introduced a new domain layer with
Entitybase class,Repositorycontracts, andCriteriaobjects for flexible querying - Implemented SQLAlchemy-based infrastructure layer with base models, repositories, and criteria translators
- Refactored the users package to follow the new architecture with domain entities, repositories, use cases, and API dependencies
Reviewed changes
Copilot reviewed 46 out of 46 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| app/common/domain/entities/entity.py | Defines base Entity class with validation for required DTOs (CreateDto, UpdateDto) |
| app/common/domain/repository.py | Defines abstract Repository interface with CRUD and query operations |
| app/common/domain/criteria.py | Base Criteria class for filtering entities |
| app/common/infrastructure/base_sqlalchemy_repository.py | SQLAlchemy implementation of Repository pattern with criteria support |
| app/common/infrastructure/sqlalchemy_criteria_translator.py | Registry-based translator system for converting Criteria to SQL statements |
| app/users/domain/user.py | User entity with CreateDto and UpdateDto nested classes |
| app/users/infrastructure/sqlalchemy_user_repository.py | SQLAlchemy-specific User repository implementation |
| app/users/use_cases/create_user_use_case.py | Refactored to use repository pattern instead of service layer |
| app/users/api/endpoints.py | Updated to use dependency injection for repositories |
| app/common/exceptions/repository_errors.py | New exception hierarchy for repository operations |
| # TODO: raise a better message | ||
| msg = f"Couldn't create entity instance.\n{e}" | ||
| raise repository_errors.RepositoryError(msg) |
There was a problem hiding this comment.
The error message 'Couldn't create entity instance' provides insufficient context for debugging. Include the entity class name and specific validation failures to help developers identify which entity failed and why.
| # TODO: raise a better message | |
| msg = f"Couldn't create entity instance.\n{e}" | |
| raise repository_errors.RepositoryError(msg) | |
| error_details = "; ".join( | |
| f"{err.get('loc')}: {err.get('msg')}" for err in e.errors() | |
| ) | |
| entity_name = getattr(self.entity, "__name__", str(self.entity)) | |
| msg = ( | |
| "Couldn't create entity instance for " | |
| f"{entity_name}. Validation errors: {error_details}" | |
| ) | |
| raise repository_errors.RepositoryError(msg) from e |
There was a problem hiding this comment.
Such parsing is not required. Converting the pydantic.ValidationError to a string already exposes all the necessary information for debugging:
>>> import pydantic
>>> class A(pydantic.BaseModel):
... a: str
... b: int
... c: list[str]
... d: list[int
... ]
...
>>> try:
... A(a=3, b="hello", c="world", d=(3,4,5,6))
... except pydantic.ValidationError as e:
... print(str(e))
...
3 validation errors for A
a
Input should be a valid string [type=string_type, input_value=3, input_type=int]
For further information visit https://errors.pydantic.dev/2.12/v/string_type
b
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='hello', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/int_parsing
c
Input should be a valid list [type=list_type, input_value='world', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/list_type
Will remove the TODO comment.
|
|
||
|
|
||
| class UserCriteria(Criteria[User]): | ||
| pass |
There was a problem hiding this comment.
The empty UserCriteria base class appears unused as UserEmailFilter directly extends UserCriteria. Consider removing this intermediate class unless it serves a specific purpose for future criteria types, or document its intended use.
| pass | |
| """Base criteria type for queries operating on :class:`User` entities.""" |
There was a problem hiding this comment.
This is debatable. Obviously, it's meant to group all UserCriteria.
Currently it seems to not be useful because there is only one criteria defined, but the "logic step" is there, and in my opinion it's better that the abstraction is explicit rather than implicit. In the future it can be used to define common behavior for all user criteria, and for other purposes like aggregation.
Furthermore, it's existence does not hurt, so I am inclined to keep it. But I would certainly not add a docstring to it.
| CreateDto: t.ClassVar[type[pydantic.BaseModel]] | ||
| UpdateDto: t.ClassVar[type[pydantic.BaseModel]] |
There was a problem hiding this comment.
The CreateDto and UpdateDto ClassVars lack docstrings explaining their purpose and requirements. Add documentation to clarify that these must be defined in subclasses as nested classes for entity creation and update operations.
There was a problem hiding this comment.
This is already explained in the Entity docstring. No need for redundancy here.
| ) | ||
|
|
||
| from fastapi import Depends | ||
| import typing as t |
There was a problem hiding this comment.
Is there any benefit on importing it like this? I know the import is clearer but I think the code is uglier 😄 . This is not a fix, just a preference.
There was a problem hiding this comment.
First of all, this is code style preference, yes, so it's very difficult (if not impossible) to have programmers agree on these kinds of discussions 😁.
Anyways, in short, for imports I follow the following pattern for (IMO) maximum readability:
- for external builtin and external libraries imports, prefer importing modules instead of symbols
- if importing from another internal submodule, prefer importing the submodule
- create a facade for each submodule in the
__init__.py, similar to typescripts barrel files, that both exposes a clean interface, hides implementation details of the submodule that should not be imported outside of it, and allows other modules to have cleaner imports - if importing from the same submodule, import symbols from their specific file
Reasoning for these rules:
- the "import modules preference" ensures the source of each symbol does not get obfuscated in large files, as it is now explicit, so the code gains clarity
- prevents symbol name collisions
- importing modules over symbols has no performance hit, as the python interpreter must parse a module entirely before allowing symbols to be imported from it
- the "one import per line" works better with the restrictive settings of the linter (that make an import take 3 lines if it exceeds 80 characters, because it splits it with parenthesis, creating a lot of visual noise that hurt readability and inflate the line count)
- also, makes the imports of the file have a consistent "shape", making it faster to read
source -- symbolon every line and get a clear picture at a glance. - the "do import symbols if they belong to the same submodule" has to do with preventing circular imports, the explanation for this can be long, I can elaborate if you wish
Regarding the import typing as t, I generally do it like that to have a short alias for a module that is (or should) be used a lot. It's the same reasoning why alembic imports sqlalchemy as sa. It keeps the clarity of the above principles, helps with keeping line length short (again, important so the linter does not introduce weird artifacts that hurt readability), and makes typing the code easier. I generally always import it like that even if I require only one symbol from the typing module, in case the file grows later and more are needed, which creates better git diffs (as the imports are not shuffled around so much).
Anyways, this is a summary of the benefits I considered when I came up with these. For me, in conjunction they make the code easier to work with. That said, If import typing as t annoys you I can remove it.
What do you think?
| @@ -0,0 +1,5 @@ | |||
| __all__ = ("Criteria",) | |||
There was a problem hiding this comment.
Does it make sense tu put all here if we only have 1 class to import?
There was a problem hiding this comment.
It's good python practice to have an __all__ in every file regardless of how many symbols it exports. Besides, the amount of symbols a file exports can change over the project lifetime, so it's easier to already have it there.
Also, having an __all__ defined makes the __init__.pys require less maintenance.
If a change is made in this file, the __all__ is updated, and the __init__.py (which imports with from .criteria import *) already gets the change. So, changes are contained to one file. Otherwise, to keep the __init__.py updated, you would need to make a change in this file and then go and modify the explicit imports in the __init__.py.
So, IMO, maintaining a simple __all__ on each file is more maintainable, and adheres to python good practices.
| @abc.abstractmethod | ||
| def exists(self, *criteria: Criteria[TEntity]) -> bool: | ||
| raise NotImplementedError() |
There was a problem hiding this comment.
Does it make sense if we already have find?
There was a problem hiding this comment.
Yes, it's less expensive for the db. Look at the implementation, the query is optimized for that
| @abc.abstractmethod | ||
| def first(self, *criteria: Criteria[TEntity]) -> TEntity | None: | ||
| raise NotImplementedError() |
There was a problem hiding this comment.
Does it make sense if we already have where?
There was a problem hiding this comment.
Yes, this returns the first match or None (if not match is found). This has better performance than where for cases where at most one result is expected. Also, as this returns the entity or None, prevents annoying handling like accessing the first element of a list when there's only one result (getting an user by email, for example).
|
|
||
| @router.post("", status_code=fastapi.status.HTTP_201_CREATED) | ||
| def create_user( | ||
| user_repository: UserRepositoryDependency, |
There was a problem hiding this comment.
I know dependencies are a great tool, but doesn't this tie the controllers to the internal logic of the use case? What is tomorrow for creating a user the use case creates it in an external platform like Cognito?
There was a problem hiding this comment.
No, this does not tie the controllers to the internal logic of the use case. The controller has the responsibility of making the use case accesible "from the outside", it receives the dependencies the use case needs and passes them along with the data.
Because the dependency (the actual implementation of the user repository) is being injected on the controller itself by fastapi, and the controller only know about the contract: the UserRepository contract, which is part of the domain logic, not of the infrastructure layer. This is as detached from implementation details as we can go, as the dependency needs to be injected somewhere.
I don't know Cognito, but assuming it acts as an external persistence layer, I understand we could have a few paths.
If users are stored on an external platform instead of our db, then the case is very simple, an UserRepository is created that provides access to a persistence layer in such an external platform, and that implementation is injected on the controller. The use case logic does not need to change at all to accommodate for this.
If at any point two or more UserRepository implementations need to coexist, then the solution depends on the specifics of that case. If both implementations are not used at the same time or coordinated (for each use case it's one or the other), a UserRepositoryFactory would be a good solution.
If we need to sync our db with the external db, that is, both repository implementations need to be used at the same time, then I would create a "mixed" or "meta" repository that encapsulates the logic of that coordination (it would use both the sqlalchemy repository and the cognito repository internally). Then it can be injected into the use case, and the use case still does not care about what repository has been injected into it, it only cares about business logic.
OTOH, if cognito not only acts as a persistence layer, but the platform handles specific logic that is attached to it, then we could create it as a client. But the principle is still the same, the client would be injected into the use case to separate implementation details from domain logic.
I don't know if this answers your question. I hope it does, but if you have other scenarios in mind, please share so we can think about them and how the proposed architecture would accommodate for them.
2e7c776 to
35da134
Compare
Architecture Refactor - PR 1
This PR refactors the boilerplate architecture to better conform with hexagonal architecture and DDD standards.
Changes
Domain layer
Entities
The previous
InDBschemas are replaced byEntity, which represent the core of the business logic. To prevent schema proliferation, contracts for creating and updating entities are defined in the entities itself.Entity definition errors are raised at compile time, and effort has been made for them to provide examples on correct definitions, for ease of troubleshooting and getting familiar with the system.
Repositories
Repositories represent the persistence layer. They are defined as base classes with a common interface, where the methods have been inspire by Laravel methods, the try to present a common "feel" between both ecosystems and make developer migrations easier.
The infrastructure layer is responsible for providing the actual implementation. The use case layer can operate on this contract without being aware of the infrastructure.
Criteria
Complex filters and relationships are represented by
Criteriadtos that can be passed to the repository methods to customize the output. This allows the repositories to respect the open/closed principle, and prevent method proliferation, which has been an issue in past projects.Again, the infrastructure layer is responsible for implementing the contract.
Infrastructure layer
A base sqlalchemy model, implementation for sqlalchemy repositories, and the architecture for sqlalchemy criteria translators have been provided.
Creating repositories and translators for new entities requires very little work, which is a big advantage over the previous implementation.
Controller layer
FastAPI dependencies are used to "glue" the systems together.
Still TO-DO