Skip to content

Submission: Dynamox Developer Challenge - Luiz Henrique#203

Open
LuizHenrique31415 wants to merge 6 commits into
dynamox-s-a:mainfrom
LuizHenrique31415:feat/postgresql-persistence
Open

Submission: Dynamox Developer Challenge - Luiz Henrique#203
LuizHenrique31415 wants to merge 6 commits into
dynamox-s-a:mainfrom
LuizHenrique31415:feat/postgresql-persistence

Conversation

@LuizHenrique31415
Copy link
Copy Markdown

Descrição do Projeto

Para o gerenciamento deste desafio, utilizei um quadro Kanban para organizar as etapas de desenvolvimento, desde a modelagem do banco de dados até os testes finais de latência.

(Esse desafio utilizei IA como ferramenta.)

Link do Trello: https://trello.com/invite/b/6a038b04c09ab433f1b89271/ATTI0c990eb758f81d852c6efc9b13a28dc29B4F8A04/challenger

Implementação de uma API de processamento de sinais utilizando FastAPI, Pandas para cálculos matemáticos e PostgreSQL para persistência de dados. O projeto foi estruturado com foco em performance e facilidade de deploy via Docker. (Esse desafio utilizei IA como ferrramenta.)

Principais Implementações
Persistência em Banco de Dados: Utilização de SQLAlchemy para gerenciar o armazenamento das séries temporais no PostgreSQL.

Processamento de Dados: Integração com a biblioteca Pandas para cálculo eficiente de métricas (média, máximo, mínimo).

Observabilidade & SRE: Implementação de logs de latência e um script de validação automática (run.sh) para garantir que o sistema está operante e dentro dos limites de performance.

Containerização: Dockerfile e Docker Compose configurados para subir a stack completa (API + DB) com um único comando.

Como Testar
Para facilitar a revisão, a execução foi totalmente automatizada:

Execução Completa:

Bash
chmod +x run.sh
./run.sh
Este script sobe os containers, aguarda o banco de dados ficar saudável e executa o benchmark de performance.

Testes Unitários:
Para rodar os testes de integração que validam os endpoints e a lógica do banco:

Bash
docker-compose run --rm api pytest test_main.py
```

Tecnologias Utilizadas

  • Python 3.11 / FastAPI
  • PostgreSQL
  • Pandas
  • Docker / Docker Compose
  • Pytest

@wasdutra
Copy link
Copy Markdown

Hi @LuizHenrique31415, thanks for your PR submission! Your implementation has been evaluated and our team will reach out to you regarding the next steps. Below you can find the detailed feedback on your solution.


User Stories

  1. As a user, I want to be able to store a raw data series.

    Good — POST /series correctly persists the series by name and supports upsert (update if exists, insert if new). FastAPI binds the list[float] body correctly from the JSON payload. 👍

    That said, accepting a bare JSON array as the request body is unconventional. Without a Pydantic model wrapping the payload (e.g. {"name": "...", "data": [...]}) you lose automatic input validation, meaningful error messages on bad input, and a documented request schema in the auto-generated OpenAPI docs. Consider introducing a request schema.

  2. As a user, I want to be able to retrieve metrics about the time series.

    GET /series/{name}/metrics returns mean, max, min, and count — that covers the core requirement. 👍

    Pandas is pulled in as a dependency but only used for four aggregations that Python's standard library (statistics, min(), max(), len()) handles natively. Pandas is a large dependency; either remove it and use builtins, or justify it by doing something that actually needs it (e.g. rolling windows, resampling, FFT).

    The metrics endpoint is keyed by name (GET /series/{name}/metrics) rather than by a stable numeric id. For a frontend integration this is problematic: if a user renames a series (or if names contain spaces or special characters that need URL-encoding), the client loses its reference. A resource identifier should be stable and opaque — the id column already exists on the model and should be exposed. Ideally the API returns the id on POST /series so the client can store it and use GET /series/{id}/metrics for all subsequent requests.

  3. As a user, I want to be able to delete a time series I've sent to the server.

    DELETE /series/{name} works correctly and returns a 404 when the series doesn't exist. 👍

  4. As a user, I want to be able to retrieve the number of time series I've stored in the server.

    This endpoint is missing. A simple GET /series/count returning db.query(TimeSeriesModel).count() would satisfy it.

  5. As a user, I want to be able to retrieve a full time series I've stored.

    This endpoint is also missing. There is no route that returns the raw stored data for a given series name — you'd need something like GET /series/{name} returning the data field.


Technical Requirements

  1. Use Python.

    Yes. 👍

  2. Use a REST-API framework (e.g., FastAPI).

    FastAPI is used correctly for routing. 👍

    As noted in Story 1, the absence of Pydantic request/response schemas means the API's contract is implicit and undocumented. Adding schemas is a small effort with a large payoff in clarity and safety.

  3. The latency between client and the server side must be below 350ms in all requests.

    health_and_perf.py measures average and P95 latency and asserts the average is below 350ms — good initiative. 👍

    The benchmark only hits GET / (the root health check). The requirement says "all requests" — you should also measure POST /series, GET /series/{name}/metrics, and DELETE /series/{name}, since those involve database I/O and are where latency problems would actually appear.

    Additionally, your routes are declared async def but use synchronous SQLAlchemy sessions. Under concurrent load, the sync session blocks the event loop, which will inflate latency significantly. Either use asyncpg + SQLAlchemy async or declare the routes as plain def so Uvicorn runs them in a thread pool.

  4. Use a database to store the time series data.

    PostgreSQL via SQLAlchemy is correctly integrated. The get_db dependency properly manages session lifecycle (open → yield → close in finally). 👍

    Base.metadata.create_all runs at import time, which works for a prototype but is not production-safe. Schema changes cannot be tracked, versioned, or rolled back. Consider Alembic migrations even for a small project.

  5. Ensure correct business logic and behavior with automated unit tests (e.g., pytest).

    test_main.py uses TestClient and is discoverable by pytest. It covers the happy path for Stories 1–3 and the 404 case on delete. 👍

    The tests are not isolated — test_delete_series depends on the engine_01 series left behind by test_create_and_read_metrics. If that test fails or runs in a different order, test_delete_series gives a false result. Use a pytest fixture with setup/teardown or autouse to manage test data.

    There is no conftest.py that swaps DATABASE_URL for a lightweight test database (e.g. SQLite in-memory or a dedicated test PostgreSQL schema). As written, the tests require a live PostgreSQL instance, making them integration tests rather than unit tests. They will fail in a clean CI environment without extra setup.

    run_app.py contains a duplicate set of test functions but does not follow the test_*.py naming convention, so pytest won't discover it. Rename it to test_*.py or remove it to avoid confusion.


Bonus Requirements

  1. Deploy your application to a cloud provider and provide the API URL.

    The Docker setup is in place — deploying to a free tier (e.g. Render, Railway, Fly.io) would be a small step from here. No public URL was provided.

  2. Implement a functionality that gives a future prediction of the time series data.

    Not implemented. A simple linear extrapolation or ARIMA model via statsmodels would demonstrate this well.

  3. Add a load balancer to the application.

    Not implemented. Adding an Nginx or Traefik service to docker-compose.yaml in front of the api container would satisfy this.

  4. Add load tests to the application.

    health_and_perf.py runs 10 sequential requests, which is a latency smoke test, not a load test. A load test sends concurrent requests to stress the system — tools like Locust or k6 are straightforward to add.


General Evaluation Criteria

  1. Anyone should be able to follow the instructions and run the application.

    README.md, Dockerfile, docker-compose.yaml, and run.sh are all present, and ./run.sh provides a single entry point. The api service correctly waits for PostgreSQL to be healthy before starting (condition: service_healthy). 👍

    The README highlights a latency middleware (X-Response-Time-MS header) and structured logging (latency.log with WARNING alerts) as differentials of the solution. Neither feature exists in main.py. Be careful about documenting things that aren't implemented — it creates confusion for reviewers and raises questions about what else might be overstated.

  2. Back-end code successfully integrated with persistent storage.

    SQLAlchemy is cleanly wired to PostgreSQL. Session management via get_db and Depends is correct and idiomatic FastAPI. 👍

  3. Stories were implemented according to the functional requirements.

    Stories 1, 2, and 3 are implemented. Stories 4 (count) and 5 (full retrieval) are missing. Completing those two endpoints would fully satisfy the functional requirements.

  4. Problem-solving skills and ability to handle ambiguity.

    Good stack choices overall. The depends_on: condition: service_healthy in Docker Compose and the inclusion of a latency benchmark show you were thinking beyond the minimum. 👍

    The upsert pattern in POST /series is a design mistake in a time-series context. Time-series data should be immutable and append-only — silently overwriting an existing series with new values destroys historical data without any warning to the caller. The correct behaviour for a duplicate name should be a 409 Conflict error, forcing the client to make an explicit decision (delete first, use a different name, or call a dedicated update endpoint if mutation is intentional).

  5. Code quality, readability, and maintainability.

    Almost every line in main.py has a trailing inline comment — # Find highest value above series_data.max(), # Always close the connection after the request finishes on the finally block, and so on. These comments explain what the code literally does rather than why, which is noise rather than documentation. Comments should explain intent or non-obvious decisions, not restate the code.

    No Pydantic schemas, no error handling beyond 404 (e.g., what happens if data is an empty list?), and all concerns (DB engine, ORM model, app, routes) live in a single file. For a project this size, splitting into at least models.py, database.py, and routes.py would make the code easier to navigate and test.

  6. Code is well-organized and documented.

    The README.md and Docker setup show good intent around documentation and runability. 👍

    The README describes features that don't exist in the code (see point 1 above). Keep documentation honest — accurate and minimal is better than aspirational and misleading.

    main.py is a single-file monolith. Separating concerns into modules is a straightforward refactor that would significantly improve maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants