Skip to content

Commit fdc284b

Browse files
authored
Merge pull request #64 from DataKitchen/python3.11
refactor: make codebase compatible with Python 3.11
2 parents 1d21ca5 + 7625c0b commit fdc284b

14 files changed

Lines changed: 135 additions & 94 deletions

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,14 +84,14 @@ As an alternative to the Docker Compose [installation with dk-installer (recomme
8484

8585
| Software | Tested Versions | Command to check version |
8686
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------------------|
87-
| [Python](https://www.python.org/downloads/) <br/>- Most Linux and macOS systems have Python pre-installed. <br/>- On Windows machines, you will need to download and install it. | 3.12 | `python3 --version` |
87+
| [Python](https://www.python.org/downloads/) <br/>- Most Linux and macOS systems have Python pre-installed. <br/>- On Windows machines, you will need to download and install it. | 3.11, 3.12, 3.13 | `python3 --version` |
8888
| [PostgreSQL](https://www.postgresql.org/download/) | 14.1, 15.8, 16.4 | `psql --version`|
8989

9090
### Install the TestGen package
9191

9292
We recommend using a Python virtual environment to avoid any dependency conflicts with other applications installed on your machine. The [venv](https://docs.python.org/3/library/venv.html#creating-virtual-environments) module, which is part of the Python standard library, or other third-party tools, like [virtualenv](https://virtualenv.pypa.io/en/latest/) or [conda](https://docs.conda.io/en/latest/), can be used.
9393

94-
Create and activate a virtual environment with a TestGen-compatible version of Python (`>=3.12`). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.
94+
Create and activate a virtual environment with a TestGen-compatible version of Python (`>=3.11`). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.
9595

9696
_On Linux/Mac_
9797
```shell

docs/local_development.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ git clone https://github.com/YOUR-USERNAME/dataops-testgen
2323

2424
We recommend using a Python virtual environment to avoid any dependency conflicts with other applications installed on your machine. The [venv](https://docs.python.org/3/library/venv.html#creating-virtual-environments) module, which is part of the Python standard library, or other third-party tools, like [virtualenv](https://virtualenv.pypa.io/en/latest/) or [conda](https://docs.conda.io/en/latest/), can be used.
2525

26-
From the root of your local repository, create and activate a virtual environment with a TestGen-compatible version of Python (`>=3.12`). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.
26+
From the root of your local repository, create and activate a virtual environment with a TestGen-compatible version of Python (`>=3.11`; we develop on 3.13). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.
2727

2828
_On Linux/Mac_
2929
```shell

pyproject.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,13 @@ classifiers = [
2121
"License :: OSI Approved :: Apache Software License",
2222
"Development Status :: 5 - Production/Stable",
2323
"Operating System :: OS Independent",
24+
"Programming Language :: Python :: 3.11",
25+
"Programming Language :: Python :: 3.12",
2426
"Programming Language :: Python :: 3.13",
2527
"Topic :: System :: Monitoring",
2628
]
2729
keywords = [ "dataops", "data", "quality", "testing", "database", "profiling" ]
28-
requires-python = ">=3.12"
30+
requires-python = ">=3.11"
2931

3032
dependencies = [
3133
"PyYAML==6.0.3",
@@ -170,7 +172,7 @@ filterwarnings = [
170172
# for an explanation of their functionality.
171173
# WARNING: When changing mypy configurations, be sure to test them after removing your .mypy_cache
172174
[tool.mypy]
173-
python_version = "3.13"
175+
python_version = "3.11"
174176
check_untyped_defs = true
175177
disallow_untyped_decorators = true
176178
disallow_untyped_defs = true
@@ -211,7 +213,7 @@ exclude = [
211213
]
212214

213215
[tool.ruff]
214-
target-version = "py310"
216+
target-version = "py311"
215217
line-length = 120
216218
indent-width = 4
217219
include = [

testgen/common/database/flavor/bigquery_flavor_service.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ def get_connection_string_head(self, params: ResolvedConnectionParams) -> str:
1414
return f"{self.url_scheme}://"
1515

1616
def get_connection_string_from_fields(self, params: ResolvedConnectionParams) -> str:
17-
return f"{self.url_scheme}://{params.service_account_key["project_id"] if params.service_account_key else ""}"
17+
project_id = params.service_account_key["project_id"] if params.service_account_key else ""
18+
return f"{self.url_scheme}://{project_id}"
1819

1920
def get_connect_args(self, params: ResolvedConnectionParams) -> dict: # noqa: ARG002
2021
return {}

testgen/common/models/scores.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -522,7 +522,7 @@ def get_as_sql(
522522
for _, field_filters in grouped_filters:
523523
field_filters_sql = [f.get_as_sql(prefix=prefix, operand="AND") for f in field_filters]
524524
filters_sql.append(
525-
f"({" OR ".join(field_filters_sql)})" if len(field_filters_sql) > 1 else field_filters_sql[0]
525+
f"({' OR '.join(field_filters_sql)})" if len(field_filters_sql) > 1 else field_filters_sql[0]
526526
)
527527
else:
528528
filters_sql = [ f.get_as_sql(prefix=prefix, operand="AND") for f in self.filters ]

testgen/ui/queries/profiling_queries.py

Lines changed: 82 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -180,8 +180,7 @@ def get_tables_by_condition(
180180
include_active_tests: bool = False,
181181
include_scores: bool = False,
182182
) -> list[dict]:
183-
query = f"""
184-
{"""
183+
active_tests_cte = """
185184
WITH active_test_definitions AS (
186185
SELECT
187186
test_defs.table_groups_id,
@@ -201,47 +200,66 @@ def get_tables_by_condition(
201200
test_defs.schema_name,
202201
test_defs.table_name
203202
)
204-
""" if include_active_tests else ""}
205-
SELECT
206-
table_chars.table_id::VARCHAR AS id,
207-
'table' AS type,
208-
table_chars.table_name,
209-
table_chars.schema_name,
210-
table_chars.table_groups_id::VARCHAR AS table_group_id,
211-
-- Characteristics
212-
functional_table_type,
213-
approx_record_ct,
214-
table_chars.record_ct,
215-
table_chars.column_ct,
216-
add_date,
217-
last_refresh_date,
218-
drop_date,
219-
{f"""
203+
""" if include_active_tests else ""
204+
205+
table_tags_select = f"""
220206
-- Table Tags
221207
table_chars.description,
222208
table_chars.critical_data_element,
223209
{", ".join([ f"table_chars.{tag}" for tag in TAG_FIELDS ])},
224210
-- Table Groups Tags
225211
{", ".join([ f"table_groups.{tag} AS table_group_{tag}" for tag in TAG_FIELDS if tag != "aggregation_level" ])},
226-
""" if include_tags else ""}
227-
{"""
212+
""" if include_tags else ""
213+
214+
has_test_runs_select = """
228215
-- Has Test Runs
229216
EXISTS(
230217
SELECT 1
231218
FROM test_results
232219
WHERE table_groups_id = table_chars.table_groups_id
233220
AND table_name = table_chars.table_name
234221
) AS has_test_runs,
235-
""" if include_has_test_runs else ""}
236-
{"""
222+
""" if include_has_test_runs else ""
223+
224+
active_tests_select = """
237225
-- Test Definition Count
238226
active_tests.count AS active_test_count,
239-
""" if include_active_tests else ""}
240-
{"""
227+
""" if include_active_tests else ""
228+
229+
scores_select = """
241230
-- Scores
242231
table_chars.dq_score_profiling,
243232
table_chars.dq_score_testing,
244-
""" if include_scores else ""}
233+
""" if include_scores else ""
234+
235+
active_tests_join = """
236+
LEFT JOIN active_test_definitions active_tests ON (
237+
table_chars.table_groups_id = active_tests.table_groups_id
238+
AND table_chars.schema_name = active_tests.schema_name
239+
AND table_chars.table_name = active_tests.table_name
240+
)
241+
""" if include_active_tests else ""
242+
243+
query = f"""
244+
{active_tests_cte}
245+
SELECT
246+
table_chars.table_id::VARCHAR AS id,
247+
'table' AS type,
248+
table_chars.table_name,
249+
table_chars.schema_name,
250+
table_chars.table_groups_id::VARCHAR AS table_group_id,
251+
-- Characteristics
252+
functional_table_type,
253+
approx_record_ct,
254+
table_chars.record_ct,
255+
table_chars.column_ct,
256+
add_date,
257+
last_refresh_date,
258+
drop_date,
259+
{table_tags_select}
260+
{has_test_runs_select}
261+
{active_tests_select}
262+
{scores_select}
245263
-- Profile Run
246264
table_chars.last_complete_profile_run_id::VARCHAR AS profile_run_id,
247265
profiling_starttime AS profile_run_date,
@@ -255,13 +273,7 @@ def get_tables_by_condition(
255273
LEFT JOIN table_groups ON (
256274
table_chars.table_groups_id = table_groups.id
257275
)
258-
{"""
259-
LEFT JOIN active_test_definitions active_tests ON (
260-
table_chars.table_groups_id = active_tests.table_groups_id
261-
AND table_chars.schema_name = active_tests.schema_name
262-
AND table_chars.table_name = active_tests.table_name
263-
)
264-
""" if include_active_tests else ""}
276+
{active_tests_join}
265277
{filter_condition}
266278
ORDER BY LOWER(table_chars.table_name);
267279
"""
@@ -347,24 +359,7 @@ def get_columns_by_condition(
347359
include_active_tests: bool = False,
348360
include_scores: bool = False,
349361
) -> list[dict]:
350-
query = f"""
351-
SELECT
352-
column_chars.column_id::VARCHAR AS id,
353-
'column' AS type,
354-
column_chars.column_name,
355-
column_chars.table_name,
356-
column_chars.schema_name,
357-
column_chars.table_groups_id::VARCHAR AS table_group_id,
358-
column_chars.ordinal_position,
359-
-- Characteristics
360-
column_chars.general_type,
361-
column_chars.db_data_type,
362-
column_chars.functional_data_type,
363-
datatype_suggestion,
364-
column_chars.add_date,
365-
column_chars.last_mod_date,
366-
column_chars.drop_date,
367-
{f"""
362+
column_tags_select = f"""
368363
-- Column Tags
369364
column_chars.description,
370365
column_chars.critical_data_element,
@@ -376,13 +371,9 @@ def get_columns_by_condition(
376371
{", ".join([ f"table_chars.{tag} AS table_{tag}" for tag in TAG_FIELDS ])},
377372
-- Table Groups Tags
378373
{", ".join([ f"table_groups.{tag} AS table_group_{tag}" for tag in TAG_FIELDS if tag != "aggregation_level" ])},
379-
""" if include_tags else ""}
380-
-- Profile Run
381-
column_chars.last_complete_profile_run_id::VARCHAR AS profile_run_id,
382-
run_date AS profile_run_date,
383-
TRUE AS is_latest_profile,
384-
query_error AS profiling_error,
385-
{"""
374+
""" if include_tags else ""
375+
376+
has_test_runs_select = """
386377
-- Has Test Runs
387378
EXISTS(
388379
SELECT 1
@@ -391,8 +382,9 @@ def get_columns_by_condition(
391382
AND table_name = column_chars.table_name
392383
AND column_names = column_chars.column_name
393384
) AS has_test_runs,
394-
""" if include_has_test_runs else ""}
395-
{"""
385+
""" if include_has_test_runs else ""
386+
387+
active_tests_select = """
396388
-- Test Definition Count
397389
(
398390
SELECT COUNT(*)
@@ -402,12 +394,40 @@ def get_columns_by_condition(
402394
AND column_name = column_chars.column_name
403395
AND test_active = 'Y'
404396
) AS active_test_count,
405-
""" if include_active_tests else ""}
406-
{"""
397+
""" if include_active_tests else ""
398+
399+
scores_select = """
407400
-- Scores
408401
column_chars.dq_score_profiling,
409402
column_chars.dq_score_testing,
410-
""" if include_scores else ""}
403+
""" if include_scores else ""
404+
405+
query = f"""
406+
SELECT
407+
column_chars.column_id::VARCHAR AS id,
408+
'column' AS type,
409+
column_chars.column_name,
410+
column_chars.table_name,
411+
column_chars.schema_name,
412+
column_chars.table_groups_id::VARCHAR AS table_group_id,
413+
column_chars.ordinal_position,
414+
-- Characteristics
415+
column_chars.general_type,
416+
column_chars.db_data_type,
417+
column_chars.functional_data_type,
418+
datatype_suggestion,
419+
column_chars.add_date,
420+
column_chars.last_mod_date,
421+
column_chars.drop_date,
422+
{column_tags_select}
423+
-- Profile Run
424+
column_chars.last_complete_profile_run_id::VARCHAR AS profile_run_id,
425+
run_date AS profile_run_date,
426+
TRUE AS is_latest_profile,
427+
query_error AS profiling_error,
428+
{has_test_runs_select}
429+
{active_tests_select}
430+
{scores_select}
411431
table_chars.approx_record_ct,
412432
table_groups.project_code,
413433
table_groups.connection_id::VARCHAR AS connection_id,

testgen/ui/scripts/patch_streamlit.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def _create_tag(relative_filepath: str, html: BeautifulSoup) -> Tag | None:
7979
),
8080
}
8181

82-
extension = f".{relative_filepath.split(".")[-1]}"
82+
extension = f".{relative_filepath.split('.')[-1]}"
8383
if extension in tag_for_ext:
8484
return tag_for_ext[extension]()
8585
return None

testgen/ui/views/data_catalog.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -251,12 +251,13 @@ def get_excel_report_data(
251251
data["excluded_data_element"] = data["excluded_data_element"].apply(lambda val: "Yes" if val else None)
252252
data["pii_flag"] = data["pii_flag"].apply(lambda val: "Yes" if val else None)
253253
data["top_freq_values"] = data["top_freq_values"].apply(
254-
lambda val: "\n".join([f"{part.split(" | ")[1]} | {part.split(" | ")[0]}" for part in val[2:].split("\n| ")])
254+
lambda val: "\n".join([f"{part.split(' | ')[1]} | {part.split(' | ')[0]}" for part in val[2:].split("\n| ")])
255255
if not pd.isna(val) and val != PII_REDACTED
256256
else val
257257
)
258+
nl = "\n" # For Python 3.11 compatibility
258259
data["top_patterns"] = data["top_patterns"].apply(
259-
lambda val: "".join([f"{part}{'\n' if index % 2 else ' | '}" for index, part in enumerate(val.split(" | "))])
260+
lambda val: "".join([f"{part}{nl if index % 2 else ' | '}" for index, part in enumerate(val.split(" | "))])
260261
if not pd.isna(val) and val != PII_REDACTED
261262
else val
262263
)

testgen/ui/views/dialogs/table_create_script_dialog.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ def generate_create_script(table_name: str, data: list[dict]) -> str | None:
3030
separator = " " if index == len(table_data) - 1 else ","
3131
col_defs.append(f"{col['column_name']:<{max_name}} {(col_type):<{max_type}}{separator} {comment}")
3232

33+
col_defs_joined = "\n ".join(col_defs)
3334
return f"""
3435
CREATE TABLE {table_data[0]['schema_name']}.{table_data[0]['table_name']} (
35-
{"\n ".join(col_defs)}
36+
{col_defs_joined}
3637
);"""

testgen/ui/views/monitors_dashboard.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -483,9 +483,10 @@ def _monitor_changes_by_tables_query(
483483
{"OFFSET :offset" if offset else ""}
484484
"""
485485

486+
escaped_table_name_filter = table_name_filter.replace("_", "\\_") if table_name_filter else None
486487
params = {
487488
"table_group_id": table_group_id,
488-
"table_name_filter": f"%{table_name_filter.replace('_', '\\_')}%" if table_name_filter else None,
489+
"table_name_filter": f"%{escaped_table_name_filter}%" if escaped_table_name_filter else None,
489490
"sort_field": sort_field,
490491
"limit": limit,
491492
"offset": offset,

0 commit comments

Comments
 (0)