Skip to content

Commit f435d36

Browse files
committed
feat: rename to data-profiling
1 parent 82479e9 commit f435d36

261 files changed

Lines changed: 246 additions & 246 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 51 additions & 51 deletions
Large diffs are not rendered by default.

docs/advanced_settings/analytics.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
## Overview
44

5-
`ydata-profiling` is a powerful library designed to generate profile reports from pandas and Spark Dataframe objects.
6-
As part of our ongoing efforts to improve user experience and functionality, `ydata-profiling`
5+
`data-profiling` is a powerful library designed to generate profile reports from pandas and Spark Dataframe objects.
6+
As part of our ongoing efforts to improve user experience and functionality, `data-profiling`
77
includes a telemetry feature. This feature collects anonymous usage data, helping us understand how the
88
library is used and identify areas for improvement.
99

1010
The primary goal of collecting telemetry data is to:
1111

12-
- Enhance the functionality and performance of the ydata-profiling library
12+
- Enhance the functionality and performance of the data-profiling library
1313
- Prioritize new features based on user engagement
1414
- Identify common issues and bugs to improve overall user experience
1515

@@ -18,15 +18,15 @@ The primary goal of collecting telemetry data is to:
1818
The telemetry system collects non-personal, anonymous information such as:
1919

2020
- Python version
21-
- `ydata-profiling` version
22-
- Frequency of use of `ydata-profiling` features
21+
- `data-profiling` version
22+
- Frequency of use of `data-profiling` features
2323
- Errors or exceptions thrown within the library
2424

2525
## Disabling usage analytics
2626

2727
We respect your choice to not participate in our telemetry collection. If you prefer to disable telemetry, you can do so
2828
by setting an environment variable on your system. Disabling telemetry will not affect the functionality
29-
of the ydata-profiling library, except for the ability to contribute to its usage analytics.
29+
of the data-profiling library, except for the ability to contribute to its usage analytics.
3030

3131

3232
### Set an Environment Variable

docs/advanced_settings/available_settings.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Available Settings
22

3-
A set of options is available in order to customize the behaviour of ``ydata-profiling`` and the appearance of the generated report. The depth of customization allows the creation of behaviours highly targeted at the specific dataset being analysed. The available settings are listed below. To learn how to change them, check :doc:`changing_settings`.
3+
A set of options is available in order to customize the behaviour of ``data-profiling`` and the appearance of the generated report. The depth of customization allows the creation of behaviours highly targeted at the specific dataset being analysed. The available settings are listed below. To learn how to change them, check :doc:`changing_settings`.
44

55
## General settings
66

@@ -45,8 +45,8 @@ Configure the schema type for a given dataset.
4545
import json
4646
import pandas as pd
4747

48-
from ydata_profiling import ProfileReport
49-
from ydata_profiling.utils.cache import cache_file
48+
from data_profiling import ProfileReport
49+
from data_profiling.utils.cache import cache_file
5050

5151
file_name = cache_file(
5252
"titanic.csv",

docs/advanced_settings/changing_settings.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,21 +44,21 @@ r = ProfileReport(
4444

4545
## Through a custom configuration file
4646

47-
To control `ydata-profiling` through a custom file, you can start with
47+
To control `data-profiling` through a custom file, you can start with
4848
one of the sample configuration files below:
4949

5050
- [default configuration
51-
file](https://github.com/ydataai/ydata-profiling/blob/master/src/ydata_profiling/config_default.yaml)
51+
file](https://github.com/Data-Centric-AI-Community/data-profiling/blob/master/src/data_profiling/config_default.yaml)
5252
(default)
5353
- [minimal configuration
54-
file](https://github.com/ydataai/ydata-profiling/blob/master/src/ydata_profiling/config_minimal.yaml)
54+
file](https://github.com/Data-Centric-AI-Community/data-profiling/blob/master/src/data_profiling/config_minimal.yaml)
5555
(minimal computation, optimized for performance)
5656

5757
Change the configuration to your liking and point towards that
5858
configuration file when computing the report:
5959

6060
``` python linenums="1" title="Custom configuration file"
61-
from ydata_profiling import ProfileReport
61+
from data_profiling import ProfileReport
6262

6363
profile = ProfileReport(df, config_file="your_config.yml")
6464
profile.to_file("report.html")
@@ -70,7 +70,7 @@ Any configuration setting can also be read from environment variables.
7070
For example:
7171

7272
```python linenums="1" title="Setting title for the report with parameters"
73-
from ydata_profiling import ProfileReport
73+
from data_profiling import ProfileReport
7474

7575
profile = ProfileReport(df, title="My Custom Profiling Report")
7676
```
@@ -79,7 +79,7 @@ is equivalent to setting the title as an environment variable
7979

8080
```python linenums="1" title="Set title through environment variables"
8181
import os
82-
from ydata_profiling import ProfileReport
82+
from data_profiling import ProfileReport
8383

8484
os.environ("PROFILE_TITLE")='My Custom Profiling Report'
8585

docs/advanced_settings/collaborative_data_profiling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[YData Fabric](https://ydata.ai/products/fabric) is a Data-Centric AI
66
development platform. YData Fabric provides all capabilities of
7-
ydata-profiling in a hosted environment combined with a guided UI
7+
data-profiling in a hosted environment combined with a guided UI
88
experience.
99

1010
[Fabric\'s Data Catalog](https://ydata.ai/products/data_catalog)

docs/features/big_data.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22

33
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=baa0e45f-0c03-4190-9646-9d8ea2640ba2" />
44

5-
By default, `ydata-profiling` comprehensively summarizes the input
5+
By default, `data-profiling` comprehensively summarizes the input
66
dataset in a way that gives the most insights for data analysis. For
77
small datasets, these computations can be performed in *quasi*
88
real-time. For larger datasets, deciding upfront which calculations to
99
make might be required. Whether a computation scales to a large datasets
1010
not only depends on the exact size of the dataset, but also on its
1111
complexity and on whether fast computations are available. If the
1212
computation time of the profiling becomes a bottleneck,
13-
`ydata-profiling` offers several alternatives to overcome it.
13+
`data-profiling` offers several alternatives to overcome it.
1414

1515
!!! info "Scale in a fully managed system"
1616

@@ -27,9 +27,9 @@ computation time of the profiling becomes a bottleneck,
2727
This mode was introduced in version v4.0.0
2828

2929

30-
`ydata-profiling` now supports Spark Dataframes profiling. You can find
30+
`data-profiling` now supports Spark Dataframes profiling. You can find
3131
an example of the integration
32-
[here](https://github.com/ydataai/ydata-profiling/blob/master/examples/features/spark_example.py).
32+
[here](https://github.com/Data-Centric-AI-Community/data-profiling/blob/master/examples/features/spark_example.py).
3333

3434
**Features supported:** - Univariate variables' analysis - Head and Tail
3535
dataset sample - Correlation matrices: Pearson and Spearman
@@ -38,7 +38,7 @@ dataset sample - Correlation matrices: Pearson and Spearman
3838
histogram computation
3939

4040
Keep an eye on the
41-
[GitHub](https://github.com/ydataai/ydata-profiling/issues) page to
41+
[GitHub](https://github.com/Data-Centric-AI-Community/data-profiling/issues) page to
4242
follow the updates on the implementation of [Pyspark Dataframes
4343
support](https://github.com/orgs/ydataai/projects/16/views/2).
4444

@@ -48,7 +48,7 @@ support](https://github.com/orgs/ydataai/projects/16/views/2).
4848

4949
This mode was introduced in version v2.4.0
5050

51-
`ydata-profiling` includes a minimal configuration file where the most
51+
`data-profiling` includes a minimal configuration file where the most
5252
expensive computations are turned off by default. This is the
5353
recommended starting point for larger datasets.
5454

@@ -58,7 +58,7 @@ profile.to_file("output.html")
5858
```
5959

6060
This configuration file can be found here:
61-
[config_minimal.yaml](https://github.com/ydataai/ydata-profiling/blob/master/src/ydata_profiling/config_minimal.yaml).
61+
[config_minimal.yaml](https://github.com/Data-Centric-AI-Community/data-profiling/blob/master/src/data_profiling/config_minimal.yaml).
6262
More details on settings and configuration are available in
6363
`../advanced_usage/available_settings`{.interpreted-text role="doc"}.
6464

@@ -103,7 +103,7 @@ that only the interactions with these variables in specific are
103103
computed.
104104

105105
``` python linenums="1" title="Disable expensive computations"
106-
from ydata_profiling import ProfileReport
106+
from data_profiling import ProfileReport
107107
import pandas as pd
108108

109109
# Reading the data
@@ -127,14 +127,14 @@ role="doc"}.
127127

128128
# Concurrency
129129

130-
`ydata-profiling` is a project under active development. One of the
130+
`data-profiling` is a project under active development. One of the
131131
highly desired features is the addition of a scalable backend such as
132132
[Modin](https://github.com/modin-project/modin) or
133133
[Dask](https://dask.org/).
134134

135135
Keep an eye on the
136-
[GitHub](https://github.com/ydataai/ydata-profiling/issues) page to
136+
[GitHub](https://github.com/Data-Centric-AI-Community/data-profiling/issues) page to
137137
follow the updates on the implementation of a concurrent and highly
138138
scalable backend. Specifically, development of a Spark backend is
139139
[currently
140-
underway](https://github.com/ydataai/ydata-profiling/projects/3).
140+
underway](https://github.com/Data-Centric-AI-Community/data-profiling/projects/3).

docs/features/collaborative_data_profiling.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ A collaborative experience to profile datasets & relational databases
99

1010
[YData Fabric](https://ydata.ai/products/fabric) is a Data-Centric AI
1111
development platform. YData Fabric provides all capabilities of
12-
ydata-profiling in a hosted environment combined with a guided UI
12+
data-profiling in a hosted environment combined with a guided UI
1313
experience.
1414

1515
[Fabric's Data Catalog](https://ydata.ai/products/data_catalog),
16-
a scalable and interactive version of ydata-profiling,
16+
a scalable and interactive version of data-profiling,
1717
provides a comprehensive and powerful tool designed to enable data
1818
professionals, including data scientists and data engineers, to manage
1919
and understand data within an organization. The Data Catalog act as a

docs/features/comparing_datasets.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,18 @@
44
!!! note "Dataframes compare support"
55

66
Profiling compare is supported from
7-
ydata-profiling version 3.5.0 onwards.
7+
data-profiling version 3.5.0 onwards.
88
Profiling compare is not *(yet!)* available for Spark Dataframes
99

10-
`ydata-profiling` can be used to compare multiple version of the same
10+
`data-profiling` can be used to compare multiple version of the same
1111
dataset. This is useful when comparing data from multiple time periods,
1212
such as two years. Another common scenario is to view the dataset
1313
profile for training, validation and test sets in machine learning.
1414

1515
The following syntax can be used to compare two datasets:
1616

1717
``` python linenums="1" title="Comparing 2 datasets"
18-
from ydata_profiling import ProfileReport
18+
from data_profiling import ProfileReport
1919

2020
train_df = pd.read_csv("train.csv")
2121
train_report = ProfileReport(train_df, title="Train")
@@ -37,7 +37,7 @@ In order to compare more than two reports, the following syntax can be
3737
used:
3838

3939
``` python linenums="1" title="Comparing more than 2 datasets"
40-
from ydata_profiling import ProfileReport, compare
40+
from data_profiling import ProfileReport, compare
4141

4242
comparison_report = compare([train_report, validation_report, test_report])
4343

docs/features/custom_reports.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
In some situations, a user might want to customize the appearance
44
of the report to match personal preferences or a corporate brand.
5-
``ydata-profiling`` offers two major customization dimensions:
5+
``data-profiling`` offers two major customization dimensions:
66
the **styling of the HTML report** and the **styling of the visualizations
77
and plots** contained within.
88

@@ -64,7 +64,7 @@ values overview can also be customized via the ``plot`` argument. To customize
6464
the palette used by the correlation matrix, use the ``correlation`` key:
6565

6666
``` python linenums="1" title="Changing visualizations color palettes"
67-
from ydata_profiling import ProfileReport
67+
from data_profiling import ProfileReport
6868

6969
profile = ProfileReport(
7070
df,
@@ -77,7 +77,7 @@ Similarly, the palette for *Missing values* can be changed using ``missing`` arg
7777

7878
``` python linenums="1" python
7979

80-
from ydata_profiling import ProfileReport
80+
from data_profiling import ProfileReport
8181

8282
profile = ProfileReport(
8383
df,
@@ -87,7 +87,7 @@ Similarly, the palette for *Missing values* can be changed using ``missing`` arg
8787
)
8888
```
8989

90-
``ydata-profiling`` accepts all ``cmap`` values (colormaps) accepted by ``matplotlib``.
90+
``data-profiling`` accepts all ``cmap`` values (colormaps) accepted by ``matplotlib``.
9191
The list of available colour maps can [be accessed here](https://matplotlib.org/stable/tutorials/colors/colormaps.html>).
9292
Alternatively, it is possible to create [custom palettes](https://matplotlib.org/stable/gallery/color/custom_cmap.html>).
9393

docs/features/metadata.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
When sharing reports with coworkers or publishing online, it might be
66
important to include metadata of the dataset, such as author, copyright
7-
holder or descriptions. `ydata-profiling` allows complementing a report
7+
holder or descriptions. `data-profiling` allows complementing a report
88
with that information. Inspired by [schema.org\'s
99
Dataset](https://schema.org/Dataset), the currently supported properties
1010
are *description*, *creator*, *author*, *url*, *copyright_year* and
@@ -33,7 +33,7 @@ report.to_file(Path("stata_auto_report.html"))
3333

3434
In addition to providing dataset details, often users want to include
3535
column-specific descriptions when sharing reports with team members and
36-
stakeholders. `ydata-profiling` supports creating these descriptions, so
36+
stakeholders. `data-profiling` supports creating these descriptions, so
3737
that the report includes a built-in data dictionary. By default, the
3838
descriptions are presented in the *Overview* section of the report, next
3939
to each variable.
@@ -64,7 +64,7 @@ Alternatively, column descriptions can be loaded from a JSON file:
6464
``` python linenums="1" title="Generate a report with descriptions per variable from a JSON definitions file"
6565
import json
6666
import pandas as pd
67-
import ydata_profiling
67+
import data_profiling
6868

6969
definition_file = dataset_column_definition.json
7070

@@ -87,17 +87,17 @@ report.to_file("report.html")
8787

8888
In addition to providing dataset details, users often want to include
8989
set type schemas. This is particularly important when integrating
90-
`ydata-profiling` generation with the information already in a data
91-
catalog. When using `ydata-profiling` ProfileReport, users can set the
90+
`data-profiling` generation with the information already in a data
91+
catalog. When using `data-profiling` ProfileReport, users can set the
9292
type_schema property to control the generated profiling data types. By
9393
default, the `type_schema` is automatically inferred with [visions](https://github.com/dylan-profiler/visions).
9494

9595
``` python linenums="1" title="Set the variable type schema to Generate the profile report"
9696
import json
9797
import pandas as pd
9898

99-
from ydata_profiling import ProfileReport
100-
from ydata_profiling.utils.cache import cache_file
99+
from data_profiling import ProfileReport
100+
from data_profiling.utils.cache import cache_file
101101

102102
file_name = cache_file(
103103
"titanic.csv",

0 commit comments

Comments
 (0)