Skip to content
This repository was archived by the owner on Apr 8, 2024. It is now read-only.

Commit ea6431a

Browse files
authored
Update README.md (#1)
Fix typos reported by marketing team
1 parent 7d3f221 commit ea6431a

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@ The log file name can be changed to describe the run.
309309
## **Key Takeaways and Discussion**
310310
Synthetic data offers the flexibilty and accessibility to create innovative AI solutions to real problems. As synthetic data becomes a growing focus for AI solutions, it will become increasingly important to optimize the AI pipeline wherever possible. Intel® strives to bring those optimizations in all parts of the data science pipeline, from data generation through model inference.
311311

312-
Looking at the high-level utility of the different datasets created in this reference kit, with the industry specific ones (utilities, e-commerce, etc) we took a more targeted approach which also allowed us to demonstrate the functionality of a diverse set of Intel® optimized technologies such as Intel® Distribution of Modin and Intel® Distribution for Python*'s optimizations for NumPy and SciPy. Having said that, the industry-agnostic dataset is equally and probably more valuable in terms of the actual utility as it allows the user to customize the minutiae of the synthetic data, e.g. defining number of columns needed, data type, distrubutions, weights, etc. Furthermore we have also added the capability to generate a basic target value based on regression, binary classification or multi-class classification. One thing that would make the synthetic data even more realistic is (1) adding correlation(s) between features and (2) adding correlation between the response and one or more features. There can be a very high number of possibilities in which these correlations occur in the real world. However, this would have made the scope of this reference kit harder to contain. As a result, this capability is not offered as part of the reference kit. However, we would like to mention it here, since it can easily be added during run-time by drawing from the users' knowledge and subject-matter expertise.
312+
Looking at the high-level utility of the different datasets created in this reference kit, with the industry specific ones (utilities, e-commerce, etc) we took a more targeted approach which also allowed us to demonstrate the functionality of a diverse set of Intel® optimized technologies such as Intel® Distribution of Modin and Intel® Distribution for Python*'s optimizations for NumPy and SciPy. Having said that, the industry-agnostic dataset is equally and probably more valuable in terms of the actual utility as it allows the user to customize the minutiae of the synthetic data, e.g. defining number of columns needed, data type, distributions, weights, etc. Furthermore we have also added the capability to generate a basic target value based on regression, binary classification or multi-class classification. One thing that would make the synthetic data even more realistic is (1) adding correlation(s) between features and (2) adding correlation between the response and one or more features. There can be a very high number of possibilities in which these correlations occur in the real world. However, this would have made the scope of this reference kit harder to contain. As a result, this capability is not offered as part of the reference kit. However, we would like to mention it here, since it can easily be added during run-time by drawing from the users' knowledge and subject-matter expertise.
313313

314314
As a reminder, these experiments are provided to highlight the functionality of optimizations seen from Intel® oneAPI technologies in specific business contexts and how they scale. Please bear in mind that the exact gains you see here may or may not match the results of using these technologies in larger-scale real world operations.
315315

@@ -409,4 +409,4 @@ The following are open-source codebases that helped with the foundation of this
409409

410410
## **Notes**
411411

412-
**Please see this data set's applicable license for terms and conditions. Intel® does not own the rights to this data set and does not confer any rights to it.**
412+
**Please see this data set's applicable license for terms and conditions. Intel® does not own the rights to this data set and does not confer any rights to it.**

0 commit comments

Comments
 (0)