|
| 1 | +Data Loading |
| 2 | +============ |
| 3 | + |
| 4 | +Cardea makes use of a module to plugin the user's data and automatically organize it into the framework. |
| 5 | +It expects data in Fast Healthcare Interoperability Resources (FHIR), a standard for health care data |
| 6 | +exchange, published by HL7®. Among the advantages of FHIR over other standards are: |
| 7 | + |
| 8 | +* Fast and easy to implement |
| 9 | +* Specification is free for use with no restrictions |
| 10 | +* Strong foundation in Web standards: XML, JSON, HTTP, OAuth, etc. |
| 11 | +* Support for RESTful architectures |
| 12 | +* Concise and easily understood specifications |
| 13 | +* A human-readable serialization format for ease of use by developers |
| 14 | + |
| 15 | +By default, Cardea loads a dataset hosted in `Amazon S3`_, representing a formatted version of the |
| 16 | +Kaggle dataset: `Medical Appointment No Shows`_, but it also allows user to load datasets providing a |
| 17 | +local path with CSV files, using the ``load_data_entityset(...)`` method. As an example, the following piece |
| 18 | +of code will load the default Kaggle dataset: |
| 19 | + |
| 20 | +.. code-block:: python |
| 21 | +
|
| 22 | + from cardea import Cardea |
| 23 | + cardea = Cardea() |
| 24 | + cardea.load_data_entityset() |
| 25 | +
|
| 26 | +While local files can be loaded using the same method with a ``folder_path`` parameter: |
| 27 | + |
| 28 | +.. code-block:: python |
| 29 | +
|
| 30 | + cardea.load_data_entityset(folder_path="your/local/path/") |
| 31 | +
|
| 32 | +Cardea handles datasets as a collection of entities and the relationships between them because they |
| 33 | +are useful for preparing raw, structured datasets for feature engineering. For this, it uses |
| 34 | +the `featuretools.EntitySet`_ class. |
| 35 | + |
| 36 | +Using the following command, you will be able to summarize the dataset: |
| 37 | + |
| 38 | +.. code-block:: python |
| 39 | +
|
| 40 | + cardea.es |
| 41 | + Entityset: fhir |
| 42 | + Entities: |
| 43 | + Address [Rows: 81, Columns: 2] |
| 44 | + Appointment_Participant [Rows: 6100, Columns: 2] |
| 45 | + Appointment [Rows: 110527, Columns: 5] |
| 46 | + CodeableConcept [Rows: 4, Columns: 2] |
| 47 | + Coding [Rows: 3, Columns: 2] |
| 48 | + Identifier [Rows: 227151, Columns: 1] |
| 49 | + Observation [Rows: 110527, Columns: 3] |
| 50 | + Patient [Rows: 6100, Columns: 4] |
| 51 | + Reference [Rows: 6100, Columns: 1] |
| 52 | + Relationships: |
| 53 | + Appointment_Participant.actor -> Reference.identifier |
| 54 | + Appointment.participant -> Appointment_Participant.object_id |
| 55 | + CodeableConcept.coding -> Coding.object_id |
| 56 | + Observation.code -> CodeableConcept.object_id |
| 57 | + Observation.subject -> Reference.identifier |
| 58 | + Patient.address -> Address.object_id |
| 59 | +
|
| 60 | +Showing, in this case, the resources that were loaded into the framework (**Entities** section) |
| 61 | +and the relationship between the resources (**Relationships** section). |
| 62 | + |
| 63 | + |
| 64 | +.. _Amazon S3: https://s3.amazonaws.com/dai-cardea/ |
| 65 | +.. _Medical Appointment No Shows: https://www.kaggle.com/joniarroba/noshowappointments |
| 66 | +.. _featuretools.EntitySet: https://docs.featuretools.com/generated/featuretools.EntitySet.html#featuretools.EntitySet |
0 commit comments