Skip to content

Commit c5f3fdf

Browse files
Add memory debug and profile (#130)
* Add code for memory consumption and optionally select which debug you would like to use. * Add documentation about debuging * Add psutil * Tests updates * Fix lint * Add extra tests * Update MLPrimitives version * Rephrase documentation
1 parent 52653e0 commit c5f3fdf

4 files changed

Lines changed: 294 additions & 120 deletions

File tree

docs/advanced_usage/pipelines.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,42 @@ An example of this situation, where we want to reuse the output of the first blo
423423
predictions = pipeline.predict(X_test)
424424
score = compute_score(y_test, predictions)
425425

426+
Pipeline debugging
427+
------------------
428+
429+
Sometimes we might be interested in debugging a pipeline execution and obtain information
430+
about the time, the memory usage, the inputs and outputs that each step takes. This is possible
431+
by using the argument ``debug`` with the method ``fit`` and ``predict``. This argument allows us
432+
to retrieve critical information from the pipeline execution:
433+
434+
* ``Time``: Elapsed time for the primitive and the given stage (fit or predict).
435+
* ``Memory``: Amount of memory increase or decrease for the given primitive for that pipeline.
436+
* ``Input``: The input values that the primitive takes for that specific step.
437+
* ``Output``: The output produced by the primitive.
438+
439+
440+
If the ``debug`` argument is set to ``True`` then a dictionary will be returned containing all the
441+
elements listed previously::
442+
443+
result, debug_info = pipeline.fit(X_train, y_train, debug=True)
444+
445+
In case you want to retrieve only some of the elements listed above and skip the rest, you can
446+
pass an ``str`` to the ``debug`` argument with any combination of the following characters:
447+
448+
* ``i``: To include inputs.
449+
* ``o``: To include outputs.
450+
* ``m``: To include used memory.
451+
* ``t``: To include elapsed time.
452+
453+
For example, if we are only interested on capturing the elapsed time and used memory during the
454+
``fit`` process, we can call the method as follows::
455+
456+
result, debug_info = pipeline.fit(X_train, y_train, debug='tm')
457+
458+
.. warning:: Bear in mind that if we use ``debug=True`` or saving the ``Input`` and ``Output``,
459+
this will consume extra memory ram as it will create copies of the input data and
460+
the output data for each primitive. For profiling it is recommended using the option
461+
``tm`` as shown in the previous example.
426462

427463
.. _API Reference: ../api_reference.html
428464
.. _primitives: ../primitives.html

0 commit comments

Comments
 (0)