learned_optimization/docs/notebooks/summary_tutorial.py at 11a697610ac0f5f17b9502f5ba2c1d989578022e · google/learned_optimization · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
# coding=utf-8
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# ---
# jupyter:
#   jupytext:
#     formats: ipynb,md:myst,py
#     text_representation:
#       extension: .py
#       format_name: light
#       format_version: '1.5'
#       jupytext_version: 1.14.1
#   kernelspec:
#     display_name: Python 3
#     name: python3
# pylint: disable=line-too-long
# ---

# + [markdown] id="ryqPvTKI19zH"
# # Summary tutorial: Getting metrics out of your models
#
# The goal of the `learned_optimization.summary` module is to seamlessly allow researchers to annotate and extract data from *within* a jax computation / machine learning model.
# This could be anything from mean and standard deviation of an activation, to looking at the distribution of outputs.
#
# Doing this in Jax can be challenging at times as code written in Jax can make use of a large number of function transformations making it difficult to reach in and look at a value.
#
# This notebook discusses the `learned_optimization.summary` module which provides one solution to this problem and is discussed in this notebook.

# + [markdown] id="aJfX-Mda59NC"
# ## Deps

# + [markdown] id="NGqJCT0MwlvI"
# In addition to `learned_optimization`, the summary module requires `oryx`. This can be a bit finicky to install at the moment as it relies upon particular versions of tensorflow (even though we never use these pieces).
# All of `learned_optimization` will run without `oryx`, but to get summaries this module must be installed.
# In a colab this is even more annoying as we must first upgrade the versions of some installed modules, restart the colab kernel, and then proceed to run the remainder of the cells.

# + colab={"base_uri": "https://localhost:8080/", "height": 1000} id="dLD5VE3EACSI" outputId="fe7f4bea-c9d1-4b0c-91e5-75d61a60a11f"
# !pip install --upgrade git+https://github.com/google/learned_optimization.git oryx tensorflow==2.8.0rc0 numpy

# + [markdown] id="gzqk9f_t6f8J"
# To check that everything is operating as expected we can check that the imports succeed.

# + id="kbWUpquq6XSu"
import oryx
from learned_optimization import summary
assert summary.ORYX_LOGGING

# + [markdown] id="Lz6MTETQ4R11"
# ## Basic Example
# Let's say that we have the following function and we wish to look at the `to_look_at` value.

# + id="Jugy6h9d4QT0"
import jax
import jax.numpy as jnp


# + colab={"base_uri": "https://localhost:8080/"} id="ozBppGEc4QT0" outputId="2b9b497f-5cf7-45e1-dd6f-0195e13b7a24"
def forward(params):
  to_look_at = jnp.mean(params) * 2.
  return params


def loss(parameters):
  loss = jnp.mean(forward(parameters)**2)
  return loss


value_grad_fn = jax.jit(jax.value_and_grad(loss))
value_grad_fn(1.0)


# + [markdown] id="9WzMuzf44j0L"
# With the summary module we can first annotate this value with `summary.summary`

# + id="K7N39btX4jUe"
def forward(params):
  to_look_at = jnp.mean(params) * 2.
  summary.summary("to_look_at", to_look_at)
  return params


@jax.jit
def loss(parameters):
  loss = jnp.mean(forward(parameters)**2)
  return loss


# + [markdown] id="AL9_xgfR4yPS"
# Then we can transform the `loss` function with the function transformation: `summary.with_summary_output_reduced`.
# This transformation goes through the computation and extracts all the tagged values and returns them to us by name in a dictionary.
# In implementation, all the hard work here is done by the wonderful `oryx` library (in particular [harvest](https://github.com/jax-ml/oryx/tree/main/oryx/core/interpreters/harvest.py)).
# When we wrap a function this, we return a tuple containing the original result, and a dictionary with the desired metrics.

# + colab={"base_uri": "https://localhost:8080/"} id="hZQkB6Um8PI5" outputId="984e4f64-7562-48ae-ca68-ff4014037553"
result, metrics = summary.with_summary_output_reduced(loss)(1.)
result, metrics

# + [markdown] id="EJ7sPabc-zby"
# As currently returned, the dictionary contains extra information as well as potentially duplicate values. We can collapse these metrics into a single value with the following:

# + colab={"base_uri": "https://localhost:8080/"} id="ZpZxTgsF-zJv" outputId="920de601-0c5e-4342-8bde-cf5297dc4635"
summary.aggregate_metric_list([metrics])

# + [markdown] id="tPZ-YM6I_Dhk"
# The keys of this dictionary first show how the metric was aggregated. In this case there is only a single metric so the aggregation is ignored. This is followed by `||` and then the summary name.

# + [markdown] id="Pd9RDqbVAB-L"
# One benefit of this function transformation is it can be nested with other jax function transformations. For example we can jit the transformed function like so:

# + colab={"base_uri": "https://localhost:8080/"} id="3f7vaGON_7Dc" outputId="7ad7c330-a0eb-4df7-96c2-0ae6d92ac252"
result, metrics = jax.jit(summary.with_summary_output_reduced(loss))(1.)
summary.aggregate_metric_list([metrics])


# + [markdown] id="ZeriYYoTJvKd"
# At this point `aggregate_metric_list` cannot be jit. In practice this is fine as it performs very little computation.

# + [markdown] id="FfONK6la_QYr"
# ## Aggregation of the same name summaries.
#
# Consider the following fake function which calls a `layer` function twice.
# This `layer` function creates a summary and thus two summary are created.
# When running the transformed function we see not one, but two values returned.

# + colab={"base_uri": "https://localhost:8080/"} id="SDToo5ZH-8kP" outputId="1a6956f1-7968-4e36-e5bf-de05334ca58e"
def layer(params):
  to_look_at = jnp.mean(params) * 2.
  summary.summary("to_look_at", to_look_at)
  return params * 2


@jax.jit
def loss(parameters):
  loss = jnp.mean(layer(layer(parameters))**2)
  return loss


result, metrics = summary.with_summary_output_reduced(loss)(1.)
result, metrics

# + [markdown] id="JZsHOz6J_t3-"
# These values can be combined with `aggregate_metric_list` as before, but this time the aggregation takes the mean. This `mean` is specified by the `aggregation` keyword argument in `summary.summary` which defaults to `mean`.

# + colab={"base_uri": "https://localhost:8080/"} id="TW4wQ-kB-cCQ" outputId="824fd411-5cad-4368-9698-d1a04e309981"
summary.aggregate_metric_list([metrics])


# + [markdown] id="ZXtFMhmqANGJ"
# Another useful aggregation mode is "sample". As jax's random numbers are stateless, an additional RNG key must be passed in for this to work.

# + colab={"base_uri": "https://localhost:8080/"} id="hvfgKVZhAeRW" outputId="fc44541a-82b5-4a2d-f103-e3b8f929aa56"
def layer(params):
  to_look_at = jnp.mean(params) * 2.
  summary.summary("to_look_at", to_look_at, aggregation="sample")
  return params * 2


@jax.jit
def loss(parameters):
  loss = jnp.mean(layer(layer(parameters))**2)
  return loss


key = jax.random.PRNGKey(0)
result, metrics = summary.with_summary_output_reduced(loss)(
    1., sample_rng_key=key)
summary.aggregate_metric_list([metrics])


# + [markdown] id="0Q98mny6AyNH"
# Finally, there is `"collect"` which concatenates all the values together into one long tensor after first raveling all the inputs. This is useful for extracting distributions of quantities.

# + colab={"base_uri": "https://localhost:8080/"} id="SzVbsqZIA2O9" outputId="9fec7c59-f8bf-4aa1-fa67-3e4bc75153d2"
def layer(params):
  to_look_at = jnp.mean(params) * 2.
  summary.summary(
      "to_look_at", jnp.arange(10) * to_look_at, aggregation="collect")
  return params * 2


@jax.jit
def loss(parameters):
  loss = jnp.mean(layer(layer(parameters))**2)
  return loss


key = jax.random.PRNGKey(0)
result, metrics = summary.with_summary_output_reduced(loss)(
    1., sample_rng_key=key)
summary.aggregate_metric_list([metrics])


# + [markdown] id="nCN3ET9IQd7A"
# ## Summary Scope
# Sometimes it is useful to be able to group all summaries from a function inside some code block with some common name. This can be done with the `summary_scope` context.

# + colab={"base_uri": "https://localhost:8080/"} id="1SM-Ab-CQgWw" outputId="b9dbd717-006d-465e-c4af-b10445b08d2c"
@jax.jit
def loss(parameters):
  with summary.summary_scope("scope1"):
    summary.summary("to_look_at", parameters)

  with summary.summary_scope("nested"):
    summary.summary("summary2", parameters)

    with summary.summary_scope("scope2"):
      summary.summary("to_look_at", parameters)
  return parameters


key = jax.random.PRNGKey(0)
result, metrics = summary.with_summary_output_reduced(loss)(
    1., sample_rng_key=key)
summary.aggregate_metric_list([metrics])

# + id="pOEJJw2EQd2h"


# + [markdown] id="4pzFDb3FBfi-"
# ## Usage with function transforms.
# Thanks to `oryx`, all of this functionality works well across a variety of function transformations.
# Here is an example with a scan, vmap, and jit.
# The aggregation modes will aggregate across all timesteps, and all batched dimensions.

# + colab={"base_uri": "https://localhost:8080/"} id="FFW4l5RPBn7I" outputId="d621048f-921c-4d40-f6ae-9b516c12078a"
@jax.jit
def fn(a):
  summary.summary("other_val", a[2])

  def update(state, _):
    s = state + 1
    summary.summary("mean_loop", s[0])
    summary.summary("collect_loop", s[0], aggregation="collect")
    return s, s

  a, _ = jax.lax.scan(update, a, jnp.arange(20))
  return a * 2


vmap_fn = jax.vmap(fn)

result, metrics = jax.jit(summary.with_summary_output_reduced(vmap_fn))(
    jnp.tile(jnp.arange(4), (2, 2)))
summary.aggregate_metric_list([metrics])

# + [markdown] id="gwpKu6neCjyN"
# ## Optionally compute metrics: `@add_with_metrics`
#
# Oftentimes it is useful to define two "versions" of a function -- one with metrics, and one without -- as sometimes the computation of the metrics adds unneeded overhead that does not need to be run every iteration.
# To create these two versions one can simply wrap the function with the `add_with_summary` decorator.
# This adds both a keyword argument, and an extra return to the wrapped function which switches between computing metrics, or not.

# + colab={"base_uri": "https://localhost:8080/"} id="Nlw9j1NUCuXD" outputId="421f5b94-5446-4fd9-d2aa-f944c54e4a97"
from learned_optimization import summary
import functools


def layer(params):
  to_look_at = jnp.mean(params) * 2.
  summary.summary("to_look_at", jnp.arange(10) * to_look_at)
  return params * 2


@functools.partial(jax.jit, static_argnames="with_summary")
@summary.add_with_summary
def loss(parameters):
  loss = jnp.mean(layer(layer(parameters))**2)
  return loss


res, metrics = loss(1., with_summary=False)
print("No metrics", summary.aggregate_metric_list([metrics]))

res, metrics = loss(1., with_summary=True)
print("With metrics", summary.aggregate_metric_list([metrics]))


# + [markdown] id="SnRKsVLbHA1t"
# ## Limitations and Gotchas

# + [markdown] id="LHCCoTCbLtpD"
# ### Requires value traced to be a function of input
#
# At the moment summary.summary MUST be called with a descendant of whatever input is passed into the function wrapped by `with_summary_output_reduced`. In practice this is almost always the case as we seek to monitor changing values rather than constants.
#
# To demonstrate this, note how the constant value is NOT logged out, but if add it to `a*0` it does become logged out.

# + colab={"base_uri": "https://localhost:8080/"} id="iIbrjrJ4HEd-" outputId="b1ccd1db-5615-45b9-b52c-0ae78ea9369f"
def monitor(a):
  summary.summary("with_input", a)
  summary.summary("constant", jnp.asarray(2.0))
  summary.summary("constant_with_inp", 2.0 + (a * 0))
  return a


result, metrics = summary.with_summary_output_reduced(monitor)(1.)
summary.aggregate_metric_list([metrics])


# + [markdown] id="yvEW6XfTH5_N"
# The rational for why this is a bit of a rabbit hole, but it is related to how tracing in jax work and is beyond the scope of this notebook.

# + [markdown] id="wk9gKeH7LvHb"
# ### No support for jax.lax.cond
#
# At this point one cannot extract summaries out of jax conditionals. Sorry. If this is a limitation to you let us know as we have some ideas to make this work.

# + [markdown] id="xMUpBsUUK4un"
# ### No dynamic names
#
# At the moment, the tag, or the name of the summary, must be a string known at compile time. There is no support for dynamic summary names.

# + [markdown] id="6s4B2eUoz7nV"
# ## Alternatives for extracting information
# Using this module is not the only way to extract information from a model. We discuss a couple other approaches.

# + [markdown] id="ZoekEh4J1geQ"
# ### "Thread" metrics through
# One way to extract data from a function is to simply return the things we want to look at. As functions become more complex and nested this can become quite a pain as each one of these functions must pass out metric values. This process of spreading data throughout a bunch of functions is called "threading".
#
# Threading also requires all pieces of code to be involved -- as one must thread these metrics everywhere.

# + colab={"base_uri": "https://localhost:8080/"} id="q78BOx1X1uiN" outputId="4bb8807f-4345-4e8a-c231-9bb482222352"
def lossb(p):
  to_look_at = jnp.mean(123.)
  return p * 2, to_look_at


def loss(parameters):
  l = jnp.mean(parameters**2)
  l, to_look_at = lossb(l)
  return l, to_look_at


value_grad_fn = jax.jit(jax.value_and_grad(loss, has_aux=True))
(loss, to_look_at), g = value_grad_fn(1.0)
print(to_look_at)

# + [markdown] id="jNt9CNJf2HJN"
# ### jax.experimental.host_callback
#
# Jax has some support to send data back from an accelerator back to the host while a ja program is running. This is exposed in jax.experimental.host_callback.
#
# One can use this to print which is a quick way to get data out of a network.

# + colab={"base_uri": "https://localhost:8080/"} id="1Ih2LxP22MZD" outputId="0dd0b8ec-2c9e-414d-eadf-843122b7b8ab"
from jax.experimental import host_callback as hcb


def loss(parameters):
  loss = jnp.mean(parameters**2)
  to_look_at = jnp.mean(123.)
  hcb.id_print(to_look_at, name="to_look_at")
  return loss


value_grad_fn = jax.jit(jax.value_and_grad(loss))
_ = value_grad_fn(1.0)

# + [markdown] id="06Dvp3xrEA-R"
# It is also possible to extract data out with [host_callback.id_tap](https://jax.readthedocs.io/en/latest/jax.experimental.host_callback.html#jax.experimental.host_callback.id_tap). We experimented with this briefly for a summary library but found both performance issues and increased complexity around custom transforms.

# + id="HNXn4_jCRLMY"