The script_success metric has been found to output 1 (success) for experiment="wehe" and machine=~".*(bru06|cgk01|chs0t|fra07|iad07|iad08|lax07|lax08|lax0t|lhr09|ord07|pdx0t|sea09|sin02|tpe02).*" (cloud nodes). Since NDT is currently the only experiment running on cloud nodes, this output is incorrect.
To debug this issue, we connected to one of the script exporter containers in the prometheus-federation cluster and ran the script_exporter command for wehe/cloud machines. While the output was 1, the logs showed that there was a connection error.
Note: one way to find the set of cloud nodes is to run the following command in the "sites" folder of siteinfo.
$ grep default_virtual *
bru06.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
cgk01.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
chs0t.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
fra07.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
iad07.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
iad08.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
lax07.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
lax08.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
lax0t.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
lhr09.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
ord07.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
pdx0t.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
sea09.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
sin02.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
tpe02.jsonnet:local sitesDefault = import 'sites/_default_virtual.jsonnet';
The
script_successmetric has been found to output 1 (success) forexperiment="wehe"andmachine=~".*(bru06|cgk01|chs0t|fra07|iad07|iad08|lax07|lax08|lax0t|lhr09|ord07|pdx0t|sea09|sin02|tpe02).*"(cloud nodes). Since NDT is currently the only experiment running on cloud nodes, this output is incorrect.To debug this issue, we connected to one of the script exporter containers in the prometheus-federation cluster and ran the
script_exportercommand for wehe/cloud machines. While the output was 1, the logs showed that there was a connection error.Note: one way to find the set of cloud nodes is to run the following command in the "sites" folder of siteinfo.