Skip to content

App state is not updating in cf cli and appsman ui when the app is down#4309

Merged
Samze merged 5 commits intocloudfoundry:mainfrom
nookala:app_state_down_not_updated
Apr 16, 2025
Merged

App state is not updating in cf cli and appsman ui when the app is down#4309
Samze merged 5 commits intocloudfoundry:mainfrom
nookala:app_state_down_not_updated

Conversation

@nookala
Copy link
Copy Markdown
Contributor

@nookala nookala commented Apr 13, 2025

In some circumstances CAPI will report an app instance is running when it is down.

CAPI iterates over all actual_lrps returned from Diego and uses the app index as the key, so in the case CAPI will override each app instance information once and the state shown will be determined by the order of the actual lrp instances.

Example of a duplicate entry from cfdot actual-lrps. Note the process_guid and index are the same.

{
"process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
"index": 3,
"domain": "cf-apps",
"instance_guid": "",
"cell_id": "",
"address": "",
"ports": null,
"preferred_address": "UNKNOWN",
"crash_count": 0,
"state": "UNCLAIMED",
"placement_error": "unable to communicate to compatible cells",
"since": 1739568280529021112,
"modification_tag": {
"epoch": "780635af-9208-4d5e-5a08-ea49ebcb3f95",
"index": 5758
},
"presence": "ORDINARY",
"OptionalRoutable": {
"routable": false
},
"availability_zone": ""
}
{
"process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
"index": 3,
"domain": "cf-apps",
"instance_guid": "1f3ffac3-be77-45e0-5075-7357",
"cell_id": "23b06662-20e7-42dd-9377-6d8f10190ec4",
"address": "10.0.4.17",
"ports": [
{
"container_port": 8080,
"host_port": 61012,
"container_tls_proxy_port": 61001,
"host_tls_proxy_port": 61014
},
{
"container_port": 8080,
"host_port": 61012,
"container_tls_proxy_port": 61443,
"host_tls_proxy_port": 0
},
{
"container_port": 2222,
"host_port": 61013,
"container_tls_proxy_port": 61002,
"host_tls_proxy_port": 61015
}
],
"instance_address": "10.255.233.24",
"preferred_address": "HOST",
"crash_count": 0,
"state": "RUNNING",
"since": 1739222044495241579,
"modification_tag": {
"epoch": "4a424a13-b5ba-47b7-771a-1a61d99c2524",
"index": 2
},
"presence": "SUSPECT",
"metric_tags": {
"app_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"app_name": "static",
"instance_id": "3",
"organization_id": "c877a084-d65b-4758-9908-90201c6df339",
"organization_name": "org-1",
"process_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"process_instance_id": "1f3ffac3-be77-45e0-5075-7357",
"process_type": "web",
"source_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
"space_id": "b248d5ab-2948-468b-ad0f-7b1b90e923d1",
"space_name": "space-1"
},
"OptionalRoutable": {
"routable": true
},
"availability_zone": "us-central1-f"
}
Fix
In the case of duplicates, CAPI should look at the since value of the actual_lrp information and take the latest definition.
Tested by killing the diego cell VM bosh delete-vm. Now cf app returns the correct status

instances: 0/2
memory usage: 1024M
state since cpu memory disk logging cpu entitlement details
#0 down 2025-04-15T00:34:03Z 0.0% 0B of 0B 0B of 0B 0B/s of 0B/s unable to communicate to compatible cells
#1 down 2025-04-15T00:34:03Z 0.0% 0B of 0B 0B of 0B 0B/s of 0B/s unable to communicate to compatible cells

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@nookala nookala marked this pull request as ready for review April 15, 2025 14:36
@Samze Samze self-requested a review April 15, 2025 14:57
Comment thread lib/cloud_controller/diego/reporters/instances_stats_reporter.rb Outdated
@Samze Samze merged commit 53e13f7 into cloudfoundry:main Apr 16, 2025
8 checks passed
ari-wg-gitbot added a commit to cloudfoundry/capi-release that referenced this pull request Apr 16, 2025
Changes in cloud_controller_ng:

- App state is not updating in cf cli and appsman ui when the app is down
    PR: cloudfoundry/cloud_controller_ng#4309
    Author: Sriram Nookala <snookala@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants