Skip to content

Log Persistence#828

Merged
SteveMicroNova merged 1 commit into
mainfrom
PersistantLogs
Oct 10, 2024
Merged

Log Persistence#828
SteveMicroNova merged 1 commit into
mainfrom
PersistantLogs

Conversation

@SteveMicroNova
Copy link
Copy Markdown
Contributor

@SteveMicroNova SteveMicroNova commented Jul 24, 2024

What does this change intend to accomplish?

Supports #741 by adding a new tab to the updater called "Admin Settings" that contains the password reset stuff as well as the option to toggle log persistence

Checklist

  • Have you tested your changes and ensured they work?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • If applicable, have you updated the CHANGELOG?
  • Does your submission pass linting & tests? You can test on localhost using ./scripts/test
  • If this is a UI change, have you tested it across multiple browser platforms?
  • If this is a UI change, have you tested across multiple viewport sizes (ie. desktop versus mobile)?

@SteveMicroNova SteveMicroNova changed the title Create function, endpoint, frontend implementation for log persistence Log Persistence Jul 24, 2024
@rtertiaer
Copy link
Copy Markdown
Contributor

I'm of split mind between putting this in the main amplipi setup, and in the updater (which we should probably rename "admin" or similar.)

@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

I'm of split mind between putting this in the main amplipi setup, and in the updater (which we should probably rename "admin" or similar.)

@linknum23 Thoughts?

@linknum23
Copy link
Copy Markdown
Contributor

Sorry I didn't get a chance to look at this before my early weekend adventures.

Currently the only thing that modifies the logging is the updater so modifying it in amplipi as well feels awkward. However that would mean that this menu has to live in the updater/admin interface somewhere. The right choice is not obvious to me right now. That probably means we need a better interface for stuff like this.

Comment thread scripts/configure.py Outdated
@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

Currently the only thing that modifies the logging is the updater

Where at? I'm looking at the updater and I just see the latest update, a way to revert to previous updates, a way to upload custom updates, a way to set a password for the web interface, and the support tunnel. Nothing that has to do with logs as far as I can see

@SteveMicroNova SteveMicroNova marked this pull request as ready for review August 5, 2024 19:32
@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

Marking this as ready to review as both people that have chimed in are uncertain as to if this belongs in the updater or not
If we solidly say "this belongs in the updater", it will be reverted to draft while I work on that

Comment thread amplipi/app.py Outdated
Comment on lines +233 to +247
@api.post(
'/api/persist_logs',
tags=["config"],
response_class=Response,
responses={
200: {}
}
)
def log_persistance(ctrl: Api = Depends(get_ctrl)):
"""Toggles log persistance by directly changing the journald.conf and restarting the service"""
try:
ctrl.toggle_persist_logs()
except Exception as e:
logger.error("persist_logs toggle failed!")
raise e
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found reasons to call this function from within the Api object itself and so moved it there, this imports the Api object as ctrl and calls the function

Comment thread amplipi/ctrl.py Outdated
Comment on lines +235 to +250
# determine if logs are persistant, based on the contents of a file
try:
conf_path = '/etc/systemd/journald.conf'
confparse = configparser.ConfigParser(strict=False, allow_no_value=True)
confparse.read(conf_path)

storage_value = confparse.get("Journal", "Storage")
if storage_value == "volatile":
self.persist_logs = False
elif storage_value == "persistent":
self.persist_logs = True
else:
self.toggle_persist_logs(False)
except Exception as exc:
logger.exception("Error reading persist_logs flag, setting to False (volatile logs). Exception: {exc}")
self.toggle_persist_logs(False)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempts to read the config file to see what the current mode is. If it cannot be read, default to not persisting logs

Comment thread amplipi/ctrl.py Outdated
Comment on lines +457 to +460
if value is not None:
goal_value = value
else:
goal_value = not self.persist_logs
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the option to explicitly manually toggle the persistence rather than always flicking the switch to the inverse of the present setting

@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

Requesting review from Ryan as he reviews all my things

Requesting review from Lincoln as he's already had input and I would like to continue to ask his opinion, but I do not need his review per se

Requesting review from Klayton to potentially have a tiebreaker for the "should the be in the updater or not?" issue

@rtertiaer
Copy link
Copy Markdown
Contributor

Where at?
at a minimum here, called from the "updater side" of things.

@rtertiaer
Copy link
Copy Markdown
Contributor

my vote would be to put this in the updater, given the above.

@klay2000
Copy link
Copy Markdown
Contributor

klay2000 commented Aug 5, 2024

I don't feel like this goes in the updater because it's not exactly related to updates, however I also don't feel like the regular Amplipi web app should be able to change something like this. Given that the updater also has the support tunnel stuff I suppose there is some precedent for something like this to go there. Ultimately I think as was said before we need a better place for this kind of thing, but for now the updater is good enough.

@SteveMicroNova SteveMicroNova force-pushed the PersistantLogs branch 2 times, most recently from 8442fbe to 9f852a4 Compare August 6, 2024 03:05
@rtertiaer
Copy link
Copy Markdown
Contributor

imo, the "updater" has never been an updater, but an administration interface (and once upon a time, the only administration you could do was updating the appliance.) it's a separate app with elevated privileges, accessible even when the main app breaks.

@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

Recording.2024-08-06.141145.mp4

Log persistence has been moved to the updater page (now called admin) and plugged in properly
See video for changes and proof of function, or go to lamplipi now to see for yourself. Next steps are make sure you can actually access persisted logs from the logs page, and find where those are stored in the first place

@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

SteveMicroNova commented Aug 6, 2024

Next steps are make sure you can actually access persisted logs from the logs page, and find where those are stored in the first place

I've found the file they persist to and verified that the logs page can find them, the only thing is persistent logs always persist, and will be on top of the pile on the logs page even when you turn off persistent storage. We should garbage collect this a bit, the only question is on what basis.

My immediate ideas:

  1. simply wipe them out the second log persistence is turned off
  2. turn on a midnight cronjob to clear the persist logs directory of all files at midnight if persist logs is deactivated

@linknum23
Copy link
Copy Markdown
Contributor

Should we change the Settings name from "Admin" to "Admin / Updates" to maintain consistency with the old interface?

Comment thread CHANGELOG.md Outdated
* Move system state out of running software directory
* Fix updates on systems with an API password set
* Fix the display service on systems with an API password set
* Add toggleable option to persist logs
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to move to a different section now that it's no longer in the webapp

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially because this is now getting into a release a version or two later 🥴

@SteveMicroNova SteveMicroNova force-pushed the PersistantLogs branch 4 times, most recently from aa09db4 to b4e63d6 Compare October 3, 2024 18:39
Copy link
Copy Markdown
Contributor Author

@SteveMicroNova SteveMicroNova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments turned into a review at some point during writing so here's an obligatory box of text with the comment tag so I don't have to rewrite what I've said

Comment thread scripts/increment_auto_off.py Outdated
Comment on lines +47 to +52
if response.ok:
loop = False
except Exception as exc:
logger.exception(f"Auto off increment has failed with the following exception, trying again in 5 seconds: {exc}")
finally:
time.sleep(5) # Loop until there is a response.ok at the end
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only mechanism by which this can fail is network issues, so I think that a simple retry will suffice
There is no way to have a key issue as the file is verified as being there at the start of every endpoint, and if it isn't there or doesn't have all the proper headers then the file is created or the headers added.

Comment thread amplipi/updater/asgi.py
Comment on lines +197 to +239
except Exception as exc:
logger.exception(str(exc))
return 500
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think return 500 is a sensible way to handle things, but it is what we do everywhere else and one of the notes was "do this like everything else" and so I've simply followed the template of the endpoints beneath this

@SteveMicroNova SteveMicroNova force-pushed the PersistantLogs branch 3 times, most recently from a6a0879 to e76e8ea Compare October 4, 2024 15:39
@@ -0,0 +1,47 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last few commits I've entirely reworked this file

Calling attention to it because the whole thing is green either way

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is even more true now

Comment on lines +167 to +206
<span class="hover-tooltip">
This form lets you set a web interface password, which will make it so that you must input the same password to access the system's controls. Leaving the fields empty will remove password protection. Note that this only affects the web interface, not the system password, and will log out any active sessions.
</span>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tooltips probably need some rewording

I moved them to be attached to the headers rather than to the input labels with the last commit to make them mobile compatible, previously you would've had to potentially inadvertently check the "enable" checkbox of persist logs to see what the tooltip has to say on mobile due to their lack of hoverover. This has had the effect of changing the context of the tooltips as well

Comment thread web/src/pages/Settings/Settings.jsx
@SteveMicroNova SteveMicroNova force-pushed the PersistantLogs branch 2 times, most recently from 08ec9ae to 2b63767 Compare October 4, 2024 19:23
@linknum23
Copy link
Copy Markdown
Contributor

Looks like we are going to need to update some update docs in several places here and on the interwebz

@SteveMicroNova
Copy link
Copy Markdown
Contributor Author

Looks like we are going to need to update some update docs in several places here and on the interwebz

Yeah, that was my concern with that change. There's a lot of places that we can't change either such as other people's posts on discourse and old email chains, though luckily those instances feel like they've got fairly limited blast radius
I'll make sure to update them in my #923 branch if this is merged before that (which I expect)

Comment thread scripts/increment_auto_off.py Outdated
Comment on lines +46 to +50
except Exception as exc:
logger.exception(f"increment_auto_off.py has failed with the following exception, trying again in 5 seconds: {exc}")
finally:
if loop:
time.sleep(5) # Loop until there is a response.ok at the end
Copy link
Copy Markdown
Contributor

@rtertiaer rtertiaer Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, an endless loop is a bad idea. there is no indication that the error in question will suddenly resolve itself by virtue of trying again, since we catch all exceptions instead of ones that might actually work after retrying, such as anything requests raises. you also missed an entire logical branch where this could "fall through" and use some stale value.

even if it is something that trying again could resolve, if it's not noticed in many days or months you could end up running hundreds of these if not caught. they should have an expiration date, at a bare minimum.

looping to hit a webserver can exacerbate the issue; if the web server is not able to respond in a timely fashion, you could overload it with requests. that's especially problematic because of the above - you could end up with 300 scripts looping and hitting the web server over and over, and everything breaks down. that's especially sketchy because the server we're Denial-of-Service-ing has multiple ways to fix this when it happens (the support tunnel and a firmware upgrade), though a user is more liable to just restart their box before either of those. it's extremely bad in larger web app setups - your own webapp can DDOS your own webserver, if you ship it to your 10000 daily users (or at least make it balloon in scale and become very costly.)

I would personally not wrap any of this in a loop and use a retry decorator with much better options, especially for the exponential backoff and jitter options.

Comment thread scripts/increment_auto_off.py Outdated
Comment on lines +23 to +25
if state["persist_logs"]:
future_persist_state = state["auto_off_delay"] != 1
delay = state["auto_off_delay"] - 1
Copy link
Copy Markdown
Contributor

@rtertiaer rtertiaer Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check for these keys existence. please add this file to our mypy checking and resolve all issues it reports.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and our linter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I feel like I've added them to our workflows properly, though picking through the completed workflows I don't see the file getting checked either way. Here's something that approximates a badge of approval from the checkers, though.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 81 lines in your changes missing coverage. Please review.

Project coverage is 50.09%. Comparing base (7499989) to head (6e5a026).
Report is 27 commits behind head on main.

Files with missing lines Patch % Lines
amplipi/updater/asgi.py 0.00% 81 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #828      +/-   ##
==========================================
- Coverage   50.67%   50.09%   -0.58%     
==========================================
  Files          40       40              
  Lines        7154     7232      +78     
==========================================
- Hits         3625     3623       -2     
- Misses       3529     3609      +80     
Flag Coverage Δ
unittests 50.09% <0.00%> (-0.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

logconf.read('/var/log/logging.ini')

persist = journalconf.get("Journal", "Storage", fallback="volatile")
state_persist = True if persist not in ("volatile", "persistent") else persist == "persistent" # If value is somehow invalid, assume it's True just in case
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that persistence is on in case you somehow corrupt your journald.conf file during a catastrophic issue, that way it remains on for troubleshooting purposes

I do not see a reality where you have that file set to an invalid option and meant for it to be off. If that is the case though, simply wait 14 days by default and you'll be fine.

if api_read_failure:
try:
# If the api is inaccessible, assume it is down for whatever reason and extrapolate that there is no race condition to reading it directly
# This section is a copy of the /settings/persist_logs GET in asgi.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a copy of any section of the code, you should probably create a function out of it and call it from both places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is difficult due to them being two entirely separate programs unfortunately

I tried doing something like adding a symbolic link between the amplipi directory and /usr/local/bin where this script ends up so I could do a relative import and that almost worked except for the fact that this script is outside of the venv and so doesn't have the dependencies to import anything without errors

Maybe someday I'll come back to this and make it even better, for now it's been two hours since you approved this PR so I'm gonna stop spending time on making it more than it is

Update CHANGELOG.md

Lint ctrl.py

Add persist_logs bool and integrate throughout the config flow
Add many logger.info chunks for testing

Add more logic to persist_logs toggle endpoint

Change how default value is handled

Change default permissions of journald.conf

Fix spelling errors, add logger on frontend

Move toggle_log_persistance away from FastAPI and towards the Api object

Linting

Call reinit at end of log toggle, reword models.py description of persist_logs

Remove all uses of persist_logs from amplipi core, move to updater side

Remove lingering changes from app.py, ctrl.py

pylint

pylint (2)

Finish porting new admin settings tab to updater

Change name of Updates tab on settings page

Fix spelling in comment in app.py

Robustify setPersist function against edge cases, add documentation

pylint

Add delete logs endpoint, cronjob that calls it at midnight

Updates/Admin

Change comment on delete_persisted_logs_crontab

Update CHANGELOG.md

Remove deletion endpoint + supporting files

remove optional "value" bool from post persist_logs

Fix wording in index.html

Change configure.py to only set logs to volatile if the /var/log/journal directory doesn't exist (i.e., only set volatility on first deploy)

Reduce verbocity in configure.py

Add auto_off_delay feature to admin settings

Pylint

Pylint

Rename attempt_persist_logs_auto_off.py to increment_auto_off.py
Change logic paths of POST /settings/persist_logs to avoid interacting with the files more often than necessary

PyCodeStyle

Update crontab to have a user associated with it

remove unused variables from increment_auto_off.py

Swap from using subprocess.run and configparser to just responses in increment_auto_off.py

Copy files properly in configure.py

Cover some oversights in asgi.py

Add hoverover tooltip for Auto Off Delay label

Make sure to pip install requests in configure.py

pylint
segregate standard imports from package import in increment_auto_off.py

Update styling

Move password setting functionality to Admin Settings page, add loading spinner to save log state button

update IDs, classnames to use JavaScript naming instead of Python naming ( - instead of _ )

Add new success and failure notification for persist save button

Fix formatting for smaller screens on the updater

Update CHANGELOG.md

Update CHANGELOG.md

Fix styling of admin panel

Correct error handling, fix styling issue

pylint

Extend safety net

remove pip install requests from configure.py, that should be handled by requirements.txt

Update styling, tooltips

Gate the delay of increment_auto_off.py behind an if statement

Revamp increment_auto_off.py to be more efficient

Update verbiage around the updater

Update increment_auto_off.py to not run or loop if persistance is disabled

Edit github workflows to include more files
Add more, better file validation to /persist_logs endpoints
Remove looping, add checking to increment_auto_off.py

fix typo in python-app.yaml

Add docstring to increment_auto_off.py, add same file to github pylinting

Reconfigure file validation functions
Simplify persistence toggle function
Rework increment_auto_off.py to be able to use the backdoor if the frontdoor doesn't work out

linting

linting (2)

Linting (3)

Simplify file verification functions
Include minor verification in the non-api route of increment_auto_off.py

pycodestyle

Further robustify auto_off.py and validator functions against direct user interventions

pypcodestyle

Update password set function to give feedback in the same way as updating persistence and delay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants