Log Persistence#828
Conversation
6496985 to
3c3cadb
Compare
|
I'm of split mind between putting this in the main amplipi setup, and in the updater (which we should probably rename "admin" or similar.) |
@linknum23 Thoughts? |
29e37f4 to
cfe9606
Compare
|
Sorry I didn't get a chance to look at this before my early weekend adventures. Currently the only thing that modifies the logging is the updater so modifying it in amplipi as well feels awkward. However that would mean that this menu has to live in the updater/admin interface somewhere. The right choice is not obvious to me right now. That probably means we need a better interface for stuff like this. |
Where at? I'm looking at the updater and I just see the latest update, a way to revert to previous updates, a way to upload custom updates, a way to set a password for the web interface, and the support tunnel. Nothing that has to do with logs as far as I can see |
53cba44 to
ef829ef
Compare
|
Marking this as ready to review as both people that have chimed in are uncertain as to if this belongs in the updater or not |
| @api.post( | ||
| '/api/persist_logs', | ||
| tags=["config"], | ||
| response_class=Response, | ||
| responses={ | ||
| 200: {} | ||
| } | ||
| ) | ||
| def log_persistance(ctrl: Api = Depends(get_ctrl)): | ||
| """Toggles log persistance by directly changing the journald.conf and restarting the service""" | ||
| try: | ||
| ctrl.toggle_persist_logs() | ||
| except Exception as e: | ||
| logger.error("persist_logs toggle failed!") | ||
| raise e |
There was a problem hiding this comment.
I found reasons to call this function from within the Api object itself and so moved it there, this imports the Api object as ctrl and calls the function
| # determine if logs are persistant, based on the contents of a file | ||
| try: | ||
| conf_path = '/etc/systemd/journald.conf' | ||
| confparse = configparser.ConfigParser(strict=False, allow_no_value=True) | ||
| confparse.read(conf_path) | ||
|
|
||
| storage_value = confparse.get("Journal", "Storage") | ||
| if storage_value == "volatile": | ||
| self.persist_logs = False | ||
| elif storage_value == "persistent": | ||
| self.persist_logs = True | ||
| else: | ||
| self.toggle_persist_logs(False) | ||
| except Exception as exc: | ||
| logger.exception("Error reading persist_logs flag, setting to False (volatile logs). Exception: {exc}") | ||
| self.toggle_persist_logs(False) |
There was a problem hiding this comment.
Attempts to read the config file to see what the current mode is. If it cannot be read, default to not persisting logs
| if value is not None: | ||
| goal_value = value | ||
| else: | ||
| goal_value = not self.persist_logs |
There was a problem hiding this comment.
Added the option to explicitly manually toggle the persistence rather than always flicking the switch to the inverse of the present setting
|
Requesting review from Ryan as he reviews all my things Requesting review from Lincoln as he's already had input and I would like to continue to ask his opinion, but I do not need his review per se Requesting review from Klayton to potentially have a tiebreaker for the "should the be in the updater or not?" issue |
|
|
my vote would be to put this in the updater, given the above. |
|
I don't feel like this goes in the updater because it's not exactly related to updates, however I also don't feel like the regular Amplipi web app should be able to change something like this. Given that the updater also has the support tunnel stuff I suppose there is some precedent for something like this to go there. Ultimately I think as was said before we need a better place for this kind of thing, but for now the updater is good enough. |
8442fbe to
9f852a4
Compare
|
imo, the "updater" has never been an updater, but an administration interface (and once upon a time, the only administration you could do was updating the appliance.) it's a separate app with elevated privileges, accessible even when the main app breaks. |
8d2eb25 to
bacf2e1
Compare
Recording.2024-08-06.141145.mp4Log persistence has been moved to the updater page (now called admin) and plugged in properly |
I've found the file they persist to and verified that the logs page can find them, the only thing is persistent logs always persist, and will be on top of the pile on the logs page even when you turn off persistent storage. We should garbage collect this a bit, the only question is on what basis. My immediate ideas:
|
|
Should we change the Settings name from "Admin" to "Admin / Updates" to maintain consistency with the old interface? |
755822b to
8b4ec19
Compare
| * Move system state out of running software directory | ||
| * Fix updates on systems with an API password set | ||
| * Fix the display service on systems with an API password set | ||
| * Add toggleable option to persist logs |
There was a problem hiding this comment.
This might need to move to a different section now that it's no longer in the webapp
There was a problem hiding this comment.
Especially because this is now getting into a release a version or two later 🥴
aa09db4 to
b4e63d6
Compare
SteveMicroNova
left a comment
There was a problem hiding this comment.
My comments turned into a review at some point during writing so here's an obligatory box of text with the comment tag so I don't have to rewrite what I've said
| if response.ok: | ||
| loop = False | ||
| except Exception as exc: | ||
| logger.exception(f"Auto off increment has failed with the following exception, trying again in 5 seconds: {exc}") | ||
| finally: | ||
| time.sleep(5) # Loop until there is a response.ok at the end |
There was a problem hiding this comment.
The only mechanism by which this can fail is network issues, so I think that a simple retry will suffice
There is no way to have a key issue as the file is verified as being there at the start of every endpoint, and if it isn't there or doesn't have all the proper headers then the file is created or the headers added.
| except Exception as exc: | ||
| logger.exception(str(exc)) | ||
| return 500 |
There was a problem hiding this comment.
I don't think return 500 is a sensible way to handle things, but it is what we do everywhere else and one of the notes was "do this like everything else" and so I've simply followed the template of the endpoints beneath this
a6a0879 to
e76e8ea
Compare
| @@ -0,0 +1,47 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
In the last few commits I've entirely reworked this file
Calling attention to it because the whole thing is green either way
There was a problem hiding this comment.
This is even more true now
| <span class="hover-tooltip"> | ||
| This form lets you set a web interface password, which will make it so that you must input the same password to access the system's controls. Leaving the fields empty will remove password protection. Note that this only affects the web interface, not the system password, and will log out any active sessions. | ||
| </span> |
There was a problem hiding this comment.
Both tooltips probably need some rewording
I moved them to be attached to the headers rather than to the input labels with the last commit to make them mobile compatible, previously you would've had to potentially inadvertently check the "enable" checkbox of persist logs to see what the tooltip has to say on mobile due to their lack of hoverover. This has had the effect of changing the context of the tooltips as well
7477767 to
31fbe93
Compare
08ec9ae to
2b63767
Compare
|
Looks like we are going to need to update some update docs in several places here and on the interwebz |
Yeah, that was my concern with that change. There's a lot of places that we can't change either such as other people's posts on discourse and old email chains, though luckily those instances feel like they've got fairly limited blast radius |
| except Exception as exc: | ||
| logger.exception(f"increment_auto_off.py has failed with the following exception, trying again in 5 seconds: {exc}") | ||
| finally: | ||
| if loop: | ||
| time.sleep(5) # Loop until there is a response.ok at the end |
There was a problem hiding this comment.
in this case, an endless loop is a bad idea. there is no indication that the error in question will suddenly resolve itself by virtue of trying again, since we catch all exceptions instead of ones that might actually work after retrying, such as anything requests raises. you also missed an entire logical branch where this could "fall through" and use some stale value.
even if it is something that trying again could resolve, if it's not noticed in many days or months you could end up running hundreds of these if not caught. they should have an expiration date, at a bare minimum.
looping to hit a webserver can exacerbate the issue; if the web server is not able to respond in a timely fashion, you could overload it with requests. that's especially problematic because of the above - you could end up with 300 scripts looping and hitting the web server over and over, and everything breaks down. that's especially sketchy because the server we're Denial-of-Service-ing has multiple ways to fix this when it happens (the support tunnel and a firmware upgrade), though a user is more liable to just restart their box before either of those. it's extremely bad in larger web app setups - your own webapp can DDOS your own webserver, if you ship it to your 10000 daily users (or at least make it balloon in scale and become very costly.)
I would personally not wrap any of this in a loop and use a retry decorator with much better options, especially for the exponential backoff and jitter options.
| if state["persist_logs"]: | ||
| future_persist_state = state["auto_off_delay"] != 1 | ||
| delay = state["auto_off_delay"] - 1 |
There was a problem hiding this comment.
you need to check for these keys existence. please add this file to our mypy checking and resolve all issues it reports.
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #828 +/- ##
==========================================
- Coverage 50.67% 50.09% -0.58%
==========================================
Files 40 40
Lines 7154 7232 +78
==========================================
- Hits 3625 3623 -2
- Misses 3529 3609 +80
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| logconf.read('/var/log/logging.ini') | ||
|
|
||
| persist = journalconf.get("Journal", "Storage", fallback="volatile") | ||
| state_persist = True if persist not in ("volatile", "persistent") else persist == "persistent" # If value is somehow invalid, assume it's True just in case |
There was a problem hiding this comment.
I assume that persistence is on in case you somehow corrupt your journald.conf file during a catastrophic issue, that way it remains on for troubleshooting purposes
I do not see a reality where you have that file set to an invalid option and meant for it to be off. If that is the case though, simply wait 14 days by default and you'll be fine.
| if api_read_failure: | ||
| try: | ||
| # If the api is inaccessible, assume it is down for whatever reason and extrapolate that there is no race condition to reading it directly | ||
| # This section is a copy of the /settings/persist_logs GET in asgi.py |
There was a problem hiding this comment.
if this is a copy of any section of the code, you should probably create a function out of it and call it from both places
There was a problem hiding this comment.
This is difficult due to them being two entirely separate programs unfortunately
I tried doing something like adding a symbolic link between the amplipi directory and /usr/local/bin where this script ends up so I could do a relative import and that almost worked except for the fact that this script is outside of the venv and so doesn't have the dependencies to import anything without errors
Maybe someday I'll come back to this and make it even better, for now it's been two hours since you approved this PR so I'm gonna stop spending time on making it more than it is
585cc8d to
abe817e
Compare
Update CHANGELOG.md Lint ctrl.py Add persist_logs bool and integrate throughout the config flow Add many logger.info chunks for testing Add more logic to persist_logs toggle endpoint Change how default value is handled Change default permissions of journald.conf Fix spelling errors, add logger on frontend Move toggle_log_persistance away from FastAPI and towards the Api object Linting Call reinit at end of log toggle, reword models.py description of persist_logs Remove all uses of persist_logs from amplipi core, move to updater side Remove lingering changes from app.py, ctrl.py pylint pylint (2) Finish porting new admin settings tab to updater Change name of Updates tab on settings page Fix spelling in comment in app.py Robustify setPersist function against edge cases, add documentation pylint Add delete logs endpoint, cronjob that calls it at midnight Updates/Admin Change comment on delete_persisted_logs_crontab Update CHANGELOG.md Remove deletion endpoint + supporting files remove optional "value" bool from post persist_logs Fix wording in index.html Change configure.py to only set logs to volatile if the /var/log/journal directory doesn't exist (i.e., only set volatility on first deploy) Reduce verbocity in configure.py Add auto_off_delay feature to admin settings Pylint Pylint Rename attempt_persist_logs_auto_off.py to increment_auto_off.py Change logic paths of POST /settings/persist_logs to avoid interacting with the files more often than necessary PyCodeStyle Update crontab to have a user associated with it remove unused variables from increment_auto_off.py Swap from using subprocess.run and configparser to just responses in increment_auto_off.py Copy files properly in configure.py Cover some oversights in asgi.py Add hoverover tooltip for Auto Off Delay label Make sure to pip install requests in configure.py pylint segregate standard imports from package import in increment_auto_off.py Update styling Move password setting functionality to Admin Settings page, add loading spinner to save log state button update IDs, classnames to use JavaScript naming instead of Python naming ( - instead of _ ) Add new success and failure notification for persist save button Fix formatting for smaller screens on the updater Update CHANGELOG.md Update CHANGELOG.md Fix styling of admin panel Correct error handling, fix styling issue pylint Extend safety net remove pip install requests from configure.py, that should be handled by requirements.txt Update styling, tooltips Gate the delay of increment_auto_off.py behind an if statement Revamp increment_auto_off.py to be more efficient Update verbiage around the updater Update increment_auto_off.py to not run or loop if persistance is disabled Edit github workflows to include more files Add more, better file validation to /persist_logs endpoints Remove looping, add checking to increment_auto_off.py fix typo in python-app.yaml Add docstring to increment_auto_off.py, add same file to github pylinting Reconfigure file validation functions Simplify persistence toggle function Rework increment_auto_off.py to be able to use the backdoor if the frontdoor doesn't work out linting linting (2) Linting (3) Simplify file verification functions Include minor verification in the non-api route of increment_auto_off.py pycodestyle Further robustify auto_off.py and validator functions against direct user interventions pypcodestyle Update password set function to give feedback in the same way as updating persistence and delay
abe817e to
eee65a4
Compare

What does this change intend to accomplish?
Supports #741 by adding a new tab to the updater called "Admin Settings" that contains the password reset stuff as well as the option to toggle log persistence
Checklist
./scripts/test