Skip to content

Route backend jobs to use correct image#993

Merged
deer-wmde merged 4 commits into
mainfrom
T408624
Nov 14, 2025
Merged

Route backend jobs to use correct image#993
deer-wmde merged 4 commits into
mainfrom
T408624

Conversation

@tarrow
Copy link
Copy Markdown
Contributor

@tarrow tarrow commented Nov 6, 2025

This selects a pod that matches the selector[1]
for the appropriate backend service rather than
any random backend MediaWiki pod

[1] https://kubernetes.io/docs/concepts/services-networking/service/#services-in-kubernetes

Bug: T408624

@tarrow tarrow force-pushed the T408624 branch 3 times, most recently from 78fa82e to 530fb26 Compare November 6, 2025 19:46
@tarrow
Copy link
Copy Markdown
Contributor Author

tarrow commented Nov 6, 2025

In addition to this "unit like" test with mocked k8s client responses I also tried to try it out on a kubernetes minikube setup. This time running in GKE - still incredibly painfully slow but at least without locking up my personal machine; didn't actually get as far as really testing it routes correctly.

I would do this by creating a couple of Wikis, one with a 139 and 143 DB. I would then manually dispatch app/Jobs/ProcessMediaWikiJobsJob.php and inspect the jobs that are fired off.

@tarrow tarrow marked this pull request as ready for review November 6, 2025 20:20
This selects a pod that matches the selector[1]
for the appropriate backend service rather than
any random backend MediaWiki pod

[1] https://kubernetes.io/docs/concepts/services-networking/service/#services-in-kubernetes
Bug: T408624
Comment thread app/Jobs/ProcessMediaWikiJobsJob.php Outdated

public function handle(Client $kubernetesClient): void {
public function handle(Client $kubernetesClient, MediaWikiHostResolver $resolver): void {
$domain = $resolver->getBackendHostForDomain($this->wikiDomain);
Copy link
Copy Markdown
Contributor

@deer-wmde deer-wmde Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while reading I confused this $domain var with a wiki domain, I'd suggest $mwBackendHost or something

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; I picked a different name. I can see why you were confused for sure. I wanted to make it clear that this isn't a host like an ip but actually a domain for a service. Hopefully this makes sense now.

Comment thread app/Jobs/ProcessMediaWikiJobsJob.php Outdated
$serviceName = $domain;
$kubernetesClient->setNamespace('default');
$backendService = $kubernetesClient->services()->setLabelSelector([
'name' => $serviceName,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it will work like this, as the name of the service seems more like mediawiki-143-app-backend than mediawiki-143-app-backend.default.svc.cluster.local ?

see kubectl get service mediawiki-143-app-backend -o yaml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you're totally right. I just took the first bit. Apparently also the LabelSelector was wrong and I should have used the field one with metadata.name. I basically only determined this my trial and error though.

@tarrow tarrow requested a review from deer-wmde November 12, 2025 20:48
@tarrow
Copy link
Copy Markdown
Contributor Author

tarrow commented Nov 13, 2025

You might find you wanted to try this out in your local cluster but it depends a bit on this chart update being there e.g. this being merged (wmde/wbaas-deploy#2325). If you want to try out this new chart locally then you can do this to disable argo self heal and select a newer than used version of the chart:

kubectl patch application app-of-apps -n argocd --type='json' -p='[{"op": "replace", "path": "/spec/syncPolicy/automated/selfHeal", "value": false}]'
kubectl patch application api -n argocd --type='json' -p='[{"op": "replace", "path": "/spec/sources/0/targetRevision", "value": "0.34.0"}]'

@deer-wmde
Copy link
Copy Markdown
Contributor

Even with the change in the chart I somehow still get this access error - is this something that works for you locally @tarrow ?

kubectl exec -ti deployments/api-app-backend -- php artisan job:dispatchNow ProcessMediaWikiJobsJob somewiki.wbaas.dev
In Client.php line 417:
                                                                                                                                                              
  Authentication Exception: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"mediawiki-143-app-backend\" is forbidd  
  en: User \"system:serviceaccount:default:default\" cannot list resource \"services\" in API group \"\" in the namespace \"default\"","reason":"Forbidden",  
  "details":{"name":"mediawiki-143-app-backend","kind":"services"},"code":403}                                                                                
                                                                                                                                                              

@deer-wmde
Copy link
Copy Markdown
Contributor

Even with the change in the chart I somehow still get this access error - is this something that works for you locally @tarrow ?

kubectl exec -ti deployments/api-app-backend -- php artisan job:dispatchNow ProcessMediaWikiJobsJob somewiki.wbaas.dev
In Client.php line 417:
                                                                                                                                                              
  Authentication Exception: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"mediawiki-143-app-backend\" is forbidd  
  en: User \"system:serviceaccount:default:default\" cannot list resource \"services\" in API group \"\" in the namespace \"default\"","reason":"Forbidden",  
  "details":{"name":"mediawiki-143-app-backend","kind":"services"},"code":403}                                                                                
                                                                                                                                                              

I just checked, the clusterrole didnt get updated for some reason in my local cluster, I'll have a look

@tarrow
Copy link
Copy Markdown
Contributor Author

tarrow commented Nov 13, 2025

can always k edit clusterrole api-defaultrole -o yaml to look like
image
if it helps

@deer-wmde
Copy link
Copy Markdown
Contributor

I nuked my cluster because I didn't understand what was the cause, and indeed now the defaultrole looks better, but now I get an error for pods for some reason

In Client.php line 417:
                                                                                                                                                              
  Authentication Exception: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}   

@deer-wmde
Copy link
Copy Markdown
Contributor

Okay - turns out I was holding it wrong. The api chart only provides the api-queue deployments with the defaultrole-api serviceaccount, which effectively prevents dispatching this job synchronously (as intended). Dispatching it to the queue then worked as expected - now I only need to create a database state with two wiki versions, but this seems great so far

@deer-wmde
Copy link
Copy Markdown
Contributor

can confirm it works! 🎉

@deer-wmde deer-wmde merged commit 03dad69 into main Nov 14, 2025
5 checks passed
@deer-wmde deer-wmde deleted the T408624 branch November 14, 2025 11:44
deer-wmde pushed a commit that referenced this pull request Dec 15, 2025
* Route backend jobs to use correct image

This selects a pod that matches the selector[1]
for the appropriate backend service rather than
any random backend MediaWiki pod

[1] https://kubernetes.io/docs/concepts/services-networking/service/#services-in-kubernetes
Bug: T408624

* parse out service name from backend host

* try getting service using field not label selector

* fix pint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants