As OpenShift's CI provider, the Test Platform team needs to maintain a couple of different GitHub bots and an App. These are collectively known as our "automation". The GitHub app is necessary for many interactions that the CI has with a PR, like reporting the status of the configured tests. The bots are crucial, for various purposes, to repositories that have been onboarded to the CI, such as: maintaining branch protection, merging PRs, completing automatic fast-forwards for release branches, etc.. A lot of these actions are run using periodically scheduled jobs. When any repository doesn't provide the proper access for the bots or the app, these jobs fail and require manual intervention from our team.
We have long had detailed, onboarding documentation that includes steps for granting our automation the access to onboarded repositories that it requires. Despite this, we often ran into situations where users would reach out to us about something not working properly or there would be failed jobs due to missing automation access. This necessitated the creation of some tooling to automatically verify that the requisite access was given.
The necessary access consists of:
- Our two bots: 'openshift-ci-robot', and 'openshift-merge-robot' being either org members in the repository's organization, or being collaborators of the repo itself.
- Our GitHub app being installed in the repository's organization and given access to that repo.
Thankfully, by using the proper endpoints in GitHub's api we are able to create a check that can run each time a new repository is onboarded onto our CI, as well as weekly to verify that nothing has been modified. We created a new tool called check-gh-automation to perform this verification. This tool performs checks in 3 separate modes, detailed below.
Check a Candidate Repo for Onboarding
This mode functions by using git to determine the changes in any PR to our repository that houses CI configuration: 'openshift/release'. If the PR has added configuration for a new repository the check is activated. It determines which repository or repositories are being proposed for onboarding, and performs each of the checks on them.
Check Every Onboarded Repo
This mode is used weekly to verify that our automation retains access to all onboarded repos. It utilizes the Prow configuration to find every onboarded repo, and performs each of the checks on all of them.
Manually Check a Provided Repo
This mode is useful for either debugging the tool itself, or for confirming that changes made to a repo's automation access have cleared up the problems when the weekly job fails. It takes the name of a repo as an argument and runs the check on it.
Checks Performed
The access token for the GitHub app itself is utilized to perform the following checks for each repository:
Each bot is checked to verify that they are members of the organization that owns the repository by using the GET '/orgs/{org}/members/{bot}'
endpoint and verifying that it returns the '204' status code. If that is not the case then the GET '/repos/{org}/{repo}/collaborators/{bot}'
endpoint must be used to determine if that the '204' status code is returned. If either of these checks are successful than the bot in question has the required access for that repo.
In order to check that the app is installed we utilize the GET '/repos/{org}/{repo}/installation'
endpoint. If this returns a '200' status code we can be sure that the app is installed in the organization and has been enabled for the repository.