A shot at running scalable, cost-efficient GitHub Actions self-hosted runners on Google Cloud using CloudRun.
The architecture of this system utilises Google Cloud’s CloudRun service, enabling scalable and cost-effective operation of GitHub Actions self-hosted runners by dynamically creating instances in response to workflow triggers.
What does it do?
This can be deployed as a CloudRun service, and when the GitHub Actions workflow is triggered and a webhook call is made by GitHub, a CloudRun instance is set up as an agent and awaits the job. After the job is completed, the instance is cleaned up and the CloudRun service automatically scales down back to zero (eventually)
What are the advantages?
- CloudRun is billed based on the usage. Since this solution can scale down to 0 when not in use, you only have to pay for the CPU and Memory used during the job execution.
- CloudRun can scale massively. You can run multiple pipelines/jobs simultaneously without having to worry about the agent configuration.
- CloudRun has access to the service account running the container. It can access any GCP resources securely by granting access with IAM. (Resources within a VPC will require a serverless VPC connector to be setup for accessing them from the CloudRun instance)
Are there any drawbacks?
Yes, of course. Nothing comes without a cost.
CloudRun instances have a maximum request timeout of 3600s. Any GitHub Actions job running on CloudRun agent should complete before that. If not, CloudRun will likely terminate the job. Solution is to create smaller independent pipelines that complete before 1 hour.
Things to note
Since this is not a native solution, there are some things that might cause confusion. Below are the things that might indicate a problem, but doesn’t really affect the functionality of the solution.
- GitHub webhook delivery history will show that some webhook calls have timed out. This is because the default timeout for GitHub webhook calls are 10s, and it cannot be modified. But for the CloudRun to be active until the job completes, the response for the webhook call should not be sent until the job completes. A CloudRun instance is considered to be active only when it is processing a request. If not, the instance is considered idle, and Google can terminate the container to save resources/costs.
How to set up GitHub Actions Agent on CloudRun?
Source Code setup
- Clone/Copy/Fork this repo and set it up in your Git environment.
- Create a docker image out of this source code. (you can also configure CloudRun to build directly from the source code using cloud build)
- Create a GCP project, if you don’t already have one.
- Create a secret in Secret Manager (GCP) to securely store your GitHub Org Access token. This access token should have at least admin:org scope selected.
- Create a CloudRun service with the following configuration and the above built image (if you are configuring CloudRun to build directly from source, make sure you include the following configuration as well)
- Add a webhook in your GitHub organisation’s settings with the payload URL set to the CloudRun service URL, select the content type as ‘application/json,’ and choose ‘Workflow jobs’ as the event to trigger the webhook.
- Use ‘runs-on: self-hosted’ in your workflow files to use the agent we just created for running the workflows.
Using a GitHub Actions self-hosted agent on CloudRun is a highly scalable and cost-efficient solution for running multiple pipelines/jobs simultaneously without worrying about agent configuration. CloudRun’s billing based on usage and ability to scale massively makes it an ideal choice for this purpose. However, it’s important to note that CloudRun instances have a maximum request timeout of 3600s and that smaller independent pipelines must be created to complete before this timeout. Overall, setting up this solution requires some configuration, but it offers significant advantages for organisations looking to optimise their workflow processes.