[tor-project] GitLab Runner updates

anarcat · June 16, 2022, 1:55pm

Hello!

We're making changes to the GitLab CI infrastructure you should know
about. TL;DR: new OSUOSL runners, tags are now lowercase, clarification
on the "tpa" tag.

First, we're adopting a few CI runners provided by the good people at
OSUOSL. Two new amd64 runners are joining the fleet and will be
executing untagged jobs across our instance. This should help relieve
the pressure on our existing runners, specifically related to delays in
job processing when large simulations would run.

In addition, we also gain three new runners running on arm64, ppc64le
and s390x architectures, again from OSUOSL.

Secondly, we've updated the tags on our existing runners in order for
both TPA and OSUOSL runners to improve consistency. In short, we've
lower-cased the former "Linux" and "TPA" tags, which are now "linux" and
"tpa". If you have CI jobs with the old uppercase tags, please make sure
to update your .gitlab-ci.yml files. Also refer to the CI documentation
for further details on the available tags:

Finally, note that the OSUOSL runners are *not* marked "tpa", because we
do not manage the underlying virtual machines. In that sense they are
slightly less "trusted" because we do not control the runner's
configuration, so if you want to limit certain jobs to those "trusted"
runners, be sure to limit your jobs to the `tpa` tag.

In general, you shouldn't really *trust* GitLab or GitLab CI for
anything else than running tests. Builds should be verified out of band
with reproducible builds. You can reproduce a local GitLab CI
environment by installing gitlab-runner and executing jobs locally,
without having to trust the entire GitLab installation or foreign
runners. As a reminder, it is your responsibility to ensure the
integrity of your code and artifacts, see those links for a further
discussion:

This work was done as part of this ticket:

Feedback is welcome there, but new issues should probably be reported in
a new ticket. In any case, let us know if anything seems off.

A.

PS: Note that those runners are not *yet* online, but we expect them to
become live within a few days. The above ticket will be updated when
that happens.

···

--
Antoine Beaupré
torproject.org system administration

jnewsome · June 18, 2022, 10:56pm

<snip>

In general, you shouldn't really *trust* GitLab or GitLab CI for
anything else than running tests. Builds should be verified out of band
with reproducible builds. You can reproduce a local GitLab CI
environment by installing gitlab-runner and executing jobs locally,
without having to trust the entire GitLab installation or foreign
runners. As a reminder, it is your responsibility to ensure the
integrity of your code and artifacts, see those links for a further
discussion:

evaluate mitigation strategies to work around GitLab's attack surface for git hosting (#81) · Issues · The Tor Project / TPA / Gitlab · GitLab
gitlab · Wiki · The Tor Project / TPA / TPA team · GitLab
git · Wiki · The Tor Project / TPA / TPA team · GitLab

<snip>

We also had some discussion about reproducing gitlab-CI builds in Confirm Tor Project tor.git package builds are reproducible (#40615) · Issues · The Tor Project / Core / Tor · GitLab.

While it's fairly straightforward to install a gitlab-runner and execute locally, as far as I can tell a malicious GitLab installation could still send a modified "script" (post-processed .gitlab-ci.yml) or repo checkout down to the runner. Maybe there's some way to audit this, but I couldn't find an obvious one. Maybe configuring the runner to log at debug level would record enough? Advanced configuration | GitLab

For that issue I ended up hacking together a small python script that processes the .gitlab-ci.yml into something to feed directly through Docker. It's currently a bit hacky and specialized for the Debian tor package build. I think it could be generalized further to be reusable if that's of interest (maybe using Docker Compose to orchestrate jobs within a pipeline), but am still thinking about whether there's a better way... reproduce_pipeline.py · main · Jim Newsome / reproduce-tor-debian-build · GitLab

Right now my top candidate we haven't tried yet is to install a full local GitLab in addition to a local gitlab-runner; maybe using their published Docker images GitLab Docker images | GitLab. This seems like the least engineering effort (~none) but a bit more work for every individual wanting to do such a local build.

Keeping as much logic out of the .gitlab-ci.yml as possible so that the gitlab yml is trivial to manually reproduce outside of gitlab (e.g. run `./build.sh`) is probably ideal, though gives up some gitlab functionality. IIUC this is the approach we're using for the tor tarballs. The Tor Project / Core / Tor CI Reproducible · GitLab

···

On 6/16/22 08:55, Antoine Beaupré wrote:
_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

anarcat · June 20, 2022, 2:20pm

<snip>

In general, you shouldn't really *trust* GitLab or GitLab CI for
anything else than running tests. Builds should be verified out of band
with reproducible builds. You can reproduce a local GitLab CI
environment by installing gitlab-runner and executing jobs locally,
without having to trust the entire GitLab installation or foreign
runners. As a reminder, it is your responsibility to ensure the
integrity of your code and artifacts, see those links for a further
discussion:

evaluate mitigation strategies to work around GitLab's attack surface for git hosting (#81) · Issues · The Tor Project / TPA / Gitlab · GitLab
gitlab · Wiki · The Tor Project / TPA / TPA team · GitLab
git · Wiki · The Tor Project / TPA / TPA team · GitLab

<snip>

We also had some discussion about reproducing gitlab-CI builds in
Confirm Tor Project tor.git package builds are reproducible (#40615) · Issues · The Tor Project / Core / Tor · GitLab.

Interesting!

While it's fairly straightforward to install a gitlab-runner and execute
locally, as far as I can tell a malicious GitLab installation could
still send a modified "script" (post-processed .gitlab-ci.yml) or repo
checkout down to the runner. Maybe there's some way to audit this, but I
couldn't find an obvious one. Maybe configuring the runner to log at
debug level would record enough?
Advanced configuration | GitLab

Thtat's not what I mean. I don't mean installing your own runner locally
and hooking it up with GitLab. I mean installing the gitlab-runner
package (only!) and *not* hooking it up in GitLab.

Instead, you run the job completely locally, without involving GitLab at
all. That's done with the `gitlab-runner exec` command:

We have docs about this here:

This removes a large part of the attack surface because GitLab is taken
out of the equation. It reduces the stack to:

* your local computer and operating system
* your git repository
* git
* gitlab-runner
* the executor (e.g. Docker) and its image

It's still pretty darn large, but it's better than before.

For that issue I ended up hacking together a small python script that
processes the .gitlab-ci.yml into something to feed directly through
Docker. It's currently a bit hacky and specialized for the Debian tor
package build. I think it could be generalized further to be reusable if
that's of interest (maybe using Docker Compose to orchestrate jobs
within a pipeline), but am still thinking about whether there's a better
way...
reproduce_pipeline.py · main · Jim Newsome / reproduce-tor-debian-build · GitLab

Note that @eighthave has done a similar thing for F-Droid, you might
want to collaborate.

I think the improvement of that over the above is that you remove the
"gitlab-runner" part of the attack surface. It's a pretty large attack
surface because the runners are a surprisingly large amount of code, but
I wonder if it's worth the trouble...

What's the threat model here specifically? Backdoored gitlab-runner code?

Right now my top candidate we haven't tried yet is to install a full
local GitLab in addition to a local gitlab-runner; maybe using their
published Docker images GitLab Docker images | GitLab.
This seems like the least engineering effort (~none) but a bit more work
for every individual wanting to do such a local build.

Other organisations run *two* GitLab instances for that purpose, by the
way. GitLab.com included, from what I understand.

Keeping as much logic out of the .gitlab-ci.yml as possible so that the
gitlab yml is trivial to manually reproduce outside of gitlab (e.g. run
`./build.sh`) is probably ideal, though gives up some gitlab
functionality.

What functionality are you thinking of here?

IIUC this is the approach we're using for the tor
tarballs. The Tor Project / Core / Tor CI Reproducible · GitLab

Thanks for the input!

a.

···

On 2022-06-18 17:56:14, Jim Newsome wrote:

On 6/16/22 08:55, Antoine Beaupré wrote:

--
Antoine Beaupré
torproject.org system administration
_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

jnewsome · June 21, 2022, 3:33pm

While
it's fairly straightforward to install a gitlab-runner and execute
locally, as far as I can tell a malicious GitLab installation could
still send a modified "script" (post-processed .gitlab-ci.yml) or repo
checkout down to the runner. Maybe there's some way to audit this, but I
couldn't find an obvious one. Maybe configuring the runner to log at
debug level would record enough?
Advanced configuration | GitLab

Thtat's not what I mean. I don't mean installing your own runner locally
and hooking it up with GitLab. I mean installing the gitlab-runner
package (only!) and *not* hooking it up in GitLab.

Instead, you run the job completely locally, without involving GitLab at
all. That's done with the `gitlab-runner exec` command:

GitLab Runner commands | GitLab

We have docs about this here:

ci · Wiki · The Tor Project / TPA / TPA team · GitLab

This removes a large part of the attack surface because GitLab is taken
out of the equation. It reduces the stack to:

  * your local computer and operating system
  * your git repository
  * git
  * gitlab-runner
  * the executor (e.g. Docker) and its image

It's still pretty darn large, but it's better than before.

Ahhh right, I'd forgotten about `gitlab-runner`'s `exec` feature. Unfortunately the current implementation of the feature is a bit hacky and not super well-documented. IIUC they took it from a 3rd party pull request, tried to rip it back out, but too many people screamed so it's still there in a semi-zombie state. It looks like they're working on designing a new implementation that they'll be happier with. Local pipeline execution (#2797) · Issues · GitLab.org / gitlab-runner · GitLab.

The current version only runs a single job, not a whole pipeline, so you still need some wrapper logic for multi-job pipelines to run them in the right order, copy artifacts between each-other, initialize pipeline-level variables, etc.

For the debian package build I got it partly working, but couldn't find a way to run a single-job out of a parameterized matrix (which they use to build for multiple platforms and architectures). Given the other headaches and lack of documentation I shelved this approach for the moment (Confirm Tor Project tor.git package builds are reproducible (#40615) · Issues · The Tor Project / Core / Tor · GitLab).

I agree that this feature is potentially very useful. The "v2" proposal of the feature will run a whole pipeline, but communicates with Gitlab to help do so, which may defeat the purpose again from our perspective (at least without some careful auditing of the communication between gitlab and the runner). Local pipeline execution (#2797) · Issues · GitLab.org / gitlab-runner · GitLab

For that issue I ended up hacking together a small python script that
processes the .gitlab-ci.yml into something to feed directly through
Docker. It's currently a bit hacky and specialized for the Debian tor
package build. I think it could be generalized further to be reusable if
that's of interest (maybe using Docker Compose to orchestrate jobs
within a pipeline), but am still thinking about whether there's a better
way...
reproduce_pipeline.py · main · Jim Newsome / reproduce-tor-debian-build · GitLab

Note that @eighthave has done a similar thing for F-Droid, you might
want to collaborate.

Thanks, good to know!

I think the improvement of that over the above is that you remove the
"gitlab-runner" part of the attack surface. It's a pretty large attack
surface because the runners are a surprisingly large amount of code, but
I wonder if it's worth the trouble...

What's the threat model here specifically? Backdoored gitlab-runner code?

Right - I agree there's not much security benefit over the `gitlab-runner exec` approach. I just found I ultimately wasn't getting that much benefit out of it since I was already having to write all the pipeline-orchestration, and got tired of wrestling with the lack of documentation etc :).

Right now my top candidate we haven't tried yet is to install a full
local GitLab in addition to a local gitlab-runner; maybe using their
published Docker imageshttps://docs.gitlab.com/ee/install/docker.html.
This seems like the least engineering effort (~none) but a bit more work
for every individual wanting to do such a local build.

Other organisations run *two* GitLab instances for that purpose, by the
way. GitLab.com included, from what I understand.

Interesting

Keeping as much logic out of the .gitlab-ci.yml as possible so that the
gitlab yml is trivial to manually reproduce outside of gitlab (e.g. run
`./build.sh`) is probably ideal, though gives up some gitlab
functionality.

What functionality are you thinking of here?

For example the debian package build in particular makes heavy use of yml templating. The same thing could be achieved other ways - e.g. moving the yml snippets out to shell files/functions that can be invoked by the other "job scripts", but it adds more indirection and fragmentation vs having everything in one place in the yml file.

For multi-job pipelines, you also still end up having to duplicate the outer orchestration between jobs in the pipeline between yml and some other driver script. You can mitigate this by using fewer jobs (maybe just 1) but that's again giving up some gitlab functionality.

Thanks for the input!

···

On 6/20/22 09:20, Antoine Beaupré wrote:

_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project