[tor-project] minutes from the sysadmin meeting

Hello everyone!

Long time no see! Here’s your usual dose of sysadmin minutes, sorry for
the late mail, we skipped a few…

Roll call: who’s there and emergencies

anarcat, gaba, kez, lavamind, no emergency apart from CiviCRM hogging
a CPU but that has been happening for the last month or so

Dashboard review

We went through our normal per-user, weekly, check-in:

We do not go through the general dashboards anymore as those are done
in triage (by the star of the week for TPA, with gaba and anarcat for
web).

Q2 prioritisation

We looked at the coming deliverables, mostly on the web side of things:

  • developer portal
    • repo: force-push new HUGO site into The Tor Project / Web / dev · GitLab
    • staging: use pages for it until build pipeline is ready
    • triage/clean issues in web/dev (gaba)
    • edit/curate content (gaba)
    • review by TPO
    • send to production (maybe Q4 2023)
  • donation page (next project meeting is on May 17th) ~ kez working on it
  • self-host forum ~ wrapping up by the end of June
  • download page when ux team is done with it

We also looked at the TPA milestones.

Out of those milestones, we hope for the gnt-dal migration to be
completed shortly. It’s technically done, but there’s still a bunch of
cleanup work to be completed to close the milestone compeltely.

Another item we want to start completing but that has a lot of
collateral is the bullseye upgrade, as that includes upgrading Puppet,
LDAP (!), Mailman (!!), possibly replacing Nagios, and so on.

Anarcat also wants to push the gitolite retirement forward as that has
been discussed in Costa Rican corridors and there’s momentum on this
now that a set of rewrite rules has been built…

Holidays planning

We reviewed the summer schedule to make sure everything is up to date
and there is not too much overlap.

Metrics of the month

  • hosts in Puppet: 85, LDAP: 86, Prometheus exporters: 155
  • number of Apache servers monitored: 33, hits per second: 658
  • number of self-hosted nameservers: 6, mail servers: 9
  • pending upgrades: 0, reboots: 2
  • average load: 1.17, memory available: 3.31 TiB/4.45 TiB, running
    processes: 580
  • disk free/total: 35.92 TiB/105.25 TiB
  • bytes sent: 306.33 MB/s, received: 198.85 MB/s
  • planned bullseye upgrades completion date: 2023-01-21 (!)
  • GitLab tickets: 192 tickets including…
    • open: 0
    • icebox: 143
    • backlog: 22
    • next: 16
    • doing: 6
    • needs information: 4
    • needs review: 1
    • (closed: 3121)

Upgrade prediction graph lives at:

Note that we’re late in the bullseye upgrade procedure, but for the
first time in months we’ve had significant progress with the
retirement of a bunch of machines and rebuilding of existing
ones.

We’re also starting to deploy our first bookworm machines now,
although that is done only on a need-to basis as we can’t actually
install bookworm machines yet: they need to be installed with
bullseye to get Puppet boostrapped and then we immediately upgrade to
bookworm.

A more detailed post-mortem of the upgrade process is under discussion in the wiki: