[tor-project] minutes from the sysadmin meeting

Here’s your monthly dose of sysadmin news!

Roll call: who’s there and emergencies

anarcat, gaba, kez, lavamind

Dashboard review

We did our normal per-user check-in:

… and briefly reviewed the general dashboards:

We need to rethink the web board triage, as mentioned in the last
point of this meeting.

TPA-RFC-42: 2023 roadmap

Gaba brought up a few items we need to plan for, and schedule:

  • donate page rewrite (kez)
  • sponsor9:
    • self-host discourse (Q1-Q2 < june 2023)
    • RT and cdr.link evaluation (Q1-Q2, gus): “improve our frontdesk
      tool by exploring the possibility of migrating to a better tool
      that can manage messaging apps with our users”
    • download page changes (kez? currently blocked on nico)
  • weblate transition (CI changes pending, lavamind following up)
  • developer portal (dev.torproject.org), in Hugo, from ura.design
    (tpo/web/dev#6)

Those are tasks that either TPA will need to do themselves or assist
other people in. Gaba also went through the work planned for 2023 in
general to see what would affect TPA.

We then discussed anarcat’s roadmap proposal (TPA-RFC-42):

  • do the bookworm upgrades, this includes:
    • puppet server 7
    • puppet agent 7
    • plan would be:
      • Q1-Q2: deploy new machines with bookworm
      • Q1-Q4: upgrade existing machines to bookworm
  • email services migration (e.g. execute TPA-RFC-31, still need to
    decide the scope, proposal coming up)
  • possibly retire schleuder (e.g. execute TPA-RFC-41, currently
    waiting for feedback from the community council)
  • complete the cymru migration (e.g. execute TPA-RFC-40)
  • retire gitolite/gitweb (e.g. execute TPA-RFC-36)
  • retire SVN (e.g. execute TPA-RFC-11)
  • monitoring system overhaul (TPA-RFC-33)
  • deploy a Puppet CI
    • e.g. make the Puppet repo public, possibly by removing private content
      and just creating a “graft” to have a new repository without old
      history (as opposed to rewriting the entire history, because then
      we don’t know if we have confidential stuff in the old history)
    • there are disagreements on whether or not we should make the
      repository public in the first place, as it’s not exactly “state
      of the art” puppet code, which could be embarrassing
    • there’s also a concern that we don’t need CI as long as we don’t
      have actual tests to run (but it’s also kind of pointless to have
      CI without tests to run…), but for now we already have the
      objective of running linting checks on push (tpo/tpa/team#31226)
  • plan for summer vacations

Web team organisation

Postponed to next meeting. anarcat will join Gaba’s next triage
session with gus to see how that goes.

Metrics of the month

  • hosts in Puppet: 95, LDAP: 95, Prometheus exporters: 163
  • number of Apache servers monitored: 29, hits per second: 715
  • number of self-hosted nameservers: 6, mail servers: 10
  • pending upgrades: 0, reboots: 4
  • average load: 0.64, memory available: 4.61 TiB/5.74 TiB, running
    processes: 736
  • disk free/total: 32.50 TiB/92.28 TiB
  • bytes sent: 363.66 MB/s, received: 215.11 MB/s
  • planned bullseye upgrades completion date: 2022-11-01
  • GitLab tickets: 175 tickets including…
    • open: 0
    • icebox: 144
    • backlog: 17
    • next: 4
    • doing: 7
    • needs review: 1
    • needs information: 2
    • (closed: 2934)

Upgrade prediction graph lives at:

Now also available as the main Grafana dashboard. Head to
https://grafana.torproject.org/, change the time period to 30 days,
and wait a while for results to render.

Number of the month: 12

Progress on bullseye upgrades mostly flat-lined at 12 machines since
August. We actually have three less bullseye servers now, down to 83
from 86.