2017-07-02

Rebuilding Debian with Java 9

It's about three months until Java 9 is supposed to be released. Debian contains around 1,200 packages that build against Java 8. I have been trying to build them with Java 9.

It's not been going well.

The first attempt found an 87% failure rate, the majority of which were either:

This is too bad to get an idea of what's actually broken, so I gave up.

The second attempt has gone better, only 57% failures. This had a number of issues fixed, but there's still a large number of problems masked by toolchain failures.

However, some real Java 9 breakages are coming to the fore!

Oh, and 135 packages have an unknown problem, so maybe there's a whole other class of bug I've missed.

This is (obviously) an ongoing project, but I thought I'd write up what I'd seen so far.

Also, I wanted to mention how cool it was to hack up a dashboard for your ghetto make/Docker build process in ghetto shell, although slightly less ghetto than the previous shell.

Every 2.0s: ./classify.sh

ascii           cast            deps            doclint         javadoc         keyword         modules         unknown         version
47 total        9 total         203 total       82 total        165 total       15 total        83 total        114 total       206 total
====            ====            ====            ====            ====            ====            ====            ====            ====
antelope        charactermanaj  access-*hecker  android*-tools  access-*hecker  avalon-*mework  activem*otobuf  adql            389-adm*onsole
axis            dom4j           activem*tiveio  antlr4          akuma           axis            android*ork-23  aspectj         389-ds-console
azureus         electric        activemq        args4j          animal-sniffer  biojava-live    android*dalvik  aspectj*plugin  airport-utils
clirr           findbugs        activem*otobuf  bindex          annotat*ndexer  bnd             android*oclava  bouncycastle    android*-tools
cmdreader       jajuk           afterburner.fx  cdi-api         antlr3          dbus-java       android*silver  bsh             antlr
cortado         jsymphonic      akuma           classycle       apache-log4j2   jalview         android*inding  closure*mpiler  artemis
cronometer      libjaud*r-java  animal-sniffer  commons-math3   apache-mime4j   javacc4         android*ibcore  cofoja          beansbinding
dita-ot         olap4j          annotat*ndexer  ditaa           async-h*client  java-gnome      android*ibrary  commons-jcs     bindex
eclipselink     sweethome3d     antlr3          felix-g*-shell  axmlrpc         jmol            android*apksig  convers*ruptor  biojava-live
eclipse                         antlr4          fest-assert     bcel            jruby-openssl   android*s-base  davmail         biomaj
entagged                        apache-log4j2   fest-reflect    bridge-*jector  libcomm*g-java  android*ls-swt  diffoscope      brig
fop                             args4j          fest-util       bsaf            libxml-*1-java  android*helper  dnsjava         brltty
geronim*0-spec                  atinjec*jsr330  ganymed-ssh2    build-h*plugin  libxpp2-java    ant             dumbster        cadencii
imagej                          bcel            gentlyw*-utils  canl-java       mvel            apktool         eclipse*nyedit  castor
jasmin-sable                    bridge-*jector  glassfish       cglib           squareness      basex           eclipse-cdt     cdi-api
jas                             build-h*plugin  hdf5            commons*nutils                  bintray*t-java  eclipse*config  ceph
jasypt                          carrots*h-hppc  hessian         commons*ration                  biojava4-live   eclipse-eclox   cobertura
javacc4                         cglib           intelli*ations  commons-csv                     easybind        eclipse-emf     coco-java
javaparser                      checkstyle      jacksum         commons-io                      eclipse-mylyn   eclipse-gef     colorpicker
jets3t                          codenarc        jcm             commons*vaflow                  eclipse-wtp     eclipse*clipse  commons*client
jgromacs                        commons*nutils  jfugue          commons-jci                     eclipse-xsd     eclipse*es-api  concurr*t-dfsg
king                            commons*ration  jmock           commons-math                    freeplane       eclipse-rse     cvc3
knopfle*h-osgi                  commons-csv     jnr-ffi         cssparser                       gant            eclipse*clipse  db5.3
libcds-*t-java                  commons-io      jpathwatch      csvjdbc                         gradle-*plugin  emma-coverage   dbus-java
libcomm*g-java                  commons*vaflow  jsurf-alggeo    dirgra                          gradle          gdcm            dicomscope
libidw-java                     commons-jci     jts             dnssecjava                      gradle-*otobuf  geronim*upport  ditaa
libiscwt-java                   commons-math    libcds-*c-java  dokujclient                     gradle-*plugin  gettext         docbook*-saxon
libitext-java                   commons-parent  libcomm*c-java  doxia-s*etools                  graxxia         gluegen2        doxia
libjdbm-java                    commons-vfs     libcomm*4-java  dtd-parser                      groovycsv       gnome-split     easyconf
libjt400-java                   core-ca*lojure  libcomm*2-java  easymock                        groovy          h2database      excalib*logger
libstax-java                    cssparser       libhac-java     felix-b*sitory                  gs-collections  ha-jdbc         excalib*logkit
libvldo*g-java                  data-xm*lojure  libhtml*r-java  felix-f*mework                  htsjdk          hdrhistogram    f2j
libxpp3-java                    dirgra          libirclib-java  felix-g*ommand                  ice-bui*gradle  icu4j           felix-osgi-obr
livetri*jsr223                  dnssecjava      libjaba*t-java  felix-g*untime                  insubstantial   icu4j-4.2       fontchooser
mathpiper                       dokujclient     libjgoo*n-java  felix-shell                     ivyplusplus     icu4j-4.4       ganymed-ssh2
maven-a*helper                  doxia           libjhla*s-java  felix-s*ll-tui                  jabref          icu4j-49        gentlyw*-utils
metastudent                     doxia-s*etools  libjoda*e-java  felix-utils                     jackson*tabind  istack-commons  geogebra
naga                            dtd-parser      libjsonp-java   geronim*0-spec                  jackson*-guava  jakarta-jmeter  gridengine
ognl                            easymock        libjsr1*y-java  geronim*1-spec                  java3d          janino          healpix-java

2017-03-21

find-deleted: checkrestart replacement

checkrestart, part of debian-goodies, checks what you might need to restart. You can run it after a system update, and it will find running processes using outdated libraries. If these can be restarted, and there's no new kernel updates, then you can save yourself a reboot.

However, checkrestart is pretty dated, and has some weird behaviour. It frequently reports that there are things needing a restart, but that it doesn't feel like telling you what they are. (This makes me moderately angry). It's pretty bad at locating services on a systemd-managed system. It tries to look through Debian packages, making it Debian specific (along with unreliable). This is especially odd, because systemd knows what pid belongs to a unit, as does /proc, and...

Instead of fixing it, I have rewritten it from scratch.

find-deleted is a tool to find deleted files which are still in use, and to suggest systemd units to restart.

The default is to try and be helpful:

% find-deleted
 * blip
   - sudo systemctl restart mysql.service nginx.service
 * drop
   - sudo systemctl restart bitlbee.service
 * safe
   - sudo systemctl restart fail2ban.service systemd-timesyncd.service tlsdate.service
 * scary
   - sudo systemctl restart dbus.service lxc-net.service lxcfs.service polkitd.service
Some processes are running outside of units, and need restarting:
 * /bin/zsh5
  - [1000] faux: 7161 17338 14539
 * /lib/systemd/systemd
  - [1000] faux: 2082
  - [1003] vrai: 8551 8556

Here, it is telling us that a number of services need a restart. The services are categorised based on some patterns defined in the associated configuration file, deleted.yml.

For this machine, I have decided that restarting mysql and nginx will cause a blip in the service to users; I might do it at an off-peak time, or ensure that there's other replicas of the service available to pick up the load.

My other categories are:

  • drop: A loss of service will happen that will be annoying for users.
  • safe: These services could be restarted all day, every day, and nobody would notice.
  • scary: Restarting these may log you out, or cause the machine to stop functioning.
  • other: things which don't currently have a classification

If you're happy with its suggestions, you can copy-paste the above commands, or you can run it in a more automated fashion:

systemctl restart $(find-deleted --show-type safe)

This can effectively be run through provisioning tools, on a whole selection of machines, if you trust your matching rules! I have done this with a much more primitive version of this tool at a previous employer.

It can also print the full state that it's working from, using --show-paths.


2017-02-21

Busy work: pasane and syscall extractor

Today, I wrote a load of Python and a load of C to work around pretty insane problems in Linux, my choice of OS for development.


pasane is a command-line volume control that doesn't mess up channel balance. This makes it unlike all of the other volume control tools I've tried that you can reasonably drive from the command line.

It's necessary to change volume from the command line as I launch commands in response to hotkeys (e.g. to implement the volume+ button on my keyboard). It's also occasionally useful to change the volume via. SSH. Maybe there's another option? Maybe something works via. dbus? This seemed to make the most sense at the time, and isn't too awful.

Am I insane?


Next up: Some code to parse the list of syscalls from the Linux source tree.

It turns out that it's useful to have these numbers available in other languages, such that you can pass them to tools, so that you can decode raw syscall numbers you've seen, or simply so that you can make the syscalls.

Anyway, they are not available in the Linux source. What? Yes, for most architectures, this table is not available. It's there on i386, amd64, and arm (32-bit), but not for anything else. You have to.. uh.. build a kernel for one of those architectures, then compile C code to get the values. Um. What?

This is insane.

The Python code (linked above) does a reasonably good job of extracting them from this insanity, and generating the tables for a couple more arches.

I needed to do this so I can copy a file efficiently. I think. I've kind of lost track. Maybe I am insane.


2017-02-11

Playing with prctl and seccomp

I have been playing with low-level Linux security features, such as prctl no new privs and seccomp. These tools allow you to reduce the harm a process can do to your system.

They're typically deployed as part of systemd, although the default settings in many distros are yet to be ideal. This is partly because it's hard to confirm what a service actually needs, and partly because many services support many more things than a typical user cares about.

For example, should a web server be able to make outgoing network connections? Probably not, it's accepting network connections from people, maybe running some code, then returning the response. However, maybe you're hosting some PHP that you want to be able to fetch data from the internet? Maybe you're running your web-server as a proxy?

To address these questions, Debian/Ubuntu typically err on the side of "let it do whatever, so users aren't inconvenienced". CentOS/RHEL have started adding a large selection of flags you can toggle to fiddle security (although through yet another mechanism, not the one we're talking about here..).


Anyway, let's assume you're paranoid, and want to increase the security of your services. The two features discussed here are exposed in systemd as NoNewPrivileges= and SystemCallFilter=.

The first, NoNewPrivileges=, prevents a process from getting privileges in any common way, e.g. by trying to change user, or trying to run a command which has privileges (e.g. capabilities) attached.

This is great. Even if someone knows your root password, they're still stuck:

% systemd-run --user -p NoNewPrivileges=yes --tty -- bash
$ su - root
Password:
su: Authentication failure

$ sudo ls
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid'
  option set or an NFS file system without root privileges?

The errors aren't great, as the tools have no idea what's going on, but at least it works!

This seems like a big, easy win; I don't need my php application to become a different user... or do I? It turns out that the venerable mail command, on a postfix system, eventually tries to run a setuid binary, which fails. And php defaults to sending mail via. this route. Damn!

Let's try it out:

% systemd-run --user -p NoNewPrivileges=yes --tty -- bash
faux@astoria:~$ echo hi | mail someone@example.com
faux@astoria:~$

postdrop: warning: mail_queue_enter: create file maildrop/647297.680: Permission denied

Yep, postdrop is setgid (the weird s in the permissions string):

% ls -al =postdrop
-r-xr-sr-x 1 root postdrop 14328 Jul 29  2016 /usr/sbin/postdrop

It turns out that Debian dropped support for alternative ways to deliver mail. So, we can't use that!


Earlier I implied that NoNewPrivileges=, despite the documentation, doesn't remove all ways to get some privileges. One way to do this is to enter a new user namespace (only widely supported by Ubuntu as of today). e.g. we can get CAP_NET_RAW (and its associated vulnerabilities) through user namespaces:

% systemd-run --user -p NoNewPrivileges=yes --tty -- \
    unshare --map-root-user --net -- \
    capsh --print \
        | fgrep Current: | egrep -o 'cap_net\w+'
cap_net_bind_service
cap_net_broadcast
cap_net_admin
cap_net_raw

To harden against this, I wrote drop-privs-harder which simply breaks unshare (and its friend clone)'s ability to make new user namespaces, using seccomp.


Unlike NoNewPrivileges=, SystemCallFilter= takes many more arguments, and requires significantly more research to work out whether a process is going to work. Additionally, systemd-run doesn't support SystemCallFilter=. I'm not sure why.

To assist people playing around with this (on amd64 only!), I wrote a tool named seccomp-tool and a front-end named seccomp-filter.

There's a binary of seccomp-tool available for anyone who doesn't feel like compiling it. It depends on only libseccomp2. sudo apt install libseccomp2. It needs to be in your path as seccomp-tool.

seccomp-filter supports the predefined system call sets from the systemd documentation, in addition to an extra set, named @critical, which systemd seems to silently include without telling you. Both of these tools set NoNewPrivilges=, so you will also be testing that.

Let's have a play:

% seccomp-filter.py @critical -- ls /
ls: reading directory '/': Function not implemented

Here, we're trying to run ls with only the absolutely critical syscalls enabled. ls, after starting, tries to call getdents() ("list the directory"), and gets told that it's not supported. Returning ENOSYS ("function not implemented") is the default behaviour for seccomp-filter.py.

We can have a permissions error, instead, if we like:

% seccomp-filter.py --exit-code EPERM @critical -- ls /
ls: reading directory '/': Operation not permitted

If we give it getdents, it starts working... almost:

% ./seccomp-filter.py --exit-code EPERM @critical getdents -- ls /proc
1
10
1001
11
1112

Why does the output look like it's been piped through a pager? ls has tried to talk to the terminal, has been told it can't, and is okay with that. This looks the same as:

seccomp-filter.py --blacklist ioctl -- ls /

If we add ioctl to the list again, ls pretty much works as expected, ignoring the fact that it segfaults during shutdown. systemd's @default group of syscalls is useful to include to remove this behaviour.


Next, I looked at what Java required. It turns out to be much better than I expected: the JVM will start up, compile things, etc. with just: @critical @default @basic-io @file-system futex rt_sigaction clone.

This actually works as a filter, too: if Java code tries to make a network connection, it is denied. Or, er, at least, something in that area is denied. Unfortunately, the JVM cra.. er.. "hard exits" for many of these failures, as they come through as unexpected asserts:

e.g.

Assertion 'sigprocmask_many(SIG_BLOCK, &t, 14,26,13,17,20,29,1,10,12,27,23,28, -1) >= 0' failed at ../src/nss-myhostname/nss-myhostname.c:332, function _nss_myhostname_gethostbyname3_r(). Aborting.

It then prints out loads of uninitialised memory, as it doesn't expect uname to fail. e.g.

Memory: 4k page, physical 10916985944372480k(4595315k free), swap 8597700727024688k(18446131672566297518k free)

uname: [box][box]s[box]


This demonstrates only one of the operation modes for seccomp. Note that, as of today, the Wikipedia page is pretty out of date, and the manpage is outright misleading. Consider reading man:seccomp_rule_add(3), part of libseccomp2, to work out what's available.


Summary: Hardening good, hardening hard. Run your integration test suite under seccomp-filter.py --blacklist @obsolete @debug @reboot @swap @resources and see if you can at least get to that level of security?


2016-10-11

HTTP2 slowed my site down!

At work, we have a page which asynchronously fetches information for a dashboard. This involves making many small requests back to the proxy, which is exactly the kind of thing that's supposed to be faster under HTTP2.

However, when we enabled HTTP2, the page went from loading in around two seconds, to taking over twenty seconds. This is bad. For a long time, I thought there was a bug in nginx's HTTP2 code, or in Chrome (and Firefox, and Edge..). The page visibly loads in blocks, with exactly five second pauses between the blocks.

The nginx config is simply:

resolver 8.8.4.4;
location ~ /proxy/(.*)$ {
  proxy_pass https://$1/some/thing;
}

.. where 8.8.4.4 is Google Public DNS.


It turns out that the problem isn't with HTTP2 at all. What's happening is that nginx is processing the requests successfully, and generating DNS lookups. It's sending these on to Google, and the first few are getting processed; the rest are being dropped. I don't know if this is due to the network (it's UDP, after all), or as Google think it's attack traffic. The remainder of the requests are retried by nginx's custom DNS resolver after 5s, and another batch get processed.

So, why is this happening under HTTP2? Under http/1.1, the browser can't deliver requests quickly enough to trigger this flood protection. HTTP2 has sped it up to the point that there's a problem. Woo? On localhost, a custom client can actually generate requests quickly enough, even over http/1.1.

nginx recommend not using their custom DNS resolver over the internet, and I can understand why; I've had trouble with it before. To test, I deployed dnsmasq between nginx and Google:

dnsmasq -p 1337 -d -S 8.8.4.4 --resolv-file=/dev/null

dnsmasq generates identical (as far as I can see) traffic, and is only slightly slower (52 packets in 11ms, vs. 9ms), but I am unable to catch it getting rate limited. On production, a much smaller machine than the one I'm testing, dnsmasq is significantly slower (100+ms), so it makes sense that it wouldn't trigger rate limiting. dnsmasq does have --dns-forward-max= (default 150), so there's a nice way out there.


In summary: When deploying HTTP2, or any upgrades, be aware of rate limits in your, or other people's, systems, that you may now be able to trigger.


2016-02-01

Fosdem 2016

I was at Fosdem 2016. Braindump:

  • Pottering on DNSSEC: Quick overview of some things Team Systemd is working on, but primarily about adding DNSSEC to systemd-resolvd.
    • DNSSEC is weird, scary, and doesn't really have any applications in the real world; it doesn't enable anything that wasn't already possible. Still important and interesting for defence-in-depth.
    • Breaks in interesting cases, e.g. home routers inventing *.home or fritz.box, the latter of which is a real public name now. Captive portal detection (assuming we can't just make those go away).
    • systemd-resolvd is a separate process with a client library; DNS resolution is too complex to put in libc, even if you aren't interested in in caching, etc.
  • Contract testing for JUnit: Some library sugar for instantiating multiple implementations of an interface, and running blocks of tests against them. Automatically running more tests when anything returns a class with any interface that's understood.
    • I felt like this could be applied more widely (or perhaps exclusively); if you're only testing the external interface, why not add more duck-like interfaces everywhere, and test only those? Speaker disagreed, perhaps because it never happens in practice...
    • Unfortunately, probably only useful if you are actually an SPI provider, or e.g. Commons Collections. This was what was presented, though, so not a big surprise.
    • The idea of testing mock/stub implementations came up. That's one case where, at least, I end up with many alternative implementations of an interface that are supposed to test the same. Also discussed whether XFAIL was a good thing, Rubby people suggested WIP (i.e. must fail or test suite fails) works better.
  • Frida for closed-source interop testing: This was an incredibly ambitious project, talk and demo. ("We're going to try some live demos. Things are likely to catch fire. But, don't worry, we're professionals.").
    • Injects the V8 JS VM into arbitrary processes, and interacts with it from a normal JS test tool; surprisingly nice API, albeit using some crazy new JS syntaxes. Roughly: file.js: {foo: function() { return JVM.System.currentTimeMillis(); } } and val api = injectify(pid, 'file.js'); console.log(api.foo());.
    • Great bindings for all kinds of language runtimes and UI toolkits, to enable messing with them, and all methods are overridable in-place, i.e. you can totally replace the read syscall, or the iOS get me a url method. Lots of work on inline hooking and portable and safe call target rewriting.
    • Live demo of messing with Spotify's ability to fetch URLs... on an iPhone... over the network. "System dialog popped up? Oh, we'll just inject Frida into the System UI too...".
  • USBGuard: People have turned off the maddest types of USB attack (e.g. autorun), but there's still lots of ways to get a computer to do something bad with an unexpected (modified) USB stick; generate keypresses and mouse movements, even a new network adaptor that may be chosen to send traffic to.
    • Ask the user if they expect to see a new USB device at all, and whether they think it should have these device classes (e.g. "can be a keyboard"). Can only reject or accept as a whole; kernel limitation. UX WIP.
    • Potential for read-only devices; filter any data being exfiltrated at all from a USB stick, but still allow reading from them? Early boot, maybe? Weird trust model. Not convinced you could mount a typical FS with USB-protocol level filtering of writes.
    • Mentioned device signing; no current way to do identity for devices, so everything is doomed if you have any keyboards. Also mentioned CVEs in USB drivers, including cdc-wdm, which I reviewed during the talk and oh god oh no goto abuse.
  • C safety, and whole-system ASAN: Hanno decided he didn't want his computer to work at all, so has been trying to make it start with ASAN in enforce mode on for everything. Complex due to ASAN loading order/dependencies, and the fact that gcc and libc have to be excluded because they're the things being overridden.
    • Everything breaks (no real surprise there). bash, coreutils, man, perl, syslog, screen, nano. Like with Fuzzing Project, people aren't really interested in fixing bugs they consider theoretical, but are really real on angry C compilers, or in the future. Custom allocators have to be disabled, which is widely but not totally supported.
    • Are there times where you might want ASAN on in production? Would it increase security (trading off outages), or would it add more vulnerabilities, due to huge attack surface? ~2x slowdown at the moment, which is probably irrelevant.
    • Claimed ASLR is off by default in typical Linux distros; I believe Debian's hardening-wrapper enables this, but Lintian reports poor coverage, so maybe a reasonable claim.
  • SSL management: Even in 2014, when Heartbleed happened, Facebook had not really got control of what was running in their infrastructure. Public terminators were fine, but everything uses SSL. Even CAs couldn't cope with reissue rate, even if you could find your certs. Started IDSing themselves to try and find missed SSL services.
    • Generally, interesting discussion of why technical debt accumulates, especially in infra. Mergers and legacy make keeping track of everything hard. No planning for things that seem like they'll never happen. No real definition of ownership; alerts pointed to now-empty mailing lists, or to people who have left (this sounds very familiar). That service you can't turn off but nobody knows why.
    • Some cool options. Lots of ways to coordinate and monitor SSL (now), e.g. Lemur. EC certs are a real thing you can use on the public internet (instead of just me on my private internet), although I bet it needs Facebook's cert switching middleware. HPKP can do report-only.
    • Common SSL fails that aren't publicly diagnosed right now: Ticket encryption with long-life keys (bad), lack of OCSP stapling and software to support that.
  • Flight flow control: Europe-wide flight control is complex, lots of scarce resources: runways, physical sky, radar codes, radio frequencies. Very large, dynamic, safety-critical optimisation problem.
    • 35k flights/day. 4k available radar codes to identify planes. Uh oh. Also, route planning much more computationally expensive in 4D. Can change routes, delay flights, but also rearrange ATC for better capacity of bits of the sky.
    • Massive redundancy to avoid total downtime; multiple copies of the live system, archived plans stored so they can roll-back a few minutes, then an independently engineered fall-back for some level of capacity if live fails due to data, then tools to help humans do it, and properly maintained capacity information for when nobody is contactable at all.
    • Explained optimising a specific problem in Ada; no actual Ada tooling so stuck with binary analysis (e.g. perf, top, ..). Built own parallelisation and pipelining as there's no tool or library support. Ada codebase and live system important, but too complex to change, so push new work into copies of it on the client. Still have concurrency bugs due to shared state.
  • glusterfs and beyond glusterfs: Large-scale Hadoop alternative, more focused on reliability, and NFS/CIFS behaviour than custom API, but also offer object storage and others.
    • Checksumming considered too slow, so done out of band (but actually likely to catch problems), people don't actually want errors to block their reads (?!?). Lots more things are part of a distributed system than I would have expected, e.g. they're thinking of adding support for geographical clustering, so you can ensure parts of your data are in different DCs, or that there is a copy near Australia (dammit).
    • The idea that filesystems have to be in kernel mode is outdated; real perf is from e.g. user-mode networking stacks. Significantly lower development costs in user-mode means FSes are typically faster in user space, as the devs have spent more time (with more tooling modifiers) getting them to work properly: real speedups are algorithmical and not code.
    • Went on to claim that Python is fine, don't even need C or zero-copy (but do need to limit copies), as everything fun is offloaded anyway. Ships binaries with debug symbols (1-5% slowdown) as it's totally irrelevant. Team built out of non-FS, non-C people writing C. They're good enough to know what not to screw up.
    • Persistent memory is coming (2TB of storage between DRAM and SSD speed), and cache people are behind. NFS-Ganesha should be your API in all cases.
  • What even are Distros?: Distros aren't working, can't support things for 6mo, 5y, 10y or whatever as upstream and devs hate you. Tried to build hierarchies of stable -> more frequently updated, but failed; build deps at the lower levels; packaging tools keep changing; no agreement on promotion; upstreams hate you.
    • PPAs? They had invented yet another PPA host, and they're all still bad. Packaging is so hard to use, especially as non-upstreams don't use it enough to remember how to use it.
    • Components/Modules? Larger granularity installable which doesn't expose what it's made out of, allowing distros to mess it up inside? Is this PaaS, is this Docker? It really feels like it's not, and it's really not where I want to go.
    • Is containerisation the only way to protect users from badly packaged software? Do we want to keep things alive for ten years? I have been thinking about the questions this talk had for a while, but need a serious discussion before I can form opinions.
  • Reproducible Builds: Definitely another post some time.
    • Interesting questions about why things are blacklisted, and whether the aim is to support everything. Yes. Everything. EVERYTHING.
  • Postgres buffer manager: Explaining the internals of the shared buffers data structure, how it's aged poorly, what can be done to fix it up, or what it should be replaced with. Someone actually doing proper CS data-structures, and interested in cache friendliness, but also tunability, portability, future-proofing, etc.
    • shared_buffers tuning advice (with the usual caveat that it's based on workload); basically "bigger is normally always better, but definitely worth checking you aren't screwing yourself".
    • Also talked about general IO tuning on Linux, e.g. dirty_writeback, which a surprising number of people didn't seem to have heard of. Setting it to block earlier reduces maximum latency; numbers as small as 10MB were considered.
  • Knot DNS resolver: DNS resolvers get to do a surprising number of things, and some people run them at massive scale. Centralising caching on e.g. Redis is worthwhile sometimes. Scriptable in Lua so you can get it to do whatever you feel like at the time (woo!).
    • Implements some interesting things: Happy Eyeballs, QNAME minimisation, built-in serving of pretty monitoring. Some optimisations can break CDNs (possibly around skipping recursive resolving due to caching), didn't really follow.
  • Prometheus monitoring: A monitoring tool. Still unable to understand why people are so much more excited about it than anything else. Efficient logging and powerful querying. Clusterable through alert filtering.
    • "Pull" monitoring, i.e. multiple nodes fetch data from production, which is apparently contentious. I am concerned about credentials for machine monitoring, but the daemon is probably not that much worse.
  • htop porting: Trying to convince the community to help you port to other platforms is hard. If you don't, they'll fork, and people will be running an awful broken fork for years (years). Eventually resolved by adding the ability to port to BSD, then letting others do the OSX port.
  • API design for slow things: Adding APIs to the kernel is hard. Can't change anything ever. Can't optimise for the sensible case because people will misuse it and you can't change it. Can't "reduce" compile-time settings as nobody builds a kernel to run an app.
    • Lots of things in Linux get broken, or just never work to start with, due to poor test coverage, maybe the actually funded kselftest will help, but people can help themselves by making real apps before submitting apis, or at least real documentation. e.g. recvmsg timeout was broken on release. timerfd tested by manpage.
    • API versioning is hard when you can't turn anything off. epoll_create1, renameat2, dup3. ioctl, prctl, netlink, aren't a solution, but maybe seccomp is. Capabilities are hard; 1/3rd of things just check SYS_ADMIN (which is ~= root). Big argument afterwards about whether versioning can ever work, and what deprecation means. Even worse for this than for Java, where this normally comes up.
  • Took a break to talk to some Go people about how awful their development process is. CI is broken, packaging is broken, builds across people's machines are broken. Everything depends on github and maybe this is a problem.
  • Fosdem infra review: Hardware has caught up with demand, now they're just having fun with networking, provisioning and monitoring. Some plans to make the conference portable, so others can clone it (literally). Video was still bad but who knows. Transcoding is still awfully slow.
    • Fosdem get a very large temporary ipv4 assignment from IANA. /17? Wow. Maybe being phased out as ipv6 and nat64 kind of works in the real world now.
    • Argument about why they could detect how many devices there were, before we realised mac hiding on mobile is probably disabled when you actually connect, because that's how people bill.
  • HOT OSM tasking: Volunteers digitising satellite photos of disaster zones, and ways to allocate and parallelise that. Surprisingly fast; 250 volunteers did a 250k person city in five days, getting 90k buildings.
    • Additionally, provide mapping for communities that aren't currently covered, and train locals to annotate with resources like hospitals and water-flow information.
    • Interesting that sometimes they want to prioritise for "just roads", allowing faster mapping. Computer vision is still unbelievably useless; claiming 75% success rate at best on identifying if people even live in an area.
    • Lots of ethical concerns; will terror or war occur because there's maps? Sometimes they ask residents and they're almost universally in favour of mapping. Sometimes drones are donated to get better imagery, and residents jump on it.
  • Stats talk. Lots of data gathered; beer sold, injuries fixed, network usage and locations. Mostly mobile (of which, mostly android). Non-mobile was actually dominated by OSX, with Linux a close second. Ouch.

Take-aways: We're still bad at software, and at distros, and at safety, and at shared codebases.

Predictions: Something has to happen on distros, but I think people will run our current distros (without containers for every app) for a long time.


- Next ยป