I have written a large mass of code this year, primarily in Rust.
With only one exception, none of this has reached the 'blog post' level
of maturity. Here lies a memorial for these projects, perhaps as
a reminder to me to resurrect them.
Github's contribution chart gives a good
indication of just how much code has been written. Clearly visible are
some holidays, and the associated productivity peaks on either side:
Some focuses this "year" (Nov 2017+):
Archives and storage
splayers recursively unpack an
archive, supporting multiple formats. You have a
gzip file, on an
ext4 filesystem, inside a
tar.bzip2 archive, inside a Debian package? No problem.
The aim here was to "ingest" large volumes of "stuff", either for
comparison (e.g. diffoscope, from the
Reproducible Builds project), or for indexing and search.
Speaking of which, deb2pg
demonstrates various ways not to build an indexing search engine
for a large quantity of "stuff".
While working on these, I became a bit obsessed with how bad
gziping a file, then running it through any kind of indexing,
or even other compression, gives very poor results. Very poor.
rezip is a tool to
gzip files into a more storable format.
It... never made it. I could complain for hours. See the README.
Much of this work was done against/for Debian. Debian's
not a fun tool to use, so I started rewriting it. fapt
can download lists, and provide data in a usable form (e.g.
build files). gpgrv is enough
gpg implementation for
Once you start rewriting
apt, you might as well rewrite the rest
of the build and packaging system, right? fbuilder
and fappa are two ways not to
fappa needed to talk to Docker, so shipliftier
has a partial
swagger-codegen implementation for Rust.
Much of the way networking is done and explained for linux is not
netzact is a replacement for
the parts of
ss that people actually use. It has the
ss, but only one the horrible bugs: no documentation
at all. That one is probably fixable, at least!
pinetcchio continued into
its fourth year, I like to think I made it even worse this year.
fdns was going to do something with
DNS but I can't really remember which thing I was going to fix first.
There's so much wrong with DNS.
quad-image is an image hosting
service. It works, I run it. I even tried to add new image formats, like
heifers. That was a mistake.
I still use IRC. The protocol is bad, but at least there are working
clients. Slack Desktop still segfaults on start on Ubuntu 18.10, months
after release, because they don't understand how to use Electron and
nobody can fix it for them.
unsnap is an IRC title bot. Yes,
there are hundreds of others. No, I don't like working with other people's
untested code in untestable plugin frameworks. Thanks for asking.
badchat is some kind of IRC thing.
This one might still have some life in it.
zrs is a re-implementation of
directory changing tool. It's good, you should use it.
sortuniq is a more efficient
| sort | uniq. It supports some of the flags that either tool supports.
This is probably enough of a blog post for that. I use it frequently.
kill-desktop tries to get your
"X" applications to exit cleanly, such that you can shutdown, or reboot.
"Watch" the "demo" in the repository readme,
or try it out for yourself:
cargo install kill-desktop
Many people just reboot. This risks losing unsaved work, such as documents,
the play position in your media player, or even
the shell history in your shell.
This feature is typically built in to desktop environments, but somewhat
lacking in the more minimalist of linux window managers, such as my favourite,
Even the more complex solutions, such as the system built into Windows, do
not deal well with naughty applications; ones that will just go hide in the
tray when you try to close them, or that show dialogs a while after you
asked them to exit.
kill-desktop attempts to solve this problem by keeping track of the state
of the system, and offering you ways to progress. If one of these naughty
hiding applications merely hides when you close the window,
doesn't forget. It tracks the process of that window, waiting for it to go
away. If it is not going away, you are able to ask the process to exit. Or
just shut down. It's probably a bad application anyway.
Interesting learnings from this project:
Firstly, writing an interface was a bit of a pain. I wanted to be able to
prompt the user for an action, but also be able to show them updates. It is
not possible to do this without
as there is no way to do a non-blocking read from
stdin. This surprised me.
You can't even read a single character (think
Continue? y/n) without messing
with the terminal settings, which needs very low level, non-portable libraries.
There are nicely packaged solutions to this problem, like
but this ended up messing with the terminal more than required (it puts it all
the way into
raw mode, instead of stopping at
wrote my own.
Secondly, it's amazing how a relatively simple application can end up tracking
more state than expected, and
manually diffing that state.
I also spent time
moving errors to be part of the domain,
which isn't even fully surfaced yet. It amazes me how much code ends up being
dedicated to error handling, even in a language with excellent terse error
handling. (Terminology from
It's also ended up with nine dependencies, although between four and six of
those are for loading the (trivial) config file, which could be done better.
The world's understanding of cryptography, the guarantees provided,
and the practical safety and limitations, is lacking.
Cryptography, and computer security in general, is discussed in terms
of some use-cases. A use-case is addressed by combining some
primitives, and there's frequently multiple different algorithms which
can provide a primitive.
First, let's look at some use-cases:
- I want to do some banking on my bank's website.
- My bank wants to know that my genuine EMV ("Chip and Pin") card
is doing a purchase.
- I want to store a big file privately, but only remember a short password.
- I want the recipient to know that it was actually me that wrote an email.
None of these mention cryptography, or even really that security is expected,
but the requirement for security is implied by the context.
Let's pick one of these, and have a look at what's involved: "I want to store
a big file privately, but only remember a short password.".
This normally comes up with backups. You want to store your data (your family
photos?) on someone else's computer (Amazon's?), but you don't trust them. You
want to remember a password, and have this password (and only this password) be
able to unlock your precious data.
This is normally realised by:
- Making a key from the user's password.
- Using this key to scramble and protect the data.
Those are our two primitives.
After these steps have been applied, it should be impossible for anyone to
un-scramble the data without guessing the password. It's also impossible for anyone
to modify the data without us realising.
Everything one of our use-cases, and the vast majority of use-cases in the real
world, can be built from a small set of primitives. Here's a list, including the
two from above:
- Deriving a key from a password.
- Using a key to scramble and protect data.
- Agreeing on a key with an online, remote computer you know nothing about.
- Protecting something, such that it can only be read by someone you know something about.
- Proving you wrote something, given the other computer already knows something about you.
That's it. Those are our operations. Now, we can build the whole world.
But first, a quick note on security: In the modern Internet era, since ~1993 (25 years!),
only #5 has ever been practically attacked in any way. The others are practically perfect.
There have been lots of security problems, and things have had to change to remain secure.
These have mostly been:
- Computers have got fast enough that it's been possible to increase some of the
"security parameters" in some of the primitives, long before computers have
practically been fast enough to actually hurt any of the primitives.
- People have used weaker primitives, or kept old or weak systems running long past
when they should have been turned off. That, or they have been
to use these weaker systems.
- Software bugs. The primitives are complicated, built from lots of algorithms, and the
algorithms are hard enough to implement correctly on their own. It's hard to test, too!
A lot of components are
much more complicated than necessary,
but we're bad at fixing that.
- Problems with algorithms which don't translate to real world problems for most
use-cases, or that are easy to mitigate once discovered, assuming the relevant people
actually adopt the mitigations.
Now we understand what we have to build stuff from, let's try and attack the hardest
problem: "I want to do some banking on my bank's website."
Once again, I have ordered the wrong hardware from eBay.
This time, it was a set of 433MHz radio transceivers for "Arduino".
The majority of these come with embedded circuitry for sending and
receiving bits. The ones I ordered, however, did not.
The transmitter emits power when its data line is powered. The
receiver emits a varying voltage, which can be ADC'd back into
a value, ~1 -> ~800. This is not digital.
I decided to do everything from scratch. Everything.
A useful simple radio protocol is known as "OOK" or "ASK":
You turn the radio on when you're sending a "1", you turn it
off when you're not.
is amazingly simple; you turn on the radio, and you turn it off.
These fourteen lines of code actually send two bits, for reasons
which will become horrifying later.
Or now. Radio is incredibly unreliable. This is worked around
by layering all kinds of encodings / checksums together, and
hoping everything works out. (Narrator: It doesn't work out.)
The first type of encoding used is called "Manchester Encoding".
This involves doubling the amount of data you send, but gives you
lots of scope for detecting problems. For a
1, you send
and for a
10. That is, if you see a
111 or a
your stream, you know something's gone wrong.
So, to send the number
0110, we're going to send
10_01_01_10. This is why the sending code
sends two bits.
The receiver's job is much more horrifying. The receiver has
"samples" from a radio (a three-digit integer), at unknown time
intervals. The minimum value read varies wildly with environmental
conditions, as does the peak value (the value you hope to see
when the transmitter is sending).
For this purpose, the receiver has multiple levels of filtering.
it takes a
fast moving average over the received signal,
and a "slow" moving average over the
background noise (the average
of all samples), and our guess as to the
fast moving average is greater than half way up this band,
it's probably a
This can be observed in the code by enabling
and rebooting the board. This initially has a bad idea of what
the noise environment looks like, so will look like this:
background: 8 sig_high:99 high:47 trigger:53 -- ..XXX.XXXXXXX.XXXXXXXXXXXXXXX...................................XXX.............................................................
background: 6 sig_high:96 high:87 trigger:51 -- .....................................................XXX....XX..................................................................
Here, it's got a very narrow range, so triggering too often and
emitting lots of nonsense bits (the
XXXs). After a while, it will
background: 28 sig_high:159 high:757 trigger:93 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX....X..XX..XXX...............................................................................
background: 27 sig_high:163 high:450 trigger:95 -- ................................................................................................................................
background: 26 sig_high:165 high:26 trigger:95 -- ................................................................................................................................
Here, its background estimate is higher, but its
is much higher, so the trigger is higher, and it doesn't
incorrectly trigger at all. (Those
XXXs are part of a real signal.)
we "decimate" this signal down a lot, by taking a binary average of
finite blocks. As the sample rate is still significantly higher than
the length of a bit, it does not matter that these are not well
aligned. We then count the length of runs of each state we see,
ignoring single errors and overly long runs.
As Arduinos, and the radio hardware, don't do anything like what
you tell them, it's impossible to know in advance how long (in
milliseconds) a pulse will be, or how long of a run represents a
Fixing this problem is called "clock recovery", we need to guess
how long a pulse is according to us, regardless of what the sender
thinks it's doing.
Manchester encoding helps with clock recovery. The transmitter
sends a "preamble"
of zeros, which are encoded as
10101010, that is, a series of
pulses. The receiver uses this
to guess how long a pulse is, and to check the guess is correct.
This code is looking for a high (and keeping the length of this
high), then a low of the same length, then another high/low.
If we see these, then we're reasonably confident we're
synchronised to the signal.
DEBUG_CLOCK which watches this phase working:
7: (XXXXXXX_.......) 0 (XXXXXXX_.......) 0 (_XXXXXXX_..............) 0 (_XXXXXXXXXXXXXX) 1 (...................) end (encoding violated)
Here, it's guessed the length of seven, then seen a two normal
0s, then a
1, with the double-length
0 pulse in
the centre. After this, the transmitter went silent, and hence
we saw a stream of
000s. Three zeros is invalid in Manchester encoding
so we stopped decoding.
So! We've got a stream of bits, and an end. From this, we need
to find the start of the message. I've chosen to implement this
by sending a long stream of zeros, then two ones, then immediately
the data. This scheme doesn't seem ideal, but it does work.
The decoder waits for this condition to happen,
then starts to read bytes.
The bytes are transmitted as 8-bits (MSB last, unlike normal),
with a parity bit. This explains the
last piece of unexplained code
in the transmitter!
There's also a debugger for this, in
we can see it waiting for
XX (the second accepted
bracketed), then reading the next nine bits and checking the
parity. Note that there's no synchronisation for the second
byte, as it's assumed we're still synchronised:
..X(X)...XX.... => 24 (parity: 00)
X..XX..X. => 153 (parity: 00)
X......X. => 129 (parity: 00)
Here, a failure looks like:
X(.X(..X(X)..X.X.... => 20 (parity: 00)
..X.X...X(X)...XX.... => 24 (parity: 00)
To be honest, I have no real idea what's gone wrong here.
The cleaned up data stream looks like
001100 could be synchronisation, or it could be the
encoded "24". Argh! Why would you pick this sequence for
The actual data being sent is a temperature reading, encoded
as two bytes,
(int)Celsius, and the decimal part as a single
As corruption was still getting through at this level, an
extra checksum is computed, as the
xor of these two bytes
together. Finally, it's mostly reliable. With all the debugging
disabled, it looks like:
Value checks out: 24.70
Shame the temperature sensor varies (by about 2C) from my other
sensors. It also loses about half the messages to errors, as
there's no error recovery at all.
Wasn't that fun?
- What would a normal software decoder look like for this?
Probably about as bad. I wrote an example
FSK decoder as part
of a radio manipulation tool I wrote, named quadrs.
- How far is this radio transmitting?
About three centimetres.
It's about three months until
Java 9 is supposed to be released.
Debian contains around 1,200 packages that build against Java 8.
I have been trying to build them with Java 9.
It's not been going well.
The first attempt
found an 87% failure rate, the majority of which were either:
This is too bad to get an idea of what's actually broken, so I gave up.
The second attempt
has gone better, only 57% failures. This had a number of issues fixed, but
there's still a large number of problems masked by toolchain failures.
However, some real Java 9 breakages are coming to the fore!
Oh, and 135 packages have an unknown problem,
so maybe there's a whole other class of bug I've missed.
This is (obviously) an ongoing project,
but I thought I'd write up what I'd seen so far.
Also, I wanted to mention how cool it was to hack up a dashboard for your
ghetto make/Docker build process
in ghetto shell,
although slightly less ghetto than the
Every 2.0s: ./classify.sh
ascii cast deps doclint javadoc keyword modules unknown version
47 total 9 total 203 total 82 total 165 total 15 total 83 total 114 total 206 total
==== ==== ==== ==== ==== ==== ==== ==== ====
antelope charactermanaj access-*hecker android*-tools access-*hecker avalon-*mework activem*otobuf adql 389-adm*onsole
axis dom4j activem*tiveio antlr4 akuma axis android*ork-23 aspectj 389-ds-console
azureus electric activemq args4j animal-sniffer biojava-live android*dalvik aspectj*plugin airport-utils
clirr findbugs activem*otobuf bindex annotat*ndexer bnd android*oclava bouncycastle android*-tools
cmdreader jajuk afterburner.fx cdi-api antlr3 dbus-java android*silver bsh antlr
cortado jsymphonic akuma classycle apache-log4j2 jalview android*inding closure*mpiler artemis
cronometer libjaud*r-java animal-sniffer commons-math3 apache-mime4j javacc4 android*ibcore cofoja beansbinding
dita-ot olap4j annotat*ndexer ditaa async-h*client java-gnome android*ibrary commons-jcs bindex
eclipselink sweethome3d antlr3 felix-g*-shell axmlrpc jmol android*apksig convers*ruptor biojava-live
eclipse antlr4 fest-assert bcel jruby-openssl android*s-base davmail biomaj
entagged apache-log4j2 fest-reflect bridge-*jector libcomm*g-java android*ls-swt diffoscope brig
fop args4j fest-util bsaf libxml-*1-java android*helper dnsjava brltty
geronim*0-spec atinjec*jsr330 ganymed-ssh2 build-h*plugin libxpp2-java ant dumbster cadencii
imagej bcel gentlyw*-utils canl-java mvel apktool eclipse*nyedit castor
jasmin-sable bridge-*jector glassfish cglib squareness basex eclipse-cdt cdi-api
jas build-h*plugin hdf5 commons*nutils bintray*t-java eclipse*config ceph
jasypt carrots*h-hppc hessian commons*ration biojava4-live eclipse-eclox cobertura
javacc4 cglib intelli*ations commons-csv easybind eclipse-emf coco-java
javaparser checkstyle jacksum commons-io eclipse-mylyn eclipse-gef colorpicker
jets3t codenarc jcm commons*vaflow eclipse-wtp eclipse*clipse commons*client
jgromacs commons*nutils jfugue commons-jci eclipse-xsd eclipse*es-api concurr*t-dfsg
king commons*ration jmock commons-math freeplane eclipse-rse cvc3
knopfle*h-osgi commons-csv jnr-ffi cssparser gant eclipse*clipse db5.3
libcds-*t-java commons-io jpathwatch csvjdbc gradle-*plugin emma-coverage dbus-java
libcomm*g-java commons*vaflow jsurf-alggeo dirgra gradle gdcm dicomscope
libidw-java commons-jci jts dnssecjava gradle-*otobuf geronim*upport ditaa
libiscwt-java commons-math libcds-*c-java dokujclient gradle-*plugin gettext docbook*-saxon
libitext-java commons-parent libcomm*c-java doxia-s*etools graxxia gluegen2 doxia
libjdbm-java commons-vfs libcomm*4-java dtd-parser groovycsv gnome-split easyconf
libjt400-java core-ca*lojure libcomm*2-java easymock groovy h2database excalib*logger
libstax-java cssparser libhac-java felix-b*sitory gs-collections ha-jdbc excalib*logkit
libvldo*g-java data-xm*lojure libhtml*r-java felix-f*mework htsjdk hdrhistogram f2j
libxpp3-java dirgra libirclib-java felix-g*ommand ice-bui*gradle icu4j felix-osgi-obr
livetri*jsr223 dnssecjava libjaba*t-java felix-g*untime insubstantial icu4j-4.2 fontchooser
mathpiper dokujclient libjgoo*n-java felix-shell ivyplusplus icu4j-4.4 ganymed-ssh2
maven-a*helper doxia libjhla*s-java felix-s*ll-tui jabref icu4j-49 gentlyw*-utils
metastudent doxia-s*etools libjoda*e-java felix-utils jackson*tabind istack-commons geogebra
naga dtd-parser libjsonp-java geronim*0-spec jackson*-guava jakarta-jmeter gridengine
ognl easymock libjsr1*y-java geronim*1-spec java3d janino healpix-java
checkrestart, part of
checks what you might need to restart. You can run it after a system
update, and it will find running processes using outdated libraries.
If these can be restarted, and there's no new kernel updates, then
you can save yourself a reboot.
checkrestart is pretty dated, and has some weird behaviour.
It frequently reports that there are things needing a restart, but
that it doesn't feel like telling you what they are. (This makes me
moderately angry). It's pretty bad at locating services on a
systemd-managed system. It tries to look through Debian packages,
making it Debian specific (along with unreliable). This is especially
odd, because systemd knows what pid belongs to a unit, as does
Instead of fixing it, I have rewritten it from scratch.
find-deleted is a tool
to find deleted files which are still in use, and to suggest systemd
units to restart.
The default is to try and be helpful:
- sudo systemctl restart mysql.service nginx.service
- sudo systemctl restart bitlbee.service
- sudo systemctl restart fail2ban.service systemd-timesyncd.service tlsdate.service
- sudo systemctl restart dbus.service lxc-net.service lxcfs.service polkitd.service
Some processes are running outside of units, and need restarting:
-  faux: 7161 17338 14539
-  faux: 2082
-  vrai: 8551 8556
Here, it is telling us that a number of services need a restart.
The services are categorised based on some patterns defined in the
associated configuration file,
For this machine, I have decided that restarting
will cause a
blip in the service to users; I might do it at an off-peak
time, or ensure that there's other replicas of the service available
to pick up the load.
My other categories are:
- drop: A loss of service will happen that will be annoying for users.
- safe: These services could be restarted all day, every day, and nobody would notice.
- scary: Restarting these may log you out, or
cause the machine to stop functioning.
- other: things which don't currently have a classification
If you're happy with its suggestions, you can copy-paste the above commands,
or you can run it in a more automated fashion:
systemctl restart $(find-deleted --show-type safe)
This can effectively be run through provisioning tools, on a whole selection
of machines, if you trust your matching rules! I have done this with a much
more primitive version of this tool at a previous employer.
It can also print the full state that it's working from, using