Faux' blog

2015-11-18

xlines: stdin round-robiner

xlines is a combination of xargs and split. It takes a bunch of lines, and sends them to a number of child processes. Each process sees only one of the lines.

e.g.

seq 16 | xlines -c 'cat > $(mktemp)'

...will give you 8 temporary files (on an 8-core machine) containing:

1
9

and:

2
10

etc.

Why would you care?

You have a bunch of INSERT statements coming off a stream, but your database will only use a single core if you run them in series:

zcat sql.gz | xlines -P 32 -- psql

Some speed-up.

zcat sql.gz | xlines -P 32 -c 'buffer | psql'

Zoom.

A specific tool to fix a specific job. I still don't think it makes up for the lack of limited parallelism in shell, however. Still thinking about that one...

2015-11-12

Teensy weensy crypto

As the UK's politicians continue to fail to understand what "strong cryptography" or "banning" even mean, I thought I would have a look at how simple strong cryptography can be.

nanorc4 is a working RC4 encryption and decryption implementation in 16-bit assembly. It will run on any 32-bit (or, presumably, 16-bit!) Windows machine (which, admittedly, are going out of fashion), and on dosbox:

uwACiB/+w3X6MckxwIjIih6AAP7L9vOI44qHgv6Iy4onAOAAxegvAP7Bdd8xybQIzSH+wYjLAi/o
HACIy4oXiOsCF4jTihcwwrQCzSG0C80hhMB12c0giMuKF4jrijeIF4jLiDfD

Yep, that's it. base64 encoded. 102 bytes, or 138 encoded. Fits in a tweet. Probably small enough to memorise. Certainly pretty hard to ban.

With this (and your computer) you can secure a message with a password in a way that's unbreakable. I can't break it, your government can't break it, other people's governments can't break it. Secure.

Why's it so small?

The problem is (relatively) easy. This is known as "pre-shared key cryptography", or "symmetric cryptography", which are one of the easier problems in the science. Things get much harder when you don't have a good way to tell the target the key in advance.
RC4 is surprisingly secure for how simple the code is.
16-bit assembly, and the COM "format" have no preamble: it's just the code. It just starts executing at the start. (And I hacked at it a bit.)

Demo!

> echo hi | one.com secure password>out ; in DOS (note: no trailing space)

$ make c && ./c 'secure password' <out  # on linux
hi

Should you use it? No. There's many important missing features that are present in proper symmetric encryption tools, such as proper key derivation, protection against modification, IVs, and fewer bugs. Yes, even this 102 byte program has some significant bugs I couldn't be bothered to fix.

Is RC4 secure? For this use-case, yes. For TLS, most certainly not. Even today there are many plausible attacks against RC4 in the TLS context, but none of them apply to this static-data world.

I was actually hoping to be able to fit RC4-drop-N in, which is probably secure in many more contexts, but I couldn't get the byte count down to the (tweet-derived) target. I guess this makes for a reasonable golf competition...

Development notes:

dosbox is pretty annoying, but so is cmd. The dosbox debugger is cool, but there doesn't seem to be any current documentation on it. That Forum Post is pretty wrong.
dosbox doesn't support pipes or <input redirection, so I couldn't debug with binary files, which is one of the reasons it doesn't work.
I have no idea what the actual semantics of the input interrupts are, all the useful documentation seems to have been lost to history, or was commercial (and/or paper) in the first place.
Everything fits in three 256-byte blocks, so the bh register == block number, and there's no use of memory segmentation (WOOO).
block 0: the PSP, which I couldn't overwrite as it has the key in (as the command-line argument).
block 1: the code segment
block 2: the 256-byte state for RC4.
After the key setup, the bh is left at 2 forever.
cl and ch are used for the i and j state parts in RC4.

Update:

A number of people pointed me at Odzhan's RC4 implementation in normal x86(_64) which shows a much better understanding of actual assembly programming. For example, their "swap" implementation is amazing compared to mine.
Some people asked how much hacking it took to get the size down. It took about six hours, but it was great. I love golf competitions, even if they're just against myself.
There was some concern that people might actually accidentally run or incorporate the code without understanding the flaws, as there isn't a big enough warning on this page, or on github. These people additionally didn't read any of the rest of the article, where it is explained that it's broken, 16-bit x86 assembly which you actually can't run anywhere, even if you wanted to.

2015-10-04

Capturing users' ssh keys

Four years ago, I was working on a project that would require users to connect to it over ssh. At the time, asking typical users (even developers!) to send you an ssh public key was a bit of an involved operation.

The situation hasn't improved much.

For example, github suggests generating the keys manually, then using Windows' clip.exe or apt-get install xclip && xclip (from the command line) to get the key into the clipboard, then pasting it into their web-interface. Ugh.

The situation is a little better for PuTTYTray, it has built-in support for SSH agent, and a reasonably streamlined way to get keys into the clipboard, but, then, we're still using the clipboard-into-the-web-interface story. This was written in 2013-08, two years too late (although I'm sure the author could have been convinced to move the development forward).

For this project, I came up with a better way.

I realised I could simply ask the new user to ssh in, and capture their keys. To distinguish concurrent users, I could issue them a fake username, and ask them to ssh account-setup-for-USERNAME@my.service.com. When they do, I can capture their keys and automatically associate them with their account. No platform specific commands, no unnecessary messing around in the terminal.

This is possible due to how ssh authentication works:

Client sends the username.
Server replies: Sure, you can try logging in with keys, or with passwords if you want.
Client sends Public Key 1.
Server replies: Nope, but you can try other keys or passwords.
Client sends Public Key 2.
Server replies: ...

That is, the standard ssh client will just send you all the user's public keys.

Note that this isn't (normally) considered a security problem; the keys are public, after all, and the server isn't leaking any information by saying "nope".

As I was already running a custom SSH server which practically required you to implement authentication yourself anyway, it was a simple step to add key capture to the account setup procedure. I've uploaded a stripped down version to github if you want to see how it works. For example,

Start the server:

server% git clone https://github.com/FauxFaux/ssh-key-capture.git
server% cd ssh-key-capture
server% ./gradlew -q run

The user can try and login, but gets rejected (this isn't reqiured):

john% ssh -p 9422 john@localhost
Permission denied (publickey).

Server logs from the (unnecessary) failed authentication:

KeyCapture - john trying to authenticate with RSA MIIBIjANBg...
KeyCapture - john trying to authenticate with EC MFkwEwYHKoZ...

Tell the server that john has signed up, or wants to add keys, or...

Enter a new user name, or blank to exit: john
Ask 'john' to ssh to '18a74d9f-5c7d-41d0-8369-bae4aaba9867@...'

John now adds his keys, and hence can login:

john% ssh -p 9422 18a74d9f-5c7d-41d0-8369-bae4aaba9867@localhost
Added successfully!  You can now log-in normally.
Connection to localhost closed.

john% ssh -p 9422 john@localhost
Hi!  You've successfully authenticated as john
Bye!
Connection to localhost closed.

Future work:

It could capture all of the user's keys (it currently just captures the first).
More meaningful behaviour after the first authenticaiton, or during the admin part of the setup?
Some way to do this on top of OpenSSH, or other tools people actually run in the wild. PAM?

Update: There was some decent discussion on reddit's /r/netsec about this post.

2015-10-01

ghetto_json for Ansible

ansible-ghetto-json is an ansible module for making quick edits to JSON files.

Ansible has great built-in support for ini files, but a number of more modern applications are using JSON for config files.

ghetto_json lets you make some types of edits to JSON files, and remains simple enough that it's hopefully easier just to extend than to switch to a different module, and you won't feel too guilty just copy-pasting it into your codebase.

More details are in its README, which you can view on the above github link.

It offers an interesting oppotunity to think about type conversion: JSON actually supports more types than you would normally think of; ints, floats, nulls, booleans, as well as the trusty string type. Python, which I still don't think of as a typed language, uses and honours these types in its JSON module, meaning you have to do conversion.

And, if it explicitly supports null, how do you do removals? I made up a new keyword, unset, which removes the key. Pretty ghetto.

2015-05-07

lxc-autostart for limited users, on systemd

lxc comes with a tool named lxc-autostart which can help you start your containers at boot, all you have to do is set lxc.start.auto = 1 in the config file and it will start your containers for you... if you're running your containers as root.

For convenience and security, I'm not running my containers as root. Normally, if I wanted to start something on boot, as a limited user (or possibly as a service), I'd use the cron@reboot hack:

$ crontab -l
@reboot /usr/bin/lxc-autostart

This, however, fails for lxc-autostart (and for lxc-start, for the same reason): cron runs your command in a bizarre environment which, importantly, doesn't have the user's cgroups setup properly. These are setup somewhere scary (pam?), and cron apparently doesn't do a proper log-in for your user. You can observe the failure with some:

* * * * * cat /proc/self/cgroup

...which will show you have junk cgroups, which makes lxc-start angry with terrible, terrible errors:

cgmanager[1041]: cgmanager:do_create_main: pid 5679 (uid 1000 gid 1000) may not create under /run/cgmanager/fs/blkio/system.slice/autostart.service
cgmanager[1041]: cgmanager:do_create_main: pid 5679 (uid 1000 gid 1000) may not create under /run/cgmanager/fs/cpu/system.slice/autostart.service
...
cgmanager[1041]: cgmanager: Invalid path /run/cgmanager/fs/blkio/system.slice/autostart.service/lxc/utopic
cgmanager[1041]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/blkio/system.slice/autostart.service/lxc/utopic
cgmanager[1041]: cgmanager: Invalid path /run/cgmanager/fs/cpu/system.slice/autostart.service/lxc/utopic
cgmanager[1041]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/cpu/system.slice/autostart.service/lxc/utopic
...

The easiest way for a limited user to solve this is, as far as I'm aware, ssh to localhost. Limited users can't configure sudo to be passwordless, and can't su without entering their password on a proper terminal, meaning neither work from cron.

$ ssh-keygen -t ed25519
$ ssh-copy-id localhost
$ crontab -l
@reboot /usr/bin/ssh me@localhost /usr/bin/lxc-autostart

This was working great, until the Ubuntu Vivid upgrade, which has bought the wonders of systemd.

Under systemd, the @reboot entries are sometimes processed before sshd has started, so the above massive hack fails.

$ crontab -l
@reboot sleep 10 && /usr/bin/ssh ...

NO. NO NO NO.

Under systemd, we can write a simple service file that does the auto-start. systemd understands cgroups, so if you ask it to run a service as a User=, it'll run the service in the user's cgroup, right? Nope: It runs everything in the service cgroup. Fair enough.

However, as the service is started as root, we can use su. A systemd service: /etc/systemd/system/autostart.service:

[Unit]
Description=lxc-autostart
After=network.target

[Install]
WantedBy=multi-user.target

[Service]
Type=oneshot
ExecStart=/bin/su me -c '/usr/bin/lxc-autostart'

And install it:

$ sudo systemctl enable lxc-autostart.service

This seems to work. I'm not sure if the After= is necessary; network.target is a complex beast but I still feel safer waiting for something to be alive.

2015-05-04

Everyday Shell

A couple of people mocked my use of shell in my last post, so I thought I'd write up a couple of problems I solved this week, to allow me to laugh at the solutions in the future, and, more importantly, for you to laugh at them now.

White-space has been inserted into the examples for bloggability, but all these are what I actually wrote, as one-liners.

Downloading matching files

I've got the http:// URL of an indexed directory, which contains a load of large files. I want to download all the files with -perl and .html in their names.

First thought: wget probably has a flag for this:

wget -np --mirror --accept='*-perl*.html' https://example.com/foo/

This actually produces the right output, but... it downloads all the files, then deletes the ones that it doesn't want to keep. My guess is that it's sucking links out of the intermediate files. Maybe this could be fixed by limiting the recursion depth, instead of using mirror? This isn't what I did, however:

curl https://example.com/foo/ | \
   cut -d'"' -f 8 | \
   fgrep -- -perl | \
   sed 's,^,https://example.com/foo/,' | \
   wget -i-

Yep, that works. Very unix-y solution, every tool only doing one thing. Breaks horribly if the input is wrong. Fast, as it only looks at files it needs, and wget manages a connection pool for you (whereas for u in $urls; wget $u wouldn't).

Line counts in a git repository

I've got a checkout of a git repository, and I want to know roughly how many lines of production code there are in it. It's a Java codebase, so most production code is in */src/main or client/src.

find -maxdepth 3 -name main -o -name client | xargs sloccount

Why didn't I use -exec here? Probably paranoid of -exec with -o. Correct solution:

find -maxdepth 3 \( -name main -o -name client \) -exec sloccount {} +

Two massive problems, anyway:

There's a load of generated or downloaded code in those directories; build output, downloaded modules, ...
sloccount really hates taking multiple directories as input, especially when they have the same (base)name, and just ignores some of them.

Next up, let's use git ls-files to skip ignored files:

git ls-files | egrep 'src/main|client/src' | xargs cat | wc -l

Barfs as there's white-space in the file names, which I wasn't expecting. Could probably work around it with some:

... | while read line; do cat $line; done | wc -l

...but we may as well fix the real problem. git ls-files has -z for null-terminated output, and xargs has -0 for null-terminated input. grep has -Z for null-terminated output... but I couldn't find anything that would make it take null-terminated entries as input. Sigh.

Wait, it's git. We can just clone the repo. (cdtcds to a new temporary directory)

cdt; git clone ~/code/repo .

...then we can use find again:

find \( \
        -name \*.java -iwholename '*src/main*' \
    -o \
        -name \*.js -iwholename '*client*' \
\) -exec cat {} + | \
grep -v '^$' | \
egrep -v '^[ \t]*//' | \
wc -l

Close enough to the expected numbers! Now, let's backfill graphite:

(for rev in $(g rev-list --all | sed '1~50p'); do
    g co -q $rev
    echo code.production $(!!) $(g show --format=%at | head -n1)
) | grep -v ' 0 ' | nc localhost 2444

Woo!

« Prev - Next »