2015-11-18
xlines
is a combination of xargs
and split
. It takes a bunch of lines,
and sends them to a number of child processes. Each process sees
only one of the lines.
e.g.
seq 16 | xlines -c 'cat > $(mktemp)'
...will give you 8 temporary files (on an 8-core machine) containing:
1
9
and:
2
10
etc.
Why would you care?
You have a bunch of INSERT
statements coming off a stream, but your
database will only use a single core if you run them in series:
zcat sql.gz | xlines -P 32 -- psql
Some speed-up.
zcat sql.gz | xlines -P 32 -c 'buffer | psql'
Zoom.
A specific tool to fix a specific job. I still don't think it makes up
for the lack of limited parallelism in shell, however. Still thinking about
that one...
2015-11-12
As the UK's politicians continue to fail to understand what "strong cryptography"
or "banning" even mean, I thought I would have a look at how simple strong
cryptography can be.
nanorc4
is a working RC4
encryption and decryption implementation in 16-bit assembly. It will run on
any 32-bit (or, presumably, 16-bit!) Windows machine (which, admittedly,
are going out of fashion), and on dosbox
:
uwACiB/+w3X6MckxwIjIih6AAP7L9vOI44qHgv6Iy4onAOAAxegvAP7Bdd8xybQIzSH+wYjLAi/o
HACIy4oXiOsCF4jTihcwwrQCzSG0C80hhMB12c0giMuKF4jrijeIF4jLiDfD
Yep, that's it. base64
encoded. 102 bytes, or 138 encoded. Fits in
a tweet.
Probably small enough to memorise. Certainly pretty hard to ban.
With this (and your computer) you can secure a message with a password in a way
that's unbreakable. I can't break it, your government can't break it, other
people's governments can't break it. Secure.
Why's it so small?
- The problem is (relatively) easy. This is known as "pre-shared key cryptography",
or "symmetric cryptography", which are one of the easier problems in the science.
Things get much harder when you don't have a good way to tell the target the key
in advance.
- RC4 is surprisingly secure for how simple the code is.
- 16-bit assembly, and the COM "format" have no preamble: it's just the code.
It just starts executing at the start. (And I hacked at it a bit.)
Demo!
> echo hi | one.com secure password>out ; in DOS (note: no trailing space)
$ make c && ./c 'secure password' <out # on linux
hi
Should you use it? No. There's many important missing features that are present in proper
symmetric encryption tools, such as proper key derivation, protection against modification,
IVs,
and fewer bugs. Yes, even this 102 byte program has some significant bugs I couldn't
be bothered to fix.
Is RC4 secure? For this use-case, yes. For TLS, most certainly not. Even today there are
many plausible attacks against RC4 in the TLS context, but none of them apply to this
static-data world.
I was actually hoping to be able to fit RC4-drop-N in, which is
probably secure in many more contexts, but I couldn't get the byte count down to the
(tweet-derived) target. I guess this makes for a reasonable golf competition...
Development notes:
dosbox
is pretty annoying, but so is cmd
. The dosbox
debugger is cool, but there
doesn't seem to be any current documentation on it. That Forum Post is pretty wrong.dosbox
doesn't support pipes or <input
redirection, so I couldn't debug with binary
files, which is one of the reasons it doesn't work.- I have no idea what the actual semantics of the input interrupts are, all the useful
documentation seems to have been lost to history, or was commercial (and/or paper)
in the first place.
- Everything fits in three 256-byte blocks, so the
bh
register == block number, and
there's no use of memory segmentation (WOOO). - block 0: the PSP,
which I couldn't overwrite as it has the key in (as the command-line argument).
- block 1: the code segment
- block 2: the 256-byte state for RC4.
- After the key setup, the bh is left at
2
forever. cl
and ch
are used for the i
and j
state parts in RC4.
Update:
- A number of people pointed me at
Odzhan's RC4 implementation in normal x86(_64)
which shows a much better understanding of actual assembly programming. For example,
their "swap" implementation is amazing compared to mine.
- Some people asked how much hacking it took to get the size down. It took about six hours,
but it was great. I love golf competitions, even if they're just against myself.
- There was some concern that people might actually accidentally run or incorporate the code
without understanding the flaws, as there isn't a big enough warning on this page, or on
github. These people additionally didn't read any of the rest of the article, where it
is explained that it's broken, 16-bit x86 assembly which you actually can't run anywhere,
even if you wanted to.
2015-10-04
Four years ago, I was working on a project that would require users to connect
to it over ssh. At the time, asking typical users (even developers!) to send
you an ssh public key was a bit of an involved operation.
The situation hasn't improved much.
For example, github suggests generating the keys manually,
then using Windows' clip.exe
or apt-get install xclip && xclip
(from the
command line) to get the key into the clipboard, then pasting it into their
web-interface. Ugh.
The situation is a little better for PuTTYTray,
it has built-in support for SSH agent, and a reasonably streamlined way to get keys
into the clipboard, but,
then, we're still using the clipboard-into-the-web-interface story. This was
written in 2013-08, two years too late (although I'm sure the author could have been
convinced to move the development forward).
For this project, I came up with a better way.
I realised I could simply ask the new user to ssh in, and capture their keys. To
distinguish concurrent users, I could issue them a fake username, and ask them to
ssh account-setup-for-USERNAME@my.service.com
. When they do, I can capture their
keys and automatically associate them with their account. No platform specific
commands, no unnecessary messing around in the terminal.
This is possible due to how ssh authentication works:
- Client sends the username.
- Server replies: Sure, you can try logging in with keys, or with passwords if you want.
- Client sends Public Key 1.
- Server replies: Nope, but you can try other keys or passwords.
- Client sends Public Key 2.
- Server replies: ...
That is, the standard ssh client will just send you all the user's public keys.
Note that this isn't (normally)
considered a security problem; the keys are public, after all, and the server isn't
leaking any information by saying "nope".
As I was already running a custom SSH server
which practically required you to implement authentication yourself anyway, it was a
simple step to add key capture to the account setup procedure. I've uploaded a stripped
down version to github if you want to see
how it works. For example,
Start the server:
server% git clone https://github.com/FauxFaux/ssh-key-capture.git
server% cd ssh-key-capture
server% ./gradlew -q run
The user can try and login, but gets rejected (this isn't reqiured):
john% ssh -p 9422 john@localhost
Permission denied (publickey).
Server logs from the (unnecessary) failed authentication:
KeyCapture - john trying to authenticate with RSA MIIBIjANBg...
KeyCapture - john trying to authenticate with EC MFkwEwYHKoZ...
Tell the server that john
has signed up, or wants to add keys, or...
Enter a new user name, or blank to exit: john
Ask 'john' to ssh to '18a74d9f-5c7d-41d0-8369-bae4aaba9867@...'
John now adds his keys, and hence can login:
john% ssh -p 9422 18a74d9f-5c7d-41d0-8369-bae4aaba9867@localhost
Added successfully! You can now log-in normally.
Connection to localhost closed.
john% ssh -p 9422 john@localhost
Hi! You've successfully authenticated as john
Bye!
Connection to localhost closed.
Future work:
- It could capture all of the user's keys (it currently just captures the first).
- More meaningful behaviour after the first authenticaiton, or during the admin part of the setup?
- Some way to do this on top of OpenSSH, or other tools people actually run in the wild. PAM?
Update: There was some
decent discussion on reddit's /r/netsec
about this post.
2015-10-01
ansible-ghetto-json is an
ansible module for making quick edits to JSON files.
Ansible has great built-in support for ini files,
but a number of more modern applications are using JSON for config files.
ghetto_json
lets you make some types of edits to JSON files, and remains simple enough that
it's hopefully easier just to extend than to switch to a different module, and you won't feel
too guilty just copy-pasting it into your codebase.
More details are in its README, which you can view on the above github link.
It offers an interesting oppotunity to think about type conversion:
JSON actually supports more types than you would normally think of;
ints, floats, null
s, booleans, as well as the trusty string type. Python,
which I still don't think of as a typed language, uses and honours these types
in its JSON module, meaning you have to do conversion.
And, if it explicitly supports null
, how do you do removals? I made up a new
keyword, unset
, which removes the key. Pretty ghetto.
2015-05-07
lxc comes with a tool named lxc-autostart
which can help you start
your containers at boot, all you have to do is set lxc.start.auto = 1
in the config file and it will
start your containers for you... if you're running your containers as root.
For convenience and security, I'm not running my containers as root. Normally, if I wanted to start
something on boot, as a limited user (or possibly as a service), I'd use the cron
@reboot
hack:
$ crontab -l
@reboot /usr/bin/lxc-autostart
This, however, fails for lxc-autostart
(and for lxc-start
, for the same reason): cron
runs your
command in a bizarre environment which, importantly, doesn't have the user's cgroups setup properly.
These are setup somewhere scary (pam
?), and cron
apparently doesn't do a proper log-in for your user.
You can observe the failure with some:
* * * * * cat /proc/self/cgroup
...which will show you have junk cgroups, which makes lxc-start
angry with terrible, terrible errors:
cgmanager[1041]: cgmanager:do_create_main: pid 5679 (uid 1000 gid 1000) may not create under /run/cgmanager/fs/blkio/system.slice/autostart.service
cgmanager[1041]: cgmanager:do_create_main: pid 5679 (uid 1000 gid 1000) may not create under /run/cgmanager/fs/cpu/system.slice/autostart.service
...
cgmanager[1041]: cgmanager: Invalid path /run/cgmanager/fs/blkio/system.slice/autostart.service/lxc/utopic
cgmanager[1041]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/blkio/system.slice/autostart.service/lxc/utopic
cgmanager[1041]: cgmanager: Invalid path /run/cgmanager/fs/cpu/system.slice/autostart.service/lxc/utopic
cgmanager[1041]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/cpu/system.slice/autostart.service/lxc/utopic
...
The easiest way for a limited user to solve this is, as far as I'm aware, ssh
to localhost
.
Limited users can't configure sudo
to be passwordless, and can't su
without entering their password on a
proper terminal, meaning neither work from cron
.
$ ssh-keygen -t ed25519
$ ssh-copy-id localhost
$ crontab -l
@reboot /usr/bin/ssh me@localhost /usr/bin/lxc-autostart
This was working great, until the Ubuntu Vivid upgrade, which has bought the wonders of systemd
.
Under systemd
, the @reboot
entries are sometimes processed before sshd
has started, so the above
massive hack fails.
$ crontab -l
@reboot sleep 10 && /usr/bin/ssh ...
NO. NO NO NO.
Under systemd
, we can write a simple service file that does the auto-start. systemd
understands cgroups,
so if you ask it to run a service as a User=
, it'll run the service in the user's cgroup, right?
Nope: It runs everything in the service cgroup. Fair enough.
However, as the service is started as root, we can use su
.
A systemd
service: /etc/systemd/system/autostart.service
:
[Unit]
Description=lxc-autostart
After=network.target
[Install]
WantedBy=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/su me -c '/usr/bin/lxc-autostart'
And install it:
$ sudo systemctl enable lxc-autostart.service
This seems to work. I'm not sure if the After=
is necessary;
network.target
is a complex beast
but I still feel safer waiting for something to be alive.
2015-05-04
A couple of people mocked my use of shell in my last post, so I thought
I'd write up a couple of problems I solved this week, to allow me to laugh at the solutions in the
future, and, more importantly, for you to laugh at them now.
White-space has been inserted into the examples for bloggability, but all these are what I actually wrote,
as one-liners.
Downloading matching files
I've got the http:// URL of an indexed directory,
which contains a load of large files. I want to download all the files with -perl
and .html
in their
names.
First thought: wget
probably has a flag for this:
wget -np --mirror --accept='*-perl*.html' https://example.com/foo/
This actually produces the right output, but... it downloads all the files, then deletes the ones that it
doesn't want to keep. My guess is that it's sucking links out of the intermediate files. Maybe this could
be fixed by limiting the recursion depth, instead of using mirror? This isn't what I did, however:
curl https://example.com/foo/ | \
cut -d'"' -f 8 | \
fgrep -- -perl | \
sed 's,^,https://example.com/foo/,' | \
wget -i-
Yep, that works. Very unix-y solution, every tool only doing one thing.
Breaks horribly if the input is wrong. Fast, as it only looks at files it needs, and wget
manages a
connection pool for you (whereas for u in $urls; wget $u
wouldn't).
Line counts in a git repository
I've got a checkout of a git repository, and I want to know roughly how many lines of production code
there are in it. It's a Java codebase, so most production code is in */src/main
or client/src
.
find -maxdepth 3 -name main -o -name client | xargs sloccount
Why didn't I use -exec
here? Probably paranoid of -exec
with -o
. Correct solution:
find -maxdepth 3 \( -name main -o -name client \) -exec sloccount {} +
Two massive problems, anyway:
- There's a load of generated or downloaded code in those directories; build output, downloaded modules, ...
sloccount
really hates taking multiple directories as input, especially when they have the
same (base)name, and just ignores some of them.
Next up, let's use git ls-files
to skip ignored files:
git ls-files | egrep 'src/main|client/src' | xargs cat | wc -l
Barfs as there's white-space in the file names, which I wasn't expecting.
Could probably work around it with some:
... | while read line; do cat $line; done | wc -l
...but we may as well fix the real problem. git ls-files
has -z
for null-terminated output, and
xargs has -0
for null-terminated input. grep
has -Z
for null-terminated output... but I couldn't
find anything that would make it take null-terminated entries as input. Sigh.
Wait, it's git. We can just clone the repo. (cdt
cd
s to a new t
emporary directory)
cdt; git clone ~/code/repo .
...then we can use find again:
find \( \
-name \*.java -iwholename '*src/main*' \
-o \
-name \*.js -iwholename '*client*' \
\) -exec cat {} + | \
grep -v '^$' | \
egrep -v '^[ \t]*//' | \
wc -l
Close enough to the expected numbers! Now, let's backfill graphite:
(for rev in $(g rev-list --all | sed '1~50p'); do
g co -q $rev
echo code.production $(!!) $(g show --format=%at | head -n1)
) | grep -v ' 0 ' | nc localhost 2444
Woo!
« Prev
-
Next »