Finding even more untagged posts on Tumblr

When I wrote my original script for finding untagged Tumblr posts, I expected it to be a one-off. I never expected to write a dedicated site, or for that site to become the most popular thing I’ve ever made. I’ve been flattered by some of the emails and tweets I’ve received about the site.

But I’ve also been letting it stagnate. I’ve been putting off a steady trickle of bug reports and feature requests, and the site was getting rough around the edges. On Monday, I inadvertently broke the site completely with some changes on the blog, so I decided that it was finally time to fix it.

This is a fairly major update, which I’m calling “v2.0”. It’s a ground-up rewrite that makes the site much simpler and easier to maintain.

Along with a fresh coat of paint and lots of bug fixes, there are a few new features:

  • A progress meter, so you can see how many posts have been checked (and how many are still to go)
  • Better filtering, allowing you to include/exclude reblogs and filter on post type
  • An improved mobile design

As always, the URL is and the code is on GitHub.

Feedback, bug reports, etc. can be sent via email or on Twitter.

Useful Bash features: exit traps

I recently discovered exit traps, and I think they’re a really neat feature of Bash.

Exit traps are a resilient way to handle cleanup code in Bash scripts. You “trap” a function on a special exit code, and then when the script exits, that function gets called – even if the script died unexpectedly. That means the cleanup work is always finished.

The linked article is a good introduction to traps. It explains how they works, and has several examples of how traps might be used. Here’s another example from my recent work:

As part of my day job, I’ve been doing some tests to monitor the performance impact of our code. As the test runs, I have a shell script running alongside which records the load on the system.

At the end of the test, I want to save a copy of the logs, so that I can correlate events with the load on the system. I was doing it by hand, but I would often forget. Then I found exit traps, and the solution was simple. I added a few lines to my monitoring script:

function save_logs {
    if [[ -f /var/log/calico/felix.log ]]; then
        mv /var/log/calico/felix.log "$TESTID"_felix.log

trap save_logs exit

and now I get a set of saved logs after every test. It’s reliable, robust, and one less thing for me to think about.

If you write any shell scripts yourself, I suggest you look at exit traps – I think you’ll find them very useful.

Persistent IPython notebooks in Windows

I’ve been using IPython for about six months, and I’ve grown to love the web-based notebook interface. It’s became my go-to environment when I want to do simple calculations, or test a new idea in Python. It’s also a lovely environment for literate programming, and I wish I’d had it for my university coursework.1

On my Mac, I’ve been using some scripts by Nathan Grigg to keep my IPython notebook server running continuously, and to give it a nicer hostname. Over the weekend, I realised that 1) IPython would be really useful at work, and 2) since my work computer is a PC, I need to adapt his scripts to work with Windows.

This post explains how to get a persistent IPython notebook on Windows. The ideas are based on Nathan’s post, but the implementation is a little different.

Continue reading →

Safer file copying in Python

Using scripting and the command line can be a two-edged sword. They’re very powerful tools, but they make it easy to shoot yourself in the foot. They assume that you know what you’re doing, even if it might be dangerous.

An example I often run in to is moving or copying files. If you try to copy over an existing file, these tools will often scribble over the old file without any warning. For example, here’s the description for shutil.copyfile:

shutil.copyfile(src, dst, *, follow_symlinks=True)

Copy the contents of the file named src to a file named dst and return dst. […] If dst already exists, it will be replaced.

The user is responsible for checking that the command won’t be destructive, not the utility.

Modern GUIs are a bit friendlier. For example, in the OS X Finder, if you try to copy over an existing file, you get this warning:

If you choose “Keep Both”, then the Finder leaves the original file intact, and picks a new name for your copy. (In this case, it would use myfile 2.txt, myfile 3.txt, and so on.) I use this option when I want to be cautious. When I’m working with something precious, like my photo collection, I’d rather create too many copies than risk accidentally deleting something.

I wanted to replicate that functionality in Python, as a drop-in for the copyfile() and move() methods of the shutil module. I couldn’t find existing code to do this, so I wrote my own script instead.

Continue reading →

Useful Git features: a per-clone exclude file

With Git, you can define a list of rules to tell it which files should never be checked in as part of a commit. These “ignore rules” could include files which auto-generated, compiled from source or temporary – anything you don’t need to keep around.

Until this morning, I only knew about two places where I could store these rules:

  1. Globally: use ~/.gitignore_global.

    The rules in this file apply to every Git repo on your computer. It’s useful, but not very interesting – mine is just a list of file types that I (almost) never want to check in to Git.

    This isn’t attached to a particular repo, so I use Dropbox to sync it between my computers.

  2. Per-repo: use .gitignore.

    The rules in this file can be tailed to fit one repo. You can check in this file alongside your code, so the same rules apply whenever the repo gets checked out.

Those two files cover about 95% of my use cases. But sometimes I write rules that I don’t want to be checked in or globally applied: locally-generated files that I’m unlikely to create again.

Doing a bit of Googling, I stumbled upon a solution:

  1. Per-clone: use .git/info/exclude.

    This file contains a list of ignore rules, but it doesn’t get checked in with the repo. It’s exactly what I was looking for.

Despite using Git for almost five years, I’ve never come across this file. It makes sense that it exists – it fills a natural gap left by the first two files – but I never knew it was there. It just goes to show: there’s always more to learn.

Cloning GitHub’s Contributions chart

I’m a total sucker for gamification1. If you put a pretty chart in front of me to measure my progress, I fall for it every time. One place I’ve noticed this recently is with the Contributions graph on my user page on GitHub. This is a year-long calendar that shows a heatmap of all your activity. Here’s what mine looks like:

You can see that my activity started to pick up around March, which is when I started using GitHub at work. I was seeing this chart almost every day, and I began feeling guilty about the amount of blank space. So I’ve been trying to be more active – whether that’s on my own repos, or pull requests against other people’s work – and I think the change is noticeable.

The GitHub chart has been a particular effective motivator. I think it’s the long tail that does it: if I don’t do something now, I’ll be looking at the blank space for another year. So it got me wondering, could I use this design for something else?

Continue reading →

The Secret Lives of Data, a visualisation of the Raft algorithm

One of the big problems in computer science is distributed consensus. This is the problem of getting a set of nodes in a network (or distributed system) to agree on something: perhaps a value, an action, or a record of history. Some of the nodes in this system will be faulty, and drop messages, so you need to be able to work around that as well. This turns out to be very difficult to solve.

There are two commonly-used algorithms for solving this problem: Paxos and Raft. Both have been mathematically proven to work to solve the consensus problem.

As part of my day job (Project Calico), we use etcd as a distributed database layer, which implements the Raft algorithm for consensus. Since Raft was explicitly designed to be understandable (as opposed to Paxos’s reputation for inscrutability), I thought it was worth trying to understand how it works.

I came across this visualisation, which explains how the Raft algorithm works. It takes you from the basics of the consensus problem, through the design of the Raft algorithm and explains how it copes when the network starts to fail. I found it really interesting, and I think it’s well worth a read.

Previewing notes from nvALT in Marked

I find nvALT to be an indispensable note-taking application. I have thousands of plain-text notes, but it’s still incredibly fast and easy to look up a specific note. I also lean heavily on Marked for previewing notes – particularly complex notes with lots of links and images.

Until recently, I’ve been using a Keyboard Maestro macro from Patrick Welker to take a note from nvALT and preview it in Marked. The AppleScript in the macro takes the title of a note, converts it to a filename, and passes the filename to Marked.

That works in about 95% of cases, but I’ve encountered two problems:

  1. OS X does strange things with file separators (the colon and the slash). Having a colon or slash in the title of a note means that it isn’t picked up by this script.
  2. The script assumes that all my notes have the same extension: .md. This is almost always the case, but sometimes notes appear with the .txt or .mdown extension. I could play whack-a-mole with file extensions, but it’s easier to have the script do it for me.

I’ve written a Python script to replace the AppleScript, which seems to solve both of these problems. I’ve been using it for the last few weeks, and now I’d like to share it.

Continue reading →

One-step paste in the iOS Simulator

On the latest episode of The Talk Show, John Gruber lamented the two-step paste process into the iOS Simulator (about fifteen minutes in). It goes like this:

  1. In OS X, you copy some text, which gets saved to the OS X clipboard.
  2. You switch to the iOS Simulator, and paste, which saves it to the iOS clipboard.
  3. You long click in iOS within the Simulator, and paste again (this simulates the iOS paste action), which causes the text to be pasted in iOS.

Having to paste twice is mildly annoying, but it’s easy to fix. I have a Keyboard Maestro macro that lets me paste directly into the copy of iOS running in the Simulator.

The macro itself is very simple:

It’s just a tweaked version of Gabe Weatherhead’s Paste as Typed Text macro, which is useful for pasting passwords into fields where pasting is disabled. I have it inside a macro group, which is configured to only be enabled in the iOS Simulator.

For it to work, you need to have the hardware keyboard connected in the iOS Simulator (Hardware > Keyboard > Connect Hardware Keyboard). This is what allows you to type in the iOS Simulator with your regular keyboard, and lets Keyboard Maestro pass in keystrokes.

There are a few other notes:

  • I have this mapped to ⌘V. This is the same as the default Paste shortcut in the Simulator. In my usage, the conflict hasn’t caused any problems, but if it does, you can override the default Paste shortcut in the “Shortcuts” tab of the Keyboard Preference Pane.

  • Occasionally I find apps in OS X that break the Paste as Typed Text. This macro has always worked for me in the iOS Simulator, but that’s not to say it will always work everywhere.

  • This will only work for text, not images or files.

  • I’ve only tested this with the iOS Simulator that comes with Xcode 6. Apple has a habit of changing the Simulator between versions of Xcode, so it might not work if you use a different version. (Since the macro bypasses the Simulator, I’d expect it to keep working, but no guarantees.)

Electoral reform

A common criticism of the UK voting system (First Past the Post or FPTP) is that the share of the popular vote and the share of Parliamentary seats is often wildly different. This was true again in yesterday’s election.

I made this chart, which shows the difference between a party’s share of the seats, and their share of the popular vote. It’s not great:

difference between % seats and % votes

Seat and vote counts from the BBC.

Both the major parties did well from FPTP, as is usually the case. The other major beneficiary was the SNP, which isn’t surprising. The smaller parties (Greens, UKIP and Lib Dems) often suffer from having a voting base that’s spread across the country, but not concentrated in many areas. That makes it harder to win in individual seats. Since the SNP only run candidates in Scotland, I’m sure that helped them here.

The disparity between the large and the small parties is not new, nor is the disparity between votes and seats. But it is disappointing.

In 2011, I supported a change to the Alternative Vote (although I was too young to actually vote). I still think it would have been a good idea, but I worry that the failure of that campaign has shut out electoral reform for a generation. And as long as one of the two main parties is in government, they have no incentive to change the system, because it treats them so well.

I strongly believe that we need to scrap FPTP, and bring in a new, more proportional voting system. It will make our politics fairer, give smaller parties a proper voice, and go some way to correcting these disparities. But I’m at a total loss to see how we might do that.

← Older Posts