“The document could not be saved”

I try not to make unreasonable complaints about the quality of software. I write software for a living, and writing bug-free software is hard. A trawl through the code I’ve written would reveal many embarrassing or annoying bugs. People in glass houses, etc.

But I do have some basic expectations of the software I use. For example, I expect that my text editor should be able to open, edit, and save files.

A screenshot of a TextMate window showing a dialog “The document ‘hello.txt’ could not be saved. Please check Console output for reason.”

Two days ago, TextMate on my laptop decided it didn’t fancy saving files any more. Actually, it decided that ⌘+S should do nothing, and trying to click “Save” in the File menu would throw this error dialog. A trawl of Google suggests that I’m the only person who’s ever hit this error. If so, here’s a quick write-up of what I tried for the next person to run into it.

Continue reading →


Hiding the YouTube search bar

This morning, I got an email from Sam, asking if I had a way to cover up the persistent YouTube search bar:

Three years ago, I wrote a bookmarklet for cleaning up the worst of the Google Maps interface, and we can adapt this to clean up YouTube as well. Unlike that post, this is one I’m likely to use myself. (Writing the Maps bookmarklet was a fun exercise in JavaScript, but I almost always use Google Maps on my phone, so I was never as annoyed by the clutter on the desktop version.)

If we do “Inspect Element” on a YouTube page, we can find the element that contains this search box: <div id="yt-masthead-container">. So we want to toggle the visibility of this element. Since it’s only one item, we can write a much smaller JavaScript snippet for toggling the visibility:

var search_bar = document.getElementById("yt-masthead-container");

// Check if it's already hidden
var hidden = (window.getComputedStyle(search_bar)).getPropertyValue("display");

// Set the visibility based on the opposite of the current state
void(search_bar.style.display = (hidden == "none" ? "" : "none"));

To use this code, drag this link to your bookmarks bar:

Toggle the YouTube search bar

Simply click it once to make the bar disappear, and click it again to bring it all back.

Something that wasn’t in my original Google Maps bookmarklet is that void() call. It turns out that if a bookmarklet returns a value, it’s supposed to replace the current page with that value. Which strikes me as bizarre, but that’s what Chrome does, so it broke the page. (Safari doesn’t – not sure if that’s a bug or a feature.) The void function prevents that from happening.

This isn’t perfect – content below the bar doesn’t reflow to take up the available space – but the bar no longer hangs over content as you scroll. I think I’ll find this useful when I’m pressed for space on small screens. It’s a bit more screen real-estate I can reclaim. Thanks for the idea, Sam!


Treat regular expressions as code, not magic

Regular expressions (or regexes) have a reputation for being unreadable. They provide a very powerful way to manipulate text, in a very compact syntax, but it can be tricky to work out what they’re doing. If you don’t write them carefully, you can end up with an unmaintainable monstrosity.

Some regexes are just pathological1, but the vast majority are more tractable. What matters is how they’re written. It’s not difficult to write regexes that are easy to read – and that makes them easy to edit, maintain, and test. This post has a few of my tips for making regexes that are more readable.

Here’s a non-trivial regex that we’d like to read:

MYSTERY = r'^v?([0-9]+)(\.([0-9]+)(\.([0-9]+[a-z]))?)?$'

What’s it trying to parse? Let’s break it down.

Tip 1: Split your regex over multiple lines

A common code smell is “clever” one-liners. Lots of things happen on a single line, which makes it easy to get confused and make mistakes. Since disk space is rarely at a premium (at least, not any more), it’s better to break these up across multiple lines, into simpler, more understandable statements.

Regexes are an extreme version of clever one-liners. Splitting a regex over multiple lines can highlight the natural groups, and make it easier to parse. Here’s what our regex looks like, with some newlines and indentation:

MYSTERY = (
    r'^v?'
    r'([0-9]+)'
    r'('
        r'\.([0-9]+)'
        r'('
            r'\.([0-9]+[a-z])'
        r')?'
    r')?$'
)

This is the same string, but broken into small fragments. Each fragment is much simpler than the whole, and you can start to understand what the regex is doing by analysing each fragment individually. And just as whitespace and indentation are helpful in non-regex code, here they help to convey the structure – different groups are indented to different levels.

So now we have some idea of what this regex is matching. But what was it trying to match?

Tip 2: Comment your regexes

Comments are really important for the readability of code. Good comments should explain why the code was written this way – what problem was it trying to solve?

This is helpful for many reasons. It helps us understand what the code is doing, why it might make some non-obvious choices, and helps to spot bugs. If we know what the code was supposed to do, and it does something different, we know there’s a problem. We can’t do that with uncommented code.

Regexes are a form of code, and should be commented as such. I like to have an overall comment that explains the overall purpose of the regex, as well as individual comments for the broken-down parts of the regex. Here’s what I’d write for our example:

# Regex for matching version strings of the form vXX.YY.ZZa, where
# everything except the major version XX is optional, and the final
# letter can be any character a-z.
#
# Examples: 1, v1.0, v1.0.2, v2.0.3a, 4.0.6b
VERSION_REGEX = (
    r'^v?'                          # optional leading v
    r'([0-9]+)'                     # major version number
    r'('
        r'\.([0-9]+)'               # minor version number
        r'('
            r'\.([0-9]+[a-z]?)'     # micro version number, plus
                                    # optional build character
        r')?'
    r')?$'
)

As I was writing these comments, I actually spotted a mistake in my original regex – I’d forgotten the ? for the optional final character.

With these comments, it’s easy to see exactly what the regex is doing. We can see what it’s trying to match, and jump to the part of the regex which matches a particular component. This makes it easier to do small tweaks, because you can go straight to the fragment which controls the existing behaviour.

So now we can read the regex. How do we get information out of it?

Tip 3: Use non-capturing groups.

The parentheses throughout my regex are groups. These are useful for organising and parsing information from a matching string. In this example:

  • The groups for minor and micro version numbers are followed by a ? – the dot and the associated number are both optional. Putting them both in a group, and making them optional together, means that v2 is a valid match, but v2. isn’t.

  • There’s a group for each component of the version string, so I can get them out later. For example, given v2.0.3b, it can tell us that the major version is 2, the minor version is 0, and the micro version is 3b.

In Python, we can look up the value of these groups with the .groups() method, like so:

>>> import re
>>> m = re.match(VERSION_REGEX, "v2.0.3b")
>>> m.groups()
('2', '.0.3b', '0', '.3b', '3b')

Hmm.

We can see the values we want, but there are a couple of extras. We could just code around them, but it would be better if the regex only captured interesting values.

If you start a group with (?:, it becomes a non-capturing group. We can still use it to organise the regex, but the value isn’t saved.

I’ve changed two groups to be non-capturing in our example:

# Regex for matching version strings of the form vXX.YY.ZZa, where
# everything except the major version XX is optional, and the final
# letter can be any character a-z.
#
# Examples: 1, v1.0, v1.0.2, v2.0.3a, 4.0.6b
NON_CAPTURING_VERSION_REGEX = (
    r'^v?'                          # optional leading v
    r'([0-9]+)'                     # major version number
    r'(?:'
        r'\.([0-9]+)'               # minor version number
        r'(?:'
            r'\.([0-9]+[a-z]?)'     # micro version number, plus
                                    # optional build character
        r')?'
    r')?$'
)

Now when we extract the group values, we’ll only get the components that we’re interested in:

>>> m = re.match(NON_CAPTURING_VERSION_REGEX, "v2.0.3b")
>>> m.groups()
('2', '0', '3b')
>>> m.group(2)
'0'

Now we’ve cut out the noise, and we can access the interesting values of the regex. Let’s go one step further.

Tip 4: Always use named capturing groups

What does m.group(2) mean? It’s not very obvious, unless I have the regex that m was matching against. When reading code, it can be difficult to know what the value of a capturing group means.

And suppose I later change the regex, and insert a new capturing group before the end. I now have to renumber anywhere I was getting groups with the old numbering scheme. That’s incredibly fragile.

There’s a reason we use text, not numbers, to name variables in our programs. If a variable has a descriptive name, the code is much easier to read, because we know what the variable “means”. And when we’re writing code, we’re much less likely to get variables confused.

The same logic should apply to regexes.

Many regex parsers now support named capturing groups. You can supply an alternative name for looking up the value of a group. In Python, the syntax is (?P<name>...) – it varies slightly from language to language.

If we add named groups to our expression:

# Regex for matching version strings of the form vXX.YY.ZZa, where
# everything except the major version XX is optional, and the final
# letter can be any character a-z.
#
# Examples: 1, v1.0, v1.0.2, v2.0.3a, 4.0.6b
NAMED_CAPTURING_VERSION_REGEX = (
    r'^v?'                                # optional leading v
    r'(?P<major>[0-9]+)'                  # major version number
    r'(?:'
        r'\.(?P<minor>[0-9]+)'            # minor version number
        r'(?:'
            r'\.(?P<micro>[0-9]+[a-z]?)'  # micro version number, plus
                                          # optional build character
        r')?'
    r')?$'
)

We can now look up the attributes by name, or indeed access the entire collection with the groupdict attributed.

>>> m = re.match(NAMED_CAPTURING_VERSION_REGEX, "v2.0.3b")
>>> m.groups()
('2', '0', '3b')
>>> m.group('minor')
'0'
>>> m.groupdict()
{'major': '2', 'micro': '3b', 'minor': '0'}

If I look up a group with m.group('minor'), it’s much clearer what it means. And if the underlying regex ever changes, the lookup is fine as-is. Named capturing groups make our code much more explicit and robust.

Conclusion

The tips I’ve suggested – significant whitespace, comments, using descriptive names – are useful, but they’re hardly revolutionary. These are all hallmarks of good code.

Regexes are often allowed to bypass the usual metrics of code quality. They sit as black boxes in the middle of a codebase, monolithic strings that look complicated and scary. If you treat regexes as code, rather than magic, you end up breaking them down, and making them more readable. The result is always an improvement.

Regexes don’t have to be scary. Just treat them as another piece of code.


  1. Validating email addresses is a problem that you probably shouldn’t try to solve with regexes. Usually you want to know that the user has access to the address, not just that it’s correctly formatted. To check that, you need to actually send them an email – which ensures it’s valid at the same time. ↩︎


Get images from the iTunes/App/Mac App Stores with Alfred

Several weeks ago, Dr. Drang posted a Python script for getting artwork from the iTunes Store. It uses the iTunes API, which is super handy – I’d never even known it existed. I still rip a fair amount of music from CDs, and having artwork from iTunes is nice. (A script is a better approach than “buy one song from the album, get the artwork”, which is what I used to do.)

Thing is, I do most of my web searches through Alfred. I don’t really want to go out to the command-line for this one task. Wouldn’t it be nice if I could get iTunes artwork through Alfred?

Hmm.

Calling a script is a fairly simple Alfred workflow. I created a keyword input for “ipic”, which requires an argument, and then that argument is passed to the “Run Script” action. That action has a single-line Bash script: calling out to Dr. Drang’s script, passing my input as a command-line argument.

This works fine with Dr. Drang’s original script.

Unfortunately, Alfred passes the entire search as a single string. Although the original script has flags for filtering by content type (e.g. album, film, TV show), you can’t use that filtering in Alfred – the script only ever sees a single argument.

So I tweaked the script to add a special case for Alfred. When Alfred calls the script, it passes an undocumented --alfred flag. Although docopt is nominally passing the command-line flags, it doesn’t know about this one. Instead, I intercept the flags before docopt sees them, and rearrange them if I detect the script is being called by Alfred:

if sys.argv[1] == '--alfred':
    media_type, search_term = sys.argv[2].split(' ', 1)
    if media_type in ('ios', 'mac', 'album', 'film', 'tv', 'book', 'narration'):
        sys.argv = [sys.argv[0], '--{0}'.format(media_type), search_term]
    else:
        sys.argv = [sys.argv[0], sys.argv[2]]

By the time docopt is called, the arguments look as if I called the script from the command-line. It never knows the difference.

This change, along with long-name flags and writing to a tempfile instead of the Desktop, are in my GitHub fork of Dr. Drang’s original script.


Exclusively create a file in Python 3

I’ve been tidying up a lot of old Python code recently, and I keep running into this pattern:

if not os.path.exists('newfile.txt'):
    with open('newfile.txt', 'w') as f:
        f.write('hello world')

The program wants to write some text to this file, but only if nobody’s written to it before – they don’t want to overwrite the existing contents. This approach is very sensible: if we check that the file exists before writing, we can avoid scribbling over a pre-existing file.

But this code is subject to a race condition: if the file pops into existence between the if and the open(), we scribble all over it anyway.

To catch this race condition, Python 3.3 added a new file mode: x for exclusive creation. If you open a file in mode x, the file is created and opened for writing – but only if it doesn’t already exist. Otherwise you get a FileExistsError.

Here’s how I’d rewrite the snippet above:

try:
    with open('newfile.txt', 'x') as f:
        f.write('hello world')
except FileExistsError:
    print('File already exists.  Clean up!')

Using the x mode means you can be sure that you won’t override an existing file. It’s safer than the existence check.

I probably won’t use this a lot, but when I do, I’ll appreciate it. This has been my general experience with Python 3: there’s no killer feature that I can’t live without, just a growing pile of small niceties that I miss when I go back to Python 2.


Backup paranoia

By now, you’ve probably read about the KeRanger ransomware. Ransomware is not a new idea, but this is the first time it’s come to the Mac. It if works as described, it’s a nasty piece of work. And if you read the same articles as me, you saw comments like If you don’t have backups, you deserve what you get.1

It’s important to keep good backups, but they’re not foolproof. In this case, I’m not sure backups would always save you.

Claud Xiao and Jin Chen, two security researchers, have worked out what the malware does:

After connecting to the C2 server and retrieving an encryption key, the executable will traverse the “/Users” and “/Volumes” directories, encrypt all files under “/Users”, and encrypt all files under “/Volumes” which have certain file extensions.

The “/Volumes” directory is where OS X mounts disks (both external and internal). It includes “Macintosh HD” and any external drives you have mounted. If your backup drives were mounted when the ransomware got to work, they’d be no help at all.

My backup regime has extra steps that I always thought were paranoid, but now I’m not so sure. Here are a few of my suggestions:

  • Only mount your backup drives when you need them.

    If your backup drive is permanently mounted, then it’s always exposed to problems on your computer. There’s a much higher risk of accidental data corruption, malware or random OS bugs. If you only mount the drives when backups are running, it’s much less exposed.

    I have scripts that auto-mount my drives before my nightly backups start, and auto-eject them when they finish. Most of the time, they’re not mounted.

    This gives you extra time. When something goes wrong, you’ve got a chance to spot it and take action – before it propagates to the backups.

  • Keep an offsite backup that’s hard to modify.

    It’s good to have a backup that’s completely isolated, so anything that goes wrong with your computer cannot possibly affect it. Keep a copy of your data on a drive that’s outside the house – it’s safe from your computer, and from environmental problems like theft or a fire. This is an offsite backup.

    I have two offsite backups: an external drive that I keep at the office, and online backups with Crashplan. The latter is particularly nice, because it stores old versions of every file. Even if files do get corrupted or encrypted, I can always roll back to a known good version.

  • When you go travelling, don’t just leave your computer running.

    If you’re at home when something goes wrong, you have options. You can triage. Diagnose. Work out if you’re affected. If needs be, you can pull the plug (literally). That’s much harder if you’re away from the house, perhaps impossible.

    So ask yourself: should I leave my computer on while I’m away? If it’s not doing anything useful, turn it off. And if you have to keep it running, does it need network access?

  • Disconnect your backups when you’re away from home.

    If you do have to leave your computer running while you’re away, you don’t need up-to-date backups – very little is changing. Unmount and unplug your backup drives, so they’re protected from any problems in your absence.

Nothing is watertight – you could do everything above, and just get unlucky. Data loss happens to the best of us.

But what these suggestions get you is extra time when you have problems. When you’re in a rush, you can panic and make mistakes. In a crisis, having time to breathe and think is invaluable.


  1. This was mixed with the idea that BitTorrent is only used for piracy, which means your computer is fair game for malware authors. I’m not interested in that discussion (at least not today). ↩︎


How I use TextExpander to curb my language

I saw a tweet yesterday that I really liked:

I’m on my own personal quest to banish the word “simply” from all instructional content. Saying “simply” doesn’t make it simple.
Keri Maijala (@clamhead) Feb 22 2016 11:09 AM

I’ve been making a concerted effort to cut down on this sort of phrasing as well.

Easily” was my personal weak spot - when writing instructions, I would say “you can easily do X”. That’s little comfort to a reader who is trying (and failing) to do X. Similar words are “just” and “clearly”

I have some TextExpander snippets to help train me out of these bad habits. Whenever I type a word like “easily”, it gets replaced with “easily?”. The extra question mark forces me to think – is that word appropriate here?

Often, the answer is yes, so I delete the question mark and carry on typing. But just having that momentary pause is enough. I can no longer get away with slipping it in automatically, without thinking – I have to justify it every time.

(I don’t think this is an original idea, but I can’t remember where I heard it first. Sorry if that was you!)

And this isn’t just for condescending language. You can use this to help reduce any sort of word you want to cut out of your writing. A particularly important one for me is ableist language. I’m very liable to write this without really thinking. Now I have to check myself every time I use it.

This won’t fix your writing overnight. You’ll still write problematic phrases, but it’s a good way to start training yourself out of it. I recommend trying it.


Saved by the Prompt

One of the most-used apps on my iPhone is Prompt, an SSH client by Panic. I use it for connecting to the Linode web server that powers this blog. SSH on the iPhone might seem silly. (I only ever installed it as an experiment.) But in sticky situations, it turns out that having the Unix command line in my pocket can be really useful.

A recent example: I was staying at a hotel for work, and the booking reference had been sent in an Excel spreadsheet. When viewed in Safari on iOS, the booking number was “helpfully” recognised as a number, and shown in the approximated form 1.41+E9. Since I don’t have a spreadsheet app installed, I had no way to see the original number.

There are plenty of tools for converting Excel files into nicer formats, but they all need a command-line. No problem with Prompt: I synced the spreadsheet to my Linode with Dropbox, then used csvkit to turn the file into a CSV. Voila: the booking reference.

All that took less than five minutes, and used just a handful of megabytes of data.

I never use Prompt for long sessions – I’d always grab a laptop for that. But when I’m in a pinch, I can fall back on my trusty command-line tools. There’s a lot you can do with simple Linux tools that’s much harder to do with full-sized iOS apps. If you’re comfortable in the shell, it’s a great app to have handy.


The Harry Potter Collector’s iPod 

I like to think of myself as a Harry Potter geek, and I follow Apple as well. But I was blown away today by Stephen Hackett’s post about the Harry Potter Collector’s iPod. This is an intersection of my hobbies that I never knew existed.

Check out his post for what little information we know about this product — like Stephen, I can find almost nothing about it.

But I did manage to find a few things Stephen couldn’t — a few pictures, and the date it was discontinued.

There are some old terms and conditions on Apple’s UK website. This tells us when the promotion was discontinued, at least in Europe:

The closing date for this offer is March 7, 2006. All purchases must be made by this date.

And I found three sets of photographs:

  • A Polish website called “Museum iPodów”, in a post titled The most mysterious iPod. There’s a pretty good photo of the crest engraving.

  • On “Say Hello to iPod”, a post about all the editions of the fourth generation iPod, with another photo of a slightly beaten-up version.

  • A YouTube video from user “Bart” showing off a version of the device. You can see the back of the iPod towards the end of the video, although the engraving is barely visible.

I also found an Ebay auction with two of the devices for sale. It claims the devices were only for sale for a few months in 2005, but unfortunately all the picture links are broken.

I’m impressed by this device. Partly because it’s evaded my knowledge for so long, and partly for how well it’s managed to disappear. In the age of the Internet, where everything is seemingly recorded forever, it’s amazing to find something that’s been almost completely forgotten.


The Skeletor clip loop, 2015 edition 

As is becoming a tradition, The Incomparable posted their 2015 Clip Show a few weeks ago, which extended Steve Lutz’s recursive Skeletor clip loop. Slightly later than usual, I’ve updated the chart I maintain to track the progress of the clip loop.

I call this the “Oh god, what have we done” edition.

A few changes this year:

  • Different mentions within the same episode get different points on the chart. Trying to contain all the mentions within an episode around a single point was getting unwieldy.

  • I completely redrew and rewrote the chart this year. The old version was made in OmniGraffle, but I’ve switched to using LaTeX and TikZ. I already use those tools very heavily, so it will be much easier for me to make updates in future. I’ve also provided source code — so you can make your own clip loop at home!

  • This year’s clip show added no less than four events to the chart. I also added an entry for Nathan Gouwens’s interactive audio chart.

It’s possible that this clip loop is getting a little silly. All we need now are some bowls of stew


← Older Posts