Benjamin Esham

New project: Microformats2 on a Map

A screenshot of the Microformats2 on a Map application.

Inspired by Homebrew Website Club, which I’ve been going to recently, I created a web app. Microformats2 on a Map reads webpages that have been marked up according to the microformats2 standard, extracts locations from them, and displays the locations on — yes — a map. You can try it out here.

The project isn’t complete yet; the UI is unpolished and nested microformats aren’t yet supported. See the GitHub project page for more caveats, usage information, and technical details. The source code is available there under the GNU GPL, version 3. Pull requests are welcome! Bug reports are also welcome but slightly less so.

Motivation

As part of my endless process of tinkering with this website instead of actually writing posts for it, I went back and added location data to all of my blog entries. Each one now shows the city and state whence it was posted. This information is marked up with the microformats2 h-adr standard to make it machine-readable.

After going through the effort of figuring out where I posted each entry from, I wanted to be able to see all of these locations on a map. It’s a form of gamification: I feel a sense of accomplishment in general when I post a blog entry, and seeing all of my posts laid out visually just reinforces that feeling. (Instagram used to have a similar feature but it seems that they’ve discontinued it… bummer!)

Same-origin woes

The same-origin policy bit me while I was making this. My original plan was to offer an interface where users could put in multiple URLs and then the app would retrieve each page and extract all of its locations. I was also hoping to make a completely client-side web app so that I could host a few static files but not worry about any server-side logic. But I realized that the same-origin policy would prevent a page hosted on my server from downloading pages from other servers.1

My solution was an awkward compromise. I added a mode in which the user could just put in some HTML and locations would be extracted from that. I also created a server with an endpoint that would take in a URL, retrieve it, and return its contents — the world’s simplest proxy. If someone downloads Microformats2 on a Map and runs the server, the client-side code detects that and offers the possibility to enter a list of URLs. (The pages are retrieved through this proxy, which is allowed by the same-origin policy since that server and the client-side code are coming from the same port on localhost.)

This solution adds a little bit of complexity but it allows me to host a copy of the application without being responsible for a server that could probably be made to DOS someone. It also provides for the ability — if you run the server piece — to map multiple webpages’ locations, which I think is much more useful than just mapping the locations from a single page.

Standing on the node_modules of giants

It’s been said that modern web development consists mostly of hooking together other people’s libraries. In this case, that’s totally true. I pulled in Leaflet for the maps, Leaflet.markercluster to gracefully handle markers that were visually close together, and microformat-node to extract Microformats data from HTML. The app uses OpenStreetMap for map tiles and OpenStreetMap’s Nominatim service for geocoding (turning addresses into latitudes and longitudes). My contribution was a paltry 255 lines of code on the client side and 32 for the server.

  1. It would work fine if the other server had the appropriate CORS headers set, but most sites don’t (and I think this is the right default for servers). ↩︎

Kibibytes are silly and we should all use them

The kibibyte and mebibyte and their ilk are ridiculous. Introduced in 1998, these units are related to each other by factors of 1,024, a number which is almost, but not quite the same as 1,000, the basis of the metric system. The only reason these units exist is to work around the ambiguity of some people using “kilobyte” to mean 1,000 bytes and others — understandably but unwisely — using “kilobyte” to mean 1,024 bytes. The sole advantage of the newfangled binary-based units is that they’re unambiguous.

For that reason, though, everyone should measure things in kibibytes, mebibytes, and so on when at all possible. After a decade or two of such sanity maybe we’ll be able to ease back into using “kilobyte” when we mean “kilobyte” and not have to worry about being misinterpreted.

Listing the last person who committed to each branch in a Git repository

Here is a command that will list all of the server branches for the current Git repository, ordered from least recently edited to most recently edited, and with the latest author shown for each:

git branch -r --sort=creatordate \
    --format "%(creatordate:relative);%(committername);%(refname:lstrip=-1)" \
    | grep -v ";HEAD$" \
    | column -s ";" -t

If the column command is not available on your system,1 you can replace it with

    | sed -e "s/;/\t/g"

for a similar effect. Note also that you will need Git 2.13 (released in May 2017) or later.

Using the Jekyll repository as an example,2 the output will look like

6 years ago             Tom Preston-Werner  book
4 years, 4 months ago   Parker Moore        0.12.1-release
4 years ago             Matt Rogers         1.0-branch
3 years, 11 months ago  Matt Rogers         1.2_branch
3 years, 1 month ago    Parker Moore        v1-stable
12 months ago           Ben Balter          pages-as-documents
10 months ago           Jordon Bedwell      make-jekyll-parallel
6 months ago            Pat Hawks           to_integer
5 months ago            Parker Moore        3.4-stable-backport-5920
4 months ago            Parker Moore        yajl-ruby-2-4-patch
4 weeks ago             Parker Moore        3.4-stable
3 weeks ago             Parker Moore        rouge-1-and-2
19 hours ago            jekyllbot           master

Motivation

My most recent project at work had several contributors from multiple teams. I took it upon myself to periodically prune our branches, which meant that I needed to know who was responsible for each branch. BitBucket didn’t seem to show that information anywhere so I rigged up this command.

(By the way, I highly recommend GitUp for macOS if you’re interested in a novel way of visualizing your branches:

A screenshot of GitUp.

Be sure to turn on the options to show stale branch tips and remote branch tips.)

How it works

The git command lists all of the branches on the server,3 ordered from least recently edited to most recently edited. For each branch, it prints the relative timestamp of the latest commit; the name of the author of the latest commit; and the branch name. The grep command removes the “HEAD” pointer from the list, since it’s probably just pointing to one of the other branches in the list and we don’t need to show that branch twice. Finally, the column command puts the information into a nice tabular form.

  1. column is part of BSD, so it’s available on macOS. It’s available under Ubuntu if the “bsdmainutils” package is installed, which it seems to be by default. ↩︎

  2. I’ve omitted some of Jekyll’s branches for brevity. ↩︎

  3. To be precise, it lists all of the remote-tracking branches. If your local copy of the repo is up to date then this is the same as “all of the branches on the server.” ↩︎

What I believe

Sarah Kenzidor recently said, “Write a list of things you would never do. Because it is possible that in the next year, you will do them. Write a list of things you would never believe. Because it is possible that in the next year, you will either believe them or be forced to say you believe them.” Well, we’re almost over the line, so now is as good a time as any.

I reject Donald Trump as President of the United States. (Later today he will be the President — I don’t dispute that — but his sneering disregard for the formal and informal rules of the office make him unfit for it.)

I reject authoritarianism and fascism.

I reject Trump’s contempt for the free press.

I reject the supremacy of cis, straight white men, even though I am all of those.

I reject the notion that anyone is not a “real American” because they are well-educated or well-off or liberal or live in a city.

I reject lying, whether shameless or subtle. I reject gaslighting. I reject the idea that truth is a meaningless concept. I reject anti-intellectualism. I reject climate-change denialism.

I reject the Electoral College. I reject our “first past the post” voting system. I reject the two-party system that they enforce. I reject the voter suppression that may have helped Trump to win. I reject the idea that Republican Party unity is more important than the health of the country.

I reject intolerance. I reject the idea that intolerance is an opinion as valid as any other.

I reject the assumption that a free society has capitalism as its core. I reject the notion that unfettered capitalism is even compatible with free society. I reject letting “the market” determine people’s health-care options, or their fates.

I reject racism. I reject bigotry. I reject sexual assault.

I refuse to accept that humanity is no better than this. I know that — eventually — we will do better.

Day One’s lack of encryption is crippling it for me

Two years ago I wrote about how I use Day One, the journalling app for iOS and macOS. At the time I used it for reflective “how I’m feeling” pieces, notes about fun things I was doing, and occasionally a photo of food. A year later Bloom Built released the second major version of Day One. This version brought many improvements but it also dropped support for Dropbox syncing in favor of a homegrown syncing service called Day One Sync.

I trusted the people at Dropbox to store my data securely. While I’m sure that the Bloom Built engineers have the best of intentions, the company simply doesn’t have the same level of security expertise. Therefore, I don’t trust Day One Sync with my journal — my most private of data — and so it lives only on my phone now. In turn, this means that the longer entries I would have typed on my laptop have mostly gone unwritten. Those were the introspective, “journally” pieces, so now my usage of Day One is mostly to record what I’m eating. That’s a disappointing turn of events.

The security of Day One Sync will be much less of an issue once Bloom Built adds some encryption features. If I can encrypt my journal before it gets synced, with a password that only I know, then it doesn’t really matter if the Day One Sync server is breached: the hackers would only be able to see the encrypted version of my journal (and they wouldn’t have my password in any form, hashed or otherwise). Bloom Built is working on this feature but they haven’t given any estimate of when it might be ready. Until then, my journal is reduced to the kinds of entries that are short enough that I can peck them out on my phone’s keyboard.

HTTP 410 Gone But Not Forgotten

When I first launched this blog I used FeedBurner to handle its RSS feed. FeedBurner is — was — a proxy that would serve your site’s RSS feed unmodified but record a bunch of analytics as it did so. (I was hosting this site on Amazon S3, which didn’t have any real way to do server-side logging or analytics.1) The way it worked was that you would publish an RSS feed at some publicly-accessible URL, point FeedBurner to that URL, and then give out FeedBurner’s proxied URL instead of your original one.

A couple of years ago I started hosting this site on a “real” web server and I no longer needed to use FeedBurner. One downside of relying on this third-party service became clear: my few subscribers had FeedBurner’s URL, not mine, saved in their feed readers. Even if I could get FeedBurner to emit an HTTP redirect — I couldn’t — my subscribers’ feed readers would probably continue to request the FeedBurner feed indefinitely.

I did the best thing I could think of, which was to point FeedBurner to a dummy RSS feed that contained a single item: a note explaining that you were subscribed to the FeedBurner version of my feed and requesting that you subscribe to the new, “real” feed instead.

A little over a year ago I figured that this notice had been available for long enough. Apparently forgetting that I could just log in to FeedBurner and delete the feed, I set my web server to give an HTTP 410 “Gone” response when the FeedBurner feed was requested. (This status code indicates that “the target resource is no longer available at the origin server and that this condition is likely to be permanent.”)

For the next twelve months, FeedBurner dutifully kept trying to fetch my dummy feed, never losing hope that the 410 Gone would one day be replaced by a beautiful 200 OK. Not only that, but when I finally remembered that I could just log in to FeedBurner and delete the damn thing, the health check told me that everything was sunny:

Screenshot of FeedBurner’s “FeedMedic” tool, reporting that my feed had been “quite healthy.”

“Quite healthy” seems like a weird way to say “There is no feed content and I get an error when I try to request it.”

  1. Maybe Amazon has better options now, but at the time I think the only way to log the activity on your S3 website was to have it spit out (into another S3 bucket) log files with one or two events per file. This produced an unmanageable number of files — even with my very modest traffic — and the files being stored on S3 didn’t help. ↩︎

Never Again

Yesterday I signed the pledge at neveragain.tech.1 I, and 584 other members of the tech industry, have committed not to collaborate with the upcoming Trump administration by helping to create databases of people’s race, religion, or national origin. We will advocate that our companies collect as little of this data as possible; that they discard existing caches as quickly as they can; and that they refuse to turn data over to the government without a lawful order. We commit to push back if our companies collect, store, or release users’ data in an illegal or unethical way.

Signing this pledge, of course, is the easy part. As I quipped on Twitter, this really was the least I could do. Living up to the pledge will be the hard part — although, truthfully, I’m much less likely than some of the other signatories to find myself in a position where I need to speak out at work. If I do, though, it will be infinitely easier knowing that so many others in the tech community are behind me.

  1. This seems to be the name of both the website and the pledge. It doesn’t exactly roll off the tongue, does it? ↩︎

Subtweeting without tweeting

Earlier this month, Nate Silver tweeted something that made me do a double take. He described a piece by New York Times columnist Paul Krugman as being “basically a subtweet of NYT’s campaign coverage.” The column, of course, wasn’t a tweet at all, but here was a perfectly erudite person calling it a subtweet.

“Subtweet,” a portmanteau of “subtext” and “tweet,” refers to a negative tweet about some subject that cattily avoids actually mentioning that subject.1 Mulling over Silver’s statement, I realized that I couldn’t think of another word to describe this stylistic device. “Subtext” itself refers to the hidden meaning, not the work that carries the hidden meaning. “Innuendo” refers to the latter, but is most often used for sex-related insinuations. (It’s also harder to work into a sentence: compare “a subtweet of their campaign coverage” to “an innuendo referring to their campaign coverage.”)

I eventually came to the same conclusion as Silver: although “subtweet” explicitly invokes Twitter, there’s simply no better word for the concept. The existing vocabulary was so lacking that the word has escaped its roots and become generally applicable.

(I was reminded of this subject again today when I read this review by Michiko Kakutani of a new Hitler biography. It’s a pretty masterful piece of, well, subtweeting.)

  1. Sorry for mansplaining the word “subtext” to you. ↩︎