metanohi/site/two-spaces.md

4.6 KiB

abstract
Not one.

Two spaces

When I end a sentence and intend on writing a new, I type two spaces instead of one.

I do this only to separate sentence endings from period-terminated abbreviations. Consider this sentence:

I eat couches, e.g. brown ones. They are nice.

If you didn't know that "e.g." is an abbreviation, you might think that there are three sentences: "I eat couches, e.g.", "brown ones.", and "They are nice."

Now consider this sentence:

I eat couches, e.g. brown ones.  They are nice.

By typing two spaces between the sentences, I have made clear that there are only two sentences, and that the period in "e.g." is not the end of a sentence.

The problem is that the period has two purposes: To end a sentence and to end some abbreviations. Always using two spaces to separate sentences solves this.

Other solutions

Revolution

The best solution would be to use a separate character for abbreviation termination, or none at all, so that the period is exclusively used for ending sentences.

No change

One might think that another solution is to use just one space, the very thing that I'm arguing against. In the example above with one space between sentences, it's actually /not/ difficult to see that there are only two sentences: We know that a sentence must start with an uppercase letter, and "brown" after "e.g." does not, so it's not a new sentence.

However, uppercase letters can occur after abbreviations if they are part of given names. Consider this sentence:

I eat couches, e.g. Priscilla's brown one. They are nice.

It's not clear that "Priscilla'" does not start a new sentence, because it's very similar to "They": Both words start with an uppercase letter and are placed after a period and a space. But "Priscilla's" is just another word in the first sentence!

This almost shows that the one-space methodology is insufficient, but not completely. One can argue that if we know all valid abbreviations, we can just check if a period is an end to an abbreviation or not, and determine that way whether it's a sentence.

But this is only true if the abbreviation can be used in only one way! Read this sentence:

I used to eat couches bef. I found the cow.

It uses the abbreviation "bef." for "before"; see http://public.oed.com/how-to-use-the-oed/abbreviations/.

The sentence can be read in two ways: Either you read it as one sentence -- "I used to eat couches before I found the cow" -- or you read it as two sentences -- "I used to eat couches before." and "I found the cow."

Both are valid (at least if you accept that a preposition can be the last word in a sentence).

I admit that that this example is a bit extreme. After all, most abbreviations can be used only in unambigious ways. Nevertheless, it still shows that just using a single space between sentences is insufficient!

Also, we have assumed that all abbreviations are known, which excludes temporary (and to some extent field-specific) abbreviations. This is not good! It's much easier to just use two spaces between your sentences!

Two spaces and fixed width output

Due to my background/foreground as a programmer, I have a tendency to limit myself to 80 characters per line, and write two newlines when I start a new paragraph (just look at the source of this page).

This is just a choice of representation which works well in many cases, but I won't write about that. The interesting thing is: How does this mix with using two spaces between sentences? This can actually be a problem; look at this sentence:

Bla bla bla bla bef. bla bla.

This is one sentence, as "bef." does not end the sentence. If we assume that the line width is not 80 characters, but instead 16 characters, then the line should be wrapped like this:

Bla bla bla bef.
bla bla

But now it's not clear if "bef." ends a sentence or not! If we want to turn the fixed-width representation back into a simple line representation, we don't know if we should insert one or two spaces after "bef.". How do we solve that?

The answer is that, when you line-wrap, you don't split word sequences separated by ". ", .i.e. you see an abbreviation and its following word as a single word. That way, you would end up with:

Bla bla bla
bef. bla bla

which would not cause any problems.

General thoughts

Most natural languages have some amount of unambiguity, and part of it seems to make some things easier, i.e. allowing speakers to be loose when talking about stuff.

This other kind of ambiguity doesn't help anyone.