metanohi/site/two-spaces.md

4.5 KiB

abstract
Not one.

Two spaces

When I end a sentence and intend on writing a new, I type two spaces instead of one.

I do this only to separate sentence endings from period-terminated abbreviations. Consider this sentence:

I eat couches, e.g. brown ones. They are nice.

If you didn't know that "e.g." is an abbreviation, you might think that there are three sentences: "I eat couches, e.g.", "brown ones.", and "They are nice."

Now consider this sentence:

I eat couches, e.g. brown ones.  They are nice.

By typing two spaces between the sentences, I have made clear that there are only two sentences, and that the period in "e.g." is not the end of a sentence.

The problem is that the period has two purposes: To end a sentence and to end some abbreviations. Always using two spaces to separate sentences solves this.

Other solutions

Revolution

The best solution would be to use a separate character for abbreviation termination, or none at all, so that the period is exclusively used for ending sentences.

No change

One might think that another solution is to use just one space, the very thing that I'm arguing against. In the example above with one space between sentences, it's actually /not/ difficult to see that there are only two sentences: We know that a sentence must start with an uppercase letter, and "brown" after "e.g." does not, so it's not a new sentence.

However, uppercase letters can occur after abbreviations if they are part of given names. Consider this sentence:

I eat couches, e.g. Priscilla's brown one. They are nice.

It's not clear that "Priscilla'" does not start a new sentence, because it's very similar to "They": Both words start with an uppercase letter and are placed after a period and a space. But "Priscilla's" is just another word in the first sentence!

This almost shows that the one-space methodology is insufficient, but not completely. One can argue that if we know all valid abbreviations, we can just check if a period is an end to an abbreviation or not, and determine that way whether it's a sentence.

But this is only true if the abbreviation can be used in only one way! Read this sentence:

`` I used to eat couches bef. I found the cow.


It uses the abbreviation "bef." for "before"; see
[http://public.oed.com/how-to-use-the-oed/abbreviations/](http://public.oed.com/how-to-use-the-oed/abbreviations/).

The sentence can be read in two ways: Either you read it as one sentence -- "I
used to eat couches before I found the cow" -- or you read it as two sentences
-- "I used to eat couches before." and "I found the cow."

Both are valid (at least if you accept that a preposition can be the last word
in a sentence).

I admit that that this example is a bit extreme.  After all, most abbreviations
can be used only in unambigious ways.  Nevertheless, it still shows that just
using a single space between sentences *is insufficient*!

Also, we have assumed that all abbreviations are known, which excludes temporary
(and to some extent field-specific) abbreviations.  This is not good!  It's much
easier to just use two spaces between your sentences!


# Two spaces and fixed width output

Due to my background/foreground as a programmer, I have a tendency to limit
myself to 80 characters per line, and write two newlines when I start a new
paragraph (just look at the source of this page).

This is just a choice of representation which works well in many cases, but I
won't write about that.  The interesting thing is: How does this mix with using
two spaces between sentences?  This can actually be a problem; look at this
sentence:

: Bla bla bla bla bef. bla bla.

This is one sentence, as "bef." does not end the sentence.  If we assume that
the line width is not 80 characters, but instead 16 characters, then the line
should be wrapped like this:

Bla bla bla bef. bla bla


But now it's not clear if "bef." ends a sentence or not!  If we want to turn the
fixed-width representation back into a simple line representation, we don't know
if we should insert one or two spaces after "bef.".  How do we solve that?

The answer is that, when you line-wrap, you don't split word sequences separated
by ". ", .i.e. you see an abbreviation and its following word as a single word.
That way, you would end up with:

Bla bla bla bef. bla bla


which would not cause any problems.


# General thoughts

Most natural languages have some amount of unambiguity, and part of it seems to
make some things easier, i.e. allowing speakers to be loose when talking about
stuff.

This other kind of ambiguity doesn't help anyone.