metanohi/site/two-spaces.md

140 lines
4.6 KiB
Markdown
Raw Normal View History

2016-09-02 11:47:33 +02:00
---
abstract: Not one.
---
2015-04-08 14:31:45 +02:00
2016-09-02 11:47:33 +02:00
# Two spaces
2015-04-08 14:31:45 +02:00
When I end a sentence and intend on writing a new, I type two spaces instead of
one.
I do this only to separate sentence endings from period-terminated
abbreviations. Consider this sentence:
2016-09-02 11:47:33 +02:00
```
I eat couches, e.g. brown ones. They are nice.
```
2015-04-08 14:31:45 +02:00
If you didn't know that "e.g." is an abbreviation, you might think that there
are three sentences: "I eat couches, e.g.", "brown ones.", and "They are nice."
Now consider this sentence:
2016-09-02 11:47:33 +02:00
```
I eat couches, e.g. brown ones. They are nice.
```
2015-04-08 14:31:45 +02:00
By typing two spaces between the sentences, I have made clear that there are
only two sentences, and that the period in "e.g." is not the end of a sentence.
The problem is that the period has two purposes: To end a sentence and to end
some abbreviations. Always using two spaces to separate sentences solves this.
2016-09-02 11:47:33 +02:00
# Other solutions
2015-04-08 14:31:45 +02:00
2016-09-02 11:47:33 +02:00
## Revolution
2015-04-08 14:31:45 +02:00
The best solution would be to use a separate character for abbreviation
termination, or none at all, so that the period is exclusively used for ending
sentences.
2016-09-02 11:47:33 +02:00
## No change
2015-04-08 14:31:45 +02:00
One might think that another solution is to use just one space, the very thing
that I'm arguing against. In the example above with one space between
sentences, it's actually /not/ difficult to see that there are only two
sentences: We know that a sentence must start with an uppercase letter, and
"brown" after "e.g." does not, so it's not a new sentence.
However, uppercase letters *can* occur after abbreviations if they are part of
given names. Consider this sentence:
2016-09-02 11:47:33 +02:00
```
I eat couches, e.g. Priscilla's brown one. They are nice.
```
2015-04-08 14:31:45 +02:00
It's not clear that "Priscilla'" does not start a new sentence, because it's very
similar to "They": Both words start with an uppercase letter and are placed
after a period and a space. But "Priscilla's" is just another word in the first
sentence!
This almost shows that the one-space methodology is insufficient, but not
completely. One can argue that if we know all valid abbreviations, we can just
check if a period is an end to an abbreviation or not, and determine that way
whether it's a sentence.
But this is only true if the abbreviation can be used in only one way! Read
this sentence:
2016-09-02 23:39:34 +02:00
```
2016-09-02 11:47:33 +02:00
I used to eat couches bef. I found the cow.
```
2015-04-08 14:31:45 +02:00
It uses the abbreviation "bef." for "before"; see
2016-09-02 11:47:33 +02:00
[http://public.oed.com/how-to-use-the-oed/abbreviations/](http://public.oed.com/how-to-use-the-oed/abbreviations/).
2015-04-08 14:31:45 +02:00
The sentence can be read in two ways: Either you read it as one sentence -- "I
used to eat couches before I found the cow" -- or you read it as two sentences
-- "I used to eat couches before." and "I found the cow."
Both are valid (at least if you accept that a preposition can be the last word
in a sentence).
I admit that that this example is a bit extreme. After all, most abbreviations
can be used only in unambigious ways. Nevertheless, it still shows that just
using a single space between sentences *is insufficient*!
Also, we have assumed that all abbreviations are known, which excludes temporary
(and to some extent field-specific) abbreviations. This is not good! It's much
easier to just use two spaces between your sentences!
2016-09-02 11:47:33 +02:00
# Two spaces and fixed width output
2015-04-08 14:31:45 +02:00
Due to my background/foreground as a programmer, I have a tendency to limit
myself to 80 characters per line, and write two newlines when I start a new
paragraph (just look at the source of this page).
This is just a choice of representation which works well in many cases, but I
won't write about that. The interesting thing is: How does this mix with using
two spaces between sentences? This can actually be a problem; look at this
sentence:
2016-09-02 23:39:34 +02:00
```
Bla bla bla bla bef. bla bla.
```
2015-04-08 14:31:45 +02:00
This is one sentence, as "bef." does not end the sentence. If we assume that
the line width is not 80 characters, but instead 16 characters, then the line
should be wrapped like this:
2016-09-02 11:47:33 +02:00
```
2015-04-08 14:31:45 +02:00
Bla bla bla bef.
bla bla
2016-09-02 11:47:33 +02:00
```
2015-04-08 14:31:45 +02:00
But now it's not clear if "bef." ends a sentence or not! If we want to turn the
fixed-width representation back into a simple line representation, we don't know
if we should insert one or two spaces after "bef.". How do we solve that?
The answer is that, when you line-wrap, you don't split word sequences separated
by ". ", .i.e. you see an abbreviation and its following word as a single word.
That way, you would end up with:
2016-09-02 11:47:33 +02:00
```
2015-04-08 14:31:45 +02:00
Bla bla bla
bef. bla bla
2016-09-02 11:47:33 +02:00
```
2015-04-08 14:31:45 +02:00
which would not cause any problems.
2016-09-02 11:47:33 +02:00
# General thoughts
2015-04-08 14:31:45 +02:00
Most natural languages have some amount of unambiguity, and part of it seems to
make some things easier, i.e. allowing speakers to be loose when talking about
stuff.
2015-04-08 14:34:22 +02:00
This other kind of ambiguity doesn't help anyone.