What would language be like without words?

Anastassia Loukina and Greg Kochanski

As part of a research project, we are looking to describe speech in ways that can be applied uniformly across the whole variety of human languages.  This might seem to be a straightforward task, but it isn’t.  One of the reasons is that familiar terms in English often don’t apply well in other languages.  Even the term “word” causes trouble.

The notion of ‘word’ (‘lexis‘) as many other linguistic terms we still use today  appeared  in the writings of ancient philosophers.  The Alexandrian grammarians defined the word as the smallest element of a sentence which has a certain meaning.  Even though this definition is rather vague, ancient grammarians would have had little trouble breaking a sentence into separate words.  That’s because in languages like Classical Greek or Latin most words are easily identified by grammatical markers which characterize the whole word.  For example, the ending of a noun would indicate the case and number, while the ending of a verb would tell us about the tense, voice, mood and other properties.  Thus ‘numerus impar‘ (‘odd number’) clearly constitutes two words, since if we wanted to say ‘odd numbers’, each of the words would acquire its own plural marker and thus we would have ‘numeri impares‘.

The model of language description developed in Antiquity for Greek and Latin, was used in modern times to describe modern European languages as well as other world languages.  However, a model developed for one language does not always work well for other languages.  The definition of a word as something with a set of its own grammatical markers works well for many modern languages such as Russian, Spanish or Arabic. For example,  the plural of Spanish ‘número impar‘ is  ‘números impares‘.

However, even in English such grammatical criterion hardly works. ‘Odd numbers’  is not really different from ‘oddballs’ – in both cases only the second element has a plural marker. However, ‘oddball’ is spelled without spaces and features in the Oxford English dictionary as a separate word, while most people would agree that ‘odd number’  consists of two words. In fact, if one looks at languages other than English, it gradually becomes apparent that “What is a word?” is a bad question.  It’s not that it has no answer, instead, the problem is that it has several reasonable answers that often disagree.

I won’t try to give a dictionary definition of what a word is any more than I could give a dictionary definition of what  life is.  In either case, the best you can do is to list some examples and some important properties. Ultimately, a word is something that behaves like other words.

Words, in English, are a coincidence of four separate ways of looking at language:

  1. Words are groups of letters surrounded by white space.
  2. Words are small chunks of language that have a clear meaning and are not made of words.  (If this were a technical paper, I’d mention “morphemes” here.)
  3. Words are the things that you can string together with a lot of freedom, and where many of the combinations make some kind of sense.
  4. Words are clumps of symbols (or sounds) that often occur together.

All of these criteria are a little bit fuzzy, but in English they all point to the same group of things: what we call words. (These definitions work pretty well for English, but remember that they are designed for English.  They are not universal, God-given axioms.)

By the first definition, “house” is a word because it is  normally written with white space around it.  It’s also a word by the second definition because it has a clear meaning: a roof with some walls, and furniture inside.  “House,” is a unit where the pieces don’t have meaning on their own: there’s nothing much to say about “hou” or “se” or “ous”.  (We can neglect the occasional fragment that turns out to be an unrelated word, like “Ho!”.  This just goes to show that language wasn’t designed by logicians or computer programmers.)

“Runner” is another example of the second definition.  Even though you can split it into two pieces that each mean something (“run” + a person who does it), the second part (“er”) isn’t a word on its own.

Words, can also be defined by their ability to be combined to make complete thoughts and sentences.  While some combinations of words are prohibited by grammar rules, a remarkable number of combinations are grammatical, and even more combinations will get an idea across even if their grammar isn’t quite right.  It may look a little funny in print, but if your partner asks you where her book has gotten to, and you say “Sofa.  Under.”  the message will be understood, she’ll look under the sofa and may not even notice that you weren’t grammatical.

And, fourth, the letters or sounds of a word are found together more often than chance would allow.   In English, this seems blindingly obvious.  “Of course,” one is tempted to say, “‘hou’ will be followed by ‘se’.  That’s because we talk and write in words.”  But, tempting as that may be, it’s not an explanation because it’s a circular argument that assumes its own answer.  A good way to answer the question is to harness a statistician to a computer and feed both on a diet of sounds that a baby might hear. Certain combinations of sounds then do turn up more often than others.  In English, these common combinations contain a lot of simple words.

For the case of “house” and “elephant” and many other English words, all of these rules agree.   But that’s not true for every word in the language.  Take “bookcase”.    It has a clear meaning, but it is made of the words “book” and “case”, so it’s not a word by the second definition, even though it meets the other three.

Or, take “dental floss”.   That’s not normally thought of as a single word, but the two parts occur together far more often than chance.  “Dental” is not a common word (it’s is 0.001% as common as “the”) but “dental floss” is more common than “the floss”; the two words stick to each other amazingly well.  Statistically, “dental floss” can be thought of as a single word with a space in it.  Much like “bookcase”, wherever you find one part, you usually find the other.  Here in the UK, the pair “dental floss” is associated almost as strongly as “candy floss”, which acts as a compound word, and is often spelled with a hyphen, or with the two parts jammed together.

Another English “word” that doesn’t match all the definitions is “of”.  It doesn’t have a clear meaning of its own, so it fails the second definition.

If we look back before about 700 ce, the whitespace criterion fails because spaces were not used in Latin.  (And, before the 1800s, it failed for most people because not many people knew how to write and there are no white spaces in speech.) Even now there are compound words that can be written either with or without a space in the middle: “sandstorm”, “paper clip”, “whitespace” and “downstairs” are examples.  “Aircraft” is now spelled without a space, but before 1930, its common spelling had a space.

So, in English, this four-way definition of a word is just an approximation. There are disagreements between the four definitions, but the disagreements are few enough so that the idea of a word is useful.

However other languages do things differently.   Some are much more enthusiastic about forming compound words; others might avoid spaces.  Some might have many more fragments that can be used to construct words than English does.  In many languages, the disagreements among the four ways of defining a word can be severe and frequent.  When that happens, the language doesn’t really have “words” any more.  When this happens, it doesn’t mean that the language is entirely alien and unimaginable.   Rather, it can mean that the language makes heavy use of something that exists — but is rare — in English.  (Or, from a foreign point of view, English only makes weak use of some things that can be important parts of other languages.)

Mandarin (Chinese) is an example of a language which is brimming with compound words.   For example, the word “bookcase” comes from the combination of shû (book) and jià (case), the word “computer” is from diân (electronic) and nao[tone 3] (brain), and the word “airplane” is from fêi (flying) and jî (machine).  Chinese also makes compounds out of two syllables with similar meanings.  For example, the word “debate” is the combination of biàn (debate) and lùn (discuss), and the word “clean” is qîng (clear) and jié (clean).  Individually, these compounds are really not much different from “bookcase” or “aircraft” in English; the difference is that most Mandarin “words” are formed this way, instead of just a few in English.  In Mandarin, the boundary between word and compound is blurry.

So, if we follow our first definition (something that has meaning and isn’t made of words), then we’d have to decide that most “words” in Mandarin have just one syllable.  But, these compounds tend to act as units so the other definitions point to most Mandarin words having two syllables.  The definitions do not agree, and this makes many linguists reluctant to talk about words in Mandarin.  And, it’s not just linguists: if you ask several native Mandarin speakers to chop a typical Mandarin sentence into words, there will probably be disagreement about some of the words.  Usually, someone will want to break a four-syllable “word” into a pair of two syllable “words”, or to break a two-syllable “word” into a pair of single syllables.

So, while a English writer may need to think about choosing between “sea shore” or “seashore”, a Chinese writer doesn’t try to make a distinction between “words” and compounds.   All are written the same way; words are not set apart by white-space in writing.   So, for Mandarin, the “…surrounded by spaces…” definition isn’t very useful at all.

German is also full of compound words such as “schaumgummimatratze” (foam rubber mattress), which can easily be split into three parts corresponding to the words in the English translation.  So, Mandarin is hardly unique.

Other languages compose large word-like things out of small bits.  These are “synthetic” or “polysynthetic” languages.   In these languages, things you might think of as a word can have seven syllables and be constructed from several ideas, carrying as much information as a phrase in English.   But, unlike Mandarin, fragments as small as a single sound (like “-t-”) can carry a meaning.   (Much like the possessive apostrophe-”s” in English.) In these languages, the small fragments aren’t really words because they fail the test of mixing and matching, and they fail the white space test.   On the other hand, the large compositions aren’t good words because the fail the first test (“small chunks and not made of words”).

In these languages, the small fragments aren’t really words because they fail the test of mixing and matching, and they fail the white space test.   On the other hand, the large compositions aren’t good words because they fail the first test (“small chunks and not made of words”).  So, what is a word?   In many languages, that’s just not a good question to ask.    And, what would language be like without words?   It would be a little like Mandarin, a little like Turkish, a little like Mohawk.  It would also be a little like English, if you exaggerated English in the right directions.  So, back at the research project, we decided not to use the word “word” in our descriptions of languages, because even though it may work well for Russian, it’s a sloppy idea for English and doesn’t work very well at all for Mandarin.

Thanks to John Coleman and Chilin Shih for comments and information.