<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Science and Language</title>
	<atom:link href="http://kochanski.org/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://kochanski.org/blog</link>
	<description>Slow blogging from the research side</description>
	<lastBuildDate>Sun, 15 Aug 2010 12:19:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Nature vs. Nurture and Linguistic Universals</title>
		<link>http://kochanski.org/blog/?p=399</link>
		<comments>http://kochanski.org/blog/?p=399#comments</comments>
		<pubDate>Sun, 15 Aug 2010 12:19:44 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[science and how it works]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[nature nurture]]></category>
		<category><![CDATA[prediction]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[UG]]></category>
		<category><![CDATA[universals]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=399</guid>
		<description><![CDATA[How much of human language is wired into our brains and part of the heritage of our biological evolution?  How much is a product of our culture and learned in childhood?  This is a long-standing question, and an important part of understanding ourselves. A good way to approach this question is to look for linguistic [...]]]></description>
			<content:encoded><![CDATA[<p>How much of human language is wired into our brains and part of the heritage of our biological evolution?  How much is a product of our culture and learned in childhood?  This is a long-standing question, and an important part of understanding ourselves.</p>
<p>A good way to approach this question is to look for linguistic universals.  A universal is something that is found in (almost) all languages.</p>
<p style="padding-left: 30px;"><span style="color: #003300;">It is an odd use of the word that universals don&#8217;t need to apply quite universally.  While odd, it is fairly sensible for three reasons.  First, there are a lot of languages out there, and many of them haven&#8217;t been studied very carefully.  So, we don&#8217;t really know all the world&#8217;s languages.   Second, there are many languages that have just gotten a season or two&#8217;s fieldwork by a graduate student, and a few of those quick studies will have reached the wrong conclusions.  So, the term has to allow for some mistakes.</span></p>
<p style="padding-left: 30px;"><span style="color: #003300;">The third reason is more subtle and interesting: culture can sometimes override biology.   Oh, not for everything: a cultural decision to breathe water woudn&#8217;t cause it to happen.  It&#8217;d just cause everyone to lie about it, as one sees in Stanislaw Lem&#8217;s book &#8220;The Star Diaries&#8221; (Mariner Books (1985)<strong> </strong>ISBN-10: 0156849054<strong>, </strong>ISBN-13: 978-0156849050).  But sometimes our physiology doesn&#8217;t constrain us completely: it just makes one behaviour easier/faster/better than another.  That&#8217;s the interesting middle ground where culture can sometimes override biology, but you can still have a universal preference for one behavior over another.</span></p>
<p>The logic behind the connection between universals and the nature/nurture question is that human biology is fairly uniform, so that biologically constrained aspects of language would be expected to be much the same from one language to another.  Shared features are not proof of a biological origin since culturally determined aspects can spread due to contact, trade and politics.</p>
<p>However, any uniformity we see carries a suggestion that biology is important.  To pick one example, Martian linguists visiting the Earth might reasonably guess that the universal fact that females speak with a higher fundamental frequency than males is caused by an underlying biological difference.  And, of course, they would be correct, even though there are a few high-pitched males and low-pitched women.</p>
<p>On the other side of the coin, substantial differences in linguistic implementation are strong evidence for the importance of culture.  We can be fairly certain that there is no biological basis behind the word we choose to express the color blue.  Once you get outside the family of related European languages, the word varies wildly.</p>
<p style="padding-left: 30px;"><span style="color: #003300;">Some of you may ask &#8220;Well, what about Chomsky and his Universal Grammar hypothesis?&#8221;   All I can say is that it best interpreted as a metaphor for a mixture of biological and cultural factors.  That hypothesis doesn&#8217;t seem to connect to modern neuroscience.  It&#8217;s also a bit vague: how many switches are there?   What do they switch?   And it hides a lot of its limitations under the rug, under the guise of distinguishing between performance and competence.  This may be the subject of a future blog post.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=399</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lost track of my algorithm</title>
		<link>http://kochanski.org/blog/?p=386</link>
		<comments>http://kochanski.org/blog/?p=386#comments</comments>
		<pubDate>Wed, 11 Aug 2010 23:33:41 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[publishing and copyright]]></category>
		<category><![CDATA[science and how it works]]></category>
		<category><![CDATA[techniques]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[citations]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[errors]]></category>
		<category><![CDATA[reproducibility]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[svn]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[version control]]></category>
		<category><![CDATA[who-watches-the-watchers]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=386</guid>
		<description><![CDATA[In a recent paper, I seem to have misled anyone who might try to replicate my work.  The paper is here, and the full reference is: &#8220;Long-Range Prosody Prediction and Rhythm&#8221;, Greg Kochanski, Anastassia Loukina, Elinor Keane, Chilin Shih and Burton Rosner, University of Oxford, Speech Prosody 2010 100222:1-4. (Note that &#8220;100222&#8243; is the volume [...]]]></description>
			<content:encoded><![CDATA[<p>In a recent paper, I seem to have misled anyone who might try to replicate my work.  The paper is <a href="http://kochanski.org/gpk/papers/2010/SpeechProsody/">here</a>, and the full reference is:</p>
<p>&#8220;Long-Range Prosody   Prediction and Rhythm&#8221;, Greg Kochanski, Anastassia Loukina,   Elinor Keane, Chilin Shih and Burton Rosner, University of   Oxford, <em>Speech Prosody 2010</em> 100222:1-4. <span style="color: #008000;">(Note that   &#8220;100222&#8243; is the volume number.)</span></p>
<p>The problem is in the sentence &#8220;&#8230;where <em>D</em> is the running duration measure from [3][9]&#8221; in Section 2..2, point 2.  Reference [3] is to a 2006 paper of mine that describes an algorithm for computing running duration.  <span style="color: #003300;">(G. Kochanski, E. Grabe, J. Coleman, and B. Rosner, “Loudness predicts prominence: Fundamental frequency lends little,” J. Acoustical Society of America, vol. 118, no. 2, pp. 1038–1054, 2005. <a href="http://kochanski.org/gpk/papers/2005/04pnp.pdf">here</a> )</span> Unfortunately, it&#8217;s <em>not</em> the same algorithm.  During the last four years, I had changed the algorithm that I routinely use.</p>
<p>The idea behind the algorithm is that it continuously measures how long each sound is stable.  If you make an &#8220;oooo&#8221; sound that lasts 2.0 seconds, the algorithm should give a value near 2.0 at any moment within that sound.   Then, if you follow it with a shorter &#8220;w&#8221; sound, the value the algorithm produces should be small inside the &#8220;w&#8221;.</p>
<div id="attachment_391" class="wp-caption alignnone" style="width: 460px"><img class="size-large wp-image-391" title="Running duation explanation" src="http://kochanski.org/blog/wp-content/uploads/2010/08/explanation-1024x541.jpg" alt="Schematic plot of how the running duration algorithm works." width="450" height="237" /><p class="wp-caption-text">Some words, and the running duration values that the algorithm might produce (schematic).</p></div>
<p>For those of you who like to think in terms of phonemes, you can approximate it as a bit of pseudocode:</p>
<pre>for all times t in the data file {
    find out which phoneme you are in at time t
    the returned value at time t is the duration of that phoneme.
    }</pre>
<p><span style="color: #003300;">(I emphasize that the above pseudocode is wrong in every detail.  It&#8217;s just intended to help get your brain around the idea of a continuously changing measurement of duration.  The real algorithm can be found <a href="http://kochanski.org/gpk/code/speechresearch/voicing/">here</a> by going to the &#8220;pseudoduration&#8221; script.)</span></p>
<p>Here&#8217;s a comparison between the old algorithm (<span style="color: #993300;">red</span>) and the new one (<span style="color: #000080;">blue</span>).  You can see that the two algorithms match pretty well in the loud regions, but are drastically different in the silences.    <span style="color: #003300;">(Silences are the vertical dark regions (the darkness shows the loudness at each point), as you can see from the green speech waveform.  This is computed from the l-reg3-f1 audio file in the IViE corpus; it&#8217;s the same audio file used in Figure 1 of the 2005 paper.)</span></p>
<div id="attachment_393" class="wp-caption alignnone" style="width: 610px"><img class="size-full wp-image-393    " title="comparison" src="http://kochanski.org/blog/wp-content/uploads/2010/08/comparison.png" alt="The red (2005 paper) and blue (2010 paper) curves track each other fairly well, except in the silences.  This shows three seconds of audio." width="600" height="420" /><p class="wp-caption-text">Comparison between 2005 (red) and 2010 (blue)  versions of the running duration algorithm.The speech waveform is in green.</p></div>
<p>The new algorithm gives a more sensible value in silences: it&#8217;s value is approximately equal to the length of the silence.  The old version just gave a small value near zero.  The new algorithm is also written to be closer to the way the ear processes sound: it uses frequency bands that match the ear&#8217;s and includes the knowledge that perceived loudness is approximately the cube-root of the acoustic power in each band.  So, I think it&#8217;s an improvement.</p>
<p>But, certainly, it&#8217;s not the same algorithm, so inserting reference [3] was not correct.  Abetter sentence would have been &#8220;&#8230;where <em>D</em> is the running duration measure from [9], which was evolved from the algorithm presented in [3]&#8220;.</p>
<p>How did it happen?  Primarily, I was in a rush.  The scientific world is very much a case of &#8220;publish or perish&#8221;, and the deadline for a good conference was approaching.  It was very easy just to grab the reference where I had first described the idea, and not ask myself if it had changed since then.</p>
<p>Partially, I was thinking about saving space.  These conference papers are strictly limited to four pages, which is not a lot to describe an experiment that involves complicated software.  Add eight words here, and you have to remove eight words somewhere else.  So, one trims the descriptions to the bone, ideally, leaving just enough for another expert to understand.  This time, it was trimmed a bit too much.</p>
<p>Loose ends:</p>
<ul>
<li>I haven&#8217;t checked the algorithm in reference [9] to make sure it&#8217;s identical to the one described here.  That&#8217;ll involve excavating an old version of my code.   That&#8217;s possible: everything is kept in a subversion repository (except the really old stuff which is in CVS), so I have all the old versions.   Subversion lets me know what code was being used on what day (almost).   The &#8220;almost&#8221; revolves around the fact that subversion keeps track of the code that you have committed to a repository, not the actual code that is doing the computation, so there will still be some ambiguity.</li>
<li><a href="http://www.ling.upenn.edu/~myl/">Mark Liberman</a> tells me that the value of &#8220;C&#8221; in the 2005 published description is wrong.  <span style="color: #003300;"> </span> That seems to be true, but I haven&#8217;t yet sorted out what the correct value is.  I have done some excavations in CVS and  the description of the old algorithm seems correct, but the &#8220;C&#8221; value doesn&#8217;t match.  My suspicion is that &#8220;C&#8221; may have come from some intermediate version of the code, afterwards, when I was writing the paper.   However, the figures in the 2005 paper (e.g. Figure 1) were certainly computed with the same algorithm that was used to produce the results.  So, I believe the 2005 paper is be entirely consistent, except that &#8220;C&#8221; is wrong.  I don&#8217;t think this problem affects any of the conclusions; it just makes it harder for someone to reproduce them.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=386</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What would language be like without words?</title>
		<link>http://kochanski.org/blog/?p=375</link>
		<comments>http://kochanski.org/blog/?p=375#comments</comments>
		<pubDate>Sat, 24 Jul 2010 19:50:07 +0000</pubDate>
		<dc:creator>aloukina</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[techniques]]></category>
		<category><![CDATA[Chinese]]></category>
		<category><![CDATA[compound words]]></category>
		<category><![CDATA[definitions]]></category>
		<category><![CDATA[English]]></category>
		<category><![CDATA[German]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Mandarin]]></category>
		<category><![CDATA[Mohawk]]></category>
		<category><![CDATA[morpheme]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[syllable]]></category>
		<category><![CDATA[Turkish]]></category>
		<category><![CDATA[words]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=375</guid>
		<description><![CDATA[Anastassia Loukina and Greg Kochanski As part of a research project, we are looking to describe speech in ways that can be applied uniformly across the whole variety of human languages.  This might seem to be a straightforward task, but it isn&#8217;t.  One of the reasons is that familiar terms in English often don&#8217;t apply [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Anastassia Loukina and Greg Kochanski</strong></p>
<p>As part of a research project, we are looking to describe speech in ways that can be applied uniformly across the whole variety of human languages.  This might seem to be a straightforward task, but it isn&#8217;t.  One of the reasons is that familiar terms in English often don&#8217;t apply well in other languages.  Even the term “word” causes trouble.</p>
<p>The notion of &#8216;word&#8217; (&#8216;<span style="color: #800000;">lexis</span>&#8216;) as many other linguistic terms we still use today  appeared  in the writings of ancient philosophers.  The Alexandrian grammarians defined the word as the smallest element of a sentence which has a certain meaning.  Even though this definition is rather vague, ancient grammarians would have had little trouble breaking a sentence into separate words.  That&#8217;s because in languages like Classical Greek or Latin most words are easily identified by grammatical markers which characterize the whole word.  For example, the ending of a noun would indicate the case and number, while the ending of a verb would tell us about the tense, voice, mood and other properties.  Thus &#8216;<span style="color: #800000;">numerus impar</span>&#8216; (&#8216;odd number&#8217;) clearly constitutes two words, since if we wanted to say &#8216;odd numbers&#8217;, each of the words would acquire its own plural marker and thus we would have &#8216;<span style="color: #800000;">numeri impares</span>&#8216;.</p>
<p>The model of language description developed in Antiquity for Greek and Latin, was used in modern times to describe modern European languages as well as other world languages.  However, a model developed for one language does not always work well for other languages.  The definition of a word as something with a set of its own grammatical markers works well for many modern languages such as Russian, Spanish or Arabic. For example,  the plural of Spanish &#8216;<span style="color: #800000;">número impar</span>&#8216; is  &#8216;<span style="color: #800000;">números impares</span>&#8216;.</p>
<p>However, even in English such grammatical criterion hardly works. &#8216;Odd numbers&#8217;  is not really different from &#8216;oddballs&#8217; – in both cases only the second element has a plural marker. However, &#8216;oddball&#8217; is spelled without spaces and features in the Oxford English dictionary as a separate word, while most people would agree that &#8216;odd number&#8217;  consists of two words. In fact, if one looks at languages other than English, it gradually becomes apparent that “What is a word?” is a bad question.  It&#8217;s not that it has no answer, instead, the problem is that it has several reasonable answers that often disagree.</p>
<p>I won&#8217;t try to give a dictionary definition of what a word is any more than I could give a dictionary definition of what  life is.  In either case, the best you can do is to list some examples and some important properties. Ultimately, a word is something that behaves like other words.</p>
<p>Words, in English, are a coincidence of four separate ways of looking at language:</p>
<ol>
<li>Words are groups of letters surrounded by white space.</li>
<li>Words are small chunks of language that have a clear meaning and are not made of words.  <span style="color: #008000;">(If this were a technical paper, I&#8217;d mention &#8220;morphemes&#8221; here.)</span></li>
<li>Words are the things that you can string together with a lot of freedom, and where many of the combinations make some kind of sense.</li>
<li>Words are clumps of symbols (or sounds) that often occur together.</li>
</ol>
<p>All of these criteria are a little bit fuzzy, but in English they all point to the same group of things: what we call words.<span style="color: #008000;"> (These definitions work pretty well for English, but remember that they are designed for English.  They are not universal, God-given axioms.)</span></p>
<p>By the first definition, &#8220;house&#8221; is a word because it is  normally written with white space around it.  It&#8217;s also a word by the second definition because it has a clear meaning: a roof with some walls, and furniture inside.  &#8220;House,&#8221; is a unit where the pieces don&#8217;t have meaning on their own: there&#8217;s nothing much to say about &#8220;hou&#8221; or &#8220;se&#8221; or &#8220;ous&#8221;. <span style="color: #008000;"> (We can neglect the occasional fragment that turns out to be an unrelated word, like &#8220;Ho!&#8221;.  This just goes to show that language wasn&#8217;t designed by logicians or computer programmers.)</span></p>
<p>&#8220;Runner&#8221; is another example of the second definition.  Even though you can split it into two pieces that each mean something (&#8220;run&#8221; + a person who does it), the second part (&#8220;er&#8221;) isn&#8217;t a word on its own.</p>
<p>Words, can also be defined by their ability to be combined to make complete thoughts and sentences.  While some combinations of words are prohibited by grammar rules, a remarkable number of combinations are grammatical, and even more combinations will get an idea across even if their grammar isn&#8217;t quite right.  It may look a little funny in print, but if your partner asks you where her book has gotten to, and you say &#8220;Sofa.  Under.&#8221;  the message will be understood, she&#8217;ll look under the sofa and may not even notice that you weren&#8217;t grammatical.</p>
<p>And, fourth, the letters or sounds of a word are found together more often than chance would allow.   In English, this seems blindingly obvious.  &#8220;Of course,&#8221; one is tempted to say, &#8220;&#8216;hou&#8217; will be followed by &#8216;se&#8217;.  That&#8217;s because we talk and write in words.&#8221;  But, tempting as that may be, it&#8217;s not an explanation because it&#8217;s a circular argument that assumes its own answer.  A good way to answer the question is to harness a statistician to a computer and feed both on a diet of sounds that a baby might hear. Certain combinations of sounds then do turn up more often than others.  In English, these common combinations contain a lot of simple words.</p>
<p>For the case of &#8220;house&#8221; and “elephant” and many other English words, all of these rules agree.   But that&#8217;s not true for every word in the language.  Take &#8220;bookcase&#8221;.    It has a clear meaning, but it is made of the words &#8220;book&#8221; and &#8220;case&#8221;, so it&#8217;s not a word by the second definition, even though it meets the other three.</p>
<p>Or, take “dental floss”.   That&#8217;s not normally thought of as a single word, but the two parts occur together far more often than chance.  “Dental” is not a common word (it&#8217;s is 0.001% as common as “the”) but “dental floss” is more common than “the floss”; the two words stick to each other amazingly well.  Statistically, “dental floss” can be thought of as a single word with a space in it.  Much like “bookcase”, wherever you find one part, you usually find the other.  Here in the UK, the pair “dental floss” is associated almost as strongly as “candy floss”, which acts as a compound word, and is often spelled with a hyphen, or with the two parts jammed together.</p>
<p>Another English &#8220;word&#8221; that doesn&#8217;t match all the definitions is &#8220;of&#8221;.  It doesn&#8217;t have a clear meaning of its own, so it fails the second definition.</p>
<p>If we look back before about 700 ce, the whitespace criterion fails because spaces were not used in Latin.  <span style="color: #008000;">(And, before the 1800s, it failed for most people because not many people knew how to write and there are no white spaces in speech.)</span> Even now there are compound words that can be written either with or without a space in the middle: &#8220;sandstorm&#8221;, &#8220;paper clip&#8221;, &#8220;whitespace&#8221; and &#8220;downstairs&#8221; are examples.  &#8220;Aircraft&#8221; is now spelled without a space, but before 1930, its common spelling had a space.</p>
<p>So, in English, this four-way definition of a word is just an approximation. There are disagreements between the four definitions, but the disagreements are few enough so that the idea of a word is useful.</p>
<p>However other languages do things differently.   Some are much more enthusiastic about forming compound words; others might avoid spaces.  Some might have many more fragments that can be used to construct words than English does.  In many languages, the disagreements among the four ways of defining a word can be severe and frequent.  When that happens, the language doesn&#8217;t really have “words” any more.  When this happens, it doesn&#8217;t mean that the language is entirely alien and unimaginable.   Rather, it can mean that the language makes heavy use of something that exists &#8212; but is rare &#8212; in English.  <span style="color: #008000;">(Or, from a foreign point of view, English only makes weak use of some things that can be important parts of other languages.)</span></p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } --> <!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } -->Mandarin (Chinese) is an example of a language which is brimming with compound words.   For example, the word “bookcase” comes from the combination of <span style="color: #800000;">s<span style="color: #800000;">h</span></span><span style="color: #800000;">û </span>(book) and <span style="color: #800000;">jià</span> (case), the word “computer” is from <span style="color: #800000;">diân</span> (electronic) and <span style="color: #800000;">nao<span style="color: #999999;">[tone 3]</span></span> (brain), and the word “airplane” is from <span style="color: #800000;">fêi</span> (flying) and <span style="color: #800000;">jî</span> (machine).  Chinese also makes compounds out of two syllables with similar meanings.  For example, the word “debate” is the combination of <span style="color: #800000;">biàn</span> (debate) and <span style="color: #800000;">lùn</span> (discuss), and the word “clean” is <span style="color: #800000;">qîng</span> (clear) and <span style="color: #800000;">jié</span> (clean).  Individually, these compounds are really not much different from “bookcase” or “aircraft” in English; the difference is that most Mandarin “words” are formed this way, instead of just a few in English.  In Mandarin, the boundary between word and compound is blurry.</p>
<p>So, if we follow our first definition (something that has meaning and isn&#8217;t made of words), then we&#8217;d have to decide that most “words” in Mandarin have just one syllable.  But, these compounds tend to act as units so the other definitions point to most Mandarin words having two syllables.  The definitions do not agree, and this makes many linguists reluctant to talk about words in Mandarin.  And, it&#8217;s not just linguists: if you ask several native Mandarin speakers to chop a typical Mandarin sentence into words, there will probably be disagreement about some of the words.  Usually, someone will want to break a four-syllable “word” into a pair of two syllable “words”, or to break a two-syllable “word” into a pair of single syllables.</p>
<p>So, while a English writer may need to think about choosing between “sea shore” or “seashore”, a Chinese writer doesn&#8217;t try to make a distinction between “words” and compounds.   All are written the same way; words are not set apart by white-space in writing.   So, for Mandarin, the “&#8230;surrounded by spaces&#8230;” definition isn&#8217;t very useful at all.</p>
<p>German is also full of compound words such as <span style="color: #800000;">“schaumgummimatratze”</span> (foam rubber mattress), which can easily be split into three parts corresponding to the words in the English translation.  So, Mandarin is hardly unique.</p>
<p>Other languages compose large word-like things out of small bits.  These are “synthetic” or “polysynthetic” languages.   In these languages, things you might think of as a word can have seven syllables and be constructed from several ideas, carrying as much information as a phrase in English.   But, unlike Mandarin, fragments as small as a single sound (like “-t-”) can carry a meaning.   <span style="color: #008000;">(Much like the possessive apostrophe-”s” in English.)</span> In these languages, the small fragments aren&#8217;t really words because they fail the test of mixing and matching, and they fail the white space test.   On the other hand, the large compositions aren&#8217;t good words because the fail the first test (“small chunks and not made of words”).</p>
<p>In these languages, the small fragments aren&#8217;t really words because they fail the test of mixing and matching, and they fail the white space test.   On the other hand, the large compositions aren&#8217;t good words because they fail the first test (“small chunks and not made of words”).  So, what is a word?   In many languages, that&#8217;s just not a good question to ask.    And, what would language be like without words?   It would be a little like Mandarin, a little like Turkish, a little like Mohawk.  It would also be a little like English, if you exaggerated English in the right directions.  So, back at the research project, we decided not to use the word “word” in our descriptions of languages, because even though it may work well for Russian, it&#8217;s a sloppy idea for English and doesn&#8217;t work very well at all for Mandarin.</p>
<p><em>Thanks to <a href="http://www.phon.ox.ac.uk/coleman">John Coleman</a> and <a href="https://netfiles.uiuc.edu/cls/www/">Chilin Shih</a> for comments and information.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=375</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>British Association of Academic Phoneticians</title>
		<link>http://kochanski.org/blog/?p=368</link>
		<comments>http://kochanski.org/blog/?p=368#comments</comments>
		<pubDate>Tue, 18 May 2010 09:07:02 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[kudos]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[people]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=368</guid>
		<description><![CDATA[The following youngish people impressed me at the conference: Claire Nance (Glasgow) Ruth Cumming Timothy Mills Tom Starr-Marshal Emina Kurtic (Sheffield) That&#8217;s not to say there weren&#8217;t other impressive people there; but these are the people who (a) I noticed, and (b) I managed to write down, and (c) were people I didn&#8217;t know already. [...]]]></description>
			<content:encoded><![CDATA[<p>The following youngish people impressed me at the conference:</p>
<ul>
<li>Claire Nance (Glasgow)</li>
<li>Ruth Cumming</li>
<li>Timothy Mills</li>
<li>Tom Starr-Marshal</li>
<li>Emina Kurtic (Sheffield)</li>
</ul>
<p>That&#8217;s not to say there weren&#8217;t other impressive people there; but these are the people who (a) I noticed, and (b) I managed to write down, and (c) were people I didn&#8217;t know already.</p>
<p>BAAP is a friendly conference.   It&#8217;s not a &#8220;hot&#8221; conference, but it encourages student presentations, some interesting stuff appears, and it&#8217;s a good place to talk to people.  It&#8217;s low-key: people aren&#8217;t so busy bragging and advertising as they are at many conferences.  And, it&#8217;s a small conference so you can talk to nearly everyone and see just about all that&#8217;s going on.</p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=368</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nifty instructional technology</title>
		<link>http://kochanski.org/blog/?p=363</link>
		<comments>http://kochanski.org/blog/?p=363#comments</comments>
		<pubDate>Fri, 16 Apr 2010 05:07:39 +0000</pubDate>
		<dc:creator>hadmin</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[techniques]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[wild ideas]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[feedback]]></category>
		<category><![CDATA[gadgets]]></category>
		<category><![CDATA[impact]]></category>
		<category><![CDATA[man-in-the-loop]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=363</guid>
		<description><![CDATA[I&#8217;ve been taking my daughter on a tour of U.S. universities, because next autumn, she&#8217;ll be applying to them.  And some of these universities let you sit in on some of their classes: among the ones we visited, Stanford, Princeton, MIT, and Harvard do.  And much of it is what you&#8217;d expect: the historians are [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been taking my daughter on a tour of U.S. universities, because next autumn, she&#8217;ll be applying to them.  And some of these universities let you sit in on some of their classes: among the ones we visited, Stanford, Princeton, MIT, and Harvard do.  And much of it is what you&#8217;d expect: the historians are excellent story tellers, and the statisticians are always struggling to pass an appreciation of their complex subject on to people who would rather just use it as a tool.</p>
<p>But there was one nifty gadget at MIT (where else?).  The obvious bit are little 12-button &#8220;remote controls&#8221; at all the tables.  There&#8217;s one per student and a few extras.   The professor uses them to do mini-surveys to see how many people are following the lecture.   He pops up a slide with a multiple-choice question, a timer ticks down for 30 seconds, then up pops the percentage of people who voted for each answer.</p>
<p>To the system, students are anonymous: this is not part of the grading process.  It is there to provide feedback for the professor; it is part of the learning process.  <span style="color: #008000;">(People often confuse grading and learning, perhaps because both happen in schools, but they have very different goals and are often in conflict.)</span></p>
<p>The system was impressive: the first time, 35% of the students got it right (chance=25%), and the professor groaned and said &#8220;I&#8217;m not going to ask how many of you did  the reading.&#8221; so he did a whiteboard example  on the topic.  <span style="color: #008000;">(The guy had a very good heart-felt groan, and he did a good job of adapting the class on the fly to the survey results.)</span> And the example worked (or perhaps the students got their brains in gear) because the later mini-surveys got about 85% correct (these were different questions, but the same general topic).</p>
<p>This system sounds like good educational technology.   You can  know in 1 minute whether everyone is confused or not.  We had four questions during a 90 minute class.    The class were comfortable with the system; they&#8217;d have quick little discussions with their neighbours, and 80% or 90% would answer.  It worked extremely smoothly.</p>
<p>Knowing whether people are confused or not should be very useful.  Even in small classes where you can read body language and ask questions, it can be hard to tell.  In a lecture, it is harder.  It is easy to imagine that it could improve teaching by 10%, 20%, maybe more, just by reducing the amount of time that people spend in a confused state. <span style="color: #008000;"> (Though you&#8217;ll never eliminate all confusion.  In this class, we actually had two heart-felt groans, but the second one happened after the professor did the survey problem wrong, and before he sorted himself out.  Oh well.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=363</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Random interesting bits.</title>
		<link>http://kochanski.org/blog/?p=354</link>
		<comments>http://kochanski.org/blog/?p=354#comments</comments>
		<pubDate>Mon, 11 Jan 2010 09:11:39 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[accounting]]></category>
		<category><![CDATA[self-delusion]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=354</guid>
		<description><![CDATA[Humans are good at finding patterns &#8230;even when none really exist.  For instance, in the locations of Woolworth stores, prehistoric sites, or constellations. Where did the money go? Wealth is a state of mind.]]></description>
			<content:encoded><![CDATA[<h2>Humans are good at finding patterns</h2>
<p>&#8230;even when none really exist.  For instance, in the locations of <a href="http://timesonline.typepad.com/science/2010/01/aliens-with-a-taste-for-pick-n-mix-woolworths-stores-follow-uncanny-geometrical-patterns.html">Woolworth</a> stores, <a href="http://www.metro.co.uk/news/807855-did-prehistoric-satnav-help-britons-find-their-way">prehistoric sites</a>, or <a href="http://www.windows.ucar.edu/tour/link=/the_universe/Constellations/north_constellations.html">constellations</a>.</p>
<h2>Where did the money go?</h2>
<p>Wealth is a <a href="http://www2.warwick.ac.uk/fac/soc/economics/staff/phd_students/backus/money/">state of mind.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=354</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keeping your history</title>
		<link>http://kochanski.org/blog/?p=352</link>
		<comments>http://kochanski.org/blog/?p=352#comments</comments>
		<pubDate>Tue, 05 Jan 2010 17:41:57 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[techniques]]></category>
		<category><![CDATA[archive]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[logs]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=352</guid>
		<description><![CDATA[Logbooks are important in research, but (realistically), they are completely useless if you do your work on the computer.    Back in grad school, I tried, but it was hopeless.   One spends all day printing things out and pasting them in; it is closer to a primary school art class than real research. If you program, [...]]]></description>
			<content:encoded><![CDATA[<p>Logbooks are important in research, but (realistically), they are completely useless if you do your work on the computer.    Back in grad school, I tried, but it was hopeless.   One spends all day printing things out and pasting them in; it is closer to a primary school art class than real research.</p>
<p>If you program, it&#8217;s even worse.   Suppose you change a few lines in the midst of a 1000 line program.   What do you print and glue into your logbook?  Nothing you can put there will actually be helpful.  So, use a logbook if you spend almost all your time in a real lab <span style="color: #008000;">(one with beakers)</span>, but the rest of us will have to do something else.</p>
<p>But, the idea behind logbooks cannot be ignored.  It&#8217;s hard to remember what you did when, in a complex project.   And, sometimes you will need to go back and check your work, or a reviewer of one of your papers will request more explanation, or whatever.  Without good logs, you may need to re-compute your entire paper from scratch. <span style="color: #008000;"> [Without logbooks, just think how embarrassing true honesty would be.  Suppose you work hard and analyze your heart out, and get a result that is statistically significant at the 0.01% level.  You'd have to report that there is a 25.01% chance that someone who repeats your experiment will get different results: 0.01% from the statistics and 25% because that's the odds you forgot to report some important part of the method.]</span></p>
<p>So, I do almost everything with scripts (that leaves me with a record of my computations), and the scripts leave log files, and intermediate data files have headers to show where they come from.  That catches the biggest parts of  the computations, but little things still slip through the cracks.</p>
<p>Now, I&#8217;ve figured out how to log all the commands I type into a terminal on Linux to preserve an archive, in case I need to go back and figure out exactly what I was doing.</p>
<p>The standard bash shell already has a mechanism for this.  It&#8217;s called the .bash_history file.  All we need to do is use that mechanism to make a permanent archive.   To do it, just add these few lines to your .bashrc file in your home directory:</p>
<pre>#This creates an archive of all shell commands typed:
test -d ~/history || mkdir ~/history
history_exit_trap() {
dhist=$HOME/history/${HOSTNAME:-H}-$((${LINES:-40}/8)),$((${COLUMNS:-80}/12))-D${#PWD}-T$(date +%Y-%m).txt
history -a $dhist &amp;&amp; echo "####EOH####" $(pwd) $(date +%Y-%m) &gt;&gt;$dhist
}
trap "history_exit_trap" EXIT</pre>
<p>The &#8220;<strong>test</strong>&#8230;&#8221; line creates a folder to store your achives.   Then, <strong>history_exit_trap</strong> is a shell function that appends the commands you have executed to an archive file.  It cooks up a filename that depends on the month, the size of your terminal, and the name of your working directory.  So, especially if your window manager preserves your windows, and if you don&#8217;t stretch your windows too often, commands from a given project will tend to end up in the same archive file.  Finally, the <strong>trap</strong> line arranges for the shell function to be called when your shell exits (<em>i.e.</em> when you log out and/or your terminal closes).</p>
<p>Do this, and your ~/history directory will fill up with files named like mace-5,7-D17-T2010-01.txt .  Internally, each one will look like this:</p>
<pre>ls
cd history/
ls
####EOH#### /home/gpk/history 2010-01
</pre>
<p>With a bunch of commands and then a EOH line with the date and your working directory.</p>
<p>(I thank <a href="http://www.onerussian.com/Linux/bash_history.phtml">Yaroslav Halchenko</a> and the <a href="http://www.bash-hackers.org/wiki/doku.php/mirroring/bashfaq/088">Bash Hacker&#8217;s Wiki</a> for inspiration.)</p>
<p>Note added Jan 15 2010:</p>
<p>The above scheme works nicely, except that it seems to turn off the normal .bash_history file.    My new scheme is to delete those lines from .bashrc and add the following stuff to .bash_logout.  (That&#8217;s a better place for it, anyway.)</p>
<pre>test -d ~/history || mkdir ~/history
dhist=$HOME/history/${HOSTNAME:-H}-$((${LINES:-40}/8)),$((${COLUMNS:-80}/12))-D${#PWD}-T$(date +%Y-%m).txt
history -a "$dhist" &amp;&amp; echo "####EOH####" $(pwd) $(date +%Y-%m) &gt;&gt;"$dhist"</pre>
<pre>if test $(($$%10))=0;
then
history -n "$HISTFILE" &amp;&amp; history -w "$HISTFILE"
else
history -a "$HISTFILE"
fi</pre>
<p>The first three lines create and write the $HOME/history directory.   The last six lines write to your normal .bash_history file.</p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=352</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tickbox Teaching via Essay</title>
		<link>http://kochanski.org/blog/?p=345</link>
		<comments>http://kochanski.org/blog/?p=345#comments</comments>
		<pubDate>Sun, 03 Jan 2010 00:44:02 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[publishing and copyright]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[communication]]></category>
		<category><![CDATA[essays]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[tickbox]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=345</guid>
		<description><![CDATA[Much of the effort Britain spends in educating its young involves forcing them to write essays to be graded.    Examinations consist of perhaps three questions, each of which gets a three page answer.  In response, of course, the teaching centres around writing essays.    After all, they need to practise for the exam papers.  And, also [...]]]></description>
			<content:encoded><![CDATA[<p>Much of the effort Britain spends in educating its young involves forcing them to write essays to be graded.    Examinations consist of perhaps three questions, each of which gets a three page answer.  In response, of course, the teaching centres around writing essays.    After all, they need to practise for the exam papers.  And, also of course, the essays that get written for practise come as close to the exam papers as possible.  &#8220;Don&#8217;t worry,&#8221; one of my daughter&#8217;s teachers told me recently, &#8220;by the time she&#8217;ll get to the exam, she&#8217;ll already have written on all the questions.&#8221;</p>
<p>Generally, essays are a good thing.  Communication is important and writing a good essay can force someone to confront the full complexity of the world, and sort through conflicting evidence.  A good essay will construct a logical argument based on solid facts that is simultaneously an interesting story.</p>
<p>Or will it?</p>
<p>A good paper, in the educational context, is whatever the examiners say is a good paper.  And, examiners, in an effort to be more objective, are treating papers more like box-ticking exercises. <span style="color: #008000;">[God help us when we <a href="http://www.ics.heacademy.ac.uk/Events/conf2003/james_christie.htm">automate</a> the process of marking essays.]</span> &#8220;Uh huh: mentioned pidgins.   No syntactic input, yup.  Creole, tick, Bickerton, tick.&#8221;  That makes essays much like multiple choice tests: you can imagine the student thinking is in resonance: &#8220;OK.  There&#8217;s my mention of Universal Grammar, now, somewhere I need to mention `critical period&#8230;&#8217;&#8221;</p>
<p>It would be easy to bemoan how standardization emphasizes the mediocre at the expense of the brilliant.  Instead, let&#8217;s ask a question that will eventually have a solid answer.  If we are to require students to mention certain facts and ideas in an essay, we can ask &#8220;What&#8217;s the odds that these alleged facts will survive?&#8221;  Suppose we were to step forward about 20 years (half of someone&#8217;s career) then look back at the exam papers.   Would we nod, laugh, or groan?</p>
<p>That might be a good goal:  to teach people a framework of facts and theories that will last a while.  But only a few facts, and only the best theories, because with the internet, they can search for everything else.</p>
<p>Facts are cheap these days.</p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=345</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Support is weak</title>
		<link>http://kochanski.org/blog/?p=334</link>
		<comments>http://kochanski.org/blog/?p=334#comments</comments>
		<pubDate>Fri, 01 Jan 2010 16:43:19 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[science and how it works]]></category>
		<category><![CDATA[techniques]]></category>
		<category><![CDATA[communication]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[disproof]]></category>
		<category><![CDATA[Karl Popper]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[paraphrase]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[theories]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=334</guid>
		<description><![CDATA[He stood up in front of the screen and said &#8220;Support for the theory is weak.&#8221; That&#8217;s not an unusual thing to hear at a scientific conference.  But, it was surprising in context because he had just spent 15 minutes systematically demolishing a theory (or so it seemed to me).  The theory was by a [...]]]></description>
			<content:encoded><![CDATA[<p>He stood up in front of the screen and said &#8220;Support for the theory is weak.&#8221;</p>
<p>That&#8217;s not an unusual thing to hear at a scientific conference.  But, it was surprising in context because he had just spent 15 minutes systematically demolishing a theory (or so it seemed to me).  The theory was by a couple of guys named Wilson and Wilson (2005).  Their theory made detailed predictions about when people would jump in on one another&#8217;s conversation in a dialogue.  I haven&#8217;t read their paper myself, but it seems to involve a model of two oscillators, coupled together.   The idea is that the phase of each oscillator tells whether or not each person is ready to speak.  &#8220;Coupled oscillators&#8221; means that the phase of one affects the oscillation of the other.</p>
<p>The speaker explained the predictions that the theory made, and, one by one, compared his data to what it predicted.  &lt;&lt;&#8230;and the prediction is for two peaks in the histogram, symmetrically before and after the end of the sentence&#8230;&gt;&gt;<span style="color: #008000;">[that's a <a href="http://kochanski.org/blog/?p=3">semi-quote</a>]</span> while the data he displayed showed a single peak, well after the end of the sentence.</p>
<p>I sat there, nodding, thinking &#8220;Good job that.&#8221; as he showed that one prediction after another failed to match his data.  Personally, I think it should be the goal in life for experimenters to shoot down a theory.  Theorists, after all have an easy life in a field like Cognitive Science or Linguistics.  Everyone likes theories because they are much easier to remember than a lot of experimental detail, and unlike us experimenters, theorists never have to walk in through a blizzard to meet an enthusiastic volunteer who has a day off from classes and nothing better to do than your experiment.</p>
<p>So, I wrote off that theory.  He had tested a bunch of clear predictions; they didn&#8217;t work; game over.   I thought, &#8220;Good try, theorists, you just lost one to the experimenters.   Come back with another theory some time and we&#8217;ll work that one over, too.&#8221;</p>
<p>But, he hadn&#8217;t written it off.  On his conclusions slide, I saw this: &lt;&lt; • Support for Wilson &amp; Wilson is weak.&gt;&gt;  Well, yes.  That could have been a proper British understatement if accompanied with a raised eyebrow and that special intonation that says &#8220;You and I know what I <em>really</em> mean.&#8221;  Except that it wasn&#8217;t, and he wasn&#8217;t British.</p>
<p>Except that I asked him.    I stood up and said that science sometimes proceeds by disproving things.   That he had presented an excellent paper and that he had done the a good job of disproving a theory, the best of all the papers that I&#8217;d seen at the conference.  That this would let us all focus on other theories, ones that might work.</p>
<p>But he didn&#8217;t believe his own work.   Despite having pointed out drastic differences between his work and the theory&#8217;s predictions, he didn&#8217;t want to conclude that there was something wrong with the theory.  &lt;&lt;There are some hints of the predicted effects&gt;&gt;, he said.  He flicked back a few slides and pointed to an insignificant bump on a histogram.  I can only guess that he was dedicated to proving the theory, not testing it or disproving it.</p>
<p>There is nothing too much wrong with hoping to confirm a theory, but if you&#8217;re not going to listen to your own data, why bother to collect it?</p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=334</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How much do they pay you to teach at Oxford?</title>
		<link>http://kochanski.org/blog/?p=331</link>
		<comments>http://kochanski.org/blog/?p=331#comments</comments>
		<pubDate>Tue, 29 Dec 2009 03:51:47 +0000</pubDate>
		<dc:creator>gpk</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[accounting]]></category>
		<category><![CDATA[colleges]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[efficiency]]></category>
		<category><![CDATA[finances]]></category>
		<category><![CDATA[pay]]></category>
		<category><![CDATA[scams]]></category>
		<category><![CDATA[taxes]]></category>

		<guid isPermaLink="false">http://kochanski.org/blog/?p=331</guid>
		<description><![CDATA[&#8230;a very un-British question.   But, given that I was raised in the USA, I&#8217;ll answer it. It depends on whether you are on the permanent teaching staff or not.  If you&#8217;re hired to teach, you might start near £37,000 as a University Lecturer, or start as a combination of a Faculty Lecturer position at around [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230;a very un-British question.   But, given that I was raised in the USA, I&#8217;ll answer it.</p>
<p>It depends on whether you are on the permanent teaching staff or not.  If you&#8217;re hired to teach, you might start near £37,000 as a University Lecturer, or start as a combination of a Faculty Lecturer position at around £22,000/year but also have some college responsibilities that would add a moderate chunk to the recipts.  <span style="color: #008000;">(Of course, when I say &#8220;start&#8221;, I mean after university and graduate school and maybe a post-doc position.  You &#8220;start&#8221; at age 26 or so.)<br />
</span></p>
<p>And, you might rise to be a full professor somewhere, but probably not Oxford.  People come and go to follow career opportunities.  Full professors are very improbable objects: very few people become one, either because they slipped up somewhere, got unlucky, or just decided the grass was greener outside academia.  But, despite their improbability, some manage to exist, and the ones that do earn about £62,000/year.</p>
<p>But if you&#8217;re not on teaching staff, teaching pay is low.</p>
<p>I taught some <a href="http://kochanski.org/gpk/teaching/0910statistics/">lectures</a> this last term, and my measure of how large the pay is, is that I haven&#8217;t even bothered to find out how much I was paid.  <span style="color: #008000;">(Have I been paid?   I wonder if I forgot to fill out some paperwork.) </span>Probably £50 per class, give or take a factor of two.  Because they were a new set of lectures, it took a full day of work to prepare each one. <span style="color: #008000;"> (And, I worried about that first lecture for all my spare minutes for the previous week or two&#8230;)</span> So, the actual pay turns out to be something like £6/hour, less if you count the worrying time.  But, of course, if I teach the same lectures next year, the worrying will be gone, the work will be less, and I may make £10/hour.  After a few years, once the course is well polished, I might manage to make £30/hour lecturing.</p>
<p>This sounds a bit like a complaint.   It is, and it isn&#8217;t.  I get to talk and people listen. <span style="color: #008000;">(Except for that tall woman who kept nodding off near the front, a bit to my right.  She has an amazing ability to sit bolt upright while semi- or un-conscious.  I kept expecting her to topple over or slouch, but no.)</span> I get the feeling that I&#8217;m making their lives easier by telling them stuff which I leaned by hard work and sweat: hopefully they won&#8217;t have to do as much.  And a good feeling of contributing to our little bit of culture and intelligence, tiny amidst a universe that is mostly filled with stuff no more intelligent than hydrogen atoms.  So, there are other motivations beyond the money.  <span style="color: #008000;">(Which is just as well.)</span></p>
<p>But, what I am complaining about is the stupid financial system of Oxford University and the colleges.  A year ago I tutored a bunch of students in general linguistics and psycholinguistics.  Each tutorial brings in £24 or so.  But the hours&#8230;</p>
<p>Some of the hours aren&#8217;t hidden.  To do a good tutorial, you need to read the last few tests and pick a topic that is likely to be tested.  Then, you need to scrounge around and find some good papers and book chapters for the student(s) to read, arrange the class via e-mail and assign the topic and readings.  Finally, you read the papers, give the students some sensible comments, and then see the students. All more-or-less good fun.  And finally, you go to a website and do a little paperwork to get paid.</p>
<p>Did I say finally?  I thought so, but a few weeks later the P45 forms started rolling in.  Those are tax-related forms saying that my employment was being terminated at one college after another.  Oxford, you see, is a University and a collection of colleges.  The colleges are each separate corporate entities and they contract with the University for examination services and some teaching.  I knew this in theory, of course.  But I hadn&#8217;t quite realized that I would become a temporary employee of each of the colleges from which I had a student.  Eight of them, as it turns out.</p>
<p>So, I was a college employee for a few hours, then terminated.  OK.  No problem.  Except that means that I had 9 employers last year.  And that broke the only truly wonderful thing about the British taxman.</p>
<p>As an American, and one who has done a little consulting and this and that, I&#8217;ve gotten used to filling in 40 page tax forms.  But I hate them.  Taxes are a necessity of civilization, but I really detest finicky paperwork, and finicky paperwork that costs money is worse.  It&#8217;s kind of like rubbing your nose in the loss.</p>
<p>So, coming to the UK was a relief.  Taxes were computed automatically.  The taxman slides the money quietly out of the University&#8217;s coffers, you don&#8217;t see it go, and you don&#8217;t have to fill out any forms.  (It&#8217;s not always as simple as this, but it can be.)  It is a painless process, leaving only a mild feeling of genteel poverty.   I loved the PAYE system.</p>
<p>But, that only works when you have a single employer.  Now I had nine.  So, this year, I have British <em>and</em> American tax forms.  One or two of my employers paid me just £24.  But because of this, I have to learn lots of British tax law (hours!) and fiddle with finicky forms that have semi-incomprehensible instructions (you have to learn specialized phrases like &#8220;the total of renewals&#8221;).  And, then I lost one of the damned P45 forms, and anyway, it&#8217;s added 30 hours of work to the total work of tutorials.  I think my pay for tutoring is about £5/hour again.  That&#8217;s <em>after</em> computing the taxes and <em>before</em> paying them. And that&#8217;s not counting the time spent worrying about doing the taxes right.</p>
<p>I am amazed that Oxford cannot figure out how to pay me without forcing me to spend as much time taxing as teaching.</p>
]]></content:encoded>
			<wfw:commentRss>http://kochanski.org/blog/?feed=rss2&amp;p=331</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
