Post-thesis anti-chronological advice

I am mailing the following letter back in time to myself in the summer of 2005. That summer, I began the research that would ultimately become my DPhil in computational linguistics.

Dear Michel,

I hope you are enjoying your summer. The sun is bright, the weather warm and the beach as good place as any to learn how to code in Java. Six years hence, I am the last person to begrudge you this sunny start to research that will take about three years longer than you think it will. So enjoy.

But let me pass on a few pieces of advice that can all harken back to an old military saying – “Amateurs talk strategy; professionals talk logistics.” The big ideas are important, but in the end the hardest parts of what you will do is refining those big ideas into testable hypotheses that you then can actually test. Once you do that, it turns out you will have proof that you are up to this DPhil and have something worth submitting. Until then, you are an amateur.

Don’t get me wrong. Right now, you are bursting with ideas for how to answer the most interesting questions at the frontiers of computational linguistics. And that is a very good thing. Those ideas will keep going through the cruel days it will take you to fully understand how to compute an Eigenvalue Decomposition or a G-Function. Enjoy them. Let them turn over in your mind as you have that second beer.

That said, please, for my sake, write them down –And not on the window in your room. You are going to move more than once over the next six years and the landlord just might come in when you are not home and clean off the mess. To be sure, the best of what you come up with will come back to you at some point, at least as far as I remember. But still, keep track of your ideas, your research and your sources. Maybe start a journal with dates and short, legible explanations. Six years from now when you are editing your bibliography in BibTeX (it’s a programming language used for copyediting, don’t worry about it now), you will thank me. It turns out that (Brown 2000) is kind of ambiguous as citations go and hunting them down all over again made me very mad at you for being so lazy.

Another thing to keep in mind is that you should not underestimate your capacity needs. Too much time and energy will be spent trying to fit what you want to do in the resources you have at you deposal. It is easier (and if you value your time cheaper) to just get more resources. Right now, you are coding on that trusty laptop you bought before you started school. To be sure, it will serve you surprisingly well. You will find that it will be up to the data processing needed for your MPhil thesis, but you will have to put it on its side on the floor with a desk fan at its back to prevent it from overheating.

Ultimately, you will build a small city of towers. ‘Towers?’, your thinking, ‘Isn’t that a little excessive?’ No. It’s not. You will get a number of them over the next few years. One will explode – loudly – and fill your apartment with the smell of burnt metal. Ultimately you will have about four sometimes five, even six on the bad days, running at any given time.

Which leads me to a very important point; do not underestimate your ability to confuse yourself. Four to six computers running at the same time is surprisingly complex to manage. The obvious answer will be trying to figure out a way of networking them together and threading your code. Don’t waste your time. This is exactly what I mean about not confusing yourself. Your thesis is in computational linguistics. You use computer science; you don’t study it. If you want to inject an unholy amount of complexity and confusion into the next six years of your life, network your computers.

Or you could do what you will inevitably do anyway. Just copy the essential bits of your data five or six times and network them via your sneakers, flash drives and GMail. Simple, dirty, cheap, but it works.

If you are worried that doing this means that you are not sufficiently challenging yourself, don’t be. Even this will continue to surprise you with opportunities to confuse yourself. At 330AM, when you have searched every line of code to find where you are dividing by zero because it seems infinites have apparently recruited all of the numbers in your results into their Borg, you will realize you are actually looking at code you wrote three months ago on a different machine. You can formalize this into a convenient equation: Self = Confusion/0.

This is how you will get to Sesame Street. Yes, sunny days, chasing the clouds away Sesame Street. High-resolution pictures of everyone in the neighborhood are freely available on Google Images and as desktop wallpaper, they will give a face to each of the computers you will have churning away over the next few years. That way when you finally get your act together and begin a proper coding log, you can amuse yourself with entries like, “Bert exploded after the power source apparently overheated, hard drive recovered.”

The hilarity never ends and some people, like your future wife, will argue that it never started. To be sure this turning your computers into Sesame Street characters is evidence of infantile regression and perhaps some light madness. But if there is one thing that is true about writing a DPhil, particularly one that demands extensive computational work, it is that it is a long, lonely process. More than once you will ask yourself why you are doing this at all, especially after you take that job in a few years and the thesis fills each and every one of your nights, weekends and days off.

But you will find as you are looking at Ernie at 215AM on a Saturday night with ‘Rubber Ducky’ stuck in your head as you again try to optimize and debug your code that you have actually come a long way. You have gotten efficient at coding. You have actually learned statistics and a fair bit of upper level linear algebra. Math is no longer voodoo to you, which is an accomplishment in itself. Indeed, when the lawyers at work ask you about your thesis and you begin to explain it, they will squint at you with the suspicion they would use if they suspected you were concealing a prior felony conviction. (Oh, and by the way, don’t tell them the Sesame Street thing. That makes the squinting much harder.)

The last piece of advice is to be grateful for all of those (real people) around you. When your advisor tells you that a certain experiment or data are crap, believe him and move on, he’s right. Take your future wife on plenty of dates. She will be surprisingly understanding of your need to spend all of your free time on Sesame Street, the SATA cables and menagerie of components with which you decorate the apartment and the occasional chorus of expletives from the other room that wake her in the middle of the night. And be grateful for your future self, who was kind enough to write you this letter. He has just saved you a lot of time and aggravation.

Your friend,

Michel