Simulating a Cell and what you can do with it.


Yowza! A bunch of guys [of both genders]  have simulated an entire cell, in a computer, starting from the biochemistry.  (Jonathan R. KarrJayodita C. Sanghvi, Derek N. Macklin, Miriam V. Gutschow, Jared M. Jacobs, Benjamin Bolival Jr., Nacyra Assad-Garcia,  John I. Glass, and Markus W. Covert; Cell, Volume 150, Issue 2, 20 July 2012, Pages 389–401)   In the best scientific tradition, these guys stood on the shoulders of giants, referencing more than 900 other papers that did the hard work of measuring how real cells behave in the real world.

Admittedly, it’s a very simple bacterium [only 521 genes, 583k DNA base pairs], but I didn’t expect anyone to do this for another decade. Is an extremely powerful result. Once you have a simulation, you can make it better with very simple experiments. Just put a cell on a microscope, give it a dose of some chemical, and watch how it responds. The simulation should do the same thing. If it doesn’t, you adjust the simulation and try again.

We know a lot of automated techniques for improving a mathematical model, once it’s half-way accurate.   So, it’s relatively straightforward to improve the model.  [We’re using the scientific/engineering meaning of “straightforward”.  Here, the word means “something that can pretty certainly be done if you have a bunch of smart people working for a few years.”]  The process won’t be strictly computational.  Computers can only improve the simulation by adjusting it in ways that their programmers imagine.  When the imagination fails, if the simulation’s designers have missed something, then the improvement process will stop with an imperfect (and possibly misleading) model.  The researchers will then scratch their heads, try to enlarge their imagination, re-design the simulation, and try again.  It’s a back-and-forth process between the data, the computer, and the human imagination.  [The computer simulation is a powerful tool: it lets you rapidly refine and test your imagination against the real data.  And, best of all, having a simulation forces you to attend to details and keeps you honest.  You can’t sweep anything under the rug: given an incomplete model, the implacable math of the simulation will give you a result that doesn’t match the real data.]

It’s easy to imagine an automated process.   A robot selects a bacterium, aims a microscope at it, doses it with some combination of interesting chemicals, and measures how it responds.  You could easily design the robot to measure size changes.  Or, you have the robot dose the bacterium with some sort of fluorescent tag that binds to one of the bacterium’s proteins.  The total strength of the fluorescence could be automatically measured, and it would tell you how much of that protein exists in the bacterium.   The robot then goes through and does that experiment a million times with different combinations of chemicals, and you get an extremely rich data set.

The computers then set to work matching the cell’s response to each of those million different conditions to the behavior of the mathematical model.  They tweak the model until it does a good job of reproducing the experimental results in each of the conditions.  The beauty of this approach is that everything inside a bacterium affects everything else.  So, that even if you cannot directly measure some important molecule, you can measure it indirectly, because it will affect the things that you can measure.  Thus, given a sufficiently rich and diverse data set, if your model works on all possible conditions, you can be almost certain that everything in the model is correct.   [Actually, that’s only true if you get lucky.  All kinds of things can make life more difficult.  For instance, you could have two proteins that are both hard to measure, and they have much the same effect, and a surplus of one is compensated by a deficit in the other.   Then you’d be stuck.  You couldn’t indirectly measure either one individually; you could only measure the sum of the two together.]

That’s what’s going to be happening in cell biology for the next decade:  Big computer simulations of simple bacteria.   Big automated experiments that poke the bacteria in a wild variety of ways to see how they respond.  Algorithms running on computers to tweak the simulations until they match the experimental results.   Lots of biologists scratching their heads, happy that their simulation is getting better, but wondering why it doesn’t give the right answer when you raise the pH, then add acetylaldehyde and sodium ions.  [I wish I was a biologist!]

And, then soon enough, we will have damn good simulations of a few simple bacteria.   From then on, you work towards more complicated cells, step by step.  Each step will build on the techniques and the numerical results of the simple cells.  Human cells have about 23,000 genes and 3.2 billion base pairs, so our cells have a few hundred times more proteins than Mycoplasma genitalium.  The number of chemical reactions you need to understand to model a human cell might then be the square of (a few hundred) times larger: a simulation of a human cell might be 100,000 times more complex than the current work.  But it’ll happen, because we’ll be able to make incremental steps from simple cells to more and more complex ones.  [And, it’ll happen fairly fast, because we’ll be learning better techniques at each step.  Here “fast” might mean a few decades rather than a few years…]

And, somewhere in there, once you can simulate an interestingly complex cell, you can start simulating multicellular organisms.   All it takes is a pile of computers [and a bunch of smart guys scratching their heads for a few years, then testing their ideas, et cetera].  Will we ever simulate a human?  That depends on how small, cheap, and fast we can make computers, but since the human body has trillions of cells and an individual cell is pretty complex, we are certainly talking about a large pile of computers.   The naive approach, where you set one computer to simulating each cell, is absurd, but people will doubtless figure out ways to simplify the process.  For example, you have a lot (billions!) of fat cells, and most of them are probably doing more-or-less the same thing.  So, it might be possible to simulate only the cells that are interestingly different from their neighbors.  [Just simulate a few thousand fat cells, a few from each arm, a few from the back of the neck, a few around the chest, et cetera.]

Still, a full human simulation would involve the brain, and there probably all the cells are interestingly different from each other.  Or, if you were simulating the spread of cancer, you might have to look for rare processes where a cell falls off the main cancer, floats through the bloodstream, and sticks somewhere else.  That might also be hard.   But we’re into science fiction here: there’s no way to predict this far away.  It might be as close as a few decades, or maybe we’ll never get there.

[So, what does this mean to society and ethics?   Well, don’t get your degree in mathematical theology quite yet.   Also, don’t stop experimenting on animals quite yet.  Once we can simulate a mammalian cell, then we’ll be able to stop some animal experimentation, at least the simple cases where we are certain that all the interesting behavior happens within one cell.  But until we get to some sort of full-human simulation, there will still be a need for some animal experiments, if only to ensure that we can indeed use a single (simulated) cell as a proxy for the whole organism.]


2 responses to “Simulating a Cell and what you can do with it.”

  1. awesome!
    and there I through that some of the cell membrane inner workings wer poorly understood.

    perhaps now these proteins can be debugged in the simulation rather than femtosecond experiments?

    • That’s a good question. This isn’t a simulation from basic physics principles; it’s a simulation based on a lot of measured biochemical reaction rates. So, femtosecond experiments on membrane proteins are more of an input to the simulation than an output.
      Another way of looking at it is that this simulated bacterium is basically a web of chemical reactions. The state of the bacterium is specified by a vector listing how many of this kind of molecule you have, and how many of that kind. So, when we start optimizing this kind of simulated bacterium, we’ll be optimizing reaction rates: how fast X reacts with Y to form Z, but we won’t be learning about how each reaction happens.

      So, this kind of simulation and improvement won’t help you (much) if you want to understand how the reactions happen. We’ll need femtosecond experiments for that. It might eventually give better measurements of reaction rates, which could then (eventually, in principle) be used to constrain more detailed simulations of the individual reactions, but we’re not anywhere near that yet.

      FYI: the simulations don’t include predictions of protein folding, and they certainly don’t allow for the bacterium to change its genes.