# Viewpoint: Just-in-time DNA replication

Complete and timely replication of the genome is a prerequisite to fulfilling the “dream” of every cell to become two cells . So far, biologists have been successful in identifying the processes involved in DNA replication, but they have not been able to explain a fundamental control problem that cells face, the “random-completion” or “random-gap” problem: how do cells ensure that every last piece of the genome is replicated on time ? In a paper in *Physical Review E*, Scott C.-H. Yang and John Bechhoefer of Simon Fraser University use insights from condensed-matter physics to answer this question . Using a physical model originally developed to describe the kinetics of first-order phase transitions, they show that, despite the intrinsic stochasticity of the initiation of DNA replication, cells can still control the amount of time it takes to replicate the genome. The authors thus provide a rigorous solution to a long-standing problem in cell biology. The elegance of their formal approach bridging physics and biology, and the depth of their analysis, should inspire scientists from both disciplines.

The heart of the problem is that the sites at which replication initiates are randomly distributed along the chromosomes of *Xenopus laevis* embryos, a frog widely used in cell biology experiments. There are on the order of ${10}^{5}$ so-called origins where replication can start in *Xenopus* embryos, and it was quickly realized that, if these origins were truly randomly activated, one would expect an exponential distribution of distances between origins. Such a distribution would include infrequent large gaps between origins, suggesting a total replication time longer than the 20 minutes observed in frog embryos. In fact, early workers concluded that origin distribution must not be random, for exactly that reason . However, over the years, experimental evidence for stochastic “firing” of origins has piled up. It is to this apparent conflict between stochastic origin firing and well-defined replication times that Yang and Bechhoefer bring analytical rigor.

Similar problems have confronted condensed-matter physicists. Consider a tray of water that is put into a freezer at time $t=0$. A short while later, the water is all frozen. What fraction $f(t)$ of water is frozen at time $t>0$? In the 1930s, several scientists independently derived a stochastic model that could predict the form of $f(t)$, and this “Kolmogorov-Johnson-Mehl-Avrami” (KJMA) model has since been widely used by metallurgists and other materials scientists to analyze phase-transition kinetics .

In the KJMA model, the kinetics of freezing results from three simultaneous processes: nucleation of solid domains, growth of existing domains, and coalescence, which occurs when two expanding domains merge (Fig. 1(a)). In the simplest form of KJMA, solid domains nucleate anywhere in the liquid, with equal probability $I$ for all locations. Once a solid domain has been nucleated, it grows out as a sphere, typically at constant velocity $v$. When two growing domains impinge, growth ceases at the point of contact, while continuing elsewhere. Later workers revisited and refined KJMA’s methods to take into account various effects, such as finite system size and inhomogeneities in nucleation rates $I(x,t)$ in space and time .

About ten years ago, Bechhoefer and colleagues, who have studied nonequilibrium processes such as the growth of snowflakes, made the connection that features of DNA replication can be mapped onto the basic assumptions of the KJMA model (Fig. 1(b)): (i) DNA replication starts at a large number of origins, where replication “forks” are created, (ii) DNA synthesis propagates at replication forks bidirectionally from each activated origin, with propagation speed or fork velocity $v$, and (iii) DNA synthesis stops when two replication forks meet. There is, however, one fundamental difference between the analysis of DNA replication and most other nucleation-and-growth systems. In crystal growth, for example, one is interested in $f(t)$ and the size distribution of “solid” and “liquid” domains for a known $I(x,t)$, whereas in DNA replication, $I(x,t)$ itself is the *unknown* quantity that is important in understanding how the cell regulates the replication process in space and time. In other words, $I(x,t)$ is the replication “program” that varies from organism to organism. For example, if all the origins are initiated at the beginning of replication, then $I(x,t)=\delta (t-{t}_{0})$, where ${t}_{0}$ is the start time. Alternatively, if every origin has an equal probability of initiation at any time, then $I(x,t)$ is a constant. The question becomes, given an observed $f(t)$, can one extract $I(x,t)$?

In a series of papers since 2002, Bechhoefer and colleagues have shown how one can map the DNA replication process onto the basic assumptions of the KJMA model . Importantly, by reversing the KJMA formalism, they managed to recover a spatially averaged, “mean-field” $I(t)$ from experimentally measured distributions of replicated and unreplicated domains of chromosome . To this end, they focused on the model system of *Xenopus* early embryo replication, in which data collection is relatively easy. It is also a perfect system to study the random-completion problem because, unlike cells of adult animals, which take many hours to replicate their genomes, these embryos finish everything in 20 minutes, making replication time a critical issue.

Biologists have proposed two solutions to the random-completion problem . The first is that replication avoids big gaps (Fig. 1(c)) altogether by using a nonrandom spacing mechanism. However, this model has received little experimental support. The second assumes there is an excess of potential origins that are randomly distributed and that origins that do not fire early in replication, but become more likely to fire as replication progresses, i.e., *I(t)* increases with time. The intuitive idea is that if a gap persists late in replication, it will be much more likely to have origins within it fire and thus get replicated in a timely manner. The drawback to this kind of model has been that it is not clear how robust a solution it would be.

Recently, various theoretical and experimental studies have strengthened the second view, and the emerging consensus is that there is a pool of potential origins present in *Xenopus* embryos and probably all other animal cells, much larger than the actual number of initiations during replication , and the probability of initiation increases steeply . However, these observations still did not completely solve the random-completion problem because the solution requires understanding the distribution (as opposed to the mean) of the replication timing for a given $I(t)$ and spatial distribution of potential origins. That is, knowing the average time it takes for replication to complete does not help; what one cares about is how often replication fails by taking longer than some threshold time $T$.

With this in mind, Bechhoefer and co-workers interpreted the time it takes to complete replication as a “first-passage” time ${t}^{*}$ of a stochastic process governed by probability $I(t)$, which concerns the distribution $\rho ({t}^{*})$ of a probabilistic event of interest to occur for the first time at time ${t}^{*}$ or, equivalently, as the largest value ${t}^{*}$ of the timing of collisions between two growing replication bubbles. For biological success, ${t}^{*}$ does not have to be less than $T$ for every cell, but the frequency of ${t}^{*}>T$ has to be less that some acceptable failure rate. This question belongs to the domain of extreme-value statistics (a branch of statistics that is also used to evaluate things like rare but catastrophic events), and the random-completion problem can be translated into finding conditions where $I(t,x)$ results in the observed average time to complete replication and the observed failure rate .

Yang and Bechhoefer have provided the final, clear answer to the random-completion problem: For cells to achieve an acceptable distribution of replication completion times, the initiation rate $I(t)$ should increase during replication (Fig. 1(d)), in agreement with extracted values of $I(t)$ from experimental data . They show that this model can produce arbitrarily low failure rates, but more importantly, that it can produce the observed failure rate using plausible parameters that also produce reasonable mean completion times. And finally, Yang and Bechhoefer show that their result is robust; the increasing $I(t)$ produces timely replication regardless of whether the potential origins are randomly or nonrandomly distributed. This latter point should allay biologists’ fear that in this model the replication time would double if one or two origins fail to initiate and create a gap that is too large to finish replication within 20 minutes.

Given the strong theoretic foundation provided by Yang and Bechhoefer for the increasing $I(t)$ model in frog embryos, the big question is whether this model is applicable to all animal cells. Much of this work will fall to the experimental biologists, but theoretical treatments that capture the more structured replication of adult cells will certainly be important.

## References

- , , , ().
- , , and , ,.
- and , , ().
- , , ().
- , [Bull. Acad. Sci. USSR, Phys. Ser.] , ().; and , , ().; , , ().; , , ().; , , ().
- , . , New York, ed.,.
- and , , ().; , , ().; , , ().
- , , , and , ,.
- , , and , , ().; and , , ().; and , , ().; , , , ().
- and , , ().
- , , , , and , , ().
- , , ().; , , ().