Emergence of Information Transmission in a Prebiotic RNA Reactor
The RNA world theory [1] posits that the first information carrying and catalytically active molecules at the origin of life were RNAlike polynucleotides [2]. This idea is empirically supported by the discovery of ribozymes, which perform many different reactions [3], among them the basic templatedirected ligation and polymerization steps [4,5] necessary for replicating RNA. However, a concrete scenario of how a selfreplicating RNA system could have arisen spontaneously from a pool of random polynucleotides is still lacking. Physical effects may have facilitated this step, as is believed to be the case in other transitions of prebiotic evolution [6].
From the perspective of information, an RNA replicator transmits sequence information from molecule to molecule, such that the information survives even when the original carrier molecules are degraded, for instance due to hydrolytic cleavage [7]. Rephrased in these terms, the problem of spontaneous emergence of an RNA replicator [8,9] becomes a question of a path from a short term to a lasting sequence memory. This transition occurred either as a single unlikely step or as a more gradual, multistep transition. Here, we explore a scenario of the latter type, based only on simple physicochemical processes (see Fig. 1): (i) random ligation of RNA molecules, e.g., in a hydrothermal “RNA reactor,” where polynucleotides are accumulated by thermophoresis [10], (ii) folding and hybridization of RNA strands, and (iii) preferential cleavage of single rather than doublestranded RNA segments [7]. Using extensive computer simulations and theoretical analysis, we study the behavior that emerges when these processes are combined.
Clearly, the preferential cleavage at unpaired bases effectively creates a selection pressure for base pairing in the reactor. We find that this effect increases the complexity of RNA structures in the sequence pool, which may favor the emergence of ribozymes. The underlying sequence bias also extends the expected lifetime of sequence motifs in the finite pool. Interestingly, we find that correlations between motifs persist even longer than expected. This memory effect is associated with information transmission via hybridization. Intriguingly, these correlations have the same statistical signature as templated selfreplication, only weaker. In this sense, the RNA reactor could constitute a steppingstone from which a true RNA replicator could emerge, e.g., assisted by a primitive ribozyme catalyzing templatedirected synthesis.
RNA reactor.—As illustrated in Fig. 1, we envisage an open reaction volume $V$ under nonequilibrium conditions as, e.g., inside a hydrothermal pore system where polynucleotides are strongly accumulated by a combination of convective flow and thermophoresis [10]. At any point in time, the reaction volume contains various sequences ${S}_{L}$ of length $L$. The full time evolution of this pool is a stochastic process with the reactions
We use the standard Gillespie algorithm to simulate the stochastic dynamics (1) of the sequence pool. The cleavage rate ${\beta}_{L,K}$, which is recalculated from the folding output for all molecules whenever necessary, effectively introduces a selection for basepair formation. Since RNA folding depends on the temperature $T$ and duplex formation is also concentrationdependent, we can vary the selection pressure via ${p}_{L,K}(T,V)$. We consider the reactions (1) under different possible conditions, with two different temperatures (a cold system at $10\text{\hspace{0.17em}}\xb0\mathrm{C}$ and a hot environment at $60\text{\hspace{0.17em}}\xb0\mathrm{C}$) and concentrations (in the $\mathrm{p}M$ and $\mathrm{m}M$ range, respectively). To study the differences from a random pool, we also consider a “neutral” scenario without folding ( ${p}_{L,K}=0$). These scenarios are chosen mainly to highlight the effects of base pairing and not to suggest specific environmental conditions at the origin of life.
Stationary length and shape distribution.—Disregarding sequencedependent selection, the ligationcleavage dynamics of the RNA reactor resembles the kinetics of cluster aggregation and fragmentation. Hence, the stationary sequence length distribution shown in Fig. 2(a) corresponds to a cluster size distribution, and its moments can be obtained using established methods [16,18]. In the limit of large influx $J$, the average total molecule number $\u27e8{N}_{\mathrm{tot}}\u27e9$ and their mean length $\u27e8L\u27e9$ are given by
where we have neglected the length dependence of the outflux ( ${L}_{c}\to \infty $; a finite value for ${L}_{c}$ shifts both $\u27e8{N}_{\mathrm{tot}}\u27e9$ and $\u27e8L\u27e9$ to larger values without strongly affecting the shape of the distribution). These analytical results readily explain why with stronger selection the total number of molecules decreases, but their mean length goes up [see Figs. 2(b) and 2(c)]: the cleavage rate ${\beta}_{0}$ is reduced as the mean basepairing probability $\u27e8{\overline{p}}_{L}\u27e9$ is increased especially for longer sequences [cf. Fig. 2(d)], and the distribution thus gains more weight in the tail of long sequences.
In order to characterize the structural repertoire of this RNA pool, we focused on the tail of the length distribution and analyzed the secondary structures of long sequences with $L>{L}^{*}$. We performed the analysis for ${L}^{*}=35$ as well as ${L}^{*}=50$ (the length of the minimal hairpin ribozyme [19]). Figure 2(f) shows the probability to observe structures within basic “shape” classes [20], such as hairpins or hammerheads [21]. We observe a significant enrichment of complex structures under selection compared to the neutral case defined above.
Information transmission via hybridization.—Base pairing and the ensuing correlations between sequences occur mostly within relatively short sequence regions. Therefore, we focus on the dynamics of shorter subsequences or “sequence motifs” of length $\ell $, which are informational entities not tied to a specific molecule. From our simulations, we extract time trajectories for the copy numbers ${n}_{i}(t)$ of all ${4}^{\ell}$ different motifs. Even for fairly small $\ell >3$, the sequence space of motifs is not fully covered in the finite ensemble; i.e., an average motif copy number is typically $\u27e8{n}_{i}(t)\u27e9\ll 1$. Hence, signatures of information transmission should appear as an unexpected increase in the lifetime of these motifs. Suitably averaged observables are provided by the auto and crosscorrelation functions, ${C}_{a}(t)={4}^{\ell}\sum _{i}\u27e8{n}_{i}(t){n}_{i}(0)\u27e9$ and ${C}_{c}(t)={4}^{\ell}\sum _{i}\u27e8{n}_{i}(t){n}_{i}^{*}(0)\u27e9$, respectively, where ${n}_{i}^{*}$ is the copy number of a motif’s (reverse) complement [21]. Figures 3(a) and 3(b) show data for these correlation functions for $\ell =6$ and the parameter set used in Fig. 2.
The observed motif correlations can be understood in the framework of a simple stochastic process. Motifs are created when sequence ends are ligated together and destroyed by cleavage [22]. Using a meanfieldtype approach, we pick an arbitrary probe motif with copy number $n(t)$. Its dynamics is described by a birthdeath process, where $n(t)$ is increased with constant rate ${k}_{+}$ and decreased with linear rate ${k}_{}$ [see schema (i) in Fig. 3(c)]. The birth rate ${k}_{+}$ can be computed from the steadystate length distribution $\u27e8{N}_{L}\u27e9$ by counting how many ends of long enough molecules are available for ligation. Assuming an annealed random ensemble, we obtain
The death rate ${k}_{}$ comprises the effects of cleavage and hybridization. A motif is cleaved with rate ${\beta}_{0}$ at any of its $\ell 1$ bonds, but this rate is reduced by the effective basepairing probability of its parent sequence, which in turn depends on the selection strength. On average, this reduction follows from averaging over the length and basepairing probability distributions $\u27e8{N}_{L}\u27e9$ and $\u27e8{\overline{p}}_{L}\u27e9$ of parent sequences, respectively. This gives the result
However, a birthdeath process based on these two effective rates alone necessarily fails to describe crosscorrelations between a motif and its complement [23]. The reduction in the cleavage rate of a particular motif due to hybridization is conditional on the presence of its complementary partner. Hence, we modulate the average death rate ${k}_{}$ with an additional factor $h(x)\le 1$, which accounts for the probability of hybridization and depends on the number $x={n}^{*}/n$ of available complements per motif. Since the average hybridization probability is small under the conditions considered here, it will be proportional to $x$. This leads us to a linear ansatz $h(x)\approx 1(r/{k}_{})x$, where the significance of the coefficient $r$ will shortly become apparent. We find that in the “hybridization process” of Fig. 3(c), the expected copy number $\u27e8n\u27e9$ of a motif obeys
A symmetric equation holds for $\u27e8{n}^{*}\u27e9$. Strikingly, this result is identical to the corresponding rate equations for a “replication process” [16], where motifs are born with rate ${k}_{+}$, destroyed with fixed rate ${k}_{}$, and copied from their complements with rate $r$, as in schema (ii) of Fig. 3(c). This observation suggests that we may interpret the coefficient $r$ as an apparent replication rate for motifs in the RNA reactor.
To validate this interpretation, and to measure the apparent replication rate in our simulations, we calculate the correlation functions of the hybridization process using the same approximation for $h(x)$ [16], yielding
In Figs. 3(a) and 3(b), we used these expressions with the rates ${k}_{+}$ and ${k}_{}$ calculated from Eqs. (3) and (4), and with $r$ as the only free parameter fitted simultaneously to both data sets. The equivalence between the hybridization and the replication processes is also exhibited by their correlation functions to leading order in $r/{k}_{}$ [16]. Hence, the good agreement with the simulation data indicates that the observed motif correlations are virtually indistinguishable from those expected for inefficient templatedirected replication. The replication efficiency $r/{k}_{}$ determined by the fits is plotted in Fig. 3(d) as a function of the bare cleavage rate ${\beta}_{0}$ for the different conditions. Remarkably, it reaches levels close to 30% in the cold and highly concentrated environment, where base pairing via duplex formation is favorable. Note that a true (exponential) replicator would require that motifs are copied faster than they are degraded ( $r>{k}_{}$), while our system with $r<{k}_{}$ is an inefficient realization.
These findings show that protection against cleavage due to folding and hybridization leads to an extended sequence memory in the RNA reactor. One global contribution to this longer motif lifetime is due to the “protection factor” in square brackets in Eq. (4), which renormalizes the bare cleavage rate to account for the average probability that a motif is paired. Another contribution stems from the correlation time in Eq. (6), which is increased as the apparent replication rate is subtracted from the renormalized cleavage rate, such that ${C}_{a,c}(t)$ decays on time scales of order $({k}_{}r{)}^{1}$. This specific increase occurs only when a motif and its complement mutually protect each other, and it therefore demonstrates the emergence of information transmission.
Conclusions.—We have analyzed stochastic simulations of a minimal prebiotic RNA reactor, where formation of double strands protects sequence parts from degradation. On the one hand, this selection for structure biases the resulting pool towards longer and more structured sequences, favoring the emergence of ribozymes. On the other hand, it leads to a weak apparent replication process based on “information transmission by hybridization,” conceptually similar to “sequencingbyhybridization” techniques [24]. Together, the structural complexity and the information transmission featured in the RNA reactor suggests this type of system as plausible intermediate for the emergence of a true replicator with $r>{k}_{}$. For instance, some of the relatively frequent simple structures observed in our simulation are similar to known ligase ribozymes [3]. This functionality in turn would facilitate the creation of more complex molecules from essential modular subunits [25]. Once ribozymes emerge, a selfreplicating system could be established by templatedirected ligation of suitably complementary oligomers [4]. So far, it has remained unclear how such autocatalytic RNA systems would be supplied with appropriate oligomer substrates. However, the strong crosscorrelations observed in the RNA reactor demonstrate a significantly enhanced chance of finding sequences complementary to those present in the pool, including the sequence to be replicated. Thus, the RNA reactor acts as an adaptive filter to preferentially keep potentially useful substrate sequences. This adaptive selectivity would allow for the “heritable” propagation of small variations and thus endow the replicator with basic evolutionary potential.
References
 W. Gilbert, Nature (London) 319, 618 (1986).
 L. Orgel, Crit. Rev. Biochem. Mol. Biol. 39, 99 (2004).
 J. Doudna and T. Cech, Nature (London) 418, 222 (2002).
 N. Paul and G. F. Joyce, Proc. Natl. Acad. Sci. U.S.A. 99, 12 733 (2002).
 W. Johnston et al., Science 292, 1319 (2001).
 I. Chen, R. Roberts, and J. Szostak, Science 305, 1474 (2004).
 D. Usher and A. Mchale, Proc. Natl. Acad. Sci. U.S.A. 73, 1149 (1976).
 C. Fernando, G. von Kiedrowski, and E. Szathmáry, J. Mol. Evol. 64, 572 (2007).
 M. Nowak and H. Ohtsuki, Proc. Natl. Acad. Sci. U.S.A. 105, 14 924 (2008).
 P. Baaske et al., Proc. Natl. Acad. Sci. U.S.A. 104, 9346 (2007).
 S. Duhr and D. Braun, Proc. Natl. Acad. Sci. U.S.A. 103, 19 678 (2006).

While nontemplated ligation occurs spontaneously [13] or via inorganic catalysis [14], we neglect templatedirected reactions, which are less plausible in early prebiotic chemistry in the absence of ribozymes [2].
 S. Pino, F. Ciciriello, G. Costanzo, and E. Di Mauro, J. Biol. Chem. 283, 36 494 (2008).
 J. Ferris and G. Ertem, J. Am. Chem. Soc. 115, 12 270 (1993).
 I. Hofacker et al., Monatsh. Chem. 125, 167 (1994).
 See supplemental material at http://link.aps.org/supplemental/10.1103/PhysRevLett.107.018101 for more details on the algorithm and the calculations, as well as supplementary results regarding GU pairs, selfcomplementarity, and shorter motifs.
 S. H. Bernhart et al., Algorithms Mol. Biol. 1, 3 (2006).
 R. Li, B. J. McCoy, and R. B. Diemer, J. Colloid Interface Sci. 291, 375 (2005).
 A. Hampel and R. Tritz, Biochemistry 28, 4929 (1989).
 R. Giegerich, B. Voss, and M. Rehmsmeier, Nucleic Acids Res. 32, 4843 (2004).

Results shown in Figs. 2 and 3 were obtained disallowing ambiguous GU wobble pairs. See [16] for the length and shape distribution including GU pairs.

Since most motifs live on long sequences, we can neglect outflux reactions $\propto {d}_{0}{e}^{\sqrt{L/{L}_{c}}}$ against cleavage $\propto {\beta}_{0}L$.

The presence of selfcomplementary sequences in a finite ensemble, which obey different statistics inherited by the corresponding motifs, leads to small crosscorrelations even in the neutral case. See [16] for more details.
 R. Drmanac et al., Advances in Biochemical Engineering/Biotechnology 77, 75 (2002).
 C. Briones, M. Stich, and S. C. Manrubia, RNA 15, 743 (2009).