Informational derivation of quantum theory

Phys. Rev. A 84, 012311
I. INTRODUCTION

More than 80 years after its formulation, quantum theory is still mysterious. The theory has a solid mathematical foundation, addressed by Hilbert, von Neumann, and Nordheim in 1928 [1] and brought to completion in the monumental work by von Neumann [2]. However, this formulation is based on the abstract framework of Hilbert spaces and self-adjoint operators, which, to say the least, are far from having an intuitive physical meaning. For example, the postulate stating that the pure states of a physical system are represented by unit vectors in a suitable Hilbert space appears as rather artificial: which are the physical laws that lead to this very specific choice of mathematical representation? The problem with the standard textbook formulations of quantum theory is that the postulates therein impose particular mathematical structures without providing any fundamental reason for this choice: the mathematics of Hilbert spaces is adopted without further questioning as a prescription that “works well” when used as a black box to produce experimental predictions. In a satisfactory axiomatization of quantum theory, instead, the mathematical structures of Hilbert spaces (or C* algebras) should emerge as consequences of physically meaningful postulates, that is, postulates formulated exclusively in the language of physics: this language refers to notions like physical system, experiment, or physical process and not to notions like Hilbert space, self-adjoint operator, or unitary operator. Note that any serious axiomatization has to be based on postulates that can be precisely translated in mathematical terms. However, the point with the present status of quantum theory is that there are postulates that have a precise mathematical statement, but cannot be translated back into language of physics. Those are the postulates that one would like to avoid.

The need for a deeper understanding of quantum theory in terms of fundamental principles was clear since the very beginning. Von Neumann himself expressed his dissatisfaction with his mathematical formulation of quantum theory with the surprising words “I don’t believe in Hilbert space anymore,” reported by Birkhoff in [3]. Realizing the physical relevance of the axiomatization problem, Birkhoff and von Neumann made an attempt to understand quantum theory as a new form of logic[4]: the key idea was that propositions about the physical world must be treated in a suitable logical framework, different from classical logics, where the operations AND and OR are no longer distributive. This work inaugurated the tradition of quantum logics, which led to several attempts to axiomatize quantum theory, notably by Mackey [5] and Jauch and Piron [6] (see Ref. [7] for a review on the more recent progresses of quantum logics). In general, a certain degree of technicality, mainly related to the emphasis on infinite-dimensional systems, makes these results far from providing a clear-cut description of quantum theory in terms of fundamental principles. Later Ludwig initiated an axiomatization program [8] adopting an operational approach, where the basic notions are those of preparation devices and measuring devices and the postulates specify how preparations and measurements combine to give the probabilities of experimental outcomes. However, despite the original intent, Ludwig’s axiomatization did not succeed in deriving Hilbert spaces from purely operational notions, as some of the postulates still contained mathematical notions with no operational interpretation.

More recently, the rise of quantum information science moved the emphasis from logics to information processing. The new field clearly showed that the mathematical principles of quantum theory imply an enormous amount of information-theoretic consequences, such as the no-cloning theorem [9,10], the possibility of teleportation [11], secure key distribution [12–14], or of factoring numbers in polynomial time [15]. The natural question is whether the implication can be reversed: is it possible to retrieve quantum theory from a set of purely informational principles? Another contribution of quantum information has been to shift the emphasis to finite dimensional systems, which allow for a simpler treatment but still possess all the remarkable quantum features. In a sense, the study of finite dimensional systems allows one to decouple the conceptual difficulties in our understanding of quantum theory from the technical difficulties of infinite dimensional systems.

In this scenario, Hardy’s 2001 work [16] re-opened the debate about the axiomatizations of quantum theory with fresh ideas. Hardy’s proposal was based on five main assumptions about the relation between dimension of the state space and the number of perfectly distinguishable states of a given system, about the structure of composite systems, and about the possibility of connecting any two pure states of a physical system through a continuous path of reversible transformations. However, some of these assumptions directly refer to the mathematical properties of the state space (in particular, the “simplicity axiom” 2, which is an abstract statement about the functional dependence of the state space dimension on the number of perfectly distinguishable states). Very recently, building on Hardy’s work there have been two new attempts of axiomatization by Dakic and Bruckner[17] and Masanes and Müller [18]. Although these works succeeded in removing the “simplicity axiom,” they still contain mathematical assumptions that cannot be understood in elementary physical terms (see, e.g., requirement 5 of Ref. [18], which assumes that “all mathematically well-defined measurements are allowed by the theory”).

Another approach to the axiomatization of quantum theory was pursued by one of the authors in a series of works [19] culminated in Ref. [20]. These works tackled the problem using operational principles related to tomography and calibration of physical devices, experimental complexity, and to the composition of elementary transformations. In particular this research introduced the concept of dynamically faithful states, namely states that can be used for the complete tomography of physical processes. Although this approach went very close to deriving quantum theory, in this case one mathematical assumption without operational interpretation was needed (see the CJ postulate of Ref. [20]).

In this paper we provide a complete derivation of finite dimensional quantum theory based on purely operational principles. Our principles do not refer to abstract properties of the mathematical structures that we use to represent states, transformations, or measurements, but only to the way in which states, transformations, and measurements combine with each other. More specifically, our principles are of informational nature: they assert basic properties of information processing, such as the possibility or impossibility to carry out certain tasks by manipulating physical systems. In this approach the rules by which information can be processed determine the physical theory, in accordance with Wheeler’s program “it from bit,” for which he argued that “all things physical are information-theoretic in origin” [21]. Note that, however, our axiomatization of quantum theory is relevant, as a rigorous result, also for those who do not share Wheeler’s ideas on the informational origin of physics. In particular, in the process of deriving quantum theory we provide alternative proofs for many key features of the Hilbert space formalism, such as the spectral decomposition of self-adjoint operators or the existence of projections. The interesting feature of these proofs is that they are obtained by manipulation of the principles, without assuming Hilbert spaces from the start.

The main message of our work is simple: within a standard class of theories of information processing, quantum theory is uniquely identified by a single postulate: purification. The purification postulate, introduced in Ref. [22], expresses a distinctive feature of quantum theory, namely that the ignorance about a part is always compatible with the maximal knowledge of the whole. The key role of this feature was noticed already in 1935 by Schrödinger in his discussion about entanglement [23], of which he famously wrote “I would not call that one but rather the characteristic trait of quantum mechanics, the one that enforces its entire departure from classical lines of thought.” In a sense, our work can be viewed as the concrete realization of Schrödinger’s claim: the fact that every physical state can be viewed as the marginal of some pure state of a compound system is indeed the key to single out quantum theory within a standard set of possible theories. It is worth stressing, however, that the purification principle assumed in this paper includes a requirement that was not explicitly mentioned in Schrödinger’s discussion: if two pure states of a composite system AB have the same marginal on system A, then they are connected by some reversible transformation on system B. In other words, we assume that all purifications of a given mixed state are equivalent under local reversible operations [24].

The purification principle expresses a law of conservation of information, stating that at least in principle, irreversibility can always be reduced to the lack of control over an environment. More precisely, the purification principle is equivalent to the statement that every irreversible process can be simulated in an essentially unique way by a reversible interaction of the system with an environment, which is initially in a pure state [22]. This statement can also be extended to include the case of measurement processes, and in that case it implies the possibility of arbitrarily shifting the cut between the observer and the observed system [22]. The possibility of such a shift was considered by von Neumann as a “fundamental requirement of the scientific viewpoint” (see p. 418 of [2]) and his discussion of the measurement process was exactly aimed to show that quantum theory fulfils this requirement.

Besides Schrödinger’s discussion on entanglement and von Neumann’s discussion of the measurement process, the purification principle is deeply rooted in the structure of quantum theory. At the purely mathematical level it plays a crucial role in the theory of C* algebras of operators on separable Hilbert spaces, where the purification principle is equivalent to the Gelfand-Naimark-Segal (GNS) construction [25] and implies the celebrated Stinespring’s theorem [26]. On the other hand, purification is a cornerstone of quantum information, lying at the origin of most quantum protocols. As it was shown in Ref. [22], the purification principle directly implies crucial features like no-cloning, teleportation, no-information without disturbance, error correction, the impossibility of bit commitment, and the “no-programming” theorem of Ref. [27].

In addition to the purification postulate, our derivation of quantum theory is based on five informational axioms. The reason why we call them “axioms,” as opposed to the the purification “postulate,” is that they are not at all specific of quantum theory. These axioms represent standard features of information processing that everyone would, more or less implicitly, assume. They define a class of theories of information processing that includes, for example, classical information theory, quantum information theory, and quantum theory with superselection rules. The question whether there are other theories satisfying our five axioms and, in case of a positive answer, the full classification of these theories is currently an open problem.

Here we informally illustrate the five axioms, leaving the detailed description to the remaining part of the paper:

  • (1) Causality: the probability of a measurement outcome at a certain time does not depend on the choice of measurements that will be performed later.

  • (2) Perfect distinguishability: if a state is not completely mixed (i.e., if it cannot be obtained as a mixture from any other state), then there exists at least one state that can be perfectly distinguished from it.

  • (3) Ideal compression: every source of information can be encoded in a suitable physical system in a lossless and maximally efficient fashion. Here lossless means that the information can be decoded without errors and maximally efficient means that every state of the encoding system represents a state in the information source.

  • (4) Local distinguishability: if two states of a composite system are different, then we can distinguish between them from the statistics of local measurements on the component systems.

  • (5) Pure conditioning: if a pure state of system AB undergoes an atomic measurement on system A, then each outcome of the measurement induces a pure state on system B. (Here atomic measurement means a measurement that cannot be obtained as a coarse graining of another measurement.)

All these axioms are satisfied by classical information theory. Axiom 5 is even trivial for classical theory, because the only pure states of a composite system AB are the product of pure states of the component systems A and B, and hence the state of system B will be pure irrespectively of what we do on system A.

A stronger version of axiom 5, introduced in Ref. [20], is the following:

  • (5′) Atomicity of composition: the sequential composition of two atomic operations is atomic. (Here atomic transformation means a transformation that cannot be obtained from coarse graining.)

However, it turns out that axiom 5 is enough for our derivation: thanks to the purification postulate we will be able to show the nontrivial implication: axiom 5 axiom 5 (see lemma 16).

The paper is organized as follows. In Sec. II we review the framework of operational-probabilistic theories introduced in Ref. [22]. This framework will provide the basic notions needed for the formulation of our principles. In Sec. III we introduce the principles from which we will derive quantum theory. In Sec. IV we prove some direct consequences of the principles that will be used later in the paper. In Sec. V we discuss the properties of perfectly distinguishable states, while in Sec. VI we prove the existence of a duality between pure states and atomic effects.

The results about distinguishability and duality of pure states and atomic effects allow us to show in Sec. VII that every system has a well defined informational dimension—the operational counterpart of the Hilbert space dimension. Section VIII contains the proof that every state can be decomposed as a convex combination of perfectly distinguishable pure states. Similarly, any element of the vector space spanned by the states can be written as a linear combination of perfectly distinguishable states. This result corresponds to the spectral theorem for self-adjoint operators on complex Hilbert spaces. In Sec. IX we prove some results about the maximum teleportation probability, which allow us to derive a functional relation between the dimension of the state space and the number of perfectly distinguishable states of the system. The mathematical representation of systems with two perfectly distinguishable states is derived in Sec. X, where we prove that such systems are indeed two-dimensional quantum systems—also known as qubits. In Sec. XI we construct projections on the faces of the state space of any system and prove their main properties. These results lead to the derivation of the operational analog of the superposition principle in Sec. XII which allows us to prove that systems with the same number of perfectly distinguishable states are operationally equivalent (Sec. XII B). The properties of the projections and the superposition principle are then exploited in Sec. XIII, where we extend the density matrix representation from qubits to higher dimensional systems, thus proving that a system with d perfectly distinguishable states is indeed a quantum system with d-dimensional Hilbert space. We conclude the paper with Sec. XIV, where we review our results, discussing future directions for this research.

II. THE FRAMEWORK

This section provides a brief summary of the framework of operational-probabilistic theories, which was formulated in Ref. [22]. We refer to Ref. [22] for an exhaustive presentation of the details of the framework and of the ideas behind it. The operational-probabilistic framework combines the operational language of circuits with the toolbox of probability theory: on the one hand experiments are described by circuits resulting from the connection of physical devices, on the other hand each device in the circuit can have classical outcomes and the theory provides the probability distribution of outcomes when the devices are connected to form closed circuits (that is, circuits that start with a preparation and end with a measurement).

The notions discussed in this section will allow us to draw a precise distinction between principles with an operational content and exclusively mathematical principles: with the expression “operational principle” we will mean a principle that can be expressed using only the basic notions of the the operational-probabilistic framework.

A. Circuits with outcomes

A test represents one use of a physical device, like a Stern-Gerlach magnet, a beamsplitter, or a photon counter. The device will have an input system and an output system, labeled by capital letters. The corresponding test can have different classical outcomes, represented by different values of an index iX:

Each outcome iX corresponds to a possible event, represented as

We denote by Transf(A,B) the set of all events from A to B. The reason for this notation is that in the next subsection the elements of Transf(A,B) will be interpreted as transformations with input system A and output system B. If A=B we simply write Transf(A) in place of Transf(A,A).

A test with a single outcome will be called deterministic. This name is justified by the fact that, if there is a single possible outcome, then this outcome will occur with certainty (cf. the probabilistic structure introduced in the next subsection).

Two devices can be composed in a sequence, as long as the input system of the second device is equal to the output system of the first. The events in the composite test are represented as

and are written in formulas as DjCi.

For every system A one can perform the identity test (or simply, the identity), that is, a test {IA} with a single outcome, with the property

The subindex A will be dropped from IA where there is no ambiguity.

The letter I will be reserved for the trivial system, which simply means “nothing” [28]. A device with input (or output) system I is a device with no input (or no output). The corresponding tests will be called preparation tests (or observation tests). In this case we replace the input (or output) wire with a round portion:

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_5.eps/thumbnail
(1)

In formulas we will write |ρi)B (or (aj|A). The sets Transf(I,A) and Transf(A,I) will be denoted as St(A) and Eff(A), respectively. The reason for this special notation is that in the next subsection the elements of St(A) [or Eff(A)] will be interpreted as the states (or effects) of system A.

From every pair of systems A and B one can form a composite system, denoted by AB. Clearly, composing system A with nothing still gives system A, in formula AI=IA=A. Two devices can be composed in parallel, thus obtaining a new device with composite input and composite output systems. The events in composite test are represented as

and are written in formulas as CiDj. In the special case of states we will often write |ρi)|σj) in place of ρiσj. Similarly, for effects we will write (ai|(bj| in place of aibj.

Sequential and parallel composition commute: one has (AiBj)(CkDl)=AiCkBjDl for every Ai,Bj,Ck,Dl such that the output of Ai (or Bj) coincides with the input of Ck (or Dl).

When one of the two tests is the identity, we will omit the box and draw only a straight line, as in

The rules summarized in this section define the operational language of circuits, which has been discussed in detail in a series of inspiring works by Coecke (see in particular Refs. [29,30]). The language of circuits allows one to represent the schematic of an experiment like, for example,

and also to represent a particular outcome of the experiment

In formula, the above circuit is given by

BkBC(CjIC)ρiAC.
B. Probabilistic structure: States, effects, and transformations

On top of the language of circuits, we put a probabilistic structure [22]: we declare that the composition of a preparation-test {ρi}iX with an observation-test {aj}jY gives rise to a joint probability distribution

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_10.eps/thumbnail
(2)

with p(i,j)0 and iXjYp(i,j)=1. In formula we write p(i,j)=(aj|ρi). Moreover, if two experiments are run in parallel, we assume that the joint probability distribution is given by the product

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_11.eps/thumbnail
(3)

where p(i,k):=(ak|ρi),q(j,l):=(bl|σj).

The probabilistic structure defined by Eq. (2) turns every event ρiSt(A) into a function ρ̂i:Eff(A)R, given by ρ̂i(aj):=(aj|ρi). If two events ρi,ρiSt(A) induce the same function, then it is impossible to distinguish between them from the statistics of the experiments allowed by our theory. This means that for our purposes the two events are the same: accordingly, we will take equivalence classes with respect to the relation ρiρi if ρ̂i=ρ̂i. To avoid introducing new notation, from now on we will assume that the equivalence classes have been taken since the start. We will identify the event ρiSt(A) with the corresponding function ρ̂i and will call it state. Accordingly, we will refer to preparation tests as collections of states {ρi}iX. Note that, since one can take linear combinations of functions, the states in St(A) generate a real vector space, denoted by StR(A).

The same construction holds for observation tests: every event ajEff(A) induces a function âj:St(A)R, given by âj(ρi):=(aj|ρi). If two events aj,ajEff(A) induce to the same function, then it is impossible to distinguish between them from the statistics of the experiments allowed in our theory. This means that for our purposes the two events are the same: accordingly, we will take equivalence classes with respect to the relation ajaj if âj=âj. To avoid introducing new notation, from now on we will identify the event ajEff(A) with the corresponding function âj and we will call it effect. Accordingly, we will refer to observation tests as collection of effects {aj}iY. The effects in Eff(A) generate a real vector space, denoted by EffR(A).

A vector in StR(A) [or EffR(A)] can be extended to a linear function on EffR(A) [or StR(A)]. In this way, states and effects can be thought as elements of two real vector spaces, one dual to the other. In this paper we will restrict our attention to finite dimensional vector spaces: operationally this means that the state of a given physical system is completely determined by the statistics of a finite number of finite-outcome measurements. The dimension of the vector space StR(A), which by construction is equal to the dimension of its dual EffR(A), will be denoted by DA. We will refer to DA as the size of system A.

Finally, the vector spaces StR(A) and EffR(A) can be equipped with suitable norms, which have an operational meaning related to optimal discrimination schemes [22]. The norm of an element δStR(A) is given by [22]

δ=supa0Eff(A)a0δinfa1Eff(A)a1δ,

while the norm of an element ξEffR(A) is given by

ξ=supρSt(A)|ξρ|.

We will always take the set of states St(A) to be closed in the operational norm. The convenience of this choice is the convenience of using real numbers instead of rational ones: dealing with a single real number is much easier than dealing with a Cauchy sequence of rational numbers. Operationally taking St(A) to be closed is very natural: the fact that there is a sequence of states {ρn}n=1 that converges to ρStR(A) means that there is a procedure to prepare ρ with arbitrary precision and hence that ρ deserves the name of “state”.

We conclude this subsection by noting that every event Ck from A to B induces a linear map Ck̂ from StR(A) to StR(B), uniquely defined by

Ck̂:ρSt(A)CkρSt(B).

Likewise, for every system C the event CkIC induces a linear map CkIĈ from StR(AC) to StR(BC). If two events Ck and Ck induce the same maps for every possible system C, then there is no experiment in the theory that is able to distinguish between them. This means that for our purposes the two events are the same: accordingly, we will take equivalence classes with respect to the relation CkCk if CkIĈ=CkIĈ for every system C. In this case, we will say that two events represent the same transformation. Accordingly, we will refer to tests {Ci}iX as collections of transformations. The deterministic transformations (corresponding to single-outcome tests) will be called channels.

C. Basic definitions in the operational-probabilistic framework

Here we summarize few elementary definitions that will be used later in the paper. The meaning of the definitions in the case of quantum theory is also discussed.

1. Coarse graining, refinement, atomic transformations, pure, mixed and completely mixed states

First, we start from the notions of coarse graining and refinement. Coarse graining arises when we join together some outcomes of a test: we say that the test {Dj}jY is a coarse graining of the test {Ci}iX if there is a disjoint partition {Xj}jY of X such that

Dj=iXjCi.

Conversely, if {Dj}jY is a coarse graining of {Ci}iX, we say that {Ci}iX is a refinement of {Dj}jY. Intuitively, a test that refines another is a test that extracts information in a more precise way: it is a test with better “resolving power.”

The notion of refinement also applies to a single transformation: a refinement of the transformation C is given by a test {Ci}iX and a subset X0 such that

C=iX0Ci.

Accordingly, we say that each transformation Ci,iX0 is a refinement of C. A transformation C is atomic if it has only trivial refinements: if Ci refines C, then Ci=pC for some probability p0. A test that consists of atomic transformations is a test whose “resolving power” cannot be further improved.

When discussing states (i.e., transformations with trivial input) we will use the word pure as a synonym of atomic. A pure state describes a situation of maximal knowledge about the system’s preparation, a knowledge that cannot be further refined.

As usual, a state that is not pure will be called mixed. An important notion is that of completely mixed state.

Definition 1 (Completely mixed state). A state is completely mixed if any other state can refine it: precisely, ωSt(A) is completely mixed if for every ρSt(A) there is a nonzero probability p>0 such that pρ is a refinement of ω.

Intuitively, a completely mixed state describes a situation of complete ignorance about the system’s preparation: if a system is described by a completely mixed state, then it means that we know so little about its preparation that, in fact, every preparation is possible.

We conclude this paragraph with a couple of definitions that will be used throughout the paper.

Definition 2 (Reversible transformation). A transformation UTransf(A,B) is reversible if there exists another transformation U1Transf(B,A) such that U1U=IA and UU1=IB. When A=B the reversible transformations form a group, indicated as GA.

Definition 3 (Operationally equivalent systems). Two systems A and B are operationally equivalent if there exists a reversible transformation U from A to B.

When two systems are operationally equivalent one can convert one into the other in a reversible fashion.

2. Examples in quantum theory

Consider a quantum system with Hilbert space H=Cd,d<. In this case a preparation test is a collection of unnormalized density matrices {ρi}iX (i.e., of nonnegative d×d complex matrices with trace bounded by 1) such that

iXTr[ρi]=1.

Preparation tests are often called quantum information sources in quantum information theory. A generic state ρ is an unnormalized density matrix. A deterministic state, corresponding to a single-outcome preparation test, is a normalized density matrix ρ, with Tr[ρ]=1.

Diagonalizing ρ=iαi|ψiψi| we then obtain that each matrix αi|ψiψi| is a refinement of ρ. More generally, every matrix σ such that σρ is a refinement of ρ. Up to a positive rescaling, all matrices with support contained in the support of ρ are refinements of ρ. A quantum state ρ is atomic (pure) if and only if it is proportional to a rank-one projection. A quantum state is completely mixed if and only if its density matrix has full rank. Note that the quantum state χ=Idd, where Id is the identity d×d matrix, is a particular example of completely mixed state, but not the only example. Precisely, χ=Idd is the unique unitarily invariant state in dimension d.

Let us now consider the case of observation tests: in quantum theory an observation test is given by a POVM (positive operator-valued measure), namely by a collection {Pj}jY of nonnegative d×d matrices such that

jYPj=Id.

An effect is then a nonnegative matrix P0 upper bounded by the identity. In quantum theory there is only one deterministic effect, corresponding to a single-outcome observation test: the unique deterministic effect given by the identity matrix. As we will see in the following section, the fact that the deterministic effect is unique is equivalent to the fact that quantum theory is a causal theory.

An effect P is atomic if and only if P is proportional to a rank-one projector. An observation test is atomic if it is a POVM with rank-one elements.

Finally, a general test from an input system with Hilbert space H1=Cd1 to an output system with Hilbert space H2=Cd2 is given by a quantum instrument, namely by a collection {Ck}kZ of completely positive trace nonincreasing maps sending linear operators on H1 to linear operators on H2, with the property that

CZ:=kZCk

is trace preserving. A general transformation is then given by a trace nonincreasing map, called quantum operation, whereas a deterministic transformation, corresponding to a single-outcome test, is given by a trace-preserving map, called quantum channel.

Any quantum operation C can be written in the Kraus form C(ρ)=iCiρCi, where Ci:H1H2 are the Kraus operators. Up to a positive scaling, every quantum operation D such that the Kraus operators of D belong to the linear span of the Kraus operators of C is a refinement of C. A map C is atomic if and only if there is only one Kraus operator in its Kraus form. A reversible transformation in quantum theory is a unitary map U(ρ)=UρU, where U:H1H2 is a unitary operator, that is UU=I1 and UU=I2 where I1 (I2) is the identity operator on H1 ( H2). Two quantum systems are operationally equivalent if and only if the corresponding Hilbert spaces have the same dimension.

D. Operational principles

We are now in position to make precise the usage of the expression “operational principle” in the context of this paper. By operational principle we mean here a principle that can be stated using only the operational-probablistic language, i.e., using only

  • (1) the notions of system, test, outcome, probability, state, effect, transformation;

  • (2) their specifications: atomic, pure, mixed, completely mixed; and

  • (3) more complex notions constructed from the above terms (e.g., the notion of “reversible transformation”).

The distinction between operational principles and principles referring to abstract mathematical properties, mentioned in the Introduction, should now be clear: for example, a statement like “the pure states of a system cannot be cloned” is a valid operational principle, because it can be analyzed in basic operational-probabilistic terms as “for every system A there exists no transformation C with input system A and output system AA such that C|ϕ)=|ϕ)|ϕ) for every pure state ϕ of A.” On the contrary, a statement like “the state space of a system with two perfectly distinguishable states is a three-dimensional sphere” is not a valid operational principle, because there is no way to express what it means for a state space to be a three-dimensional sphere in terms of basic operational notions. The fact that a state spate is a sphere may be eventually derived from operational principles, but cannot be assumed as a starting point.

III. THE PRINCIPLES

We now state the principles used in our derivation. The first five principles express generic features that are shared by both classical and quantum theory. They could be even included in the definition of the background framework: they define the simple model of information processing in which we try to single out quantum theory. For this reason we will call them axioms. The sixth principle in our derivation has a different status: it expresses the genuinely quantum features. A major message of our work is that, within a broad class of theories of information processing, quantum theory is completely described by the purification principle. To emphasize the special role of the sixth principle we will call it postulate, in analogy with the parallel postulate of Euclidean geometry.

A. Axioms
1. Causality

The first axiom of our list, causality [22], is so basic that could be considered as part of the background framework. We decided to explicitly present it as an axiom for two reasons: The first reason is that the framework of operational-probabilistic theories can be developed even without this requirement (see Ref. [22] for the general framework and Refs. [31,32] for two explicit examples of noncausal theories). The second reason is that we want to stress that causality is an essential ingredient in our derivation. This observation is important in view of possible extensions of quantum theory to quantum gravity scenarios where the causal structure is not defined from the start (see, e.g., Hardy in Ref. [33]).

Axiom 1 (Causality). The probability of preparations is independent of the choice of observations.

In technical terms: if {ρi}iXSt(A) is a preparation test, then the conditional probability of the preparation ρi given the choice of the observation-test {aj}jY is the marginal

p(i|{aj}):=jY(aj|ρi).

The axiom states that the marginal probability p(i|{aj}) is independent of the choice of the observation-test {aj}: if {aj}jY and {bk}kZ are two different observation tests, then one has p(i|{aj})=p(i|{bk}). Loosely speaking, one may refer to causality as a requirement of no signaling from the future: indeed, causality is equivalent to the fact that the probability of an outcome at a certain time does not depend on the choice of operations that will be done at later times [20].

An operational-probabilistic theory that satisfies the causality axiom 1 will be called causal. As we already mentioned, causality is a very basic requirement and could be considered as part of the framework: it provides the notions used to state the other axioms and it implies several facts that will be used frequently in the paper. In fact, in our derivation we do not use the causality axiom directly, but only through its consequences. In the following we briefly summarize the facts and the notations that characterize the framework of causal operational-probabilistic theories, introduced and discussed in detail in Ref. [22]. Similar structures have been subsequently considered in Refs. [34,35] within a formal description of circuits in foliable space-time regions.

First, causality is equivalent to the existence of an effect eA such that eA=jXaj for every observation-test {aj}jY. We call the effect eA the deterministic effect for system A. By definition, the effect eA is unique. The subindex A in eA will be dropped when no confusion can arise.

In a causal theory every test {Ci}iXTransf(A,B) satisfies the condition

iXeBCi=eA.

As a consequence, a transformation CTransf(A,B) satisfies the condition

(eB|C(eA|,
(4)

with the equality if and only if C is a channel (i.e., a deterministic transformation, corresponding to a single-outcome test). In Eq. (4) we used the notation (a|(a| to mean (a|ρ)(a|ρ) for every ρSt(A).

In a causal theory the norm of a state ρiSt(A) is given by ρi=(e|ρi). Accordingly, one can define the normalized state

ρ¯i:=ρieρi.

In a causal theory one can always allow for rescaled preparations: conditionally to the outcome iX in the preparation-test {ρi}iX we can say that we prepared the normalized state ρ¯i. For this reason, every state in a causal theory is proportional to a normalized state.

The set of normalized states will be denoted by St1(A). Since the set of all states St(A) is closed in the operational norm, also the set of normalized states St1(A) is closed. Moreover, the set St1(A) is convex [22]: this means that for every pair of normalized states ρ1,ρ2St1(A) and for every probability p[0,1] the convex combination ρp=pρ1+(1p)ρ2 is a normalized state. Operationally, the state ρp is obtained by

  • (1) performing a binary test with outcomes {1,2} and outcome probabilities p1=p and p2=1p;

  • (2) for outcome i preparing ρi, thus realizing the preparation-test {piρi}i=1,2;

  • (3) coarse graining over the outcomes, thus obtaining ρp=pρ1+(1p)ρ2.

The step 2 (preparation of a state conditionally on the outcome of a previous test) is possible because the theory is causal [22].

The pure normalized states are the extreme points of the convex set St1(A). For a normalized state ρSt1(A) we define the face identified by ρ as follows.

Definition 4 (Face identified by a state). The face identified by ρSt1(A) is the set Fρ of all normalized states σSt1(A) such that ρ=pσ+(1p)τ, for some nonzero probability p>0 and some normalized state τSt1(A).

In other words, Fρ is the set of all normalized states that show up in the convex decompositions of ρ. Clearly, if ϕ is a pure state, then one has Fϕ={ϕ}. The opposite situation is that of completely mixed states: by definition 1, a state ωSt1(A) is completely mixed if every state σSt1(A) can stay in its convex decomposition, that is, if Fω=St1(A). An equivalent condition for a state to be completely mixed is the following.

Lemma 1. A state ωSt1(A) is completely mixed if and only if Span(Fω)=StR(A).

Proof. The condition is clearly necessary. It is also sufficient because for a state σSt1(A) the relation σSpan(Fω) implies σFω (see lemma 16 of Ref. [22]).

A completely mixed state can never be distinguished from another state with zero error probability.

Proposition 1. Let ρSt1(A) be a completely mixed state and σSt1(A) be an arbitrary state. Then, the probability of error in distinguishing ρ from σ is strictly greater than zero.

Proof. By contradiction, suppose that one can distinguish between ρ and σ with zero error probability. This means that there exists a binary test {aρ,aσ} such that (aρ|σ)=(aσ|ρ)=0. Since ρ is completely mixed there exists a probability p>0 and a state τSt1(A) such that ρ=pσ+(1p)τ. Hence, the condition (aσ|ρ)=0 implies (aσ|σ)=0. Therefore, we have (aρ|σ)+(aσ|σ)=0. This is in contradiction with the normalization of the probabilities in the test {aρ,aσ}, which would require (aρ|σ)+(aσ|σ)=1.

2. Perfect distinguishability

Our second axiom regards the task of state discrimination. As we saw in proposition 1, if a state is completely mixed, then it is impossible to distinguish it perfectly from any other state. Axiom 2 states the converse.

Axiom 2 (Perfect distinguishability). Every state that is not completely mixed can be perfectly distinguished from some other state.

Note that the statement of axiom 2 holds for quantum and for classical information theory. In quantum theory a completely mixed state is a density matrix with full rank. If a density matrix ρ has not full rank, then it must have a kernel: hence, every density matrix σ with support in the kernel of ρ will be perfectly distinguishable from ρ, as stated in axiom 2. Applying the same reasoning for density matrices that are diagonal in a given basis, one can easily see that axiom 2 is satisfied also by classical information theory.

To the best of our knowledge, the perfect distinguishability property has never been considered in the literature as an axiom, probably because in most works it came for free as a consequence of stronger mathematical assumptions. For example, one can obtain the perfect distinguishability property from the no-restriction hypothesis of Ref. [22], stating that for every system A any binary probability rule [i.e., any pair of positive functionals a0,a1EffR(A) such that a0+a1=eA] actually describes a measurement allowed by the theory. This assumption was made, for example, in Ref. [18] in the case of systems with at most two distinguishable states (see requirement 5 of Ref. [18]). Note that the difference between the perfect distinguishability axiom and the no-restriction hypothesis is that the former can be expressed in purely operational terms, whereas the latter requires the notion of “positive functional” which is not part of the basic operational language.

3. Ideal compression

The third axiom is about information compression. An information source for system A is a preparation-test {ρi}iX, where each ρiSt(A) is an unnormalized state and iX(e|ρi)=1. A compression scheme is given by an encoding operation E from A to a smaller system C, that is, to a system C such that DCDA. The compression scheme is lossless for the source {ρi}iX if there exists a decoding operation D from C to A such that DE|ρi)=|ρi) for every value of the index iX. This means that the decoding allows one to perfectly retrieve the states {ρi}iX. We say that a compression scheme is lossless for the state ρ, if it is lossless for every source {ρi}iX such that ρ=iXρi. Equivalently, this means that the restriction of DE to the face identified by ρ is equal to the identity channel: DE|σ)=σ for every σFρ.

A lossless compression scheme is maximally efficient if the encoding system C has the smallest possible size, that is, if the system C has no more states than exactly those needed to compress ρ. This happens when every normalized state τSt1(C) comes from the encoding of some normalized state σFρ, namely |τ)=E|σ).

We say that a compression scheme that is lossless and maximally efficient is ideal. Our second axiom states that ideal compression is always possible.

Axiom 3 (Ideal compression). For every state there exists an ideal compression scheme.

It is easy to see that this statement holds in quantum theory and in classical probability theory. For example, if ρ is a density matrix on a d-dimensional Hilbert space and rank(ρ)=r, then the ideal compression is obtained by just encoding ρ in an r-dimensional Hilbert space. As long as we do not tolerate losses, this is the most efficient one-shot compression we can devise in quantum theory. Similar observations hold for classical information theory.

It is important to emphasize the difference between our “ideal compression” axiom and the “subspace” axiom of Refs. [16–18]: differently from the subspace axiom, the compression axiom is not an axiom about the structure of perfectly distinguishable states available for a given system. For example, here we do not assume that all systems with the same number of perfectly distinguishable states are equivalent. This fact will be proved from the principles in Sec. XII B.

4. Local distinguishability

The fourth axiom consists in the assumption of local distinguishability, here presented in the formulation of Ref. [22].

Axiom 4 (Local distinguishability). If two bipartite states are different, then they give different probabilities for at least one product experiment.

In more technical terms: if ρ,σSt1(AB) are states and ρσ, then there are two effects aEff(A) and bEff(B) such that

Local distinguishability is equivalent to the fact that two distant parties, holding systems A and B, respectively, can distinguish between the two states ρ,σSt1(AB) using only local operations and classical communication and achieving an error probability strictly larger than pran=1/2, the probability of error in random guess [22]. Again, this statement holds in ordinary quantum theory (on complex Hilbert spaces) and in classical information theory.

Another equivalent condition to local distinguishability is the local tomography axiom, introduced in Refs. [19,36]. The local tomography axiom imposes that every bipartite state can be reconstructed from the statistics of local measurements on the component systems. Technically, local tomography is in turn equivalent to the relation DAB=DADB [16] and to the fact that every state ρSt(AB) can be written as

ρ=i=1DAj=1DBρijαiβj,

where {αi}i=1DA ( {βj}j=1DB) is a basis for the vector space StR(A) [ StR(B)]. The analog condition also holds for effects: every bipartite effect EEff(AB) ben be written as

E=i=1DAj=1DBEijaibj,

where {ai}i=1DA ( {bj}j=1DB) is a basis for the vector space EffR(A) [ EffR(B)].

An important consequence of local distinguishability, observed in Ref. [22], is that a transformation CTransf(AB) is completely specified by its action on St(A): thanks to local distinguishability we have the implication

Cρ=CρρSt(A)C=C.
(5)

(see lemma 14 of Ref. [22] for the proof). Note that Eq. (5) does not hold for quantum theory on real Hilbert spaces [22].

5. Pure conditioning

The fourth axiom states how the outcomes of a measurement on one side of a pure bipartite state can induce pure states on the other side. In this case we consider atomic measurements, that is, measurements described by observation-tests {ai}iX where each effect ai is atomic. Intuitively, atomic measurement are those with maximum “resolving power.”

Axiom 5 (Pure conditioning). If a bipartite system is in a pure state, then each outcome of an atomic measurement on one side induces a pure state on the other.

The pure conditioning property holds in quantum theory and in classical information theory as well. In fact, the statement is trivial in classical information theory, because the only pure bipartite states are the product of pure states: no matter which measurement is performed on one side, the remaining state on the other side will necessarily be pure.

The pure conditioning property, as formulated above, has been recently introduced in Ref. [37]. A stronger version of axiom 5 is the atomicity of composition introduced in Ref. [20]:

  • 5′ Atomicity of composition: the sequential composition of two atomic operations is atomic.

Since pure states and atomic effects are a particular case of atomic transformations, axiom 5 implies axiom 5. In our derivation, however, also the converse implication holds: indeed, thanks to the purification postulate we will be able to show that axiom 5 implies axiom 5 (see lemma 16).

B. The purification postulate

The last postulate in our list is the purification postulate, which was introduced and explored in detail in Ref. [22]. While the previous axioms were also satisfied by classical probability theory, the purification axiom introduces in our derivation the genuinely quantum features. A purification of the state ρSt1(A) is a pure state Ψρ of some composite system AB, with the property that ρ is the marginal of Ψρ, that is,

Here we refer to the system B as the purifying system. The purification axiom states that every state can be obtained as the marginal of a pure bipartite state in an essentially unique way.

Postulate 1 (Purification). Every state has a purification. For fixed purifying system, every two purifications of the same state are connected by a reversible transformation on the purifying system.

Informally speaking, our postulate states that the ignorance about a part is always compatible with a maximal knowledge of the whole. The existence of pure bipartite states with mixed marginal was already recognized by Schrödinger as the characteristic trait of quantum theory [23]. Here, however, we also emphasize the importance of the uniqueness of purification up to reversible transformations: this property sets up a relation between pure states and reversible transformations that generates most of the structure of quantum theory. As shown in Ref. [22], an impressive number of quantum features are actually direct consequences of purification. In particular, purification implies the possibility of simulating any irreversible process through a reversible interaction of the system with an environment that is finally discarded.

IV. FIRST CONSEQUENCES OF THE PRINCIPLES
A. Results about ideal compression

Let ρSt1(A) be a state and let ETransf(A,C) [or DTransf(C,A)] be its encoding (or decoding) in the ideal compression scheme of axiom 3.

Essentially the encoding operation ETransf(A,C) identifies the face Fρ with the state space St1(C). In the following we provide a list of elementary lemmas showing that all statements about Fρ can be translated into statements about St1(C) and vice versa.

Lemma 2. The composition of decoding and encoding is the identity on C, namely ED=IC.

Proof. Since the compression is maximally efficient, for every state τSt1(C) there is a state σFρ such that Eσ=τ. Using the fact that DEσ=σ (the compression is lossless) we then obtain EDτ=EDEσ=Eσ=τ. By local distinguishability [see Eq. (5)], this implies ED=IC.

Lemma 3. The image of St1(C) under the decoding operation D is Fρ.

Proof. Since the compression is maximally efficient, for all τSt1(C) there exists σFρ such that τ=Eσ. Then, Dτ=DEσ=σ. This implies that D[St1(C)]Fρ. On the other hand, since the compression is lossless, for every state σFρ one has DEσ=σ. This implies the inclusion FρD[St1(C)].

Lemma 4. If the state ϕFρ is pure, then the state EϕSt1(C) is pure. If the state ψSt1(C) is pure, then the state DψFρ is pure.

Proof. Suppose that ϕFρ is pure and that Eϕ can be written as Eϕ=pσ+(1p)τ for some p>0 and some σ,τSt1(C). Applying D on both sides we obtain ϕ=pDσ+(1p)Dτ. Since ϕ is pure we must have Dσ=Dτ=ϕ. Now, applying E on all terms of the equality and using lemma 2 we obtain σ=τ=Eϕ. This proves that Eϕ is pure. Conversely, suppose that ψSt1(C) is pure and Dψ=pσ+(1p)τ for some p>0 and some σ,τSt1(A). Since Dψ is in the face Fρ (lemma 3), also σ and τ are in the same face. Applying E on both sides of the equality Dψ=pσ+(1p)τ and using lemma 2 we obtain ψ=EDψ=pEσ+(1p)Eτ. Since ψ is pure we must have Eσ=Eτ=ψ. Applying D on all terms of the equality we then have σ=τ=Dψ, thus proving that Dψ is pure.

We say that a state σFρ is completely mixed relative to the face Fρ if every state τFρ can stay in the convex decomposition of σ. In other words, σ is completely mixed relative to Fρ if one has Fσ=Fρ. Note that in general σFρ implies FσFρ.

We then have the following.

Lemma 5. If the state ωFρ is completely mixed relative to Fρ, then the state EωSt1(C) is completely mixed. If the state υSt1(C) is completely mixed, then the state DυFρ is completely mixed relative to Fρ.

Proof. Suppose that ω is completely mixed relative to Fρ. Then every state σFρ can stay in its convex decomposition, say ω=pσ+(1p)σ with p>0 and σFρ. Applying E we have

Eω=pEσ+(1p)Eσ.
(6)

Since the compression is maximally efficient, for every state τSt1(C) there exists a state σFρ such that τ=Eσ. Choosing the suitable σFρ and substituting τ to Eσ in Eq. (6) we then obtain that for every state τSt1(C) there exists probability p>0 and a state σFρ such that

Eω=pτ+(1p)Eσ.

This implies that Eω is completely mixed. Suppose now that υSt1(C) is completely mixed. Then every state τSt1(C) can stay in its convex decomposition, say υ=pτ+(1p)τ. with p>0 and τSt1(C). Applying D on both sides we have

Dυ=pDτ+(1p)Dτ.
(7)

Now, using lemma 3 we have that every state σFρ can be written as σ=Dτ for some τSt1(C). Choosing the suitable τSt1(C) and substituting σ to Dτ in Eq. (7) we then obtain that for evert state σFρ there exists a probability p>0 and a state τSt1(C) such that Dυ=pσ+(1p)Dτ. Therefore, Dυ is completely mixed relative to Fρ.

We now show that the system C used for ideal compression of the state ρ is unique up to operational equivalence.

Lemma 6. If two systems C and C allow for ideal compression of a state ρSt1(A), then C and C are operationally equivalent.

Proof. Let E,D and E,D denote the encoding and decoding schemes for systems C and C, respectively. Define the transformations U:=EDTransf(C,C) and V=EDTransf(C,C). It is easy to see that U is reversible and U1=V. Indeed, since the restriction of DE and DE to the face Fρ is the identity, using lemma 3 one has DED=D and similarly DED=D. Hence we have UV=EDED=ED=IC and VU=EDED=ED=IC.

It is useful to introduce the notion of equality upon input of ρ. We say that two transformations A,ATransf(A,B) are equal upon input of ρSt(A) if their restrictions to the face identified by ρ are equal, that is, if Aσ=Aσ for every σFρ. If A and A are equal upon input of ρ we write A=ρA.

Using the notion of equality upon input of ρ we can rephrase the fact that the compression is lossless for ρ as DE=ρIA. Similarly, we can state the following.

Lemma 7. The encoding E is deterministic upon input of ρ, that is (eC|E=ρ(eA|.

Proof. For every σFρ we have (eC|E|σ)(eA|DE|σ)=(eA|σ)=1, having used Eq. (4) and the fact that the compression is lossless. Since probabilities are bounded by 1, this implies (eC|E|σ)=(eA|σ) for every σFρ, that is, (eC|E=ρ(eA|.

A similar result holds for the decoding.

Lemma 8. The decoding D is deterministic, that is (eA|D=(eC|.

Proof. For every τSt1(A) we have (eA|D|τ)(eC|ED|τ)=(eC|τ), having used Eq. (4) and lemma 2. Hence (eA|D=(eC|.

B. Results about purification

The purification postulate 1 implies a large number of quantum features, as it was shown in Ref. [22]. Here we review only the facts that are useful for our derivation, referring to Ref. [22] for the proofs.

An elementary consequence of the uniqueness of purification is that the group GA of reversible transformations on A acts transitively on the set of pure states.

Lemma 9 (Transitivity on pure states). For every couple of pure states ϕ,ϕSt1(A) there is a reversible transformation UGA such that ϕ=Uϕ.

Proof. See lemma 20 of Ref. [22].

Transitivity implies that for every system A there is a unique state χASt1(A) that is invariant under reversible transformations, that is, a unique state such that UχA=χA for every UGA.

Lemma 10 (Uniqueness of the invariant state). For every system A, there is a unique state χA invariant under all reversible transformations in GA. The invariant state has the following properties:

  • (1) χA is completely mixed

  • (2) χAB=χAχB.

Proof. See corollary 34 and theorem 4 of Ref. [22]. The proof of item 2 uses the local distinguishability axiom.

When there is no ambiguity we will drop the subindex A and simply write χ.

The uniqueness of purification in postulate 1 requires that if Ψρ,ΨρSt1(AB) are two purifications of ρSt1(A), then there exists a reversible transformation UGB such that Ψρ=(IAU)Ψρ. The following lemma extends the uniqueness property to purifications with different purifying systems.

Lemma 11 (Uniqueness of the purification up to channels on the purifying systems). Let ΨSt1(AB) and ΨSt1(AC) be two purifications of ρSt1(A). Then there exists a channel CTransf(B,C) such that

Proof. See lemma 21 of Ref. [22].

Another consequence of the uniqueness of purification is the fact that any ensemble decomposition of a given mixed state can be obtained by performing a measurement on the purifying system.

Lemma 12 (Purification of preparation-tests). Let ρSt1(A) be a state and ΨρSt1(AB) be a purification of ρ. If {ρi}iX be a preparation test such that iXρi=ρ, then there exists an observation-test {ai}iX on the purifying system such that

Proof. See lemma 8 of Ref. [22].

An easy consequence is the following.

Corollary 1. If ΨρSt1(AB) is a purification of ρSt1(A) and σ belongs to the face Fρ, then there exists an effect b and a nonzero probability p>0 such that

An important consequence of purification and local distinguishability is the relation between equality upon input of ρ and equality on the purifications of ρ.

Theorem 1 (Equality upon input of ρ vs equality on purifications of ρ). Let ΨSt1(AC) be a purification of ρSt1(A), and let A,ATransf(A,B) be two transformations. Then one has

(AIC)Ψρ=(AIC)ΨρA=ρA.

Proof. See theorem 1 of Ref. [22]. The proof of the direction uses the local distinguishability axiom.

As a consequence, the purification of a completely mixed state allows for the tomography of transformations:

Corollary 2. Let ωSt1(A) be completely mixed and ΨωSt1(AC) is a purification of ω. Then, for all transformations A,ATransf(A,B) one has

(AIC)Ψω=(AIC)ΨωA=A.

Proof. By theorem 1 the first condition is equivalent to A=ωA. Since ω is completely mixed, this means Aσ=Aσ for every σSt1(A). By local distinguishability [see Eq. (5)] this implies A=A.

Corollary shows that the state (AIC)Ψω characterizes the transformation A completely. We will express this fact by saying that the state Ψω is dynamically faithful [20], or just faithful, for short. Using this notion we can rephrase corollary 2.

Corollary 3. If ΨSt1(AC) is pure and its marginal on system A is completely mixed, then Ψ is dynamically faithful for system A.

Let us choose a fixed faithful state for system A, say ΨSt1(AC). Then for every transformation CTransf(A,B) we can define the Choi state RCSt(BC) as

We then have the following.

Theorem 2 (Choi isomorphism). For a given faithful state ΨSt1(AC) the map CRC:=(CIC)Ψ has the following properties:

  • (1) It defines a bijective correspondence between tests {Ci}iX from A to B and collections of states {Ri}iX for BC satisfying

    i X e B R i B C = e A Ψ A C .
  • (2) The transformation C is atomic if and only if the corresponding state RC is pure.

Proof. See theorem 17 of Ref. [22].

A simple consequence of the Choi isomorphism is the following.

Corollary 4. Let {Ci}iXTransf(A,B) be a collection of transformations. Then, {Ci}iX is a test if and only if

iXeBCi=eA.

In particular, let {ai}iXEff(A) be a collection of effects. Then, {ai}iX is an observation test if and only if

iXai=e.
(8)

Proof. Apply item 1 of theorem 2 to the collection of states {Ri}iX defined by Ri:=(CiIC)Ψ.

A much deeper consequence of the Choi isomorphism is the following theorem.

Theorem 3 (States specify the theory)

Let Θ,Θ be two theories satisfying the purification postulate. If Θ and Θ have the same sets of normalized states, then Θ=Θ.

Proof. See theorem 19 of Ref. [22].

Thanks to theorem 3 to derive quantum theory we will only need to prove that our principles imply that for every system A the normalized states St1(A) can be described as positive Hermitian matrices with unit trace. Once this is proved, theorem 3 automatically ensures that all the dynamics and all the measurements allowed by the theory are exactly the dynamics and the measurements allowed in quantum theory.

Note that in the definition of the Choi state we left the freedom to choose the faithful state ΨSt1(AC). Among many possibilities, one convenient choice is to take a faithful state ΦSt1(AC) obtained as a purification of the invariant state χSt1(A). Moreover, as we will see in the next paragraph, we can always choose the purifying system C in such a way that the marginal on C is completely mixed.

C. Results about the combination of compression and purification

An important consequence of the combination of the purification postulate with the compression axiom is the fact that one can always choose a purification of ρ such that the marginal state on the purifying system is completely mixed. To prove this result we need the following lemma.

Lemma 1. Let ρSt1(A) be a state and let ΨρSt1(AB) be a purification of ρ. If ETransf(A,C) is the encoding operation in the compression scheme of axiom 3, then the state Ψρ:=(EIB)Ψρ is pure.

Proof. Let DTransf(C,A) be the decoding operation. Since the compression is lossless for ρ we know that DE=ρIA. By theorem 1 this is equivalent to the condition (DEIB)Ψρ=Ψρ. Now, suppose that (EIB)Ψρ=iXΓi. Applying D on both sides we then obtain Ψρ=iX(DIB)Γi, and, since Ψρ is pure, for every iX we must have (DIB)Γi=piΨρ, where pi0 is some probability. Finally, since ED=IC (lemma 2), one has Γi=pi(EIB)Ψρ. Hence, (EIB)Ψρ admits only decompositions with Γi=pi(EIB)Ψρ, that is, (EIB)Ψρ is pure.

We are now in position to prove the desired result.

Theorem 4. For every state ρSt1(A) there exists a system C and a purification ΨρSt1(AC) of ρ such that the marginal state on system C is completely mixed. Moreover, the system C is unique up to operational equivalence.

Proof. Take an arbitrary purification of ρ, say ΦρSt1(AB) for some purifying system B. Define the marginal state on system B as |θ)B:=(e|A|Φρ)AB and define the state Ψρ:=(IAE)Φρ, where ETransf(B,C) the encoding operation for state θ. By lemma 13 we know that ΨρSt(AC) is pure. Using lemma 7 and theorem 1 we obtain (eC||Ψρ)=[(eC|E]|Φρ)=(eB||Φρ)=|ρ), that is, Ψρ is a purification of ρ. Finally, the marginal on system C is given by ρ̃=Eθ, which by lemma 5 is completely mixed. This proves the first part of the thesis. It remains to show that the system C is uniquely defined up to operational equivalence. Suppose that ΨρSt(AC) is another purification of ρ with the property that the marginal on system C is completely mixed. Since Ψρ and Ψρ are two purifications of the same state, there must be two channels CTransf(C,C) and RTransf(C,C) such that Ψρ=(IAC)Ψρ and Ψρ=(IAR)Ψρ (lemma 11). Combining the two equalities one obtains Ψρ=(IARC)Ψρ. Now, the marginal of Ψρ on system C is completely mixed, and this implies that Ψρ is faithful for system C (corollary 3). Hence we have RC=IC. Repeating the same argument for Ψρ we obtain CR=IC. Therefore, C is reversible and R=C1. This proves that C and C are operationally equivalent.

The following facts will also be useful.

Corollary 5. Let ΨρSt1(AB) be a purification of ρSt1(A) and let ETransf(A,C) be the encoding for ρ. Then, the state (EIB)ΨρSt1(CB) is dynamically faithful for C.

Proof. The marginal of (EIB)Ψρ on system C is Eρ, which is completely mixed by lemma 5. Hence, (EIB)Ψρ is dynamically faithful by corollary 3.

Lemma 14. The decoding transformation DTransf(C,A) in the ideal compression for ρSt1(A) is atomic.

Proof. Let ΨρSt1(AB) be a purification of ρ, for some purifying system B. Since DE=ρIA (the compression is lossless), we have (DEIB)|Ψρ)=|Ψρ) (theorem 1). Now, by corollary 5 (EIB)|Ψρ) is faithful for C and by lemma 13 (EIB)|Ψρ) is pure. Using the Choi isomorphism with the faithful state Ψ:=(EIB)Ψρ we then obtain that D is atomic.

D. Teleportation and the link product

For every system A one can choose a completely mixed state ωA and a purification Ψ(A)St(AÃ) such that the marginal on system à is completely mixed (cf. theorem 4). Any such purification allows for a probabilistic teleportation scheme:

Lemma 15 (Probabilistic teleportation). There exists an atomic effect E(A)Eff(ÃA) and a nonzero probability pA such that

and

Proof. See corollary 19 of Ref. [22].

Let us choose Ψ(A) to be the faithful state in the definition of the Choi isomorphism. Then the sequential composition of transformation induces a composition of Choi states in following way.

Corollary 6 (Link product). For two transformations CTransf(A,B) and DTransf(B,C) the Choi state of DCTransf(A,C) is given by the link product

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_20.eps/thumbnail
(9)

Proof. See corollary 22 of Ref. [22].

We conclude this paragraph with an important result that follows from the combination of the link product structure with the pure conditioning axiom.

Lemma 16 (Atomicity of composition). The composition of two atomic transformations is atomic.

Proof. Let CTransf(A,B) and DTransf(B,C) be two atomic transformations. By the Choi isomorphism, the (unnormalized) states RC and RD are pure. Since the teleportation effect E(B) in Eq. (9) is atomic (lemma 15), the pure conditioning axiom 5 implies the state RDC is pure. By the Choi isomorphism this means that DC is atomic.

E. No information without disturbance

We say that a test {Ci}iXTransf(A) is nondisturbing upon input of ρ if iXCi=ρIA. If ρ is completely mixed, we simply say that the test is nondisturbing.

A consequence of the purification postulate is the following “no-information without disturbance” result.

Lemma 17 (No information without disturbance). A test {Ci}iXTransf(A) is nondisturbing upon input of ρ if and only if there is a set of probabilities {pi}iX such that Ci=ρpiIA for every iX.

Proof. See theorem 10 of Ref. [22].

The no-information without disturbance result implies the following geometrical limitation.

Corollary 7. For every system A the convex set of states St1(A) is not a segment.

Proof. The proof is by contradiction. Suppose that for some system A the set St1(A) is a segment. The segment has only two pure states, say ϕ1 and ϕ2, and every other state ρSt1(A) is completely mixed. Then the distinguishability axiom 2 imposes that ϕ1 and ϕ2 are perfectly distinguishable. Take the binary test {a1,a2} such that (ai|ϕj)=δij and define the “measure-and-prepare” test {C1,C2} as Ci=|ϕi)(ai|, i=1,2 (the possibility of preparing a state depending on the outcome of a previous measurement is guaranteed by causality [22]). Since every state ρ in the segment can be written as convex combination of the two extreme points, we have that the test {C1,C2} is nondisturbing: (C1+C2)ρ=ρ for every ρ. This is in contradiction with lemma 17 because C1 and C2 are not proportional to the identity.

We know that no information can be extracted without disturbance. In the following we will prove a result in the converse direction: if a measurement extracts no information, than it can be realized in a nondisturbing fashion. To show this result we first need the following.

Lemma 18. For every observation test {ai}iXEff(A) with finite outcome set X there is a system C and a test {Ai}iXTransf(A,C) consisting of atomic transformations such that (ai|=(eC|Ai.

Proof. Let |Ψ)AB be a pure faithful state for system A and let |Ri)B=(ai|A|Ψ)AB the Choi state of ai. Take a purification of Ri, say |Ψi)BC for some purifying system C [38]. Then, by the Choi isomorphism there is a test {Ai}iX, with input A and output C, such that

(see item 1 of theorem 2). Moreover, each transformation Ai:AC is atomic (item 2 of theorem 2). Applying the deterministic effect (eC| on both sides we then obtain |Ri)B=(eC||Ψi)CA=(eC|Ai|Ψ)AB. By definition of Ri, this implies (ai|A|Ψ)AB=(eC|Ai|Ψ)AB, and, since Ψ is dynamically faithful, (ai|A=(eC|Ai.

Theorem 5. Let ρSt1(A) be a state, aEff(A) be an effect, and ATransf(A,B) be an atomic transformation such that (a|A=(e|BA. If (a|=ρp(e| for some p0, then there exists a channel CTransf(B,A) such that CA=ρpIA.

Proof. Consider a purification of ρ, say ΨρSt(AC), and define the state ΣSt1(BC) by |Σ):=1p(AIC)|Ψρ). By the atomicity of composition 16 the state Σ is pure. Moreover, we have

eBΣBC=1p(a|A|Ψρ)AB=(eA||Ψρ)AC,

having used theorem 1 in the last equality. This implies that Ψρ and Σ are different purifications of the same mixed state on system C. Then, by lemma 11 there exists a channel CTransf(B,A) such that |Ψρ)=(CIC)|Σ)=1p(CAIC)|Ψρ). By theorem 1, the last equality implies CA=ρpIA.

We now make a simple observation that combined with theorem 5 will lead to some interesting consequences.

Lemma 19. If (a|ρ)=a, then a=ρae. Similarly, if (a|ρ)=0, then a=ρ0.

Proof. By definition, σFρ iff there exists p>0 and τSt1(A) such that ρ=pσ+(1p)τ. If (a|ρ)=a, then we have a=p(a|σ)+(1p)(a|τ). Since (a|σ) and (a|τ) cannot be larger than a, the only way to have the equality is to have (a|σ)=(a|τ)=a. By definition, this amounts to say a=ρae. Similarly, if (a|ρ)=0, one has 0=p(a|σ)+(1p)(a|τ), which is satisfied only if (a|σ)=(a|τ)=0, that is, if a=ρ0.

As consequence, we have the following.

Corollary 8. Let ρSt1(A) be a state, aEff(A) be an effect, and ATransf(A,B) be an atomic transformation such that (a|A=(e|BA. If (a|ρ)=1, then A is correctable upon input of ρ, that is, there exists a correction operation CTransf(B,A) such that CA=ρIA.

Proof. If (a|ρ)=1, then clearly a=1. Lemma 19 then implies (a|=ρ(e|. Applying theorem 5 we finally obtain the thesis.

Corollary 9. Let ρSt1(A) be a state, aEff(A) be an effect such that (a|ρ)=1. Then there exists a transformation CTransf(A) such that (a|=(e|C and C=ρI.

Proof. Straightforward consequence of lemma 18 and of corollary 8.

Finally, we say that an observation-test {ai}iX is noninformative upon input of ρ if we have (ai|=ρpi(e| for every iX. This means that the test {ai}iX is unable to distinguish the states in the face Fρ. As a consequence of theorem 5 we have the following “no disturbance without information” result.

Corollary 10 (No disturbance without information). If the test {ai}iX is noninformative upon input of ρ then there is a test {Di}iXTransf(A) that is nondisturbing upon input of ρ and satisfies (e|Di=(ai| for every iX.

Proof. By lemma 18 there exists a test {Ai}Transf(A,B) such that each transformation Ai is atomic and (e|Ai=(ai|. By theorem 5, for each Ai there is a correction channel Ci such that CiAi=ρpiIA. Defining Di:=CiAi we then obtain the thesis.

V. PERFECTLY DISTINGUISHABLE STATES

In this section we prove some basic facts about perfectly distinguishable states. Let us start from the definition.

Definition 5 (Perfectly distinguishable states). The normalized states {ρi}i=1NSt1(A) are perfectly distinguishable if there exists an observation-test {ai}i=1N such that (aj|ρi)=δij. The observation-test {ai}i=1N is called perfectly distinguishing.

From the distinguishability axiom 2 it is clear that every nontrivial system has at least two perfectly distinguishable states.

Lemma 20. For every nontrivial system A there are at least two perfectly distinguishable states.

Proof. Let ϕ be a pure state of A. Obviously, ϕ is not completely mixed (unless the system A has only one state, that is, unless A is trivial). Hence, by axiom 2 there exists at least a state σ that is perfectly distinguishable from ϕ.

An equivalent condition for perfect distinguishability is the following.

Lemma 21. The states {ρi}i=1NSt1(A) are perfectly distinguishable if and only if there exists an observation-test {ai}i=1N such that (ai|ρi)=1 for every i.

Proof. The condition (ai|ρi)=1,i=1,,N is clearly necessary. On the other hand, the condition (ai|ρi)=1,i=1,,N implies

(ai|ρi)=1=j=1N(aj|ρi)=(ai|ρi)+ij(aj|ρi).

Since all probabilities are nonnegative, we must have (aj|ρi)=0 for ij, and therefore, (aj|ρi)=δij.

A very general fact about state discrimination is expressed by the following.

Lemma 22. If ρ is perfectly distinguishable from σ and ρ (or σ) belongs to the face identified by ρ (or σ), then ρ is perfectly distinguishable from σ.

Proof. Let {a,ea} be the binary observation test that distinguishes perfectly between ρ and σ. By definition, aEff(A) is such that (a|ρ)=1 and (a|σ)=0. Now, by lemma 19, (a|ρ)=1 and (a|σ)=0 for all ρFρ and σFσ.

Thanks to purification and to the local distinguishability axiom 4, we are also in position to show a much stronger result.

Lemma 23. Let {ρi}i=1NFρ and {ρj}j=N+1N+MFσ be two sets of perfectly distinguishable states.

If ρ is perfectly distinguishable from σ, then the states {ρi}i=1N+M are perfectly distinguishable.

Proof. Let {a,eAa} be the observation test such that (a|ρ)=1 and (a|σ)=0. Now, by corollary 9 there is a transformation CTransf(A) such that (eA|C=(a| and C=ρIA. Similarly, there exists a transformation CTransf(A) such that (eA|C=(eA|(a| and C=σIA. We can then define the following observation test:

ci=aiCiNbiCN+1iN+M,

where {ai}i=1N (or {bj}j=N+1N+M) is the observation test that perfectly distinguishes among the states {ρi}i=1N (or {ρj}j=N+1N+M). By corollary 4 [see in particular Eq. (8)], {ci}i=1N+M is indeed an observation test: each ci is an effect and one has the normalization

i=1N+Mci=i=1NaiC+i=N+1N+MbiC=eAC+eAC=a+eAa=eA.

Moreover, since C=ρIA and C=σIA, one has (ci|ρi)=1 for every i=1,,M+N. By lemma 21, this implies that the states {ρi}i=1N+M are perfectly distinguishable.

Definition 6. A set of perfectly distinguishable states {ρi}i=1N is maximal if there is no state ρN+1St1(A) such that the states {ρi}i=1N+1 are perfectly distinguishable.

Theorem 6. A set of perfectly distinguishable states {ρi}i=1N is maximal if and only if the state ω=i=1Nρi/N is completely mixed.

Proof. We first prove that if ω is completely mixed, then the set {ρi}i=1N must be maximal. Indeed, if there existed a state ρN+1 such that {ρi}i=1N+1 are perfectly distinguishable, then clearly ρN+1 would be distinguishable from ω. This is absurd because by proposition 1 no state can be perfectly distinguished from a completely mixed state. Conversely, if {ρi}i=1N is maximal, then ω is completely mixed. If it were not, by the distinguishability axiom 2, ω would be perfectly distinguishable from some state ρN+1. By lemma 23, this would imply that the states {ρi}i=1N+1 are perfectly distinguishable, in contradiction with the hypothesis that the set {ρi}i=1N is maximal.

Lemma 24. Every set of perfectly distinguishable pure states can be extended to a maximal set of perfectly distinguishable pure states.

Proof. Let {ϕi}i=1N be a nonmaximal set of perfectly distinguishable pure states. By definition, there exists a state σ such that {ϕi}i=1N{σ} is perfectly distinguishable. Let ϕN+1 be a pure state in Fσ. By lemma 19 the states {ϕi}i=1N+1 will be perfectly distinguishable. Since the dimension of StR(A) is finite and distinguishable states are linearly independent, iterating this procedure one finally obtains a maximal set of pure states in a finite number of steps.

Corollary 11. Any pure state belongs to a maximal set of perfectly distinguishable pure states.

We conclude this section with a few elementary facts about how the ideal compression of axiom 3 preserves the distinguishability properties. In the following we will choose a state ρSt1(A) and ETransf(A,C) [or DTransf(C,A)] will be the encoding (or decoding) in the ideal compression scheme for ρ.

Lemma 25. If the states {ρi}i=1kFρ are perfectly distinguishable, then the states {Eρi}i=1kSt1(C) are perfectly distinguishable. Conversely, if the states {σi}i=1kSt1(C) are perfectly distinguishable, then the states {Dσi}i=1kFρ are perfectly distinguishable.

Proof. Let {ai}i=1k be the observation test such that (ai|ρi)=1 for every i=1,,k. Since the compression is lossless, we have DE|ρi)=|ρi) and (ai|DE|ρi)=1. Now, consider the test {ci}i=1k defined by (ci|=(ai|D. Clearly we have (ci|E|ρi)=1 for every i=1,,k. By lemma 21 this means that the states {Eρi}i=1k are perfectly distinguishable. Similarly, let {bi}i=1k the observation test that distinguishes the set {σi}i=1k. Since ED=IC (lemma 2), we can conclude by the same argument that the states {Dσi}i=1k are perfectly distinguishable.

We say that a set of perfectly distinguishable states {ρi}i=1kFρ is maximal in the face Fρ if there is no state ρk+1Fρ such that the states {ρi}i=1k+1 are perfectly distinguishable. We then have the following.

Corollary 12. If {ρi}i=1kFρ is a maximal set of perfectly distinguishable states in the face Fρ, then {Eρi}i=1kSt1(C) is a maximal set of perfectly distinguishable states. Conversely, if {σi}i=1kSt1(C) is a maximal set of perfectly distinguishable states, then {Dσi}i=1k is a maximal set of perfectly distinguishable states in the face Fρ.

Proof. Distinguishability of the states {Eρi}i=1k and {Dσi}i=1k is proved by lemma 25. Let us now prove maximality. By contradiction, suppose that the set {ρi}i=1k is maximal in the face Fρ while the set {σi}i=1k, σi:=Eρi is not maximal. This means that there exists a state σk+1St1(C) such that the states {σi}i=1k+1 are perfectly distinguishable. By lemma 25 the states {Dσi}i=1k+1 are perfectly distinguishable. Since DEρi=ρi for every i=1,,k, this means that the states {ρi}i=1k{Dσk+1} are perfectly distinguishable, in contradiction with the fact that {ρi}i=1k is maximal. This proves that the set {Eρi}i=1k must be maximal. Conversely, if the set {σi}St1(C) is maximal, using the same argument we can prove that the set {Dσi}i=1k must be maximal in Fρ.

VI. DUALITY BETWEEN PURE STATES AND ATOMIC EFFECTS

We now show the existence of a one-to-one correspondence between states and effects of any system A in the theory. Let us start from a simple observation.

Lemma 26. If a is atomic and (a|ρ)=a for ρSt1(A), then ρ must be pure.

Proof. By lemma 19 the condition (a|ρ)=a implies a=ρae. By theorem 1 the condition a=ρae implies

where ΨρSt1(AB) is any purification of ρ. Since a is atomic, the pure conditioning axiom 5 implies that the marginal state |ρ̃)B=(e|A|Ψρ)AB is pure. Since the marginal of Ψρ on system B is pure, Ψρ must be factorized, that is, Ψρ=ρρ̃ (see lemma 19 of Ref. [1]). Hence, ρ must be pure, otherwise we would have a nontrivial convex decomposition of the pure state Ψρ.

We are now in position to show that every atomic effect is associated to a unique pure state.

Theorem 7. For every atomic effect aEff(A), there exists a unique pure state ϕSt1(A) such that (a|ϕ)=a.

Proof. Let ρ be a state such that (a|ρ)=a. By lemma 26 ρ must be pure. Moreover, this pure state must be unique: suppose that ϕ and ϕ are pure states such that (a|ϕ)=(a|ϕ)=a. Then for ω=1/2(ϕ+ϕ) one has (a|ω)=a. Since ω must be pure, one has ϕ=ϕ.

We now show the converse result: for every pure state ϕSt1(A) there exists a unique atomic effect a such that (a|ϕ)=1. Let us start from the existence.

Lemma 27. Let {ϕi}i=1NSt1(A) be a maximal set of perfectly distinguishable pure states and let {ai}i=1N be the observation test such that (ai|ϕj)=δij. Then each effect ai is atomic with ai=1.

Proof. It is obvious that ai=1 because of the condition (ai|ϕi)=1. It remains to prove atomicity. Consider the state ω=i=1Nϕi/N, which is completely mixed by theorem 6. Let ΨωSt1(AB) be a purification of ω, chosen in such a way that the marginal on system B is completely mixed (theorem 4). As a consequence of purification (lemma 12), there exists an observation-test {bi}i=1N on system B such that (bi|B|Ψω)AB=1/N|ϕi)A. Since Ψω is dynamically faithful on system B, each effect bi must be atomic. Now, define the normalized states {ρi}i=1NSt1(B) and the probabilities {pi}i=1N by

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_23.eps/thumbnail
(10)

Applying the deterministic effect eB on both sides one has pi=(ai|ω)=1/N. On the other hand, applying the effect bj one has instead 1/N(bj|ρi)B=1/N(ai|ϕj)=δij/N. This implies (bi|ρi)=1 for every i. Since bi is atomic, lemma 26 forces each ρi to be pure. Finally, each ai must be atomic since its Choi state pi|ρi)B=(ai|A|Ψω)AB is pure (theorem 2).

As a consequence, we can prove the following existence result.

Lemma 28. For every pure state ϕSt1(A) there exists an atomic effect such that (a|ϕ)=1.

Proof. By corollary 11, every pure state belongs to a maximal set of perfectly distinguishable pure states {ϕi}i=1N, say ϕ=ϕ1. The thesis then follows from lemma 27.

We now prove that the atomic effect a such that (a|ϕ)=1 is unique. For this purpose we need two auxiliary lemmas.

Lemma 29. Let ϕSt1(A) be an arbitrary pure state and let pϕ be the probability defined by

pϕ=maxp:σ,χ=pϕ+(1p)σ,
(11)

where χ is the invariant state of system A. Then the value of the probability pϕ is independent of ϕ.

Proof. Since for every couple of pure states ϕ and ψ one has ψ=Uϕ for some reversible channel U (lemma 9), and since χ is invariant, one has χ=pϕ+(1p)σ if and only if χ=pψ+(1p)Uσ. The maximum probabilities for ϕ and ψ are then equal.

Since pϕ=pψ for every couple of pure states, from now on we will write pmax in place of pϕ.

Lemma 30. Let ϕSt1(A) be a pure state and aEff(A) be an atomic effect such that (a|ϕ)=1. Let |Φ)AB be a purification of the invariant state |χ)A, chosen in such a way that the marginal on system B is completely mixed, and let b be the unique atomic effect on B such that

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_24.eps/thumbnail
(12)

[note that b exists by lemma 12 is uniquely defined by Eq. (12) because Φ is faithful for system B]. Then one has

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_25.eps/thumbnail
(13)

where ψ is the unique pure state such that (b|ψ)=1.

Proof. Define the normalized pure state ψ and the probability q by

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_26.eps/thumbnail
(14)

In order to prove the thesis we have to show that q=pmax and (b|ψ)=1. Applying b on both sides of Eq. (14) and using Eq. (12) we obtain q(b|ψ)=pmax(a|ϕ)=pmax. This implies

qpmax,
(15)

with the equality if and only if (b|ψ)=1. Let b be an atomic effect such that (b|ψ)=1 (such an effect exists because of lemma 28). Define the normalized pure state ϕ and the probability p by

Applying a on both sides and using Eq. (14) we obtain p(a|ϕ)=q(b|ψ)=q, which implies pq, with the equality if and only if (a|ϕ)=1. Combining this with the inequality (15) we have pqpmax. On the other hand, by lemma 29 one has ppmax, and consequently p=q=pmax. This also implies that (b|ψ)=1 and (a|ϕ)=1.

Theorem 8. For every pure state ϕSt1(A) there is a unique atomic effect aEff(A) such that (a|ϕ)=1.

Proof. Existence has been already proved in lemma 28. Let us prove uniqueness: suppose that a and a are two atomic effects such that (a|ϕ)=(a|ϕ)=1. Then, applying lemma 30 to a and a we obtain

Since Φ is dynamically faithful, this implies a=a.

Finally, an important consequence of theorem 8.

Corollary 13. If a,aEff(A) are two atomic effects with a=a=1, then there is a reversible channel UGA such that (a|A=(a|AU.

Proof. Let ϕ and ϕ be the (unique) normalized states such that (a|ϕ)=1 and (a|ϕ)=1, respectively. Now, there is a reversible channel UGA such that |ϕ)A=U|ϕ)A. Hence (a|ϕ)=(a|ϕ)A=(a|U|ϕ). By theorem 8 one has (a|A=(a|AU.

We conclude this section with an elementary result that will be used later in the paper.

Lemma 31. Let ETransf(A,C) and DTransf(C,A) be the encoding and the decoding in the ideal compression scheme for ρSt1(A). If |ϕ)Fρ is a pure state and (a|Eff(A) is the atomic effect such that (a|ϕ)=1, then |γ):=E|ϕ)St1(C) is a pure state and (c|:=(a|DEff(C) is the atomic effect such that (c|γ)=1.

Proof. The state |γ):=E|ϕ) is pure by lemma 4. The effect (c|:=(a|D is atomic by lemmas 14 and 16. Since DE=ρIA, one has (c|γ)=(a|DE|ϕ)=(a|ϕ)=1.

VII. DIMENSION

In this section we show that each system in our theory has given informational dimension, defined as the maximum number of perfectly distinguishable pure states available in the system. In the Hilbert space framework, the informational dimension will be the dimension of the Hilbert space.

Lemma 32. All maximal sets of perfectly distinguishable pure states have the same number of elements.

Proof. Let {ϕi}i=1N be a maximal set of perfectly distinguishable pure states for system A, and let {ai}i=1N the observation test such that (ai|ϕj)=δij. By lemma 27 each ai is atomic and ai=1. Then, by corollary 13 one has (ai|A=(a0|Ui, where each Ui is a reversible channel and a0 is a fixed atomic effect with a0=1. By the invariance of χ we then obtain (ai|χA)=(a0|Ui|χA)=(a0|χA). On the other hand, one has i=1N(ai|χA)=1, which implies N=1/(a0|χA). Since a0 is arbitrary, N is independent of the choice of the set {ϕi}i=1N.

As a consequence, the number of perfectly distinguishable pure states in a maximal set is a property of the system A. We will call this number the informational dimension (or simply the dimension) of system A, and denote it with dA. The informational dimension dA has not to be confused with the size DA, defined as the dimension of the real vector space StR(A).

An immediate consequence of the proof of lemma 32.

Corollary 14. For every atomic effect a with a=1 one has (a|χA)=1/dA.

This simple fact has two very important consequences. The first is that the dimension of a composite system is the product of the dimensions of the components.

Corollary 15. The dimension of the composite system AB is the product of the dimensions of A and B, namely dAB=dAdB.

Proof. From lemma 10 we know that χAχB is the unique invariant state of system AB. Now, if aEff(A) and bEff(B) are such that a=b=1, then ab is such that ab=1. Hence we have 1/dAB=(ab|χAχB)=(a|χA)(b|χB)=1/(dAdB).

The second consequence is the relation between the dimension and the maximum probability of a pure state in the convex decomposition of the invariant state |χ)A.

Lemma 33. For every system A the maximum probability of a pure state in the convex decomposition of the invariant state is pmax=1/dA.

Proof. Let ΦSt1(AB) be a purification of the invariant state |χA), chosen in such a way that the marginal on system B is completely mixed. Let aEff(A) be an atomic effect with a=1. Then, Eq. (13) becomes

where ψ is some normalized pure state of system B. Applying the deterministic effect e on system B on both sides we obtain (a|χA)=pmax. Finally, corollary 14 states (a|χA)=1/dA. By comparison, we obtain pmax=1/dA.

Thanks to the compression axiom 3, the notion of dimension can be applied not only to the whole state space St1(A) but also to its faces. With face F of the convex set St1(A) we always mean the face Fρ identified by some state ρSt1(A).

Lemma 34. Let F be a face of the convex set St1(A). Every maximal set {ϕi}i=1k of perfectly distinguishable pure states in F has the same cardinality k. Precisely, if F is the face identified by ρSt1(A) and ETransf(A,C) is the encoding in the ideal compression for ρ, then we have k=dC.

Proof. The set {Eϕi}i=1kSt1(C) is perfectly distinguishable by lemma 25, and it is maximal by corollary 12. Moreover, the states {Eϕi}i=1k are pure by lemma 4. Hence the cardinality k of the set {ϕi}i=1k must be k=dC.

From now on the maximum number of perfectly distinguishable states in the face F will be called the dimension of the face F and will be denoted by |F|.

VIII. DECOMPOSITION INTO PERFECTLY DISTINGUISHABLE PURE STATES

In this section we show that in a theory satisfying our principles any state can be written as a convex combination of perfectly distinguishable pure states. In quantum theory, this corresponds to the diagonalization of the density matrix.

To prove this result we need first a sufficient condition for the distinguishability of states, given in the following

Lemma 35. Let {ρi}i=1NSt1(A) be a set of states. If there exists a set of effects {bi}i=1NEff(A) (not necessarily an observation test) such that (bi|ρj)=δij, then the states {ρi}i=1N are perfectly distinguishable.

Proof. For each i=1,,N consider the binary test {bi,ebi}. Since by hypothesis (bi|ρj)=δij, the test {bi,ebi} can perfectly distinguish ρi from any mixture of the states {ρj}ji. In particular, this means that, for every M<N, ρM+1 can be perfectly distinguished from the mixture ωM=j=1Mρj/M. Note that, by definition, the states {ρi}i=1M belong to the face FωM. We now prove by induction on M that the states {ρi}i=1M are perfectly distinguishable. This is true for M=1. Now, suppose that the states {ρi}i=1M are perfectly distinguishable. Since the state ρM+1 is perfectly distinguishable from ωM, by lemma 23 we have that the states {ρi}i=1M+1 are perfectly distinguishable. Taking M=N1 the thesis follows.

We now show that the invariant state χ is a mixture of perfectly distinguishable pure states.

Theorem 9. For every maximal set of perfectly distinguishable pure states {ϕi}i=1dASt1(A) one has

χ=1dAi=1dAϕi.

Proof. Let {ai}i=1dA be the observation test such that (ai|ϕj)=δij, and ΦSt1(AB) be a purification of χ, chosen in such a way that the marginal on system B is completely mixed (theorem 4). Let {ψi}i=1dASt1(B) be the pure states defined by

and, for each i, let bi be the atomic effect such that

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_31.eps/thumbnail
(16)

(here we used lemma 30 and the fact that pmax=1/dA). Then we have

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_32.eps/thumbnail
(17)

By lemma 35 this implies that the states {ψi}i=1dA are perfectly distinguishable. Now, since the marginal of |Φ)AB on system B is completely mixed, theorem 6 states that the set {ψi}i=1dA is maximal. Let {bi}i=1dA the observation test such that (bi|ψj)=δij. By lemma 27 each bi must be atomic. On the other hand, there is a unique atomic effect bi such that (bi|ψi)=1 (theorem 8). Therefore, bi=bi. This means that the effects {bi}i=1dA form an observation test. Once this fact has been proved, using Eq. (16) we obtain

|χ)A=(eB||Φ)AB=i(bi||Φ)AB=1/dAi|ϕi).

As a consequence, we have the following.

Corollary 16 (Existence of conjugate systems). For every system A there exists a system Ã, called the conjugate system, and a purification ΦSt1(AÃ) of the invariant state χA such that dÃ=dA and the marginal on à is the invariant state χÃ. The conjugate system à is unique up to operational equivalence.

Proof. We first prove that à is unique up to operational equivalence. The defining property of the conjugate system à is that the marginal of Φ on à is the invariant state χÃ, which is completely mixed. Theorem 4 then implies that à is unique up to operational equivalence. Let us now show the existence of Ã. Take a purification of χA, with purifying system à chosen so that the marginal of Φ on à is completely mixed (this is possible thanks to theorem 4). Now, the states {ψi}i=1dASt(B), defined by 1dA|ψi):=[(ai|IÃ]|Φ), are perfectly distinguishable [see Eq. (17) in the proof of theorem 9]. Hence, by theorem 6 they are a maximal set of perfectly distinguishable pure states. This implies dÃ=dA. Finally, by theorem 9 one has 1/dÃi=1dÃψi=χÃ.

Corollary 17. The distance between the invariant state χA and an arbitrary pure state ϕSt1(A) is

χϕ=2(dA1)dA.

Proof. Take a maximal set of perfectly distinguishable pure states {ϕi}i=1dA such that ϕ1=ϕ (corollary 11). Since χ=i=1dAϕi/dA one has χϕ=(dA1)dA(σϕ1), where σ=i=2dAϕi/(dA1). Hence, one has χϕ=(dA1)dAσϕ1=2(dA1)dA, having used that σ and ϕ1 are perfectly distinguishable and therefore σϕ1=2 (see subsection II-I in Ref. [22]).

We can now prove the following strong result.

Theorem 10 (Spectral decomposition). For every system A, every mixed state can be written as a convex combination of perfectly distinguishable pure states.

Proof. The proof is by induction on the dimension of the system. If dA=1, the thesis trivially holds. Now suppose that the thesis holds for any system B with dimension dBN, and take a mixed state ρSt1(A) where dA=N+1. There are two possibilities: either (1) ρ is not completely mixed or (2) ρ is completely mixed. Suppose that (1) ρ is not completely mixed. Then by the compression axiom 3 one can encode it in a system C, using an encoding operation ETransf(A,C). Now, the maximum number of perfectly distinguishable states in C is equal to the maximum number of perfectly distinguishable states in the face Fρ (corollary 12). Since ρ is not completely mixed, we must have dCN. Using the induction hypothesis we then obtain that the state EρSt1(C) is a mixture of perfectly distinguishable pure states, say Eρ=ipiψi. Applying the decoding operation DTransf(C,A) we get ρ=DEρ=ipiDψi. Since by lemmas 4 and 25 we know that the states {Dψi}i=1dC are pure and perfectly distinguishable, this is the desired decomposition for ρ. Now suppose that ρ is completely mixed (2). Consider the half-line in StR(A) defined by σt=(1+t)ρtχ, t0. Since the set of normalized states St1(A) is compact, the line will cross its border at some point t0. Therefore, one will have

ρ=11+t0σt0+t01+t0χ

for some state σt0 on the border of St1(A), that is, for some state that is not completely mixed. But we know from the discussion of point (1) that the state σt0 is a mixture of perfectly distinguishable pure states, say σt0=i=1kpiϕi. By lemma 24 this set can be extended to a maximal set of perfectly distinguishable pure states {ϕi}i=1dA. On the other hand, theorem 9 states that χ=i=1dAϕi/dA. This implies the desired decomposition

ρ=i=1dAqi1+t0+t0dA(1+t0)ϕi,

where qi=pi for 1ik, and qi=0 otherwise.

It is easy to show that the marginals of a pure bipartite state have the same spectral decomposition.

Corollary 18. Let ΨSt1(AB) be a pure state, and let ρ and ρ̃ be the marginals of Ψ on systems A and B, respectively. If ρ has spectral decomposition ρ=i=1dApiϕi, with pi>0 for every i=1,,r, rdA, then ρ̃ has spectral decomposition ρ̃=i=1rpiψi.

Proof. Let {ai}i=1dA be the observation test such that (ai|ϕj)=δij, {bi}i=1r be the observation test such that (bi|B|Ψ)AB=pi|ϕi)A for every ir. For ir define the pure state ψiSt1(B) and the probability qi via the relation

qiψiB:=aiAΨAB.

[Note that ψi is pure due to the pure conditioning axiom.] By definition we have

qi(bj|ψi)=(aibj|Ψ)=(ai|ϕj)=piδijir,jr.

The above relation implies qi=j=1dAqi(bj|ψi)=jpiδij=pi and (bj|ψi)=δij. Hence the states {ψi}i=1r are perfectly distinguishable. On the other hand, we have (aieB|Ψ)=(ai|ρ)=0i>r, which implies (ai|A|Ψ)AB=0, i>r. Therefore, we obtained

ρ̃B=eAΨAB=i=1dAaiAΨAB=i=1raiAΨAB=i=1rpiψiA,

which is the desired spectral decomposition.

The spectral decomposition of states has many consequences. Here we just discuss the simplest ones, which are needed for the purpose of the derivation of quantum theory.

A first consequence is the following lemma.

Lemma 36. Let ϕSt1(A) be a pure state and let aEff(A) be the unique atomic effect such that (a|ϕ)=1. If ϕ is perfectly distinguishable from ρ, then (a|ρ)=0.

Proof. Let us write ρ=i=1kpiϕi, with {ϕi}i=1k perfectly distinguishable pure states and pi>0 for each i. Now, by lemma 23 the states {ϕ1,,ϕk,ϕ} are perfectly distinguishable, and by lemma 24 this set can be extended to a maximal set of perfectly distinguishable pure states {γm}m=1dA, with γi=ϕi for ik and γk+1=ϕ. Denote by {cm}m=1dA the observation test that perfectly distinguishes between the states {γm}. Note that, by definition, (ck+1|ϕ)=1 and (ck+1|ϕj)=0 for every jk+1. Also, recall that ck+1 is atomic (lemma 27). By the duality of theorem 8 we have a=ck+1 and, therefore, (a|ρ)=i=1kpi(ck+1|ψi)=0.

Another consequence of theorem 10 is the following characterization of the completely mixed states as full rank states.

Corollary 19 (Characterization of completely mixed states). A state ρSt1(A), written as a mixture ρ=i=1dApiϕi of a maximal set of perfectly distinguishable pure states {ϕi}i=1dA, is completely mixed if and only if pi>0 for every i=1,,dA.

Proof. Necessity: If pi=0 for some i, then ρ is perfectly distinguishable from ϕi. Hence, it cannot be completely mixed. Sufficiency: let pmin=min{pi,i=1,,dA}. Then we have ρ=pminχ+(1pmin)σ, where σ is the state defined by σ=1/(1pmin)i=1dA(pipmin/dA)ϕi. Since ρ contains χ in its convex decomposition, and since χ is completely mixed, we conclude that ρ is completely mixed.

In particular, for two-dimensional systems we have the result.

Corollary 20. For dA=2 any state on the border of St1(A) is pure.

Another consequence of theorem 10 is that every element in the vector space StR(A) can be written as a linear combination of perfectly distinguishable states.

Corollary 21. For every ξStR(A) there exists a maximal set of perfectly distinguishable pure states {ϕi}i=1dA and a set of real numbers {ci}i=1dA such that |ξ)=ici|ϕi).

Proof. Write ξ as ξ=c+ρcσ, where c+,c0 and ρ and σ are normalized states. If c=0 there is nothing to prove, because ξ is proportional to a state. Then, suppose that c>0. Write σ as σ=ipiψi where {ψi} are perfectly distinguishable and define k=max{pi}. Then one has χ+1/(ckdA)ξ=(χ1/(kdA)σ)+c+/(ckdA)ρ. Now, by definition χ1/(kdA)σ is proportional to a state: indeed we have [χ1/(kdA)σ]=1/dAi(1pi/k)ψi, and, by definition 1pi/k0. Therefore χ+1/(ckdA)ξ is proportional to a state, say χ+1/(ckdA)ξ=tτ, with t>0. Writing τ as τ=iqiϕi, where {ϕi}i=1dA is a maximal set of perfectly distinguishable pure states, we then obtain ξ=(ckdA)(tτχ)=(ckdA)i(tqi1/dA)ϕi, which is the desired decomposition.

In quantum theory corollary 21 is equivalent to the fact that every Hermitian matrix is diagonal in a suitable orthonormal basis. A simple consequence of corollary 21 is the following.

Corollary 22. For every system A with dA=2 there is a continuous set of pure states.

Proof. Let ξStR(A) be an arbitrary vector such that (e|ξ)=0. Note that since the convex set St1(A) cannot be a segment (corollary 7), we must have DA=dim[StR(A)]>2 and, therefore, the space of vectors ξ such that (e|ξ)=0 is at least two dimensional. By corollary 21 we have ξ=c(ϕ1ϕ2)=2c(ϕ1χ), where c0, {ϕ1,ϕ2} are two perfectly distinguishable pure states and we used the fact that χ=12(ϕ1+ϕ2). Let us define ϕξ:=ϕ1. With this definition, if ϕξ1=ϕξ2 then one has ξ2=tξ1 for some t0. Now, since there is a continuous infinity of vectors ξ (up to scaling), there must be a continuous set of pure states.

We conclude this section with the dual result to the “spectral decomposition” of corollary 21.

Corollary 23. For every xEffR(A) there exists a perfectly distinguishing observation-test {ai}i=1dA and a set of real numbers {di}i=1dA such that (x|=idi(ai|.

Proof. Let ΦSt1(AÃ) be a purification of the invariant state χA, where à is the conjugate system defined in corollary 16. Take the Choi vector |Rx)Ã:=(x|A|Φ)AÃ. By corollary 21 there exists a maximal set of perfectly distinguishable pure states {ψi}i=1dA and a set of real numbers {ci}i=1dA such that |Rx)=ici|ψi). Let {ai}i=1dAEff(A) be the observation test such that 1/dA|ψi)Ã=(ai|A|Φ)Aà for every i=1,,dA (recall that by corollary 16 the marginal of Φ on system à is the invariant state χà and dÃ=dA). The test {ai}i=1dA is perfectly distinguishing: if {bi}i=1dA is the observation test such that (bi|ψj)=δij and ϕiSt1(A) is the state defined by |ϕi)A:=dA(bi|Ã|Φ)AÃ, then we have

(ai|ϕj)=dAaibjΦ=(bj|ψi)=δij.

Moreover, we have

xAΦAÃ=RxÃ=iciψiÃ=icidAaiAΦAÃ.

Since Φ is dynamically faithful, this implies (x|=idi(ai|, where di:=cidA.

IX. TELEPORTATION REVISITED

In this section we revisit probabilistic teleportation using the results about informational dimension. The key point is the section will be the proof the equality DA=dA2, which relates the dimension of the vector space StR(A) with the informational dimension dA.

A. Probability of teleportation

We start by showing a probabilistic teleportation scheme that achieves success probability pA=1/dA for every system A.

Theorem 11 (Probability of teleportation). For every system A, probabilistic teleportation can be achieved with probability pA=1/dA2.

Proof. Let à and |Φ)Aà be the conjugate system and the pure state defined in corollary 16. Then, the state |Φ)AÃ|Φ)Aà satisfies the identity

On the other hand, by lemma 33 the maximum probability of a pure state in the convex decomposition of χAÃ is pmax=1/dAÃ, and by corollaries 15 and 16 one has pmax=1/(dAdÃ)=1/dA2. Therefore, by lemma 12 there exists an atomic effect E such that

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_34.eps/thumbnail
(18)

and, since Φ is dynamically faithful,

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_35.eps/thumbnail
(19)

as can be verified applying both members of Eq. (19) to Φ, thus obtaining Eq. (18).

B. Isotropic states and effects

Here we define two maps that send reversible transformations of A to reversible transformations of Ã: the transpose and the conjugate. Using these maps we will also define the notions of isotropic states and effects and we will prove some properties of them.

Let us start from the definition of the transpose.

Lemma 37 (Transpose of a reversible transformation). Let ΦSt(AÃ) be a purification of the invariant state χA. The reversible transformations of system à are in one-to-one correspondence with the reversible transformations of system A via the transposition τ defined as follows:

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_36.eps/thumbnail
(20)

[note that the transposition is defined with respect to the given state Φ].

Proof. Since (UIÃ)|Φ) and |Φ) are purifications of the same state χA, there exists a reversible transformation UτGà such that Eq. (20) holds. Since Φ is dynamically faithful on A, the map UUτ is injective. Furthermore, the map is surjective: for every reversible VGà the states (IAV)|Φ) and |Φ) are two purifications of the same state χÃ, and, by the uniqueness of purification stated in postulate 1, there exists a reversible UGA such that

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_37.eps/thumbnail
(21)

namely V=Uτ.

The conjugate is just defined as the inverse of the transpose.

Definition 7. Let τ be the transpose defined with respect to the state ΦSt1(AÃ). The conjugate of the reversible channel UGA is the reversible channel U*GÃ defined by U*:=(Uτ)1.

We can now give the definition of isotropic pure state (isotropic atomic effect).

Definition 8. A pure state ΨSt(AÃ) [an atomic effect FEff(ÃA)] is isotropic if it is invariant under the UU* (under U*U). Diagrammatically

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_38.eps/thumbnail
(22)

An example of isotropic state is Φ: indeed, by definition of conjugate we have, for every UGA,

(UU*)Φ=(U(Uτ)1)Φ=(IA(Uτ)1Uτ)Φ=Φ.

As a consequence, the teleportation effect E is isotropic: indeed one has

which implies (E|(U*U)=(E|, since the state ΦΦ is dynamically faithful.

We now show that all isotropic pure states (isotropic atomic effects) are connected to the state Φ (to the effect E) through a local reversible transformation.

Lemma 38. If a pure state ΨSt1(AÃ) is isotropic then |Ψ)=(VIÃ)|Φ) for some reversible transformation VGA such that VU=UV for every UGA.

Proof. Since Ψ satisfies Eq. (22), its marginal on system à is the invariant state |χÃ). Since Ψ and Φ are purifications of the same state, there must exist a reversible channel VGA such that |Ψ)=(VIÃ)|Φ). Moreover, we have for every UGA

(UVU1IÃ)Φ=(UVU*)Φ=(UU*)Ψ=Ψ=(VIÃ)Φ.

Since Φ is dynamically faithful, the above equation implies UVU1=V for every UGA.

By the duality between states and effects, it is easy to obtain the following.

Lemma 39. Let AEff(AÃ) be the atomic effect such that (A|Φ)=1. If an atomic effect FEff(ÃA) is isotropic then (F|ÃA=(A|ÃA(IÃV) for some reversible transformation VGA such such that VU=UV for every UGA.

Proof. Let Ψ be the pure state such that (F|Ψ)=1. Clearly Ψ is isotropic: one has (F|(UU*)|Ψ)=(F|Ψ)=1, and, therefore, (UU*)|Ψ)=|Ψ). By lemma 38, there exists a reversible transformation V such that |Ψ)=(V1IÃ)|Φ) and V1U=UV1 for every UGA. Now, this implies (F|(V1IÃ)|Φ)=(F|Ψ)=1, which by theorem 8 implies (F|=(A|(VIÃ).

As a consequence, every isotropic effect is connected to the teleportation effect by a local reversible transformation:

Corollary 24. If an atomic effect FEff(ÃA) is isotropic then (F|ÃA=(E|ÃA(IÃV) for some reversible transformation VGA such that VU=UV for every UGA.

Proof. Since (E| and (F| are both isotropic, lemma 39 implies that they are both connected to (A| through a local reversible transformation, say V and W, respectively. Therefore, they are connected to each other through the transformation WV1.

C. Dimension of the state space

In this subsection we use the local distinguishability axiom to prove the equality DA=dA2 (see theorem 12). As a consequence, we will be able to represent the states of a system A as square dA×dA Hermitian complex matrices, that is, Hermitian operators on the complex Hilbert space CdA. Theorem 12 is thus the point where the complex field (as opposed to the real field) enters in our derivation. Notice that, even if the local distinguishability excludes quantum theory on real Hilbert spaces since the very beginning, to prove the emergence of complex Hilbert spaces we need to use all the six principles.

Due to local distinguishability, any bipartite state ΨSt(AB) can be written as

Ψ=i=1DAj=1DBΨij|αi)|βj),

where {αi} ( {βj}) is a basis for the vector space StR(A) [ StR(B)]. Similarly, a bipartite effect FEff(BA) can be written as

F=k=1DBl=1DAFkl(βk*|(αl*|

with (αl*|αi)=δil and (βk*|βj)=δjk. Finally, a transformation C from A to B can be written as

C=j=1DBi=1DACji|βj)(αi*|.

In this matrix representation the teleportation diagram of Eq. (19) becomes

ΦE=IDAdA2,
(23)

where IDA is the identity matrix in dimension DA. On the other hand, we also have

1EΦ=Tr[ΦE]=DAdA2

and, therefore,

DAdA2.

We now show that one has the equality, using the following standard lemma.

Lemma 40. With a suitable choice of basis for the vector space StR(A), every reversible transformation UGA is represented by a matrix MU of the form

MU=100OU,
(24)

where OU is an orthogonal (DA1)×(DA1) matrix.

Proof. Let {ξi} be a basis for StR(A), chosen in such a way that the first basis vector is χ, while the remaining vectors satisfy (e|ξi)=0,i=2,,DA. Such a choice is always possible since every vector vStR(A) can be written as v=(e|v)χ+ξ, where ξ satisfies (e|ξ)=0. Now, since Uχ=χ, the first column of MU must be (1,0,,0)T. Moreover, since for every normalized state ρ, Uρ is a normalized state, one must have (e|U|ξ)=0 for every ξ such that (e|ξ)=0. Hence the first row of MU must be (1,0,,0), namely MU has the block form of Eq. (24). It remains to show that, with a suitable choice of basis, the matrix OU in the second block can be chosen to be orthogonal. Observe that by definition the matrices {MU}UGA form a representation of the group GA: indeed, one has MI=IDA and MUV=MUMV for every U,VG. Consider the positive definite matrix P defined by the integral

P:=dUOUTOU,

where dU is the Haar measure on the compact group GA (see corollary 30 of Ref. [22] for the proof of compactness) and AT denotes the transpose of A. By definition, one has PT=P and OUTPOU=P for every UGA. Let us now define the new representation

OU:=P12OUP12,

obtained from OU by a change of basis in the subspace spanned by {ξi}i=2DA. With this choice, each matrix OU is orthogonal:

OUTOU=P12OUP12TP12OUP12=P12OUTPOUP12=IDA1.

As a consequence, we have the following.

Corollary 25. For every system A, the group of reversible transformations GA is (isomorphic to) a compact subgroup of O(DA1).

Lemma 41. Let EEff(AÃ) be the teleportation effect of Eq. (19). Then one has (E|Φ)=1.

Proof. Let AEff(AÃ) be the atomic effect such that (A|Φ)=1. We now prove that A=E. Indeed, by corollary 24 there exists a reversible transformation VGA such that (A|=(E|(VIÃ). Using a basis for StR(A) such that the transformations in GA are represented by orthogonal matrices as in Eq. (24), one has

1=AΦ=E(VIÃ)Φ=Tr[EMVΦ]=Tr[ΦEMV]=Tr[MV]dA2,

having used Eq. (23) for the last equality. Using the inequality Tr[MV]Tr[IDA], that holds for every orthogonal DA×DA matrix, we then obtain

1=Tr[MV]dA2Tr[IDA]dA2=Tr[EΦ]=EΦ1,

and, therefore (E|Φ)=1.

Theorem 12 (Dimension of the state space). The dimension DA of the vector space generated by the states in St(A) is DA=dA2.

Proof. Using lemma 41 and Eq. (23) we obtain 1=(E|Φ)=Tr[EΦ]=Tr[IDA]/dA2=DA/dA2. Hence, DA=dA2.

An interesting consequence of the relation (E|Φ)=1 is the following.

Corollary 26 (No inversion). Let us write an arbitrary state ρSt1(A) as ρ=χA+ξ, with (e|ξ)=0. Then, the linear map N defined by N(ρ)=χAξ is not a physical transformation.

Proof. Write the state Φ as Φ=χAχÃ+Ξ. Since (e|A|Φ)AÃ=|χ)à one must have (e|A|Ξ)AÃ=0. Therefore, Ξ must be of the form Ξ=iαiβi with (e|αi)=0 for all i. Applying the transformation N one then obtains (NIÃ)Φ=χAχÃΞ. We now prove that this is not a state, and therefore, N cannot be a physical transformation. Let E be the teleportation effect. Since (E|Φ)=1, we have 1=(E|χAχÃ)+(E|Ξ)=1/dA2+(E|Ξ). Now, we have

(E|(NIÃ)|Φ)=1dA2(E|Ξ)=2dA21.

Since this quantity is negative for every dA>1, the map N cannot be a physical transformation.

Corollary 27. The matrix MN defined as

MN=100IDA1
(25)

cannot represent a physical transformation of system A.

X. DERIVATION OF THE QUBIT

In this section we show that every two-dimensional system in our theory is a qubit. With this expression we mean that the normalized states in St1(A) can be represented as density matrices for a quantum system with two-dimensional Hilbert space. With this choice of representation we also show that the effects in Eff(A) are all the positive Hermitian matrices bounded by the identity, and that the reversible transformations GA act on the states by conjugation with unitary matrices in SU(2).

The first step is to prove that the set of normalized states St1(A) is a sphere. The idea of the proof is a simple geometric observation: in the ordinary three-dimensional space the sphere is the only compact convex set that has an infinite number of pure states connected by orthogonal transformations. The complete proof is given in the following.

Theorem 13 (The Bloch sphere). The normalized states of a system A with dA=2 form a sphere and the group GA is SO(3).

Proof. According to corollary 25, the group of reversible transformations GA is a compact subgroup of the orthogonal group O(3). It cannot be the whole O(3) because, as we saw in corollary 27, the inversion I cannot represent a physical transformation. We now show that GA must be SO(3) by excluding all the other possibilities. From corollary 22 we know that the system A has a continuum of pure states. Therefore, the group GA must contain a continuous set of transformations. Now, from the classification of the closed subgroups of O(3) we know that there are only two possibilities: (i) GA is SO(3) and (ii) GA is the subgroup generated by SO(2), the group of rotations around a fixed axis, say the z axis, and possibly by the reflections with respect to planes containing the z axis. Note that the reflection in the xy plane is forbidden, because the composition of this reflection with the rotation of π around the z axis would give the inversion, which is forbidden by corollary 26. Case (ii) is excluded because in this case the action of the group GA cannot be transitive. The detailed proof is as follows: because of the SO(2) symmetry, the set of pure states must contain at least a circle in the xy plane. This circle will be necessarily invariant under all operations in the group. However, since the convex set of states is three dimensional, there is at least a pure state outside the circle. Clearly there is no way to transform a state on the circle into a state outside the circle by means of an operation in GA. This is in contradiction with the fact that every two pure states are connected by a reversible transformation. Hence, case (ii) is ruled out. The only remaining alternative is (i), namely that GA=SO(3) and, hence, the set of pure states generated by its action on a fixed pure state is a sphere.

Since the convex set of density matrices on a two-dimensional Hilbert space is a sphere, we can represent the states in St1(A) as density matrices. Precisely, we can choose three orthogonal axes passing through the center of the sphere and call them x,y,z axes, take ϕ+,k,ϕ,k, k=x,y,z to be the two perfectly distinguishable pure states in the direction of the k axis and define σk:=ϕk,+ϕk,. From the geometry of the sphere we know that any state ρSt1(A) can be written as

ρ=χ+12k=x,y,znkσk,k=x,y,znk21,
(26)

where the pure states are those for which k=x,y,znk2=1. The Bloch representation Sρ of quantum state ρ is then obtained by associating the basis vectors χ,σx,σy,σz to the matrices

Sχ=121001,Sσx=0110,Sσy=0ii0,Sσz=1001,

and by defining Sρ by linearity from Eq. (26). Clearly, in this way we obtain

Sρ=121+nznxinynx+iny1nz,

which is the expression of a generic density matrix. Denoting by M2(C) the set of complex two-by-two matrices we have the following.

Corollary 28 (Qubit density matrices). For dA=2 the set of states St1(A) is isomorphic to the set of density matrices in M2(C) through the isomorphism ρSρ.

Once we decide to represent the states in St1(A) as matrices, the effects in Eff(A) are necessarily represented by matrices too. The matrix representation of an effect, given by the map aEff(A)EaM2(C) is defined uniquely by the relation

Tr[EaSρ]=aρρSt(A).

We then have the following.

Corollary 29. For dA=2 the set of effects Eff(A) is isomorphic to the set of positive Hermitian matrices PM2(C) such that PI.

Proof. Clearly the matrix Ea must be positive for every effect a, since we have Tr[EaSρ]=(a|ρ)0 for every density matrix Sρ. Moreover, since we have Tr[EaSρ]=(a|ρ)1 for every density matrix Sρ, we must have EaI. Finally, we know that for every couple of perfectly distinguishable pure states ϕ,ϕ there exists an atomic effect a such that (a|ϕ)=1 and (a|ϕ)=0. Since the two pure states ϕ,ϕ are represented by orthogonal rank-one projectors Sϕ and Sϕ, we must have Ea=Sϕ. This proves that the atomic effects are the whole set of positive rank-one projectors. As a consequence, also every positive matrix P with PI must represent some effect a.

Finally, the reversible transformations are represented as conjugations by unitary matrices in SU(2).

Corollary 30. For every reversible transformation UGA with dA=2 there exists a unitary matrix USU(2) such that

SUρ=USρU,ρSt(A).
(27)

Conversely, for every USU(2) there exists a reversible transformation UGA such that Eq. (27) holds.

Proof. Every rotation of the Bloch sphere is represented by conjugation by some SU(2) matrix. Conversely, every conjugation by an SU(2) matrix represents some rotation on the Bloch sphere. On the other hand, we know that GA is the group of all rotations on the Bloch sphere (theorem 13).

Note that we proved that all two-dimensional systems A and B in our theory have the same states [ St1(A)St1(B)], the same effects [ Eff(A)Eff(B)], and the same reversible transformations ( GAGB), but we did not show that A and B are operationally equivalent. For example, A and B could be different when we compose them with a third system C: the set of states St1(AC) and St1(BC) could be nonisomorphic. The fact that every couple of two-dimensional systems A and B are operationally equivalent will be proved later (cf. corollary 40).

We conclude this section with a simple fact that will be very useful later.

Corollary 31 (Superposition principle for qubits). Let {ϕ1,ϕ2}St1(A) be two perfectly distinguishable pure states of a system A with dA=2. Let {a1,a2} be the observation test such that (ai|ϕj)=δij. Then, for every probability 0p1 there exists a pure state ψpSt1(A) such that

a1ψp)=p,a2ψp)=1p.
(28)

Precisely, the set of pure states ψpSt1(A) satisfying Eq. (28) is a circle in the Bloch sphere.

Proof. Elementary property of density matrices.

XI. PROJECTIONS

In this section we define the projection on a face F of the convex set St1(A) and we prove several properties of projections. The projection on the face F will be defined as an atomic operation ΠFTransf(A) that acts as the identity on states in the face F and that annihilates the states on the orthogonal face F. In the following we first introduce the concept of orthogonal face, then prove the existence and uniqueness of projections, and finally give some useful results on the projection of a pure state on two orthogonal faces.

A. Orthogonal faces and orthogonal complements

In order to introduce the notion of orthogonal face we need first a few elementary results. We start by showing that there is a canonical way to associate a state ωF to a face F.

Lemma 42 (State associated to a face). Let F be a face of the convex set St1(A) and let {ϕi}i=1|F| be a maximal set of perfectly distinguishable pure states in F. Then the state ωF:=1|F|i=1|F|ϕi depends only on the face F and not on the particular set {ϕi}i=1|F|. Moreover, F is the face identified by ωF.

Proof. Suppose that F is the face identified by ρ and let ETransf(A,C) [or DTransf(C,A)] be the encoding (or decoding) in the ideal compression for ρ. By lemma 4 and corollary 12, {Eϕi}i=1|F| is a maximal set of perfectly distinguishable pure states of C and by theorem 9 one has χC=1|F|i=1|F|Eϕi. Hence, ωF=1|F|i=1|F|ϕi=1|F|i=1|F|DEϕi=DχC. Since the right-hand side of the equality is independent of the particular set {ϕi}i=1|F|, the state ωF in the left-hand side is independent too. To prove that F is the face identified by ωF it is enough to observe that ωF is completely mixed relative to F: this fact follows from the relation ωF=DχC and from lemma 5.

We now define the orthogonal complement of the state ωF.

Definition 9. The orthogonal complement of the state ωF is the state ωFSt1(A){0} defined as follows:

  • (1) if |F|=dA, then ωF=0;

  • (2) if F<dA, then ωF is defined by the relation

    χ A = | F | d A ω F + d A | F | d A ω F .
    (29)

An easy way to write the orthogonal complement.

Lemma 43. Take a maximal set {ϕi}i=1|F| of perfectly distinguishable pure states in F and extend it to a maximal set {ϕi}i=1dA of perfectly distinguishable pure states in St1(A), then for |F|<dA we have

ωF=1dA|F|i=|F|+1dAϕi.

Proof. By definition, for |F|<dA we have ωF=1dA|F|(dAχA|F|ωF). Substituting the expressions χA=1dAi=1dAϕi and ωF=1|F|i=1|F|ϕi we then obtain the thesis.

Note, however, that by definition the orthogonal complement ωF depends only on the face F and not on the choice of the maximal set in lemma 43.

An obvious consequence of lemma 43.

Corollary 32. The states ωF and ωF are perfectly distinguishable.

Proof. Take a maximal set {ϕi}i=1|F| of perfectly distinguishable pure states in F, extend it to a maximal set {ϕi}i=1dA, and take the observation test such that (ai|ϕj)=δij. Then the binary test {aF,eaF}, defined by aF:=i=1|F|ai distinguishes perfectly between ωF and ωF.

We say that a state τSt1(A) is perfectly distinguishable from the face F if τ is perfectly distinguishable from every state σ in the face F. With this definition we have the following.

Lemma 44. The following are equivalent:

  • (1) τ is perfectly distinguishable from the face F,

  • (2) τ is perfectly distinguishable from ωF,

  • (3) τ belongs to the face identified by ωF, that is, τFωF.

Proof. ( 12) τ is perfectly distinguishable from ωF if and only if then there exists a binary test {a,ea} such that (a|τ)=1 and (a|ωF)=0. By lemma 19 this is equivalent to the condition (a|τ)=1 and a=ωF0, that is, τ is distinguishable from any state σ in the face identified by ωF, which by definition is F. ( 23) Let {ϕi}i=1|F| be a maximal set of perfectly distinguishable states in F, ωF=1|F|i=1|F|ϕi, and let {ϕi}i=|F|+1k be the maximal set of perfectly distinguishable pure states in the spectral decomposition τ=i=|F|+1kpiϕi, with pi>0 for every i=|F|+1,,k. Since τ is perfectly distinguishable from ωF, by lemma 23 we have that the states {ϕi}i=1k are all perfectly distinguishable. Let us extend this set to a maximal set {ϕi}i=1dA. By lemma 43 have ωF=1dA|F|i=|F|+1dAϕi. Hence, all the states {ϕi}i=|F|+1dA are in the face FωF. Since τ is a mixture of these states, it also belongs to the face FωF. ( 32). Since ωF and ωF are perfectly distinguishable, if τ belongs to the face identified by ωF, then by lemma 22 τ is perfectly distinguishable from ωF.

Corollary 33. If ρ is perfectly distinguishable from σ and from τ, then ρ is perfectly distinguishable from any convex mixture of σ and τ.

Proof. Let F be the face identified by ρ. Then by lemma 44 we have σ,τFωF. Since FωF is a convex set, any mixture of σ and τ belongs to it. By lemma 44, this means that any mixture of σ and τ is perfectly distinguishable from ρ.

We are now ready to give the definition of orthogonal face.

Definition 10 (Orthogonal face). The orthogonal face F is the set of all states that are perfectly distinguishable from the face F.

By lemma 44 it is clear that F is the face identified by ωF, that is F=FωF.

In the following we list few elementary facts about orthogonal faces.

Lemma 45. The following properties hold

  • (1) |F|=dA|F|,

  • (2) χA=|F|dAωF+|F|dAωF,

  • (3) ωF=ωF,

  • (4) ωF=ωF,

  • (5) (F)=F.

Proof. Item 1. If |F|=dA the thesis is obvious. If |F|<dA, take a maximal set {ϕi}i=1|F| (or {ϕj}j=|F|+1|F|+|F|) of perfectly distinguishable pure states in F (or F). Hence we have

ωF=1|F|i=1|F|ϕiorωF=1|F|j=|F|+1|F|+|F|ϕj.

By corollary 32 the states ωF and ωF are perfectly distinguishable. Hence the states {ϕi}i=1|F|+|F| are perfectly distinguishable jointly (lemma 23). Now we must have |F|+|F|=dA, otherwise there would be a pure state ψ that is perfectly distinguishable from the states {ϕi}i=1|F|+|F|. This implies that ψ belongs to F and that states {ψ}{ϕj}j=|F|+1|F|+|F| are perfectly distinguishable in F, in contradiction with the hypotheses that the set {ϕj}j=|F|+1|F|+|F| is maximal in F. Item 2 Immediate from item 1 and definition 9. Items 3 and 4 Both items follow by comparison of item 2 with Eq. (29). Item 5 By condition 3 of lemma 44, (F) is the face identified by the state ωF, which, by item 4, is ωF. Since the face identified by ωF is F, we have (F)=F.

We now show that there is a canonical way to associate an effect aF to a face F.

Definition 11 (Effect associated to a face). We say that aFEff(A) is the effect associated to the face FSt1(A) if and only if aF=ωFe and aF=ωF0.

In other words, the definition imposes that (aF|ρ)=1 for every ρF and (aF|σ)=0 for every σF.

Lemma 46. A state ρSt1(A) belongs to the face F if and only if (aF|ρ)=1.

Proof. By definition, if ρ belongs to F, then (aF|ρ)=1. Conversely, if (aF|ρ)=1, then ρ is perfectly distinguishable from ωF, because (aF|ωF)=0. Now, we know that ωF is equal to ωF (item 4 of lemma 45). By item 2 of lemma 44 the fact that ρ is perfectly distinguishable from ωF implies that ρ belongs to (F), which is just F (item 5 of lemma 45).

We now show that the effect aF associated to the face F exists and is unique. A preliminary result needed to this purpose is the following.

Lemma 47. The effect aF must have the form aF=i=1|F|ai, where ai is the atomic effect such that (ai|ϕi)=1 and {ϕi}i=1|F| is a maximal set of perfectly distinguishable pure states in F.

Proof. By corollary 23 we have that aF can be written as (aF|=idi(ai| where {ai}i=1dA is a perfectly distinguishing test. Moreover, since aF is an effect, we must have di0 for all i=1,,dA. Now, by definition we have (aF|ωF)=0, which implies di(ai|ωF)=0 for every i=1,,dA, that is, (ai|ωF)=0 whenever di0. Let us focus on the values of i for which di0. Let ϕi be the pure state such that (ai|ϕi)=1. The condition (ai|ωF)=0 implies that ϕi is perfectly distinguishable from ωF. Therefore, ϕi belongs to (F), which is F. Since by definition we must have (aF|ϕi)=1, this also implies that di=1. In summary, we proved that aF=iai where the prime means that the sum is restricted to those values of i such that ϕiF. The condition aF=ωFe also implies that the number of terms in the sum must be exactly |F|. The thesis is then proved by suitably relabelling the effects {ai}i=1dA, in such a way that ϕi belongs to F for every i=1,,|F|.

Lemma 48. The effect aF associated to the face F is unique.

Proof. Suppose that aF=i=1|F|ai and aF=i=1|F|ai are two effects associated to the face F, both written as in lemma 47. Let {ϕi}i=1|F| (or {ϕi}i=1|F|) be the maximal set of perfectly distinguishable pure states in F such that (ai|ϕi)=1 for every i=1,,|F| [or (ai|ϕi)=1 for every i=1,,|F|], and let {ψj}j=1|F| be a maximal set of perfectly distinguishable pure states in F. Since ωF and ωF are perfectly distinguishable, the states {ϕi}i=1|F|{ψj}j=1|F| (or {ϕi}i=1|F|{ψj}j=1|F|) are perfectly distinguishable (lemma 23). Moreover, the set is maximal since |F|+|F|=dA. Let bj be the atomic effect such that (bj|ψj)=1. Then, the test that distinguishes the states {ϕi}i=1|F|{ψj}j=1|F| (or {ϕi}i=1|F|{ψj}j=1|F|) is given by {ai}i=1|F|{bj}j=1|F| (or {ai}i=1|F|{bj}j=1|F|) and its normalization reads

e=i=1|F|ai+j=1|F|bj=aF+j=1|F|bj,e=i=1|F|ai+j=1|F|bj=aF+j=1|F|bj.

By comparison we obtain aF=aF.

B. Projections

We are now in position to define the projection on a face.

Definition 12 (Projection). Let F be a face of St1(A). A projection on the face F is an atomic transformation ΠF such that

  • (1) ΠF=ωFIA,

  • (2) ΠF=ωF0.

When F is the face identified by a pure state ϕSt1(A), we have F={ϕ} and call Π{ϕ} a projection on the pure state ϕ.

The first condition in definition 12 means that the projection ΠF does not disturb the states in the face F. The second condition means that ΠF annihilates all states in the orthogonal face F. As a notation, we will indicate with ΠF the projection on the face F, that is, we will use the definition ΠF:=ΠF.

An equivalent condition for ΠF to be a projection on the face F is the following.

Lemma 49. Let {ϕi}i=1dA be a maximal set of perfectly distinguishable pure states for system A. The transformation ΠF in Transf(A) is a projection on the face generated by the subset {ϕi}i=1|F| if and only if

  • (1) ΠF=ωFIA,

  • (2) ΠF|ϕl)=0 for all l>|F|.

Proof. The condition is clearly necessary, since by definition 12 ΠF|ϕl)=0 for l>|F|. On the other hand, if ΠF|ϕl)=0 for l>|F| then by definition of ωF we have ΠF|ωF)=0 and, therefore, ΠF=ωF0.

A result that will be useful later.

Lemma 50. The transformation ΠFIB is a projection on the face F̃ identified by the state ωFχB.

Proof. ΠFIB is atomic, being the product of two atomic transformations. We now show that ΠFIB=ωFχBIAIB: Indeed, by the local tomography axiom it is easy to see that every state σFωFχB can be written as |σ)=i=1rj=1dBσij|αi)|βj), where {αi}i=1r is a basis for Span(F) and {βj}j=1dB is a basis for St1(B). Since ΠF=ωFIA, we have

|σ)=(ΠFIB)σ=i=1rj=1dBσijΠF|αi)|βj)=i=1rj=1dBσij|αi)|βj)=σ,

which implies ΠFIB=ωFχBIAIB. Finally, note that ωF̃=ωFχB, while ωF̃=ωFχB. Since we have (ΠFIB)|ωF̃)=ΠF|ωF)|χB)=0, we can conclude ΠFIB=ωF̃0. Hence ΠFIB is a projection on F̃.

In the following we will show that for every face F there exists a unique projection ΠF and we will prove several properties of projections. Let us start from an elementary observation.

Lemma 51. Let ϕ be a pure state in the face FSt1(A) and let aEff(A) be the atomic effect such that (a|ϕ)=1. If ATransf(A) is an atomic transformation such that A=ωFIA, then (a|A=(a|. Moreover, if aF is the effect associated to the face F, then we have (aF|A=(aF|.

Proof. By lemma 16, the effect (a|A is atomic. Now, since A|ϕ)=|ϕ), we have (a|A|ϕ)=(a|ϕ)=1. However, by theorem 8 (a| is the unique atomic effect such that (a|ϕ)=1. Hence, (a|A=(a|. Moreover, writing aF as aF=i=1|F|ai with (ai|ϕi)=1, ϕiF (lemma 47), we obtain (aF|A=i=1|F|(ai|A=i=1|F|(ai|=(aF|.

When applied to the case of projections, the above lemma gives the following.

Corollary 34. Let ϕ be a pure state in the face FSt1(A) and let aEff(A) be the atomic effect such that (a|ϕ)=1. Then we have (a|ΠF=(a|. Moreover, if aF is the effect associated to the face F, then we have (aF|=(aF|ΠF.

The counterpart of corollary 34 is given as follows.

Lemma 52. Let ψ be a pure state in the face F and let b be the atomic effect such that (b|ψ)=1. Then, we have (b|ΠF=0. Moreover, if aF is the effect associated to the face F, then we have (aF|ΠF=0.

Proof. By lemma 16, the effect (b|ΠF is atomic. Hence (b|ΠF must be proportional to an atomic effect b with b=1, for some proportionality constant λ[0,1], that is (b|ΠF=λ(b|. We want to prove that λ is zero. By contradiction, suppose that λ0. Let ψ be the pure state such that (b|ψ)=1. Now, since ΠF|ωF)=0, we have 0=(b|ΠF|ωF)=λ(b|ωF), which implies (b|ωF)=0. Hence, ψ is perfectly distinguishable from ωF, which in turn implies that ψ belongs to (F)=F. We then have λ=(b|ΠF|ψ)=(b|ψ)=0 (the last equality follows from the fact that ψ and ψ belong to F and F, respectively, and hence are perfectly distinguishable). This is in contradiction with the assumption λ0, thus concluding the proof that (b|ΠF=0. Moreover, writing aF as aF=i=1|F|bi with (bi|ψi)=1, ψiF, we obtain (aF|ΠF=i=1|F|(bi|ΠF=0.

Combining corollary 34 and lemma 52 we obtain an important property of projections, expressed by the following.

Corollary 35. If ΠF is a projection on the face F, then one has (eA|ΠF=(aF|.

Proof. The thesis follows from corollary 34 and lemma 52 and from the fact that aF+aF=e.

In the following we will see that for every face F there exists a unique projection. To prove that, let us start from the existence.

Lemma 53 (Existence of projections). For every face F of St1(A) there exists a projection ΠF.

Proof. By lemma 18, there exists a system B and an atomic transformation ATransf(A,B) with (e|BA=(aF|. Then, if ΨωFSt(AC) is a purification of ωF, we can define the state |Σ)BC:=(AIC)|ΨωF)AC. By lemma 16 Σ is a pure state. Moreover, the pure states Σ and ΨωF have the same marginal on system C: indeed, we have (eB||Σ)=[(eB|A]|ΨωF)=(aF||ΨωF) and, by definition, aF=ωFeA, which by theorem 1 implies (aF||ΨωF)=(eA||ΨωF). If ϕ0 and ψ0 are two arbitrary pure states of A and B, respectively, the uniqueness of purification stated by postulate 1 implies that there exists a reversible channel UGAB such that

/pra/graphics/10.1103/PhysRevA.84.012311/e012311_40.eps/thumbnail
(30)

Now, take the atomic effect bEff(B) such that (b|ψ0)=1, and define the transformation ΠFTransf(A) as

Applying b on both sides of Eq. (30) we then obtain

(ΠFIC)|ΨωF)=|ΨωF)

and, therefore, ΠF=ωFIA. Moreover, the transformation ΠF is atomic, being the composition of atomic transformations (lemma 16). Finally, we have ΠF=ωF0: indeed, by construction of ΠF we have

eAΠFρ=eAbU(AIA)ρϕ0eAeBU(AIA)ρϕ0=eAAρ=aFρ.

This implies (eA|ΠF|ωF)=(aF|ωF)0 and, therefore, ΠF=ωF0. In conclusion, ΠF is the desired projection.

To prove the uniqueness of the projection ΠF we need two auxiliary lemmas, given in the following.

Lemma 54. Let ΦSt1(AÃ) be a purification of the invariant state χA, and let ΠFTransf(A) be a projection on the face FSt1(A). Then, the pure state ΦFSt1(AÃ) defined by

|ΦF):=dA|F|(ΠFIÃ)|Φ)
(31)

is a purification of ωF.

Proof. The state ΦF is pure by lemma 16. Let us choose a maximal set of perfectly distinguishable pure states {ϕi}i=1dA such that {ϕi}i=1|F| is maximal in F. Now, we have

(eÃ||ΦF)AÃ=dA|F|ΠF(eÃ||Φ)AÃ=dA|F|ΠF|χA)

having used the relation (eÃ||Φ)AÃ=|χA) (corollary 16). We then obtain

(eÃ||ΦF)AÃ=dA|F|ΠF|χA)=1|F|i=1dAΠFϕi=1|F|i=1|F||φi)=|ωF)

having used that χA=i=1dAϕi/dA (theorem 9), and the definition of ΠF.

Lemma 55. Let ΠFTransf(A) be a projection. A transformation CTransf(A) satisfies C=ωFIA if and only if

CΠF=ΠF.
(32)

Proof. Let ΦF be the purification of ωF defined in lemma 54. Since C=ωFIA, we have (CI)|ΦF)=|ΦF). In other words, we have (CΠFI)|Φ)=(ΠFI)|Φ). Since Φ is dynamically faithful, this implies that CΠF=ΠF. Conversely, Eq. (32) implies that for σFωF, C|σ)=CΠF|σ)=ΠF|σ)=|σ), namely C=ωFIA.

Theorem 14 (Uniqueness of projections). The projection ΠF satisfying definition 12 is unique.

Proof. Let ΠF and ΠF be two projections on the same face F, and define the pure states ΦF and ΦF as in lemma 54. Now, ΦF and ΦF are both purifications of the same state ω̃FÃ : indeed, one has

eAΦFAÃ=dA|F|[eAΠF]ΦAÃ=dA|F|eFΦAÃ=dA|F|[eAΠF]ΦAÃ=(eA||ΦF)AÃ

having used the relation (eA|ΠF=(aF|=(eA|ΠF, which comes from corollary 34 and from the uniqueness of the effect aF (lemma 48). By the uniqueness of purification, we have |ΦF)=(UIÃ)|ΦF) for some reversible transformation UGA. This implies (ΠFIÃ)|Φ)=(UΠFIÃ)|Φ), and, since Φ is dynamically faithful, ΠF=UΠF. Since by definition 12 we have ΠF=ωFIA and ΠF=ωFIA, we can conclude that U=ωFIA. Finally, using lemma 55 with C=U we obtain ΠF=UΠF=ΠF.

We now show a few simple properties of projections. In the following, given a maximal set of perfectly distinguishable pure states {ϕi}i=1dA and any subset V{1,,dA} we define (with a slight abuse of notation) ωV:=iVϕi/|V|, and ΠV as the projection on the face FV:=FωV. We will refer to FV as the face generated by V.

Lemma 56. For two arbitrary subsets V,W{1,,dA} one has

ΠVΠW=ΠVW.

In particular, if VW= one has ΠVΠW=0.

Proof. First of all, ΠVΠW is atomic, being the product of two atomic transformations. Moreover, since the face FVW is contained in the faces FV and FW, we have ΠVΠW|ρ)=ΠV|ρ)=|ρ) for every ρFVW. In other words, ΠVΠW=ωVWIA. Moreover, if lVW we have ΠVΠW|ϕl)=0. By lemma 49 and and by the uniqueness of projections (theorem 14) we then obtain that ΠVΠW is the projection on the face generated by VW.

Corollary 36 (Idempotence). Every projection ΠF satisfies the identity ΠF2=ΠF.

Proof. Consider a maximal set of perfectly distinguishable pure states {ϕi}i=1dA such that {ϕi}iV is maximal in F. In this way F is the face generated by V, and, therefore ΠF=ΠV. The thesis follows by taking V=W in lemma 56.

Corollary 37. For every state ρSt1(A) such that ρF, the normalized state ρ defined by

ρ=ΠFρeΠFρ
(33)

belongs to the face F.

Proof. By corollary 35, we have (e|ΠF=(aF|. Since ρF, we must have (e|ΠF|ρ)=(aF|ρ)>0, and, therefore, the state ρ in Eq. (33) is well defined. Moreover, using the definition of ρ we obtain

aFρ=aFΠFρeΠFρ=1

having used corollaries 34 and 35 for the last equality. Finally, lemma 46 implies that ρ belongs to the face F.

Corollary 38. Let Π{ϕ} be the projection on the pure state ϕSt1(A) and a be the atomic effect such that (a|ϕ)=1. Then for every state ρSt1(A) one has Π{ϕ}|ρ)=p|ϕ) where p=(a|ρ).

Proof. Recall that, by corollary 35, we have (a|=(e|Π{ϕ}. If (a|ρ)=0 then clearly Π{ϕ}|ρ)=0. Otherwise, the proof is a straightforward application of corollary 37.

We conclude the present subsection with a result that will be useful in the next subsection.

Lemma 57. An atomic transformation ATransf(A) satisfies A=ωFIA if and only if

ΠFA=ΠF.
(34)

Proof. Suppose that A=ωFIA. Let ΦSt1(AÃ) be a purification of the invariant state χA and define the two pure states

|ΦF):=dA|F|(ΠFIÃ)|Φ),|ΦF)=:dA|F|(ΠFAIÃ)|Φ).

Then we have

(eA||ΦF)=[(aF|A]|Φ)=(aF||Φ)=(eA||ΦF)

having used the condition (aF|A=(aF| (lemma 51). Now we proved that ΦF and ΦF have the same marginal on system Ã. By the uniqueness of purification, there exists a reversible transformation VGA such that |ΦF)=(VIÃ)|ΦF). Since Φ is dynamically faithful, this implies ΠFA=VΠF.

Now, for every ρ in F one has V|ρ)=VΠF|ρ)=ΠFA|ρ)=|ρ), namely V=ωFIA. Applying lemma 55 with C=VΠF and using the idempotence of projections we then obtain

ΠFA=VΠF=(VΠF)ΠF=ΠFΠF=ΠF.

Conversely, suppose that Eq. (34) is satisfied. Let ϕF be a pure state in F and a be the atomic effect such that (a|ϕ)=1. Then, we have

aAϕ=(a|ΠFA|ϕ)=(a|ΠF|ϕ)=aϕ=1

having used the relation (a|ΠF=(a| (corollary 34). Then, by theorem 7 Aϕ=ϕ. Since ϕF is arbitrary this implies A=ωFIA.

C. Projection of a pure state on two orthogonal faces

In Sec. X we proved a number of results concerning two-dimensional systems. Some properties of two-dimensional systems will be extended to the case of generic systems using the following lemma.

Lemma 58. Consider a pure state ϕSt1(A) and two complementary projections ΠF and ΠF. Then ϕ belongs to the face identified by the state |θ):=(ΠF+ΠF)|ϕ).

Proof. If ΠF|ϕ)=0 (or ΠF|ϕ)=0), then there is nothing to prove: this means that ΠF|ϕ)=|ϕ) (or ΠF|ϕ)=|ϕ)) and the thesis is trivially true. Suppose now that ΠF|ϕ)0 and ΠF|ϕ)0. Using the notation Π1:=ΠF, Π2:=ΠF, we can define the two pure states |ϕi):=Πi|ϕ)/(e|Πi|ϕ), i=1,2, and the probabilities pi=(e|Πi|ϕ). In this way we have Πi|ϕ)=pi|ϕi) for i=1,2 and θ=p1ϕ1+p2ϕ2. Taking the atomic effect (ai| such that (ai|ϕi)=1 we have aFθ=a1+a2, where aFθ is the effect associated to the face Fθ. Recalling that (ai|Πi=(ai| for i=1,2 (corollary 34), we then conclude the following:

(aFθ|ϕ)=[(a1|+(a2|]|ϕ)=(a1|Π1|ϕ)+(a2|Π2|ϕ)=i=1,2piaiϕi=1.

Finally, lemma 46 yields ϕFθ.

A consequence of lemma 58 is the following.

Lemma 59. Let ϕSt1(A) be a pure state, aEff(A) be the unique atomic effect such that (a|ϕ)=1, and F be a face in St1(A). If ρ is perfectly distinguishable from ΠF|ϕ) and from ΠF|ϕ) then ρ is perfectly distinguishable from |ϕ). In particular, one has (a|ρ)=0.

Proof. Since ρ is perfectly distinguishable from ΠF|ϕ) and ΠF|ϕ), it is also perfectly distinguishable from any convex combination of them (corollary 33). Equivalently, ρ is perfectly distinguishable from the face Fθ identified by |θ):=ΠF|ϕ)+ΠF|ϕ). In particular, it must be perfectly distinguishable from ϕ, which belongs to Fθ by virtue of lemma 58. If a is the atomic effect such that (a|ϕ)=1, then by lemma 36 we have (a|ρ)=0.

A technical result that will be useful in the following.

Lemma 60. Let ϕSt1(A) be a pure state such that ΠF|ϕ)0 and ΠF|ϕ)0. Define the pure states |ϕ1):=ΠF|ϕ)/(e|ΠF|ϕ) and |ϕ2):=ΠF|ϕ)/(e|ΠF|ϕ) and the mixed state |θ):=(ΠF+ΠF)|ϕ). Then, we have

ΠFΠFθ=Π{ϕ1},ΠFΠFθ=Π{ϕ2}.

Proof. Let {ψi}i=1|F| be a maximal set of perfectly distinguishable pure states in F, chosen in such a way that ψ1=ϕ1, and let {ψi}i=|F|+1dA be a maximal set of perfectly distinguishable pure states in F, chosen in such a way that ψ|F|+1=ϕ2. Defining the sets V:={1,,|F|}, W:={|F|+1,,dA}, and U:={1,|F|+1} we then have ΠV=ΠF, ΠW=ΠF, and ΠU=ΠFθ. Using lemma 56 we obtain

ΠFΠθ=ΠVΠU=ΠVU=Π{ψ1}=Π{ϕ1}

and

ΠFΠθ=ΠWΠU=ΠWU=Π{ψ|F|+1}=Π{ϕ2}

We conclude this subsection with an important observation about the group of reversible transformations that act as the identity on two orthogonal faces F and F. If F is a face of St1(A), let us define GF,F as the group of all reversible transformations UGA such that

U=ωFIA,U=ωFIA.

Then we have the following.

Theorem 15. For every face FSt1(A) such that F{0} and FSt1(A), the group GF,F is topologically equivalent to a circle.

Proof. Let U be a transformation in GF,F, ΦSt(AÃ) be a purification of the invariant state χA and |ΦU):=(UIÃ)|Φ) be the Choi state of U. Define the orthogonal faces F̃:=FωFχà and F̃=FωF̃χÃ, and the projections ΠF̃:=ΠFIà and ΠF̃:=ΠFIà (see lemma 50). Using lemma 57 we then obtain

ΠF̃ΦU=(ΠFIÃ)ΦU=(ΠFUIÃ)Φ=(ΠFIÃ)Φ=|F|dAΦF

and, similarly,

ΠF̃ΦU=(ΠFIÃ)ΦU=(ΠFUIÃ)Φ=(ΠFIÃ)Φ=|F|dAΦF.

This means that the projections of ΦU on the faces F̃ and F̃ are independent of U. Also, it means that ΦU belongs to the face Fθ identified by the state |θ):=|F|dA|ΦF)+|F|dA|ΦF) (lemma 58). Now, by the compression axiom, Fθ is isomorphic to the state space of a qubit, say with ΦF and ΦF indicating the north and south poles of the Bloch sphere, respectively, and we know that all the Choi states {ΦU}UGF,F are at the same latitude [precisely, the latitude is the angle ζ given by cosζ=(|F||F|)/dA]. This implies that the states {ΦU}UGF,F are a subset of a circle Cζ in the Bloch sphere describing the face Fθ. Precisely, the circle Cζ is given by

Cζ:={ΨFθ|Π{ΦF}Ψ=|F|dAΦF,Π{ΦF}Ψ=|F|dAΦF}.

We now prove that in fact they are the whole circle. Let Ψ be a state in Cζ. Since |Ψ) belongs to the face Fθ, we obtain

(ΠFIÃ)Ψ=ΠF̃Ψ=ΠF̃ΠFθΨ=Π{ΦF}Ψ=|F|dAΦF

(the third equality comes from lemma 60 with the substitutions FF̃, ϕΨ, ϕ1ΦF, and ϕ2ΦF) and, similarly,

(ΠFIÃ)Ψ=ΠF̃Ψ=ΠF̃ΠFθΨ=Π{ΦF}Ψ=|F|dAΦF.

Therefore, we have

eAΨ=[(aF|+(aF|]Ψ=[eAΠFIÃ]Ψ+[eAΠFIÃ]Ψ=|F|dAeAΦF+|F|dAeAΦF=[eAΠFIÃ]Φ+[eAΠFIÃ]Φ=[aF+(aF|]Φ=eAΦ=|χÃ).

Since Ψ and Φ are both purifications of the invariant state χÃ, by the uniqueness of purification there must be a reversible transformation UGA such that |Ψ)=(UIÃ)|Φ). Finally, it is easy to check that ΠFU=ΠF and ΠFU=ΠF, which, by lemma 57 implies U=ωFIA and U=ωFIA. This proves that the Choi states {ΦU}UGF,F are the whole circle Cζ. Since the Choi isomorphism is continuous in the operational norm (see theorem 14 of [22]), the group GF,F is topologically equivalent to a circle.

XII. THE SUPERPOSITION PRINCIPLE

The validity of the superposition principle, proved for two-dimensional systems using the geometry of the Bloch sphere (corollary 31), can be now extended to arbitrary systems thanks to lemma 58.

Theorem 16 (Superposition principle for general systems). Let {ϕi}i=1dASt1(A) be a maximal set of perfectly distinguishable pure states and {ai}i=1dA be the observation test such that (ai|ϕj)=δij. Then, for every choice of probabilities {pi}i=1dA, pi0,i=1dApi=1 there exists at least one pure state ϕpSt1(A) such that

pi=aiϕpi=1,,dA
(35)

or, equivalently,

Π{ϕi}|ϕp)=piϕii=1,,dA,
(36)

where Π{ϕi} is the projection on ϕi.

Proof. Let us first prove the equivalence between Eqs. (35) and (36). From Eq. (36) we obtain Eq. (35) using the relation (e|Π{i}=(ai|, which follows from corollary 35. Conversely, from Eq. (35) we obtain Eq. (36) using corollary 38. Now, we will prove Eq. (35) by induction. The statement for N=2 is proved by corollary 31. Assume that the statement holds for every system B of dimension dB=N and suppose that dA=N+1. Let F be the face identified by ωF=1/Ni=1Nϕi and F be the orthogonal face, identified by the state ϕN+1. Now there are two cases: either pN+1=1 or pN+11. If pN+1=1, then there is nothing to prove: the desired state is ϕN+1. Then, suppose that pN+11. Using the induction hypothesis and the compression axiom 3 we can find a state ψqF such that (ai|ψq)=qi, with qi=pi/(1pN+1), i=1,,N. Let us then define a new maximal set of perfectly distinguishable pure states {ϕi}i=1N+1, with ϕ1=ψq and ϕN+1=ϕN+1. Note that one has ωF=1/Ni=1Nϕi, that is, F is the face generated by the states {ϕi}i=1N. Now consider the two-dimensional face F identified by θ=1/2(ϕ1+ϕN+1). By corollary 31 (superposition principle for qubits) we know that there exists a pure state ϕF with (a1|ϕ)=1pN+1 and (aN+1|ϕ)=pN+1. Let us define V:={1,,N} and W:={1,N+1}. Then, we have ΠF=ΠV and ΠF=ΠW, and by lemma 56,

ΠF|ϕ)=ΠFΠF|ϕ)=ΠVW|ϕ)=Π{ϕ1}|ϕ)=Π{ψq}|ϕ)=(1pN+1)|ψq)

having used corollary 38 for the last equality. Finally, for i=1,,N we have

(ai|ϕ)=(ai|ΠF|ϕ)=(1pN+1)(ai|ψq)=(1pN+1)qi=pi.

On the other hand we have (aN+1|ϕ)=(aN+1|ϕ)=pN+1.

A. Completeness for purification

Using the superposition principle and the spectral decomposition of theorem 10 we can now show that every state of system A has a purification in AB provided dBdA:

Lemma 61. For every state ρSt1(A) and for every system B with dBdA there exists a purification of ρ in St1(AB).

Proof. Take the spectral decomposition of ρ, given by ρ=i=1dApiϕi, where {pi} are probabilities and {ϕi}i=1dASt1(A) is a maximal set of perfectly distinguishable pure states. Let {ψi}i=1dB be a maximal set of perfectly distinguishable pure states and {ai}i=1dAEff(A) [or {bi}i=1dBEff(B)] be the test such that (ai|ϕj)=δij [or (bi|ψj)=δij]. Clearly {ϕiψj} is a maximal set of perfectly distinguishable pure states for AB. Then, by the superposition principle (theorem 16) there exists a pure state Ψρ such that (aibj|Ψρ)=piδij. Equivalently, we have (bi|B|Ψρ)AB=pi|ϕi)A for every i=1,,dA and (bi|B|Ψρ)AB=0 for i>dA. Summing over i we then obtain (e|B|Ψρ)AB=i=1dB(bi|B|Ψρ)AB=i=1dApi|ϕi)A=|ρ)A.

In the terminology of Ref. [22], lemma 61 states that a system B with dBdA is complete for the purification of system A.

As a consequence of lemma 61 we have the following.

Corollary 39. Every system B with dB=dA is operationally equivalent to the conjugate system Ã.

Proof. By corollary 61, the invariant state χASt1(A) has a purification Ψ in St1(AB). By corollary 18, the marginal of Ψ on B is the invariant state χB. By definition, this means that B is a conjugate system of A. Since the conjugate system à is unique up to operational equivalence (corollary 16), this implies the thesis.

B. Equivalence of systems with equal dimension

We are now in position to prove that two systems A and B with the same dimension are operationally equivalent, namely that there is a reversible transformation from A to B. In other words, we prove that the informational dimension classifies the systems of our theory up to operational equivalence. The fact that this property is derived from the principles, rather than being assumed from the start, is one of the important differences of our work with respect to Refs. [16–18]. Another difference is that here the equivalence of systems with the same dimension is proved after the derivation of the qubit, whereas in Refs. [16–18] the derivation of the qubit requires the equivalence of systems with the same dimension.

Corollary 40 (Operational equivalence of systems with equal dimension). Every two systems A an B with dA=dB are operationally equivalent.

Proof. By corollary 39, A and B are both operationally equivalent to the conjugate system Ã. Hence they are operationally equivalent to each other.

C. Reversible operations of perfectly distinguishable pure states

An important consequence of the superposition principle is the possibility of transforming an arbitrary maximal set of perfectly distinguishable pure states into another via a reversible transformation:

Corollary 41. Let A and B be two systems with dA=dB=:d and let {ϕi}i=1d (or {ψi}i=1d) be a maximal set of perfectly distinguishable pure states in A (or B). Then, there exists a reversible transformation UTransf(A,B) such that U|ϕi)=|ψi).

Proof. Let ΦSt(AÃ) be a purification of the invariant state χA. Although we know that A and à are operationally equivalent (corollary 39) we use the notation A and à to distinguish between the two subsystems of AÃ. Define the pure state ϕ̃i via the relation (ai|A|Φ)AÃ=1d|ϕ̃i)Ã, where {ai}i=1d is the observation test such that (ai|ϕi)=δij. Let {ãi}i=1d be the observation test such that (ãi|ϕ̃j)=δij. Then, by lemma 30 we have

ãiÃΦAÃ=1dϕiA.
(37)

On the other hand, if {bi}i=1d is the observation test such that (bi|ψj)=δij, then using the superposition principle (theorem 16) we can construct a state ΨSt1(BÃ) such that (biãj|Ψ)=δij/d, or, equivalently,

ãiÃΨBÃ=1dψiB.
(38)

Now, Φ and Ψ have the same marginal on system Ã: they are both purifications of the invariant state χÃ. Moreover, A and B are operationally equivalent because they have the same dimension (corollary 40). Hence, by the uniqueness of purification, there must be a reversible transformation UTransf(A,B) such that

ΨBÃ=(UIÃ)ΦAÃ.
(39)

Combining Eqs. (37), (38), and (39) we finally obtain

1dU|ϕi)A=[U(ãi|Ã]|Φ)BÃ=ãiÃΨBÃ=1dψiB,

that is, U|ϕi)=|ψi) for every i=1,,d.

XIII. DERIVATION OF THE DENSITY MATRIX FORMALISM

The goal of this section is to show that our set of axioms implies that

  • (1) the set of states for a system A of dimension dA is the set of density matrices on the Hilbert space CdA,

  • (2) the set of effects is the set of positive matrices bounded by the identity, and

  • (3) the pairing between a state and an effect is given by the trace of the product of the corresponding matrices.

Using the result of theorem 3, we will then obtain that all the physical transformations in our theory are exactly the physical transformations allowed in quantum mechanics. This will conclude our derivation of quantum theory.

A. The basis

In order to specify the correspondence between states and matrices we choose a particular basis for the vector space StR(A). For this purpose, we adopt the choice of basis used in Ref. [16]. The basis is constructed as follows: Let us first choose a maximal set of dA perfectly distinguishable states {ϕm}m=1dA, and declare that they are the first dA basis vectors. Then, for every m<n the face Fmn generated by {ϕm,ϕn} defines a “two-dimensional subsystem”: precisely, the face Fmn:=Fωmn with ωmn:=ϕm+ϕn2 can be ideally encoded in a two-dimensional system. Now, the convex set of states of a two-dimensional system is the Bloch sphere, and we can choose the z axis to be the line joining the two states {ϕm,ϕn}, for example, with the positive direction of the z axis being the direction from ϕm to ϕn. Once the direction of the z axis has been specified, we can choose the x and y axes. Note that any couple of orthogonal directions in the plane orthogonal to z axis is a valid choice for the x and y axes (here we do not restrict ourselves to the choice of a right-handed coordinate system). At the moment there is no relation among the different choices of axes made for different values of m and n. However, to prove that the states are represented by positive matrices, later we will have to find a suitable way of connecting all these choices of axes.

Let ϕx,+mn,ϕx,mnFmn ( ϕy,+mn,ϕy,mnFmn) be the two perfectly distinguishable states in the direction of the x axis ( y axis) and define

σkmn:=ϕk,+mnϕk,mn,k=x,y.
(40)

An immediate observation is the following.

Lemma 62. The four vectors {ϕm,ϕn,σxmn,σymn}StR(A) are linearly independent.

Proof. Linear independence is evident from the geometry of the Bloch sphere.

We now show that the collection of all vectors obtained in this way is a basis for StR(A). To this purpose we use the following.

Lemma 63. Let V{1,,dA}, and consider the projection ΠV. Then, for mV and nV, one has ΠV|σkmn)=0 for k=x,y.

Proof. Using lemma 56 and corollary 38 we obtain

ΠV|ϕk,±mn)=ΠVΠ{m,n}|ϕk,±mn)=Π{m}|ϕk,±mn)=|ϕm)am|ϕk,±mn.

Since the face Fmn is isomorphic to the Bloch sphere and the state since ϕk±mn, k=x,y lie on the equator of the Bloch sphere, we know that (am|ϕk±mn)=12. This implies

ΠV|σkmn)=ΠV|ϕk,+mn)|ϕk,mn)=ϕm1212=0.

Lemma 64. The vectors {ϕn}m=1dA{σkmn}n>m=1,,dAk=x,y form a basis for StR(A).

Proof. Since the number of vectors is exactly dA2, to prove that they form a basis it is enough to show that they are linearly independent. Suppose that there exists a vector of coefficients {cm}{ckmn} such that

mcmϕm+n>m,k=x,yckmnσkmn=0.

Applying the projection Π{m,n} on both sides and using lemma 63 we obtain

cm|ϕm)+cn|ϕn)+cmnx|σxmn)+cymn|σymn)=0.

However, from lemma 62 we know that the vectors {ϕm,ϕn,σxmn,σymn} are linearly independent. Consequently, cm=cn=cmnk=0 for all m,n,k.

B. The matrices

Since the state space St(A) for system A spans a real vector space of dimension DA=dA2, we can decide to represent the vectors {ϕm}m=1dA{σkmn}n>m=1,,dAk=x,y as Hermitian dA×dA matrices. Precisely, we associate the vector ϕm to the matrix Sϕm defined by

Sϕmrs=δrmδsm,
(41)

the vector σxmn to the matrix

Sσxmnrs=δrmδsn+δrnδsm
(42)

and the vector σymn to the matrix

Sσymnrs=iλδrmδsnδrnδsm,
(43)

where λ can take the values +1 or 1. The freedom in the choice of λ will be useful in Sec. XIII C, where we will introduce the representation of composite systems of two qubits. However, this choice of sign plays no role in the present subsection, and for simplicity we will take the positive sign.

Recall that in principle any orthogonal direction in the plane orthogonal to the z axis can be chosen to be the x axis. In general, the other possible choices for the x axis will lead to matrices of the form

Sσx,θmnrs=δrmδsneiθ+δrnδsmeiθ,θ[0,2π),
(44)

and the corresponding choice for the y axis will lead to a matrices of the form

Sσy,θmnrs=iλ(δrmδsneiθδrnδsmeiθ),θ[0,2π).
(45)

Since the vectors {ϕm}m=1dA{σkmn}n>m=1,,dA;k=x,y are a basis for the real vector space StR(A), we can expand any state ρSt(A) on them:

ρ=mρmϕm+n>m,k=x,yρkmnσkmn
(46)

and the expansion coefficients {ρm}m=1dA{ρkmn}n>m=1,,dA;k=x,y are all real. Hence each state ρ is in one-to-one correspondence with a Hermitian matrix, given by

Sρ=mρmSϕm+n>m,k=x,yρkmnSσkmn.
(47)

Since effects are linear functionals on states, they are also represented by Hermitian matrices. We will indicate with Ea the Hermitian matrix associated to the effect aEff(A). The matrix Ea is uniquely defined by the relation

aρ=Tr[EaSρ].

In the rest of the section we show that the set of matrices {Sρ|ρSt1(A)} is the whole set of positive Hermitian matrices with unit trace and that the set of matrices {Ea|aEff(A)} is the set of positive Hermitian matrices bounded by the identity.

Let us start from some simple facts:

Lemma 65. The invariant state χA has matrix representation SχA=IdAdA, where IdA is the identity matrix in dimension dA.

Proof. Obvious from the expression χA=1dmϕm and from the matrix representation of the states {ϕm}m=1dA in Eq. (41).

Lemma 66. Let amEff(A) be the atomic effect such that (am|ϕm)=1. Then, the effect am has matrix representation Eam such that Eam=Sϕm.

Proof. Let ρSt1(A) be an arbitrary state. Expanding ρ as in Eq. (46) and using lemma 62 we obtain (am|ρ)=ρm. On the other hand, by Eq. (47) we have that ρm is the mth diagonal element of the matrix Sρ: by definition of Sϕm [Eq. (41)], this implies ρm=Tr[SϕmSρ]. Now, by construction we have Tr[EamSρ]=(am|ρ)=ρm=Tr[SϕmSρ] for every ρSt1(A). Hence Eam=Sϕm.

Lemma 67. The deterministic effect eEff(A) has matrix representation Ee=IdA.

Proof. Obvious from the expression e=mam, combined with lemma 66 and Eq. (41).

Corollary 42. For every state ρSt1(A) one has

Tr[Sρ]=1.

Proof. Tr[Sρ]=Tr[EeSρ]=(e|ρ)=1.

Theorem 17. The matrix elements of Sϕ for a pure state ϕSt1(A) are (Sϕ)mn=pmpneiθmn, with m=1dApm=1, θmn[0,2π), θmn=0 and θmn=θnm.

Proof. First of all, the diagonal elements of Sϕ are given by [Sϕ]mm=(am|ϕ) [cf. Eqs. (46) and (47)]. Denoting the mth element by pm, we clearly have m=1dApm=(e|ϕ)=1. Now, the projection Π{m,n}|ϕ) is a state in the face Fmn, and, by our choice of representation, the corresponding matrix SΠ{m,n}|ϕ) is proportional to a pure qubit state (nonnegative rank-one matrix). On the other hand, it is easy to see from Eqs. (46) and (47) that SΠ{m,n}|ϕ) is the matrix with the same elements as Sϕ in the block corresponding to the qubit (m,n) and 0 elsewhere. In order to be positive and rank-one the corresponding 2×2 submatrix must have the off-diagonal elements (Sϕ)mn=pmpneiθmn for some θmn[0,2π) with θnm=θmn. Repeating the same argument for all choices of indices m,n, the thesis follows.

Theorem 18. For a pure state ϕSt1(A), the corresponding atomic effect aϕ such that (aϕ|ϕ)=1 has a matrix representation Eϕ with the property that Eϕ=Sϕ.

Proof. We already know that the statement holds for dA=2, where we proved the Bloch sphere representation, equivalent to the fact that states and effects are represented as 2×2 positive complex matrices, with the set of pure states identified with the set of all rank-one projectors. Let us now consider a generic system A. For every m<n, the face Fmn generated by {ϕm,ϕn} can be encoded in a two-dimensional system. Therefore, the matrices SΠ{m,n}|ϕ) and E(a|Π{m,n} are positive [also, recall that all matrix elements outside the (m,n) block are zero]. Let ϕ(mn) be the pure state in the face Fmn that is perfectly distinguishable from Π{m,n}|ϕ). Note that, since ϕ(mn) belongs to the face Fmn, it is also perfectly distinguishable from Π{1,,dA}{m,n}|ϕ). Hence ϕ(mn) is perfectly distinguishable from ϕ and, in particular, (a|ϕ(mn))=0 (lemma 59). This implies the relation

TrEaΠ{m,n}S|ϕ(mn))=aΠ{m,n}|ϕ(mn))=(a|ϕ(mn))=0.

Now, since the matrix E(a|Π{m,n} is positive, the above relation implies E(a|Π{m,n}=cmnSΠ{m,n}|ϕ), where cmn0. Finally, repeating the argument for all possible values of (m,n), we obtain that cmn=c for every m,n, that is, Ea=cSϕ. Taking the trace on both sides we obtain Tr[Ea]=c. To prove that c=1, we use the relation Tr[Ea]/dA=(a|χA)=1/dA.

We conclude with a simple corollary that will be used in the next subsection.

Corollary 43. Let ϕSt1(A) be a pure state and let {γi}i=1rSt1(A) be a set of pure states. If the state ϕ can be written as

ϕ=ixiγi

for some real coefficients {xi}i=1r, then the atomic effect a such that (a|ϕ)=1 is given by

(a|=ixi(ci|,

where ci is the atomic effect such that (ci|γi)=1.

Proof. For every ρSt(A) by theorem 18 one has

aρ=Tr[EaSρ]=Tr[SϕSρ]=ixiTr[SγiSρ]=ixiTr[EciSρ]=ixiciρ,

thus implying the thesis.

C. Choice of axes for a two-qubit system

If A and B are two systems with dA=dB=2, then we can use two different types of matrix representations for the states of the composite system AB.

The first type of representation is the representation Sϕ introduced through lemma 64: here we will refer to it as the standard representation. Note that there are many different representations of this type because for every pair (m,n) there is freedom in choice of the x and y axis [cf. Eqs. (44) and (45)].

The second type of representation is the tensor product representation Tϕ, defined by the tensor product of matrices representing states of systems A and B: for a state |ρ)=i,jρij|αi)|βj), with αiSt(A),βjSt(B), we have

Tρ:=i,jρijSαiASβjB,
(48)

where SA (or SB) is the matrix representation for system A (or B). Here the freedom is in the choice of the axes for the Bloch spheres of qubits A and B. Since A and B are operationally equivalent, we will indicate the elements of the bases for StR(A) and StR(B) with the same letters: {ϕm}m=12 for the two perfectly distinguishable pure states and {σk}k=x,y for the remaining basis vectors.

We now show a few properties of the tensor representation. Let FA denote the matrix corresponding to the effect AEff(AB) in the tensor representation, that is, the matrix defined by

Aρ:=Tr[FATρ]ρSt(AB).
(49)

It is easy to show that the matrix representation for effects must satisfy the analog of Eq. (48).

Lemma 68. Let AEff(AB) be a bipartite effect, written as (A|=i,jAij(ai|(bj|. Then one has

FA=i,jAijEaiAEbjB,

where EaiA (or EbjB) is the matrix representing the single-qubit effect ai (or bj) in the standard representation for qubit A (or B).

Proof. For every bipartite state |ρ)=k,lρkl|αk)|βl) one has

Tr[FATρ]=Aρ=i,j,k,lAijρklaiαk(bj|βl=i,j,k,lAijρklTrEaiASαkATrEbjBSβlB=i,j,k,lAijρklTrEaiAEbjBTαkβl=i,jAijTrEaiAEbjBTρ

which implies the thesis.

Corollary 44. Let ΨSt1(AB) be a pure state and let AEff(AB) be the atomic effect such that (A|Ψ)=1. Then one has FA=TΨ.

Proof. Let {ai}i=14 (or {βj}j=14) be a set of pure states that span StR(A) [or StR(B)] and expand Ψ as |Ψ)=i,jcij|αi)|βj). Then, corollary 43 yields (A|=i,jcij(ai|(bj| where ai and bj are the atomic effects such that (ai|αi)=(bj|βj)=1. Therefore, we have

FA=i,jcijEaiAEbjB=i,jcijSαiASβjB=TΨ.

Corollary 45. For every bipartite state ρSt1(AB), dA=dB=2 one has Tr[Tρ]=1.

Proof. For each qubit we have

Ea1=1000,Ea2=0001.
(50)

Hence EeAA=EeBB=I, where I is the 2×2 identity matrix. By lemma 68 we then have FeAeB=II and, therefore, Tr[Tρ]=Tr[FeAeBTρ]=(eAeB|ρ)=1.

Finally, an immediate consequence of local distinguishability is the following.

Lemma 69. Suppose that UGA and VGB are two reversible transformations for qubits A and B, respectively, and that U,VSU(2) are such that

SUρA=USρAUρSt1(A),SVσB=VSσBVσSt1(B).

Then, we have T(UV)τ=(UV)Tτ(UV) for every τSt1(AB).

Proof. The thesis follows by linearity expanding τ as τ=i,j=14τijαiβj, where {αi}i=14 and {βj}j=14 are bases for the StR(A) and StR(B).

The rest of this subsection is aimed at showing that, with a suitable choice of matrix representation for system B, the standard representation coincides with the tensor representation, that is, Sρ=Tρ for every ρSt(AB). This technical result is important because some properties used in our derivation are easily proved in the standard representation, while the property expressed by lemma 69 is easily proved in the tensor representation: it is then essential to show that we can construct a representation that enjoys both properties.

The four states {ϕmϕn}m,n=12 are clearly a maximal set of perfectly distinguishable pure states in AB. In the following we will construct the standard representation starting from this set.

Lemma 70. For a composite system AB with dA=dB=2 one can choose the standard representation in such a way that the following equalities hold:

Sφmφn=Tφmφn,
(51)

Sφmσk=Tφmσk,k=x,y,
(52)

Sσkφm=Tσkφm,k=x,y.
(53)

Proof. Let us choose single-qubit representations SA and SB that satisfy Eqs. (41), (42), and (43). On the other hand, choosing the states {ϕnϕn} in lexicographic order as the four distinguishable states for the standard representation, we have

[Sϕ1ϕ1]rs=δ1rδ1s,[Sϕ1ϕ2]rs=δ2rδ2s,[Sϕ2ϕ1]rs=δ3rδ3s,[Sϕ2ϕ2]rs=δ4rδ4s.

With this choice we get Sϕmϕn=SϕmASϕnB=Tϕmϕn for every m,n=1,2. This proves Eq. (51). Let us now prove Eqs. (52) and (53). Consider the two-dimensional face F11,12, generated by the states ϕ1ϕ1 and ϕ1ϕ2. This face is the face identified by the state ω11,12:=ϕ1χB, and we have F11,12{ϕ1}St1(B). Therefore we can choose the vectors σk11,12, k=x,y to satisfy the relation σk11,12:=ϕ1σk, k=x,y. Now, in the standard representation we have

Sσx11,12rs=δr1δs2+δr2δs1,Sσy11,12rs=iλ(δr1δs2δr2δs1)

[cf. Eqs. (42) and (43)]. This implies Sσk11,12=Sϕ11ASσkB=Tϕ11σk for k=x,y. Repeating the same argument for the face F22,21, F11,21, and F21,22 we obtain the proof of Eqs. (52) and (53).

In order to prove that, with a suitable choice of axes, the standard representation coincides with the tensor representation—that is, Sρ=Tρ for every ρSt(AB)—it remains to find a choice of axes such that Sσkσl=Tσkσl, k=x,y. This will be proved in the following.

Lemma 71. Let ΦSt1(AB) be a pure state such that (a1a1|Φ)=(a2a2|Φ)=1/2 [such a state exists due to the superposition principle]. With a suitable choice of the matrix representation SB, the state Φ is represented by the matrix

TΦ=121001000000001001.
(54)

Moreover, one has

Φ=χAχB+14(σxσxσyσy+σzσz).
(55)

Proof. Let us start with the proof of Eq. (54). For every reversible transformation UGA, let U*GB be the conjugate of U, defined with respect to the state Φ. Since all 2×2 unitary (nontrivial) representations of SU(2) are unitarily equivalent, by a suitable choice of the standard representation SρB for system B, one has

SU*ρB=U*SρBUT,
(56)

where U* and UT are the complex conjugate and the transpose of the matrix USU(2) such that SUρA=USρAU. Due to Eq. (56) and to lemma 69, the isotropic state Φ must satisfy the condition (UU*)TΦ(UUT)=TΦ,USU(2). Now, the unitary representation {UU*} has two irreducible subspaces and the projectors on them are given by the matrices

P0=121001000000001001,P1=121001020000201001=IIP0,

where I is the 2×2 identity matrix. The most general form for TΦ is then the following:

TΦ=x0P0+x1P1=(x0x1)P0+x1II=α+β00β0α0000α0β00α+β

having defined α:=x1 and β:=(x0x1)/2. Now, by construction the state Φ satisfies the condition

amAΦAB=12ϕmB,m=1,2.

By definition of the tensor representation, the conditional states (am|A|Φ)AB are described by the diagonal blocks of the matrix TΦ:

Sa1ΦABB=α+β00α,Sa2ΦABB=α00α+β.
(57)

Since the states ϕ1 and ϕ2 are pure, the above matrices must be be rank-one. Moreover, their trace must be equal to (ameB|Φ)=1/2(eB|ϕm)=12, m=1,2. Then we have two possibilities. Either (i) α=0 and β=12 or (ii) α=β=12. In case (i) Eq. (54) holds. In case (ii) to prove Eq. (54) we need to change our choice of matrix representation for the qubit B. Precisely, we make the following change:

SσxBS̃σxB=SσxB,SσyBS̃σyB=SσyB,SσzBS̃σzB=SσzB,
(58)

where σz:=ϕ1ϕ2. Note that the inversion of the axes, sending σk to σk for every k=x,y,z is not an allowed physical transformation, but this is not a problem here, because Eq. (58) is just a new choice of matrix representation, in which the set of states of system B is still represented by the Bloch sphere.

More concisely, the change of matrix representation SBS̃B can be expressed as

SρBS̃ρB:=YSρBTY,Y:=0110.

Note that in the new representation S̃B the physical transformation U* is still represented as S̃UρB=U*S̃ρBUT: indeed we have

S̃U*ρB=YSU*ρBTY=YU*SρBUTTY=YUSρBTUY=(YUY)YSρBTY(YUY)=U*YSρBTYUT=U*S̃ρBUT

having used the relations YY=I and YUY=U* for every USU(2). Clearly the change of standard representation SS̃ for the qubit B induces a change of tensor representation TT̃, where T̃ is the tensor representation defined by T̃ρσ:=SρAS̃σB. With this change of representation, we have

T̃Φ=121001000000001001.

This concludes the proof of Eq. (54).

Let us now prove Eq. (55). Using the fact that by definition Tρτ=(SρASτB) one can directly verify the relation

TΦ=SχASχB+14SσxASσxBSσyASσyB+SσzASσzB.

This is precisely the matrix version of Eq. (55).

Note that the choice of SB needed in Eq. (54) is compatible with the choice of SB needed in lemma 70: indeed, to prove compatibility we only have to show that the representation SB used in Eq. (54) has the property [SϕmB]rs=δmrδms, m=1,2. This property is automatically guaranteed by the relation (am|A|Φ)AB=1/2|ϕm), m=1,2 and by Eq. (57) with α=0 and β=1/2.

Corollary 46. In the standard representation the state ΦSt1(AB) is represented by the matrix

SΦ=12100eiθ00000000eiθ001.
(59)

Proof. The thesis follows from theorem 17 and lemma 70.

We now define the reversible transformations Ux,π and Uz,π2 as follows:

SUx,πρ=XSρX,X:=0110,SUz,π2ρ=eiπ4ZSρeiπ4Z,Z:=1001.
(60)

Also, we define the states Ψ,Φz,π2, and Ψz,π2 as

Ψ:=Ux,πIΦ,Φz,π2:=Uz,π2IΦ,Ψz,π2:=Uz,π2IΨ.

Lemma 72 The states Ψ,Φz,π2, and Ψz,π2 have the following tensor representation:

TΨ=120000011001100000,TΦz,π2=12100i00000000i001,TΨz,π2=12000001i00i100000.
(61)

Moreover, one has

Ψ=χAχB+14(σxσx+σyσyσzσz),Φz,π2=χAχB+14(σyσx+σxσy+σzσz),Ψz,π2=χAχB+14(σyσxσxσyσzσz).
(62)

Proof. Equation (61) is obtained from Eq. (54) by explicit calculation using lemma 69 and Eq. (60). Then, the validity of Eq. (62) is easily obtained from Eq. (55) using the relations

Ux,πσx=σx,Ux,π|σy)=|σy),Ux,πσz=σz,

and

Uz,π/2σx=|σy),Uz,π/2|σy)=|σx),Uz,π/2σz=σz.

Lemma 73. The states Ψ,Φz,π2, and Ψz,π2 have a standard representation of the form

SΨ=12000001eiγ00eiγ100000,SΦz,π2=12100λieiθ00000000λieiθ001,SΨz,π2=12000001μieiγ00μieiγ100000.
(63)

with θ as in corollary 46, γ[0,2π) and λ,μ{1,1}.

Proof. Let us start from Ψ. First, from Eq. (62) it is immediate to obtain (a1a1|Ψ)=(a2a2|Ψ)=0 and (a1a2|Ψ)=(a2a1|Ψ)=1/2. This gives the diagonal elements of SΨ. Then, using theorem 17 we obtain that SΨ must be as in Eq. (63), for some value of γ. Let us now consider Φz,π2. Again, the diagonal elements of the matrix SΦz,π2 are obtained from Eq. (62), which in this case yields (a1a1|Φz,π2)=(a2a2|Φz,π2)=1/2 and (a1a2|Φz,π2)=(a2a1|Φz,π2)=0. Hence, by theorem 17 we must have

SΦz,π2=12100eiθ00000000eiθ001

for some value of θ[0,2π). Now, denote by A the effect such that (A|Φ)=1. We then have

A|Φz,π2=TrEASΦz,π2=TrSΦSΦz,π2,A|Φz,π2=TrFATΦz,π2=TrTΦTΦz,π2=12

having used theorem 18, corollary 44, and Eq. (61). Hence we have Tr[SΦSΦz,π2]=1/2, which implies θ=θ±π2, as in Eq. (63). Finally, the same arguments can be used for Ψz,π2: The diagonal elements of SΨz,π2 are obtained from the relations (a1a1|Ψz,π2)=(a2a2|Ψz,π2)=0 and (a1a2|Ψz,π2)=(a2a1|Ψz,π2)=1/2, which follow from Eq. (62). This implies that the matrix SΨz,π2 has the form

SΨz,π2=12000001eiγ00eiγ100000

for some γ[0,2π). The relation Tr[SΨSΨz,π2]=Tr[TΨTΨz,π2]=1/2 then implies γ=γ±π2.

Let us now consider the four vectors Σx(11,22),Σy(11,22),Σx(12,21),Σy(12,21) defined as follows:

Σx(11,22)=2ΦχAχB14σzσz,Σy(11,22)=2Φz,π2χAχB14σzσz,Σx(12,21)=2ΨχAχB+14σzσz,Σx(12,21)=2Ψz,π2χAχB+14σzσz.
(64)

By the previous results, it is immediate to obtain the matrix representations of these vectors. In the tensor representation, using Eqs. (54) and (61), we obtain

TΣx(11,22)=0001000000001000,TΣy(11,22)=000i00000000i000,TΣx(12,21)=0000001001000000,TΣy(12,21)=000000i00i000000,

while in the standard representation, using Eqs. (46) and (63), we obtain

SΣx(11,22)=000eiθ00000000eiθ000,SΣy(11,22)=000λieiθ00000000λieiθ000,SΣx(12,21)=000000eiγ00eiγ000000,SΣx(11,22)=000000μieiγ00μieγ000000,

Comparing the two matrix representations we are now in position to prove the desired result.

Lemma 74. With a suitable choice of axes, one has Sσkσl=Tσkσl for every k,l=x,y.

Proof. For the face (11,22), using the freedom coming from Eqs. (43) and (44), we redefine the x and y axes so that σx(11,22):=Σx(11,22) and λσy(11,22):=Σy(11,22). In this way we have

SΣk(11,22)=TΣk(11,22)k=x,y.

Likewise, for the face (12,21) we redefine the x and y axes so that σx(12,21):=Σx(12,21) and μσy(12,21):=Σy(12,21), so that we have

SΣk(12,21)=TΣk(12,21)k=x,y.

Finally, using Eqs. (55), (62), and (64) we have the relations

σxσx=Σx(11,22)+Σx(12,21),σyσy=Σx(11,22)Σx(12,21),σxσy=Σy(11,22)Σy(12,21),σyσx=Σy(11,22)+Σy(12,21).

Since S and T coincide on the right-hand side of each equality, they must also coincide on the left-hand side.

Theorem 19. With a suitable choice of axes, the standard representation coincides with the tensor representation, that is, Sρ=Tρ for every ρSt(AB).

Proof. Combining lemma 70 with lemma 74 we obtain that S and T coincide on the tensor products basis B×B, where B={ϕ1,ϕ2,σx,σy}. By linearity, S and T coincide on every state.

From now on, whenever we will consider a composite system AB where A and B are two dimensional we will adopt the choice that guarantees that the standard representation coincides with the tensor representation.

D. Positivity of the matrices

In this paragraph we show that the states in our theory can be represented by positive matrices. This amounts to prove that for every system A, the set of states St1(A) can be represented as a subset of the set of density matrices in dimension dA. This result will be completed in Sec. XIII E, where we will see that, in fact, every density matrix in dimension dA corresponds to some state of St1(A).

The starting point to prove positivity is the following.

Lemma 75. Let A and B be two-dimensional systems. Then, for every pure state ΨSt(AB) one has SΨ0.

Proof. Take an arbitrary vector ZC2C2, written in the Schmidt form as |Z=n=12λn|vn|wn. Introducing the unitaries U,V such that U|vn=|n and V|wn=|n for every n=1,2 then we have |Z=(UV)|W, where |W=n=12λn|n|n. Therefore, we have

Z|SΨ|Z=W|S(UV)Ψ|W,

where U and V are the reversible transformations defined by SUρ=USρU and SVρ=VSρV, respectively ( U and V are physical transformations by virtue of corollary 30). Here we used the fact that the standard two-qubit representation coincides with the tensor representation and, therefore, S(UV)Ψ=(UV)SΨ(UV). Denoting the pure state (UV)|Ψ) by |Ψ) we then have

Z|SΨ|Z=λ1SΨ11,11+λ2SΨ22,22+2λ1λ2Re([SΨ]11,22).

Since by theorem 17 we have [SΨ]11,22=[SΨ]11,11[SΨ]22,22eiθ, we conclude

Z|SΨ|Z=λ1SΨ11,11+λ2SΨ22,22+2cosθλ1λ2[SΨ]11,11[SΨ]22,22(λ1[SΨ]11,11λ2[SΨ]22,22)20.

Finally, since the vector ZC2C2 is arbitrary, the matrix SΨ is positive.

Corollary 47. Let C be a system of dimension dC=4. Then, with a suitable choice of matrix representation the pure states of C are represented by positive matrices.

Proof. The system C is operationally equivalent to the composite system AB, where dA=dB=2. Let UTransf(AB,C) be the reversible transformation implementing the equivalence. Now, we know that the states of AB are represented by positive matrices. If we define the basis vectors for C by applying U to the basis for AB, then we obtain that the states of C are represented by the same matrices representing the states of AB.

Corollary 48. Let A be a system with dA=3. With a suitable choice of matrix representation, the matrix Sϕ is positive for every pure state ϕSt(A).

Proof. Let C be a system with dC=4. By corollary 47 the states of C are represented by positive matrices. Define the state ω:=13(ϕ1+ϕ2+ϕ3), where {ϕm}m=14 are four perfectly distinguishable pure states. By the compression axiom, the face Fω can be encoded in a three-dimensional system D (corollary 40). In fact, since D is operationally equivalent to A, the face Fω can be encoded in A. Let ETransf(D,A) and DTransf(A,D) be the encoding and decoding operation, respectively. If we define the basis vectors for A by applying E to the basis vectors for the face Fω, then we obtain that the states of A are represented by the same matrices representing the states in the face Fω. Since these matrices are positive, the thesis follows.

From now on, for every three-dimensional system A we will choose the x and y axes so that Sρ is positive for every ρSt(A).

Corollary 49. Let ϕSt1(A) be a pure state with dA=3. Then, the corresponding matrix Sϕ, given by

Sϕ=p1p1p2eiθ12p1p3eiθ13p1p2eiθ12p2p2p3eiθ23p1p3eiθ13p2p3eiθ23p3
(65)

satisfies the property

eiθ13=ei(θ12+θ23).

Equivalently, Sϕ=|vv|, where vC3 is the vector given by |v:=(p1,p2eiθ12,p3eiθ13)T.

Proof. The relation can be trivially satisfied when pi=0 for some i{1,2,3}. Hence let us assume p1,p2,p3>0. Computing the determinant of Sϕ one obtains det(Sϕ)=2p1p2p3[cos(θ12+θ23θ13)1]. Since Sϕ is positive, we must have det(Sϕ)0. If p1,p2,p3>0 the only possibility is θ13=θ12+θ23mod2π.

Corollary 49 can be easily extended to systems of arbitrary dimension. To this purpose, we choose the x and y axes in such a way that the projection of every state ρSt1(A) on a three-dimensional face is represented by a positive matrix.

Lemma 76. If ϕSt1(A) is a pure state and dA=N, then Sϕ=|vv|, where vCN is the vector given by v:=(p1,p2eiα2,,pNeiαN)T with αi[0,2π)i=2,,N.

Proof. Consider a triple V={p,q,r}{1,,N}. Then the state ΠV|ϕ) is proportional to a pure state of a three-dimensional system, whose representation SΠVϕ is the 3×3 square submatrix of Sϕ with elements [Sϕ]kl=pkpkeiθkl, (k,l)V×V. Now, corollary 49 forces the relation eiθpr=ei(θpq+θqr). Since this relation must hold for every choice of the triple V={p,q,r}, if we define αp:=θp1, then we have eiθpq=ei(θp1+θ1q)=ei(θp1θq1)=ei(αpαq). It is then immediate to verify that Sϕ=|vv|, where v=(p1,p2eiα2,,pNeiαN)T.

In conclusion, we proved the following.

Corollary 50. For every system A, the state space St1(A) can be represented as a subset of the set of density matrices in dimension dA.

Proof. For every state ρSt1(A) the matrix Sρ is Hermitian by construction, with unit trace by corollary 42, and positive since it is a convex mixture of positive matrices.

E. Quantum theory in finite dimensions

Here we conclude our derivation of quantum theory by showing that every density matrix in dimension dA corresponds to some state ρSt1(A).

We already know from the superposition principle (lemma 16) that for every choice probabilities {pi}i=1dA there is a pure state ϕSt1(A) such that {pi}i=1dA are the diagonal elements of Sϕ. Thus the set of density matrices corresponding to pure states contains at least one matrix of the form Sϕ=|vv|, with |v=(p1,p2eiβ2,,pdAeiβdA). It only remains to prove that every possible choice of phases βi[0,2π) corresponds to some pure state.

Recall that for a face FSt1(A) we defined the group GF,F to be the group of reversible transformations UGA such that U=ωFIA and U=ωFIA. We then have the following.

Lemma 77. Consider a system A with dA=N. Let {ϕi}i=1NSt1(A) be a maximal set of perfectly distinguishable pure states, F be the face identified by ωF=1/(N1)i=1N1ϕi and F its orthogonal face, identified by the state ϕN. If U is a reversible transformation in GF,F, then the action of U is given by

SUρ=USρUU=0IN1000eiβ,
(66)

where IN1 is the (N1)×(N1) identity matrix and β[0,2π).

Proof. Consider an arbitrary state ρSt1(A) and its matrix representation

Sρ=SΠFρffSΠFρ,

where fCN1 is a suitable vector. Since U=ωFIA and U=ωFIA, we have that

SUρ=SΠFρggSΠFρ,

where gCN1 is a suitable vector. To prove Eq. (66), we will now prove that g=eiβf for some suitable β[0,2π).

Let us start from the case N=3. Since U|ϕi)=|ϕi)i=1,2,3, we have (ai|U=(ai|i=1,2,3 (lemma 51). This implies that U sends states in the face F13 to states in the face F13: indeed, for every ρF13 one has (a13|U|ρ)=(a13|ρ)=1, which implies UρF13 (lemma 46). In other words, the restriction of U to the face F13 is a reversible qubit transformation. Therefore, the action of U on a state ρF13 must be given by

SUρ=ρ110ρ13eiβ000ρ31eiβ0ρ33,

for some β[0,2π). Similarly, we can see that U sends states in the face F23 to states in the face F23. Hence, for every σF23 we have

SUσ=0000σ22σ23eiβ0σ32eiβρ33

for some β[0,2π). We now show that eiβ=eiβ. To see that, consider a generic state ϕSt1(A), with the property that pi=(ai|ϕ)>0 for every i=1,2,3 (such state exists due to the superposition principle of theorem 16). Writing Sϕ as in Eq. (65) we then have

SUϕ=p1p1p2eiθ12p1p3ei(θ13+β)p1p2eiθ12p2<