# Informational derivation of quantum theory

Phys. Rev. A 84, 012311
##### I. INTRODUCTION

More than 80 years after its formulation, quantum theory is still mysterious. The theory has a solid mathematical foundation, addressed by Hilbert, von Neumann, and Nordheim in 1928 [1] and brought to completion in the monumental work by von Neumann [2]. However, this formulation is based on the abstract framework of Hilbert spaces and self-adjoint operators, which, to say the least, are far from having an intuitive physical meaning. For example, the postulate stating that the pure states of a physical system are represented by unit vectors in a suitable Hilbert space appears as rather artificial: which are the physical laws that lead to this very specific choice of mathematical representation? The problem with the standard textbook formulations of quantum theory is that the postulates therein impose particular mathematical structures without providing any fundamental reason for this choice: the mathematics of Hilbert spaces is adopted without further questioning as a prescription that “works well” when used as a black box to produce experimental predictions. In a satisfactory axiomatization of quantum theory, instead, the mathematical structures of Hilbert spaces (or C* algebras) should emerge as consequences of physically meaningful postulates, that is, postulates formulated exclusively in the language of physics: this language refers to notions like physical system, experiment, or physical process and not to notions like Hilbert space, self-adjoint operator, or unitary operator. Note that any serious axiomatization has to be based on postulates that can be precisely translated in mathematical terms. However, the point with the present status of quantum theory is that there are postulates that have a precise mathematical statement, but cannot be translated back into language of physics. Those are the postulates that one would like to avoid.

The need for a deeper understanding of quantum theory in terms of fundamental principles was clear since the very beginning. Von Neumann himself expressed his dissatisfaction with his mathematical formulation of quantum theory with the surprising words “I don’t believe in Hilbert space anymore,” reported by Birkhoff in [3]. Realizing the physical relevance of the axiomatization problem, Birkhoff and von Neumann made an attempt to understand quantum theory as a new form of logic[4]: the key idea was that propositions about the physical world must be treated in a suitable logical framework, different from classical logics, where the operations AND and OR are no longer distributive. This work inaugurated the tradition of quantum logics, which led to several attempts to axiomatize quantum theory, notably by Mackey [5] and Jauch and Piron [6] (see Ref. [7] for a review on the more recent progresses of quantum logics). In general, a certain degree of technicality, mainly related to the emphasis on infinite-dimensional systems, makes these results far from providing a clear-cut description of quantum theory in terms of fundamental principles. Later Ludwig initiated an axiomatization program [8] adopting an operational approach, where the basic notions are those of preparation devices and measuring devices and the postulates specify how preparations and measurements combine to give the probabilities of experimental outcomes. However, despite the original intent, Ludwig’s axiomatization did not succeed in deriving Hilbert spaces from purely operational notions, as some of the postulates still contained mathematical notions with no operational interpretation.

More recently, the rise of quantum information science moved the emphasis from logics to information processing. The new field clearly showed that the mathematical principles of quantum theory imply an enormous amount of information-theoretic consequences, such as the no-cloning theorem [9,10], the possibility of teleportation [11], secure key distribution [12–14], or of factoring numbers in polynomial time [15]. The natural question is whether the implication can be reversed: is it possible to retrieve quantum theory from a set of purely informational principles? Another contribution of quantum information has been to shift the emphasis to finite dimensional systems, which allow for a simpler treatment but still possess all the remarkable quantum features. In a sense, the study of finite dimensional systems allows one to decouple the conceptual difficulties in our understanding of quantum theory from the technical difficulties of infinite dimensional systems.

In this scenario, Hardy’s 2001 work [16] re-opened the debate about the axiomatizations of quantum theory with fresh ideas. Hardy’s proposal was based on five main assumptions about the relation between dimension of the state space and the number of perfectly distinguishable states of a given system, about the structure of composite systems, and about the possibility of connecting any two pure states of a physical system through a continuous path of reversible transformations. However, some of these assumptions directly refer to the mathematical properties of the state space (in particular, the “simplicity axiom” 2, which is an abstract statement about the functional dependence of the state space dimension on the number of perfectly distinguishable states). Very recently, building on Hardy’s work there have been two new attempts of axiomatization by Dakic and Bruckner[17] and Masanes and Müller [18]. Although these works succeeded in removing the “simplicity axiom,” they still contain mathematical assumptions that cannot be understood in elementary physical terms (see, e.g., requirement 5 of Ref. [18], which assumes that “all mathematically well-defined measurements are allowed by the theory”).

Another approach to the axiomatization of quantum theory was pursued by one of the authors in a series of works [19] culminated in Ref. [20]. These works tackled the problem using operational principles related to tomography and calibration of physical devices, experimental complexity, and to the composition of elementary transformations. In particular this research introduced the concept of dynamically faithful states, namely states that can be used for the complete tomography of physical processes. Although this approach went very close to deriving quantum theory, in this case one mathematical assumption without operational interpretation was needed (see the CJ postulate of Ref. [20]).

In this paper we provide a complete derivation of finite dimensional quantum theory based on purely operational principles. Our principles do not refer to abstract properties of the mathematical structures that we use to represent states, transformations, or measurements, but only to the way in which states, transformations, and measurements combine with each other. More specifically, our principles are of informational nature: they assert basic properties of information processing, such as the possibility or impossibility to carry out certain tasks by manipulating physical systems. In this approach the rules by which information can be processed determine the physical theory, in accordance with Wheeler’s program “it from bit,” for which he argued that “all things physical are information-theoretic in origin” [21]. Note that, however, our axiomatization of quantum theory is relevant, as a rigorous result, also for those who do not share Wheeler’s ideas on the informational origin of physics. In particular, in the process of deriving quantum theory we provide alternative proofs for many key features of the Hilbert space formalism, such as the spectral decomposition of self-adjoint operators or the existence of projections. The interesting feature of these proofs is that they are obtained by manipulation of the principles, without assuming Hilbert spaces from the start.

The main message of our work is simple: within a standard class of theories of information processing, quantum theory is uniquely identified by a single postulate: purification. The purification postulate, introduced in Ref. [22], expresses a distinctive feature of quantum theory, namely that the ignorance about a part is always compatible with the maximal knowledge of the whole. The key role of this feature was noticed already in 1935 by Schrödinger in his discussion about entanglement [23], of which he famously wrote “I would not call that one but rather the characteristic trait of quantum mechanics, the one that enforces its entire departure from classical lines of thought.” In a sense, our work can be viewed as the concrete realization of Schrödinger’s claim: the fact that every physical state can be viewed as the marginal of some pure state of a compound system is indeed the key to single out quantum theory within a standard set of possible theories. It is worth stressing, however, that the purification principle assumed in this paper includes a requirement that was not explicitly mentioned in Schrödinger’s discussion: if two pure states of a composite system $AB$ have the same marginal on system $A$, then they are connected by some reversible transformation on system $B$. In other words, we assume that all purifications of a given mixed state are equivalent under local reversible operations [24].

The purification principle expresses a law of conservation of information, stating that at least in principle, irreversibility can always be reduced to the lack of control over an environment. More precisely, the purification principle is equivalent to the statement that every irreversible process can be simulated in an essentially unique way by a reversible interaction of the system with an environment, which is initially in a pure state [22]. This statement can also be extended to include the case of measurement processes, and in that case it implies the possibility of arbitrarily shifting the cut between the observer and the observed system [22]. The possibility of such a shift was considered by von Neumann as a “fundamental requirement of the scientific viewpoint” (see p. 418 of [2]) and his discussion of the measurement process was exactly aimed to show that quantum theory fulfils this requirement.

Besides Schrödinger’s discussion on entanglement and von Neumann’s discussion of the measurement process, the purification principle is deeply rooted in the structure of quantum theory. At the purely mathematical level it plays a crucial role in the theory of C* algebras of operators on separable Hilbert spaces, where the purification principle is equivalent to the Gelfand-Naimark-Segal (GNS) construction [25] and implies the celebrated Stinespring’s theorem [26]. On the other hand, purification is a cornerstone of quantum information, lying at the origin of most quantum protocols. As it was shown in Ref. [22], the purification principle directly implies crucial features like no-cloning, teleportation, no-information without disturbance, error correction, the impossibility of bit commitment, and the “no-programming” theorem of Ref. [27].

In addition to the purification postulate, our derivation of quantum theory is based on five informational axioms. The reason why we call them “axioms,” as opposed to the the purification “postulate,” is that they are not at all specific of quantum theory. These axioms represent standard features of information processing that everyone would, more or less implicitly, assume. They define a class of theories of information processing that includes, for example, classical information theory, quantum information theory, and quantum theory with superselection rules. The question whether there are other theories satisfying our five axioms and, in case of a positive answer, the full classification of these theories is currently an open problem.

Here we informally illustrate the five axioms, leaving the detailed description to the remaining part of the paper:

• (1) Causality: the probability of a measurement outcome at a certain time does not depend on the choice of measurements that will be performed later.

• (2) Perfect distinguishability: if a state is not completely mixed (i.e., if it cannot be obtained as a mixture from any other state), then there exists at least one state that can be perfectly distinguished from it.

• (3) Ideal compression: every source of information can be encoded in a suitable physical system in a lossless and maximally efficient fashion. Here lossless means that the information can be decoded without errors and maximally efficient means that every state of the encoding system represents a state in the information source.

• (4) Local distinguishability: if two states of a composite system are different, then we can distinguish between them from the statistics of local measurements on the component systems.

• (5) Pure conditioning: if a pure state of system $AB$ undergoes an atomic measurement on system $A$, then each outcome of the measurement induces a pure state on system $B$. (Here atomic measurement means a measurement that cannot be obtained as a coarse graining of another measurement.)

All these axioms are satisfied by classical information theory. Axiom 5 is even trivial for classical theory, because the only pure states of a composite system $AB$ are the product of pure states of the component systems $A$ and $B$, and hence the state of system $B$ will be pure irrespectively of what we do on system $A$.

A stronger version of axiom 5, introduced in Ref. [20], is the following:

• (5′) Atomicity of composition: the sequential composition of two atomic operations is atomic. (Here atomic transformation means a transformation that cannot be obtained from coarse graining.)

However, it turns out that axiom 5 is enough for our derivation: thanks to the purification postulate we will be able to show the nontrivial implication: axiom 5 $⇒$ axiom 5 $′$ (see lemma 16).

The paper is organized as follows. In Sec. II we review the framework of operational-probabilistic theories introduced in Ref. [22]. This framework will provide the basic notions needed for the formulation of our principles. In Sec. III we introduce the principles from which we will derive quantum theory. In Sec. IV we prove some direct consequences of the principles that will be used later in the paper. In Sec. V we discuss the properties of perfectly distinguishable states, while in Sec. VI we prove the existence of a duality between pure states and atomic effects.

The results about distinguishability and duality of pure states and atomic effects allow us to show in Sec. VII that every system has a well defined informational dimension—the operational counterpart of the Hilbert space dimension. Section VIII contains the proof that every state can be decomposed as a convex combination of perfectly distinguishable pure states. Similarly, any element of the vector space spanned by the states can be written as a linear combination of perfectly distinguishable states. This result corresponds to the spectral theorem for self-adjoint operators on complex Hilbert spaces. In Sec. IX we prove some results about the maximum teleportation probability, which allow us to derive a functional relation between the dimension of the state space and the number of perfectly distinguishable states of the system. The mathematical representation of systems with two perfectly distinguishable states is derived in Sec. X, where we prove that such systems are indeed two-dimensional quantum systems—also known as qubits. In Sec. XI we construct projections on the faces of the state space of any system and prove their main properties. These results lead to the derivation of the operational analog of the superposition principle in Sec. XII which allows us to prove that systems with the same number of perfectly distinguishable states are operationally equivalent (Sec. XII B). The properties of the projections and the superposition principle are then exploited in Sec. XIII, where we extend the density matrix representation from qubits to higher dimensional systems, thus proving that a system with $d$ perfectly distinguishable states is indeed a quantum system with $d$-dimensional Hilbert space. We conclude the paper with Sec. XIV, where we review our results, discussing future directions for this research.

##### II. THE FRAMEWORK

This section provides a brief summary of the framework of operational-probabilistic theories, which was formulated in Ref. [22]. We refer to Ref. [22] for an exhaustive presentation of the details of the framework and of the ideas behind it. The operational-probabilistic framework combines the operational language of circuits with the toolbox of probability theory: on the one hand experiments are described by circuits resulting from the connection of physical devices, on the other hand each device in the circuit can have classical outcomes and the theory provides the probability distribution of outcomes when the devices are connected to form closed circuits (that is, circuits that start with a preparation and end with a measurement).

The notions discussed in this section will allow us to draw a precise distinction between principles with an operational content and exclusively mathematical principles: with the expression “operational principle” we will mean a principle that can be expressed using only the basic notions of the the operational-probabilistic framework.

###### A. Circuits with outcomes

A test represents one use of a physical device, like a Stern-Gerlach magnet, a beamsplitter, or a photon counter. The device will have an input system and an output system, labeled by capital letters. The corresponding test can have different classical outcomes, represented by different values of an index $i∈X$:

Each outcome $i∈X$ corresponds to a possible event, represented as

We denote by $Transf(A,B)$ the set of all events from $A$ to $B$. The reason for this notation is that in the next subsection the elements of $Transf(A,B)$ will be interpreted as transformations with input system $A$ and output system $B$. If $A=B$ we simply write $Transf(A)$ in place of $Transf(A,A)$.

A test with a single outcome will be called deterministic. This name is justified by the fact that, if there is a single possible outcome, then this outcome will occur with certainty (cf. the probabilistic structure introduced in the next subsection).

Two devices can be composed in a sequence, as long as the input system of the second device is equal to the output system of the first. The events in the composite test are represented as

and are written in formulas as $DjCi$.

For every system $A$ one can perform the identity test (or simply, the identity), that is, a test ${IA}$ with a single outcome, with the property

The subindex $A$ will be dropped from $IA$ where there is no ambiguity.

The letter $I$ will be reserved for the trivial system, which simply means “nothing” [28]. A device with input (or output) system $I$ is a device with no input (or no output). The corresponding tests will be called preparation tests (or observation tests). In this case we replace the input (or output) wire with a round portion:

###### (1)

In formulas we will write $|ρi)B$ (or $(aj|A$). The sets $Transf(I,A)$ and $Transf(A,I)$ will be denoted as $St(A)$ and $Eff(A)$, respectively. The reason for this special notation is that in the next subsection the elements of $St(A)$ [or $Eff(A)$] will be interpreted as the states (or effects) of system $A$.

From every pair of systems $A$ and $B$ one can form a composite system, denoted by $AB$. Clearly, composing system $A$ with nothing still gives system $A$, in formula $AI=IA=A$. Two devices can be composed in parallel, thus obtaining a new device with composite input and composite output systems. The events in composite test are represented as

and are written in formulas as $Ci⊗Dj$. In the special case of states we will often write $|ρi)|σj)$ in place of $ρi⊗σj$. Similarly, for effects we will write $(ai|(bj|$ in place of $ai⊗bj$.

Sequential and parallel composition commute: one has $(Ai⊗Bj)(Ck⊗Dl)=AiCk⊗BjDl$ for every $Ai,Bj,Ck,Dl$ such that the output of $Ai$ (or $Bj$) coincides with the input of $Ck$ (or $Dl$).

When one of the two tests is the identity, we will omit the box and draw only a straight line, as in

The rules summarized in this section define the operational language of circuits, which has been discussed in detail in a series of inspiring works by Coecke (see in particular Refs. [29,30]). The language of circuits allows one to represent the schematic of an experiment like, for example,

and also to represent a particular outcome of the experiment

In formula, the above circuit is given by

$BkBC(Cj⊗IC)ρiAC.$
###### B. Probabilistic structure: States, effects, and transformations

On top of the language of circuits, we put a probabilistic structure [22]: we declare that the composition of a preparation-test ${ρi}i∈X$ with an observation-test ${aj}j∈Y$ gives rise to a joint probability distribution

###### (2)

with $p(i,j)⩾0$ and $∑i∈X∑j∈Yp(i,j)=1$. In formula we write $p(i,j)=(aj|ρi)$. Moreover, if two experiments are run in parallel, we assume that the joint probability distribution is given by the product

###### (3)

where $p(i,k):=(ak|ρi),q(j,l):=(bl|σj)$.

The probabilistic structure defined by Eq. (2) turns every event $ρi∈St(A)$ into a function $ρ̂i:Eff(A)→R$, given by $ρ̂i(aj):=(aj|ρi)$. If two events $ρi,ρi′∈St(A)$ induce the same function, then it is impossible to distinguish between them from the statistics of the experiments allowed by our theory. This means that for our purposes the two events are the same: accordingly, we will take equivalence classes with respect to the relation $ρi≃ρi′$ if $ρ̂i=ρ̂i′$. To avoid introducing new notation, from now on we will assume that the equivalence classes have been taken since the start. We will identify the event $ρi∈St(A)$ with the corresponding function $ρ̂i$ and will call it state. Accordingly, we will refer to preparation tests as collections of states ${ρi}i∈X$. Note that, since one can take linear combinations of functions, the states in $St(A)$ generate a real vector space, denoted by $StR(A)$.

The same construction holds for observation tests: every event $aj∈Eff(A)$ induces a function $âj:St(A)→R$, given by $âj(ρi):=(aj|ρi)$. If two events $aj,aj′∈Eff(A)$ induce to the same function, then it is impossible to distinguish between them from the statistics of the experiments allowed in our theory. This means that for our purposes the two events are the same: accordingly, we will take equivalence classes with respect to the relation $aj≃aj′$ if $âj=âj′$. To avoid introducing new notation, from now on we will identify the event $aj∈Eff(A)$ with the corresponding function $âj$ and we will call it effect. Accordingly, we will refer to observation tests as collection of effects ${aj}i∈Y$. The effects in $Eff(A)$ generate a real vector space, denoted by $EffR(A)$.

A vector in $StR(A)$ [or $EffR(A)$] can be extended to a linear function on $EffR(A)$ [or $StR(A)$]. In this way, states and effects can be thought as elements of two real vector spaces, one dual to the other. In this paper we will restrict our attention to finite dimensional vector spaces: operationally this means that the state of a given physical system is completely determined by the statistics of a finite number of finite-outcome measurements. The dimension of the vector space $StR(A)$, which by construction is equal to the dimension of its dual $EffR(A)$, will be denoted by $DA$. We will refer to $DA$ as the size of system $A$.

Finally, the vector spaces $StR(A)$ and $EffR(A)$ can be equipped with suitable norms, which have an operational meaning related to optimal discrimination schemes [22]. The norm of an element $δ∈StR(A)$ is given by [22]

$∥δ∥=supa0∈Eff(A)a0δ−infa1∈Eff(A)a1δ,$

while the norm of an element $ξ∈EffR(A)$ is given by

$∥ξ∥=supρ∈St(A)|ξρ|.$

We will always take the set of states $St(A)$ to be closed in the operational norm. The convenience of this choice is the convenience of using real numbers instead of rational ones: dealing with a single real number is much easier than dealing with a Cauchy sequence of rational numbers. Operationally taking $St(A)$ to be closed is very natural: the fact that there is a sequence of states ${ρn}n=1∞$ that converges to $ρ∈StR(A)$ means that there is a procedure to prepare $ρ$ with arbitrary precision and hence that $ρ$ deserves the name of “state”.

We conclude this subsection by noting that every event $Ck$ from $A$ to $B$ induces a linear map $Ck̂$ from $StR(A)$ to $StR(B)$, uniquely defined by

$Ck̂:ρ∈St(A)↦Ckρ∈St(B).$

Likewise, for every system $C$ the event $Ck⊗IC$ induces a linear map $Ck⊗IĈ$ from $StR(AC)$ to $StR(BC)$. If two events $Ck$ and $Ck′$ induce the same maps for every possible system $C$, then there is no experiment in the theory that is able to distinguish between them. This means that for our purposes the two events are the same: accordingly, we will take equivalence classes with respect to the relation $Ck≃Ck′$ if $Ck⊗IĈ=Ck′⊗IĈ$ for every system $C$. In this case, we will say that two events represent the same transformation. Accordingly, we will refer to tests ${Ci}i∈X$ as collections of transformations. The deterministic transformations (corresponding to single-outcome tests) will be called channels.

###### C. Basic definitions in the operational-probabilistic framework

Here we summarize few elementary definitions that will be used later in the paper. The meaning of the definitions in the case of quantum theory is also discussed.

###### 1. Coarse graining, refinement, atomic transformations, pure, mixed and completely mixed states

First, we start from the notions of coarse graining and refinement. Coarse graining arises when we join together some outcomes of a test: we say that the test ${Dj}j∈Y$ is a coarse graining of the test ${Ci}i∈X$ if there is a disjoint partition ${Xj}j∈Y$ of $X$ such that

$Dj=∑i∈XjCi.$

Conversely, if ${Dj}j∈Y$ is a coarse graining of ${Ci}i∈X$, we say that ${Ci}i∈X$ is a refinement of ${Dj}j∈Y$. Intuitively, a test that refines another is a test that extracts information in a more precise way: it is a test with better “resolving power.”

The notion of refinement also applies to a single transformation: a refinement of the transformation $C$ is given by a test ${Ci}i∈X$ and a subset $X0$ such that

$C=∑i∈X0Ci.$

Accordingly, we say that each transformation $Ci,i∈X0$ is a refinement of $C$. A transformation $C$ is atomic if it has only trivial refinements: if $Ci$ refines $C$, then $Ci=pC$ for some probability $p⩾0$. A test that consists of atomic transformations is a test whose “resolving power” cannot be further improved.

When discussing states (i.e., transformations with trivial input) we will use the word pure as a synonym of atomic. A pure state describes a situation of maximal knowledge about the system’s preparation, a knowledge that cannot be further refined.

As usual, a state that is not pure will be called mixed. An important notion is that of completely mixed state.

Definition 1 (Completely mixed state). A state is completely mixed if any other state can refine it: precisely, $ω∈St(A)$ is completely mixed if for every $ρ∈St(A)$ there is a nonzero probability $p>0$ such that $pρ$ is a refinement of $ω$.

Intuitively, a completely mixed state describes a situation of complete ignorance about the system’s preparation: if a system is described by a completely mixed state, then it means that we know so little about its preparation that, in fact, every preparation is possible.

We conclude this paragraph with a couple of definitions that will be used throughout the paper.

Definition 2 (Reversible transformation). A transformation $U∈Transf(A,B)$ is reversible if there exists another transformation $U−1∈Transf(B,A)$ such that $U−1U=IA$ and $UU−1=IB$. When $A=B$ the reversible transformations form a group, indicated as $GA$.

Definition 3 (Operationally equivalent systems). Two systems $A$ and $B$ are operationally equivalent if there exists a reversible transformation $U$ from $A$ to $B$.

When two systems are operationally equivalent one can convert one into the other in a reversible fashion.

###### 2. Examples in quantum theory

Consider a quantum system with Hilbert space $H=Cd,d<∞$. In this case a preparation test is a collection of unnormalized density matrices ${ρi}i∈X$ (i.e., of nonnegative $d×d$ complex matrices with trace bounded by 1) such that

$∑i∈XTr[ρi]=1.$

Preparation tests are often called quantum information sources in quantum information theory. A generic state $ρ$ is an unnormalized density matrix. A deterministic state, corresponding to a single-outcome preparation test, is a normalized density matrix $ρ$, with $Tr[ρ]=1$.

Diagonalizing $ρ=∑iαi|ψi⟩⟨ψi|$ we then obtain that each matrix $αi|ψi⟩⟨ψi|$ is a refinement of $ρ$. More generally, every matrix $σ$ such that $σ⩽ρ$ is a refinement of $ρ$. Up to a positive rescaling, all matrices with support contained in the support of $ρ$ are refinements of $ρ$. A quantum state $ρ$ is atomic (pure) if and only if it is proportional to a rank-one projection. A quantum state is completely mixed if and only if its density matrix has full rank. Note that the quantum state $χ=Idd$, where $Id$ is the identity $d×d$ matrix, is a particular example of completely mixed state, but not the only example. Precisely, $χ=Idd$ is the unique unitarily invariant state in dimension $d$.

Let us now consider the case of observation tests: in quantum theory an observation test is given by a POVM (positive operator-valued measure), namely by a collection ${Pj}j∈Y$ of nonnegative $d×d$ matrices such that

$∑j∈YPj=Id.$

An effect is then a nonnegative matrix $P⩾0$ upper bounded by the identity. In quantum theory there is only one deterministic effect, corresponding to a single-outcome observation test: the unique deterministic effect given by the identity matrix. As we will see in the following section, the fact that the deterministic effect is unique is equivalent to the fact that quantum theory is a causal theory.

An effect $P$ is atomic if and only if $P$ is proportional to a rank-one projector. An observation test is atomic if it is a POVM with rank-one elements.

Finally, a general test from an input system with Hilbert space $H1=Cd1$ to an output system with Hilbert space $H2=Cd2$ is given by a quantum instrument, namely by a collection ${Ck}k∈Z$ of completely positive trace nonincreasing maps sending linear operators on $H1$ to linear operators on $H2$, with the property that

$CZ:=∑k∈ZCk$

is trace preserving. A general transformation is then given by a trace nonincreasing map, called quantum operation, whereas a deterministic transformation, corresponding to a single-outcome test, is given by a trace-preserving map, called quantum channel.

Any quantum operation $C$ can be written in the Kraus form $C(ρ)=∑iCiρCi†$, where $Ci:H1→H2$ are the Kraus operators. Up to a positive scaling, every quantum operation $D$ such that the Kraus operators of $D$ belong to the linear span of the Kraus operators of $C$ is a refinement of $C$. A map $C$ is atomic if and only if there is only one Kraus operator in its Kraus form. A reversible transformation in quantum theory is a unitary map $U(ρ)=UρU†$, where $U:H1→H2$ is a unitary operator, that is $U†U=I1$ and $UU†=I2$ where $I1$ $(I2)$ is the identity operator on $H1$ ( $H2$). Two quantum systems are operationally equivalent if and only if the corresponding Hilbert spaces have the same dimension.

###### D. Operational principles

We are now in position to make precise the usage of the expression “operational principle” in the context of this paper. By operational principle we mean here a principle that can be stated using only the operational-probablistic language, i.e., using only

• (1) the notions of system, test, outcome, probability, state, effect, transformation;

• (2) their specifications: atomic, pure, mixed, completely mixed; and

• (3) more complex notions constructed from the above terms (e.g., the notion of “reversible transformation”).

The distinction between operational principles and principles referring to abstract mathematical properties, mentioned in the Introduction, should now be clear: for example, a statement like “the pure states of a system cannot be cloned” is a valid operational principle, because it can be analyzed in basic operational-probabilistic terms as “for every system $A$ there exists no transformation $C$ with input system $A$ and output system $AA$ such that $C|ϕ)=|ϕ)|ϕ)$ for every pure state $ϕ$ of $A$.” On the contrary, a statement like “the state space of a system with two perfectly distinguishable states is a three-dimensional sphere” is not a valid operational principle, because there is no way to express what it means for a state space to be a three-dimensional sphere in terms of basic operational notions. The fact that a state spate is a sphere may be eventually derived from operational principles, but cannot be assumed as a starting point.

##### III. THE PRINCIPLES

We now state the principles used in our derivation. The first five principles express generic features that are shared by both classical and quantum theory. They could be even included in the definition of the background framework: they define the simple model of information processing in which we try to single out quantum theory. For this reason we will call them axioms. The sixth principle in our derivation has a different status: it expresses the genuinely quantum features. A major message of our work is that, within a broad class of theories of information processing, quantum theory is completely described by the purification principle. To emphasize the special role of the sixth principle we will call it postulate, in analogy with the parallel postulate of Euclidean geometry.

###### 1. Causality

The first axiom of our list, causality [22], is so basic that could be considered as part of the background framework. We decided to explicitly present it as an axiom for two reasons: The first reason is that the framework of operational-probabilistic theories can be developed even without this requirement (see Ref. [22] for the general framework and Refs. [31,32] for two explicit examples of noncausal theories). The second reason is that we want to stress that causality is an essential ingredient in our derivation. This observation is important in view of possible extensions of quantum theory to quantum gravity scenarios where the causal structure is not defined from the start (see, e.g., Hardy in Ref. [33]).

Axiom 1 (Causality). The probability of preparations is independent of the choice of observations.

In technical terms: if ${ρi}i∈X⊂St(A)$ is a preparation test, then the conditional probability of the preparation $ρi$ given the choice of the observation-test ${aj}j∈Y$ is the marginal

$p(i|{aj}):=∑j∈Y(aj|ρi).$

The axiom states that the marginal probability $p(i|{aj})$ is independent of the choice of the observation-test ${aj}$: if ${aj}j∈Y$ and ${bk}k∈Z$ are two different observation tests, then one has $p(i|{aj})=p(i|{bk})$. Loosely speaking, one may refer to causality as a requirement of no signaling from the future: indeed, causality is equivalent to the fact that the probability of an outcome at a certain time does not depend on the choice of operations that will be done at later times [20].

An operational-probabilistic theory that satisfies the causality axiom 1 will be called causal. As we already mentioned, causality is a very basic requirement and could be considered as part of the framework: it provides the notions used to state the other axioms and it implies several facts that will be used frequently in the paper. In fact, in our derivation we do not use the causality axiom directly, but only through its consequences. In the following we briefly summarize the facts and the notations that characterize the framework of causal operational-probabilistic theories, introduced and discussed in detail in Ref. [22]. Similar structures have been subsequently considered in Refs. [34,35] within a formal description of circuits in foliable space-time regions.

First, causality is equivalent to the existence of an effect $eA$ such that $eA=∑j∈Xaj$ for every observation-test ${aj}j∈Y$. We call the effect $eA$ the deterministic effect for system $A$. By definition, the effect $eA$ is unique. The subindex $A$ in $eA$ will be dropped when no confusion can arise.

In a causal theory every test ${Ci}i∈X⊂Transf(A,B)$ satisfies the condition

$∑i∈XeBCi=eA.$

As a consequence, a transformation $C∈Transf(A,B)$ satisfies the condition

$(eB|C⩽(eA|,$
###### (4)

with the equality if and only if $C$ is a channel (i.e., a deterministic transformation, corresponding to a single-outcome test). In Eq. (4) we used the notation $(a|⩽(a′|$ to mean $(a|ρ)⩽(a′|ρ)$ for every $ρ∈St(A)$.

In a causal theory the norm of a state $ρi∈St(A)$ is given by $∥ρi∥=(e|ρi)$. Accordingly, one can define the normalized state

$ρ¯i:=ρieρi.$

In a causal theory one can always allow for rescaled preparations: conditionally to the outcome $i∈X$ in the preparation-test ${ρi}i∈X$ we can say that we prepared the normalized state $ρ¯i$. For this reason, every state in a causal theory is proportional to a normalized state.

The set of normalized states will be denoted by $St1(A)$. Since the set of all states $St(A)$ is closed in the operational norm, also the set of normalized states $St1(A)$ is closed. Moreover, the set $St1(A)$ is convex [22]: this means that for every pair of normalized states $ρ1,ρ2∈St1(A)$ and for every probability $p∈[0,1]$ the convex combination $ρp=pρ1+(1−p)ρ2$ is a normalized state. Operationally, the state $ρp$ is obtained by

• (1) performing a binary test with outcomes ${1,2}$ and outcome probabilities $p1=p$ and $p2=1−p$;

• (2) for outcome $i$ preparing $ρi$, thus realizing the preparation-test ${piρi}i=1,2$;

• (3) coarse graining over the outcomes, thus obtaining $ρp=pρ1+(1−p)ρ2$.

The step 2 (preparation of a state conditionally on the outcome of a previous test) is possible because the theory is causal [22].

The pure normalized states are the extreme points of the convex set $St1(A)$. For a normalized state $ρ∈St1(A)$ we define the face identified by $ρ$ as follows.

Definition 4 (Face identified by a state). The face identified by $ρ∈St1(A)$ is the set $Fρ$ of all normalized states $σ∈St1(A)$ such that $ρ=pσ+(1−p)τ$, for some nonzero probability $p>0$ and some normalized state $τ∈St1(A)$.

In other words, $Fρ$ is the set of all normalized states that show up in the convex decompositions of $ρ$. Clearly, if $ϕ$ is a pure state, then one has $Fϕ={ϕ}$. The opposite situation is that of completely mixed states: by definition 1, a state $ω∈St1(A)$ is completely mixed if every state $σ∈St1(A)$ can stay in its convex decomposition, that is, if $Fω=St1(A)$. An equivalent condition for a state to be completely mixed is the following.

Lemma 1. A state $ω∈St1(A)$ is completely mixed if and only if $Span(Fω)=StR(A)$.

Proof. The condition is clearly necessary. It is also sufficient because for a state $σ∈St1(A)$ the relation $σ∈Span(Fω)$ implies $σ∈Fω$ (see lemma 16 of Ref. [22]). $▪$

A completely mixed state can never be distinguished from another state with zero error probability.

Proposition 1. Let $ρ∈St1(A)$ be a completely mixed state and $σ∈St1(A)$ be an arbitrary state. Then, the probability of error in distinguishing $ρ$ from $σ$ is strictly greater than zero.

Proof. By contradiction, suppose that one can distinguish between $ρ$ and $σ$ with zero error probability. This means that there exists a binary test ${aρ,aσ}$ such that $(aρ|σ)=(aσ|ρ)=0$. Since $ρ$ is completely mixed there exists a probability $p>0$ and a state $τ∈St1(A)$ such that $ρ=pσ+(1−p)τ$. Hence, the condition $(aσ|ρ)=0$ implies $(aσ|σ)=0$. Therefore, we have $(aρ|σ)+(aσ|σ)=0$. This is in contradiction with the normalization of the probabilities in the test ${aρ,aσ}$, which would require $(aρ|σ)+(aσ|σ)=1$. $▪$

###### 2. Perfect distinguishability

Our second axiom regards the task of state discrimination. As we saw in proposition 1, if a state is completely mixed, then it is impossible to distinguish it perfectly from any other state. Axiom 2 states the converse.

Axiom 2 (Perfect distinguishability). Every state that is not completely mixed can be perfectly distinguished from some other state.

Note that the statement of axiom 2 holds for quantum and for classical information theory. In quantum theory a completely mixed state is a density matrix with full rank. If a density matrix $ρ$ has not full rank, then it must have a kernel: hence, every density matrix $σ$ with support in the kernel of $ρ$ will be perfectly distinguishable from $ρ$, as stated in axiom 2. Applying the same reasoning for density matrices that are diagonal in a given basis, one can easily see that axiom 2 is satisfied also by classical information theory.

To the best of our knowledge, the perfect distinguishability property has never been considered in the literature as an axiom, probably because in most works it came for free as a consequence of stronger mathematical assumptions. For example, one can obtain the perfect distinguishability property from the no-restriction hypothesis of Ref. [22], stating that for every system $A$ any binary probability rule [i.e., any pair of positive functionals $a0,a1∈EffR(A)$ such that $a0+a1=eA$] actually describes a measurement allowed by the theory. This assumption was made, for example, in Ref. [18] in the case of systems with at most two distinguishable states (see requirement 5 of Ref. [18]). Note that the difference between the perfect distinguishability axiom and the no-restriction hypothesis is that the former can be expressed in purely operational terms, whereas the latter requires the notion of “positive functional” which is not part of the basic operational language.

###### 3. Ideal compression

The third axiom is about information compression. An information source for system $A$ is a preparation-test ${ρi}i∈X$, where each $ρi∈St(A)$ is an unnormalized state and $∑i∈X(e|ρi)=1$. A compression scheme is given by an encoding operation $E$ from $A$ to a smaller system $C$, that is, to a system $C$ such that $DC⩽DA$. The compression scheme is lossless for the source ${ρi}i∈X$ if there exists a decoding operation $D$ from $C$ to $A$ such that $DE|ρi)=|ρi)$ for every value of the index $i∈X$. This means that the decoding allows one to perfectly retrieve the states ${ρi}i∈X$. We say that a compression scheme is lossless for the state $ρ$, if it is lossless for every source ${ρi}i∈X$ such that $ρ=∑i∈Xρi$. Equivalently, this means that the restriction of $DE$ to the face identified by $ρ$ is equal to the identity channel: $DE|σ)=σ$ for every $σ∈Fρ$.

A lossless compression scheme is maximally efficient if the encoding system $C$ has the smallest possible size, that is, if the system $C$ has no more states than exactly those needed to compress $ρ$. This happens when every normalized state $τ∈St1(C)$ comes from the encoding of some normalized state $σ∈Fρ$, namely $|τ)=E|σ)$.

We say that a compression scheme that is lossless and maximally efficient is ideal. Our second axiom states that ideal compression is always possible.

Axiom 3 (Ideal compression). For every state there exists an ideal compression scheme.

It is easy to see that this statement holds in quantum theory and in classical probability theory. For example, if $ρ$ is a density matrix on a $d$-dimensional Hilbert space and $rank(ρ)=r$, then the ideal compression is obtained by just encoding $ρ$ in an $r$-dimensional Hilbert space. As long as we do not tolerate losses, this is the most efficient one-shot compression we can devise in quantum theory. Similar observations hold for classical information theory.

It is important to emphasize the difference between our “ideal compression” axiom and the “subspace” axiom of Refs. [16–18]: differently from the subspace axiom, the compression axiom is not an axiom about the structure of perfectly distinguishable states available for a given system. For example, here we do not assume that all systems with the same number of perfectly distinguishable states are equivalent. This fact will be proved from the principles in Sec. XII B.

###### 4. Local distinguishability

The fourth axiom consists in the assumption of local distinguishability, here presented in the formulation of Ref. [22].

Axiom 4 (Local distinguishability). If two bipartite states are different, then they give different probabilities for at least one product experiment.

In more technical terms: if $ρ,σ∈St1(AB)$ are states and $ρ≠σ$, then there are two effects $a∈Eff(A)$ and $b∈Eff(B)$ such that

Local distinguishability is equivalent to the fact that two distant parties, holding systems $A$ and $B$, respectively, can distinguish between the two states $ρ,σ∈St1(AB)$ using only local operations and classical communication and achieving an error probability strictly larger than $pran=1/2$, the probability of error in random guess [22]. Again, this statement holds in ordinary quantum theory (on complex Hilbert spaces) and in classical information theory.

Another equivalent condition to local distinguishability is the local tomography axiom, introduced in Refs. [19,36]. The local tomography axiom imposes that every bipartite state can be reconstructed from the statistics of local measurements on the component systems. Technically, local tomography is in turn equivalent to the relation $DAB=DADB$ [16] and to the fact that every state $ρ∈St(AB)$ can be written as

$ρ=∑i=1DA∑j=1DBρijαi⊗βj,$

where ${αi}i=1DA$ ( ${βj}j=1DB$) is a basis for the vector space $StR(A)$ [ $StR(B)$]. The analog condition also holds for effects: every bipartite effect $E∈Eff(AB)$ ben be written as

$E=∑i=1DA∑j=1DBEijai⊗bj,$

where ${ai}i=1DA$ ( ${bj}j=1DB$) is a basis for the vector space $EffR(A)$ [ $EffR(B)$].

An important consequence of local distinguishability, observed in Ref. [22], is that a transformation $C∈Transf(AB)$ is completely specified by its action on $St(A)$: thanks to local distinguishability we have the implication

$Cρ=C′ρ∀ρ∈St(A)⟹C=C′.$
###### (5)

(see lemma 14 of Ref. [22] for the proof). Note that Eq. (5) does not hold for quantum theory on real Hilbert spaces [22].

###### 5. Pure conditioning

The fourth axiom states how the outcomes of a measurement on one side of a pure bipartite state can induce pure states on the other side. In this case we consider atomic measurements, that is, measurements described by observation-tests ${ai}i∈X$ where each effect $ai$ is atomic. Intuitively, atomic measurement are those with maximum “resolving power.”

Axiom 5 (Pure conditioning). If a bipartite system is in a pure state, then each outcome of an atomic measurement on one side induces a pure state on the other.

The pure conditioning property holds in quantum theory and in classical information theory as well. In fact, the statement is trivial in classical information theory, because the only pure bipartite states are the product of pure states: no matter which measurement is performed on one side, the remaining state on the other side will necessarily be pure.

The pure conditioning property, as formulated above, has been recently introduced in Ref. [37]. A stronger version of axiom 5 is the atomicity of composition introduced in Ref. [20]:

• 5′ Atomicity of composition: the sequential composition of two atomic operations is atomic.

Since pure states and atomic effects are a particular case of atomic transformations, axiom 5 $′$ implies axiom 5. In our derivation, however, also the converse implication holds: indeed, thanks to the purification postulate we will be able to show that axiom 5 implies axiom 5 $′$ (see lemma 16).

###### B. The purification postulate

The last postulate in our list is the purification postulate, which was introduced and explored in detail in Ref. [22]. While the previous axioms were also satisfied by classical probability theory, the purification axiom introduces in our derivation the genuinely quantum features. A purification of the state $ρ∈St1(A)$ is a pure state $Ψρ$ of some composite system $AB$, with the property that $ρ$ is the marginal of $Ψρ$, that is,

Here we refer to the system $B$ as the purifying system. The purification axiom states that every state can be obtained as the marginal of a pure bipartite state in an essentially unique way.

Postulate 1 (Purification). Every state has a purification. For fixed purifying system, every two purifications of the same state are connected by a reversible transformation on the purifying system.

Informally speaking, our postulate states that the ignorance about a part is always compatible with a maximal knowledge of the whole. The existence of pure bipartite states with mixed marginal was already recognized by Schrödinger as the characteristic trait of quantum theory [23]. Here, however, we also emphasize the importance of the uniqueness of purification up to reversible transformations: this property sets up a relation between pure states and reversible transformations that generates most of the structure of quantum theory. As shown in Ref. [22], an impressive number of quantum features are actually direct consequences of purification. In particular, purification implies the possibility of simulating any irreversible process through a reversible interaction of the system with an environment that is finally discarded.

##### IV. FIRST CONSEQUENCES OF THE PRINCIPLES
###### A. Results about ideal compression

Let $ρ∈St1(A)$ be a state and let $E∈Transf(A,C)$ [or $D∈Transf(C,A)$] be its encoding (or decoding) in the ideal compression scheme of axiom 3.

Essentially the encoding operation $E∈Transf(A,C)$ identifies the face $Fρ$ with the state space $St1(C)$. In the following we provide a list of elementary lemmas showing that all statements about $Fρ$ can be translated into statements about $St1(C)$ and vice versa.

Lemma 2. The composition of decoding and encoding is the identity on $C$, namely $ED=IC$.

Proof. Since the compression is maximally efficient, for every state $τ∈St1(C)$ there is a state $σ∈Fρ$ such that $Eσ=τ$. Using the fact that $DEσ=σ$ (the compression is lossless) we then obtain $EDτ=EDEσ=Eσ=τ$. By local distinguishability [see Eq. (5)], this implies $ED=IC$. $▪$

Lemma 3. The image of $St1(C)$ under the decoding operation $D$ is $Fρ$.

Proof. Since the compression is maximally efficient, for all $τ∈St1(C)$ there exists $σ∈Fρ$ such that $τ=Eσ$. Then, $Dτ=DEσ=σ$. This implies that $D[St1(C)]⊆Fρ$. On the other hand, since the compression is lossless, for every state $σ∈Fρ$ one has $DEσ=σ$. This implies the inclusion $Fρ⊆D[St1(C)]$. $▪$

Lemma 4. If the state $ϕ∈Fρ$ is pure, then the state $Eϕ∈St1(C)$ is pure. If the state $ψ∈St1(C)$ is pure, then the state $Dψ∈Fρ$ is pure.

Proof. Suppose that $ϕ∈Fρ$ is pure and that $Eϕ$ can be written as $Eϕ=pσ+(1−p)τ$ for some $p>0$ and some $σ,τ∈St1(C)$. Applying $D$ on both sides we obtain $ϕ=pDσ+(1−p)Dτ$. Since $ϕ$ is pure we must have $Dσ=Dτ=ϕ$. Now, applying $E$ on all terms of the equality and using lemma 2 we obtain $σ=τ=Eϕ$. This proves that $Eϕ$ is pure. Conversely, suppose that $ψ∈St1(C)$ is pure and $Dψ=pσ+(1−p)τ$ for some $p>0$ and some $σ,τ∈St1(A)$. Since $Dψ$ is in the face $Fρ$ (lemma 3), also $σ$ and $τ$ are in the same face. Applying $E$ on both sides of the equality $Dψ=pσ+(1−p)τ$ and using lemma 2 we obtain $ψ=EDψ=pEσ+(1−p)Eτ$. Since $ψ$ is pure we must have $Eσ=Eτ=ψ$. Applying $D$ on all terms of the equality we then have $σ=τ=Dψ$, thus proving that $Dψ$ is pure. $▪$

We say that a state $σ∈Fρ$ is completely mixed relative to the face $Fρ$ if every state $τ∈Fρ$ can stay in the convex decomposition of $σ$. In other words, $σ$ is completely mixed relative to $Fρ$ if one has $Fσ=Fρ$. Note that in general $σ∈Fρ$ implies $Fσ⊆Fρ$.

We then have the following.

Lemma 5. If the state $ω∈Fρ$ is completely mixed relative to $Fρ$, then the state $Eω∈St1(C)$ is completely mixed. If the state $υ∈St1(C)$ is completely mixed, then the state $Dυ∈Fρ$ is completely mixed relative to $Fρ$.

Proof. Suppose that $ω$ is completely mixed relative to $Fρ$. Then every state $σ∈Fρ$ can stay in its convex decomposition, say $ω=pσ+(1−p)σ′$ with $p>0$ and $σ′∈Fρ$. Applying $E$ we have

$Eω=pEσ+(1−p)Eσ′.$
###### (6)

Since the compression is maximally efficient, for every state $τ∈St1(C)$ there exists a state $σ∈Fρ$ such that $τ=Eσ$. Choosing the suitable $σ∈Fρ$ and substituting $τ$ to $Eσ$ in Eq. (6) we then obtain that for every state $τ∈St1(C)$ there exists probability $p>0$ and a state $σ′∈Fρ$ such that

$Eω=pτ+(1−p)Eσ′.$

This implies that $Eω$ is completely mixed. Suppose now that $υ∈St1(C)$ is completely mixed. Then every state $τ∈St1(C)$ can stay in its convex decomposition, say $υ=pτ+(1−p)τ′$. with $p>0$ and $τ′∈St1(C)$. Applying $D$ on both sides we have

$Dυ=pDτ+(1−p)Dτ′.$
###### (7)

Now, using lemma 3 we have that every state $σ∈Fρ$ can be written as $σ=Dτ$ for some $τ∈St1(C)$. Choosing the suitable $τ∈St1(C)$ and substituting $σ$ to $Dτ$ in Eq. (7) we then obtain that for evert state $σ∈Fρ$ there exists a probability $p>0$ and a state $τ′∈St1(C)$ such that $Dυ=pσ+(1−p)Dτ′$. Therefore, $Dυ$ is completely mixed relative to $Fρ$. $▪$

We now show that the system $C$ used for ideal compression of the state $ρ$ is unique up to operational equivalence.

Lemma 6. If two systems $C$ and $C′$ allow for ideal compression of a state $ρ∈St1(A)$, then $C$ and $C′$ are operationally equivalent.

Proof. Let $E,D$ and $E′,D′$ denote the encoding and decoding schemes for systems $C$ and $C′$, respectively. Define the transformations $U:=E′D∈Transf(C,C′)$ and $V=ED′∈Transf(C′,C)$. It is easy to see that $U$ is reversible and $U−1=V$. Indeed, since the restriction of $D′E′$ and $DE$ to the face $Fρ$ is the identity, using lemma 3 one has $D′E′D=D$ and similarly $DED′=D′$. Hence we have $UV=E′DED′=E′D′=IC′$ and $VU=ED′E′D=ED=IC$. $▪$

It is useful to introduce the notion of equality upon input of $ρ$. We say that two transformations $A,A′∈Transf(A,B)$ are equal upon input of $ρ∈St(A)$ if their restrictions to the face identified by $ρ$ are equal, that is, if $Aσ=A′σ$ for every $σ∈Fρ$. If $A$ and $A′$ are equal upon input of $ρ$ we write $A=ρA′$.

Using the notion of equality upon input of $ρ$ we can rephrase the fact that the compression is lossless for $ρ$ as $DE=ρIA$. Similarly, we can state the following.

Lemma 7. The encoding $E$ is deterministic upon input of $ρ$, that is $(eC|E=ρ(eA|$.

Proof. For every $σ∈Fρ$ we have $(eC|E|σ)⩾(eA|DE|σ)=(eA|σ)=1$, having used Eq. (4) and the fact that the compression is lossless. Since probabilities are bounded by 1, this implies $(eC|E|σ)=(eA|σ)$ for every $σ∈Fρ$, that is, $(eC|E=ρ(eA|$. $▪$

A similar result holds for the decoding.

Lemma 8. The decoding $D$ is deterministic, that is $(eA|D=(eC|$.

Proof. For every $τ∈St1(A)$ we have $(eA|D|τ)⩾(eC|ED|τ)=(eC|τ)$, having used Eq. (4) and lemma 2. Hence $(eA|D=(eC|$. $▪$

The purification postulate 1 implies a large number of quantum features, as it was shown in Ref. [22]. Here we review only the facts that are useful for our derivation, referring to Ref. [22] for the proofs.

An elementary consequence of the uniqueness of purification is that the group $GA$ of reversible transformations on $A$ acts transitively on the set of pure states.

Lemma 9 (Transitivity on pure states). For every couple of pure states $ϕ,ϕ′∈St1(A)$ there is a reversible transformation $U∈GA$ such that $ϕ′=Uϕ$.

Proof. See lemma 20 of Ref. [22]. $▪$

Transitivity implies that for every system $A$ there is a unique state $χA∈St1(A)$ that is invariant under reversible transformations, that is, a unique state such that $UχA=χA$ for every $U∈GA$.

Lemma 10 (Uniqueness of the invariant state). For every system $A$, there is a unique state $χA$ invariant under all reversible transformations in $GA$. The invariant state has the following properties:

• (1) $χA$ is completely mixed

• (2) $χAB=χA⊗χB$.

Proof. See corollary 34 and theorem 4 of Ref. [22]. The proof of item 2 uses the local distinguishability axiom. $▪$

When there is no ambiguity we will drop the subindex $A$ and simply write $χ$.

The uniqueness of purification in postulate 1 requires that if $Ψρ,Ψρ′∈St1(AB)$ are two purifications of $ρ∈St1(A)$, then there exists a reversible transformation $U∈GB$ such that $Ψρ′=(IA⊗U)Ψρ$. The following lemma extends the uniqueness property to purifications with different purifying systems.

Lemma 11 (Uniqueness of the purification up to channels on the purifying systems). Let $Ψ∈St1(AB)$ and $Ψ′∈St1(AC)$ be two purifications of $ρ∈St1(A)$. Then there exists a channel $C∈Transf(B,C)$ such that

Proof. See lemma 21 of Ref. [22]. $▪$

Another consequence of the uniqueness of purification is the fact that any ensemble decomposition of a given mixed state can be obtained by performing a measurement on the purifying system.

Lemma 12 (Purification of preparation-tests). Let $ρ∈St1(A)$ be a state and $Ψρ∈St1(AB)$ be a purification of $ρ$. If ${ρi}i∈X$ be a preparation test such that $∑i∈Xρi=ρ$, then there exists an observation-test ${ai}i∈X$ on the purifying system such that

Proof. See lemma 8 of Ref. [22]. $▪$

An easy consequence is the following.

Corollary 1. If $Ψρ∈St1(AB)$ is a purification of $ρ∈St1(A)$ and $σ$ belongs to the face $Fρ$, then there exists an effect $b$ and a nonzero probability $p>0$ such that

An important consequence of purification and local distinguishability is the relation between equality upon input of $ρ$ and equality on the purifications of $ρ$.

Theorem 1 (Equality upon input of $ρ$ vs equality on purifications of $ρ$). Let $Ψ∈St1(AC)$ be a purification of $ρ∈St1(A)$, and let $A,A′∈Transf(A,B)$ be two transformations. Then one has

$(A⊗IC)Ψρ=(A′⊗IC)Ψρ⟺A=ρA′.$

Proof. See theorem 1 of Ref. [22]. The proof of the direction $⟸$ uses the local distinguishability axiom. $▪$

As a consequence, the purification of a completely mixed state allows for the tomography of transformations:

Corollary 2. Let $ω∈St1(A)$ be completely mixed and $Ψω∈St1(AC)$ is a purification of $ω$. Then, for all transformations $A,A′∈Transf(A,B)$ one has

$(A⊗IC)Ψω=(A′⊗IC)Ψω⟺A=A′.$

Proof. By theorem 1 the first condition is equivalent to $A=ωA′$. Since $ω$ is completely mixed, this means $Aσ=A′σ$ for every $σ∈St1(A)$. By local distinguishability [see Eq. (5)] this implies $A=A′$. $▪$

Corollary shows that the state $(A⊗IC)Ψω$ characterizes the transformation $A$ completely. We will express this fact by saying that the state $Ψω$ is dynamically faithful [20], or just faithful, for short. Using this notion we can rephrase corollary 2.

Corollary 3. If $Ψ∈St1(AC)$ is pure and its marginal on system $A$ is completely mixed, then $Ψ$ is dynamically faithful for system $A$.

Let us choose a fixed faithful state for system $A$, say $Ψ∈St1(AC)$. Then for every transformation $C∈Transf(A,B)$ we can define the Choi state $RC∈St(BC)$ as

We then have the following.

Theorem 2 (Choi isomorphism). For a given faithful state $Ψ∈St1(AC)$ the map $C↦RC:=(C⊗IC)Ψ$ has the following properties:

• (1) It defines a bijective correspondence between tests ${Ci}i∈X$ from $A$ to $B$ and collections of states ${Ri}i∈X$ for $BC$ satisfying

$∑ i ∈ X e B R i B C = e A Ψ A C .$
• (2) The transformation $C$ is atomic if and only if the corresponding state $RC$ is pure.

Proof. See theorem 17 of Ref. [22]. $▪$

A simple consequence of the Choi isomorphism is the following.

Corollary 4. Let ${Ci}i∈X⊂Transf(A,B)$ be a collection of transformations. Then, ${Ci}i∈X$ is a test if and only if

$∑i∈XeBCi=eA.$

In particular, let ${ai}i∈X⊂Eff(A)$ be a collection of effects. Then, ${ai}i∈X$ is an observation test if and only if

$∑i∈Xai=e.$
###### (8)

Proof. Apply item 1 of theorem 2 to the collection of states ${Ri}i∈X$ defined by $Ri:=(Ci⊗IC)Ψ$. $▪$

A much deeper consequence of the Choi isomorphism is the following theorem.

Theorem 3 (States specify the theory)

Let $Θ,Θ′$ be two theories satisfying the purification postulate. If $Θ$ and $Θ′$ have the same sets of normalized states, then $Θ′=Θ$.

Proof. See theorem 19 of Ref. [22]. $▪$

Thanks to theorem 3 to derive quantum theory we will only need to prove that our principles imply that for every system $A$ the normalized states $St1(A)$ can be described as positive Hermitian matrices with unit trace. Once this is proved, theorem 3 automatically ensures that all the dynamics and all the measurements allowed by the theory are exactly the dynamics and the measurements allowed in quantum theory.

Note that in the definition of the Choi state we left the freedom to choose the faithful state $Ψ∈St1(AC)$. Among many possibilities, one convenient choice is to take a faithful state $Φ∈St1(AC)$ obtained as a purification of the invariant state $χ∈St1(A)$. Moreover, as we will see in the next paragraph, we can always choose the purifying system $C$ in such a way that the marginal on $C$ is completely mixed.

###### C. Results about the combination of compression and purification

An important consequence of the combination of the purification postulate with the compression axiom is the fact that one can always choose a purification of $ρ$ such that the marginal state on the purifying system is completely mixed. To prove this result we need the following lemma.

Lemma 1. Let $ρ∈St1(A)$ be a state and let $Ψρ∈St1(AB)$ be a purification of $ρ$. If $E∈Transf(A,C)$ is the encoding operation in the compression scheme of axiom 3, then the state $Ψρ′:=(E⊗IB)Ψρ$ is pure.

Proof. Let $D∈Transf(C,A)$ be the decoding operation. Since the compression is lossless for $ρ$ we know that $DE=ρIA$. By theorem 1 this is equivalent to the condition $(DE⊗IB)Ψρ=Ψρ$. Now, suppose that $(E⊗IB)Ψρ=∑i∈XΓi$. Applying $D$ on both sides we then obtain $Ψρ=∑i∈X(D⊗IB)Γi$, and, since $Ψρ$ is pure, for every $i∈X$ we must have $(D⊗IB)Γi=piΨρ$, where $pi⩾0$ is some probability. Finally, since $ED=IC$ (lemma 2), one has $Γi=pi(E⊗IB)Ψρ$. Hence, $(E⊗IB)Ψρ$ admits only decompositions with $Γi=pi(E⊗IB)Ψρ$, that is, $(E⊗IB)Ψρ$ is pure. $▪$

We are now in position to prove the desired result.

Theorem 4. For every state $ρ∈St1(A)$ there exists a system $C$ and a purification $Ψρ∈St1(AC)$ of $ρ$ such that the marginal state on system $C$ is completely mixed. Moreover, the system $C$ is unique up to operational equivalence.

Proof. Take an arbitrary purification of $ρ$, say $Φρ∈St1(AB)$ for some purifying system $B$. Define the marginal state on system $B$ as $|θ)B:=(e|A|Φρ)AB$ and define the state $Ψρ:=(IA⊗E)Φρ$, where $E∈Transf(B,C)$ the encoding operation for state $θ$. By lemma 13 we know that $Ψρ∈St(AC)$ is pure. Using lemma 7 and theorem 1 we obtain $(eC||Ψρ)=[(eC|E]|Φρ)=(eB||Φρ)=|ρ)$, that is, $Ψρ$ is a purification of $ρ$. Finally, the marginal on system $C$ is given by $ρ̃=Eθ$, which by lemma 5 is completely mixed. This proves the first part of the thesis. It remains to show that the system $C$ is uniquely defined up to operational equivalence. Suppose that $Ψρ′∈St(AC′)$ is another purification of $ρ$ with the property that the marginal on system $C′$ is completely mixed. Since $Ψρ$ and $Ψρ′$ are two purifications of the same state, there must be two channels $C∈Transf(C,C′)$ and $R∈Transf(C′,C)$ such that $Ψρ′=(IA⊗C)Ψρ$ and $Ψρ=(IA⊗R)Ψρ′$ (lemma 11). Combining the two equalities one obtains $Ψρ=(IA⊗RC)Ψρ$. Now, the marginal of $Ψρ$ on system $C$ is completely mixed, and this implies that $Ψρ$ is faithful for system $C$ (corollary 3). Hence we have $RC=IC$. Repeating the same argument for $Ψρ′$ we obtain $CR=IC′$. Therefore, $C$ is reversible and $R=C−1$. This proves that $C$ and $C′$ are operationally equivalent. $▪$

The following facts will also be useful.

Corollary 5. Let $Ψρ∈St1(AB)$ be a purification of $ρ∈St1(A)$ and let $E∈Transf(A,C)$ be the encoding for $ρ$. Then, the state $(E⊗IB)Ψρ∈St1(CB)$ is dynamically faithful for $C$.

Proof. The marginal of $(E⊗IB)Ψρ$ on system $C$ is $Eρ$, which is completely mixed by lemma 5. Hence, $(E⊗IB)Ψρ$ is dynamically faithful by corollary 3. $▪$

Lemma 14. The decoding transformation $D∈Transf(C,A)$ in the ideal compression for $ρ∈St1(A)$ is atomic.

Proof. Let $Ψρ∈St1(AB)$ be a purification of $ρ$, for some purifying system $B$. Since $DE=ρIA$ (the compression is lossless), we have $(DE⊗IB)|Ψρ)=|Ψρ)$ (theorem 1). Now, by corollary 5 $(E⊗IB)|Ψρ)$ is faithful for $C$ and by lemma 13 $(E⊗IB)|Ψρ)$ is pure. Using the Choi isomorphism with the faithful state $Ψ:=(E⊗IB)Ψρ$ we then obtain that $D$ is atomic. $▪$

###### D. Teleportation and the link product

For every system $A$ one can choose a completely mixed state $ωA$ and a purification $Ψ(A)∈St(AÃ)$ such that the marginal on system $Ã$ is completely mixed (cf. theorem 4). Any such purification allows for a probabilistic teleportation scheme:

Lemma 15 (Probabilistic teleportation). There exists an atomic effect $E(A)∈Eff(ÃA)$ and a nonzero probability $pA$ such that

and

Proof. See corollary 19 of Ref. [22]. $▪$

Let us choose $Ψ(A)$ to be the faithful state in the definition of the Choi isomorphism. Then the sequential composition of transformation induces a composition of Choi states in following way.

Corollary 6 (Link product). For two transformations $C∈Transf(A,B)$ and $D∈Transf(B,C)$ the Choi state of $DC∈Transf(A,C)$ is given by the link product

###### (9)

Proof. See corollary 22 of Ref. [22]. $▪$

We conclude this paragraph with an important result that follows from the combination of the link product structure with the pure conditioning axiom.

Lemma 16 (Atomicity of composition). The composition of two atomic transformations is atomic.

Proof. Let $C∈Transf(A,B)$ and $D∈Transf(B,C)$ be two atomic transformations. By the Choi isomorphism, the (unnormalized) states $RC$ and $RD$ are pure. Since the teleportation effect $E(B)$ in Eq. (9) is atomic (lemma 15), the pure conditioning axiom 5 implies the state $RDC$ is pure. By the Choi isomorphism this means that $DC$ is atomic. $▪$

###### E. No information without disturbance

We say that a test ${Ci}i∈X⊂Transf(A)$ is nondisturbing upon input of $ρ$ if $∑i∈XCi=ρIA$. If $ρ$ is completely mixed, we simply say that the test is nondisturbing.

A consequence of the purification postulate is the following “no-information without disturbance” result.

Lemma 17 (No information without disturbance). A test ${Ci}i∈X⊂Transf(A)$ is nondisturbing upon input of $ρ$ if and only if there is a set of probabilities ${pi}i∈X$ such that $Ci=ρpiIA$ for every $i∈X$.

Proof. See theorem 10 of Ref. [22]. $▪$

The no-information without disturbance result implies the following geometrical limitation.

Corollary 7. For every system $A$ the convex set of states $St1(A)$ is not a segment.

Proof. The proof is by contradiction. Suppose that for some system $A$ the set $St1(A)$ is a segment. The segment has only two pure states, say $ϕ1$ and $ϕ2$, and every other state $ρ∈St1(A)$ is completely mixed. Then the distinguishability axiom 2 imposes that $ϕ1$ and $ϕ2$ are perfectly distinguishable. Take the binary test ${a1,a2}$ such that $(ai|ϕj)=δij$ and define the “measure-and-prepare” test ${C1,C2}$ as $Ci=|ϕi)(ai|$, $i=1,2$ (the possibility of preparing a state depending on the outcome of a previous measurement is guaranteed by causality [22]). Since every state $ρ$ in the segment can be written as convex combination of the two extreme points, we have that the test ${C1,C2}$ is nondisturbing: $(C1+C2)ρ=ρ$ for every $ρ$. This is in contradiction with lemma 17 because $C1$ and $C2$ are not proportional to the identity. $▪$

We know that no information can be extracted without disturbance. In the following we will prove a result in the converse direction: if a measurement extracts no information, than it can be realized in a nondisturbing fashion. To show this result we first need the following.

Lemma 18. For every observation test ${ai}i∈X⊂Eff(A)$ with finite outcome set $X$ there is a system $C$ and a test ${Ai}i∈X⊂Transf(A,C)$ consisting of atomic transformations such that $(ai|=(eC|Ai$.

Proof. Let $|Ψ)AB$ be a pure faithful state for system $A$ and let $|Ri)B=(ai|A|Ψ)AB$ the Choi state of $ai$. Take a purification of $Ri$, say $|Ψi)BC$ for some purifying system $C$ [38]. Then, by the Choi isomorphism there is a test ${Ai}i∈X$, with input $A$ and output $C$, such that

(see item 1 of theorem 2). Moreover, each transformation $Ai:A→C$ is atomic (item 2 of theorem 2). Applying the deterministic effect $(eC|$ on both sides we then obtain $|Ri)B=(eC||Ψi)CA=(eC|Ai|Ψ)AB$. By definition of $Ri$, this implies $(ai|A|Ψ)AB=(eC|Ai|Ψ)AB$, and, since $Ψ$ is dynamically faithful, $(ai|A=(eC|Ai$. $▪$

Theorem 5. Let $ρ∈St1(A)$ be a state, $a∈Eff(A)$ be an effect, and $A∈Transf(A,B)$ be an atomic transformation such that $(a|A=(e|BA$. If $(a|=ρp(e|$ for some $p⩾0$, then there exists a channel $C∈Transf(B,A)$ such that $CA=ρpIA$.

Proof. Consider a purification of $ρ$, say $Ψρ∈St(AC)$, and define the state $Σ∈St1(BC)$ by $|Σ):=1p(A⊗IC)|Ψρ)$. By the atomicity of composition 16 the state $Σ$ is pure. Moreover, we have

$eBΣBC=1p(a|A|Ψρ)AB=(eA||Ψρ)AC,$

having used theorem 1 in the last equality. This implies that $Ψρ$ and $Σ$ are different purifications of the same mixed state on system $C$. Then, by lemma 11 there exists a channel $C∈Transf(B,A)$ such that $|Ψρ)=(C⊗IC)|Σ)=1p(CA⊗IC)|Ψρ)$. By theorem 1, the last equality implies $CA=ρpIA$. $▪$

We now make a simple observation that combined with theorem 5 will lead to some interesting consequences.

Lemma 19. If $(a|ρ)=∥a∥$, then $a=ρ∥a∥e$. Similarly, if $(a|ρ)=0$, then $a=ρ0$.

Proof. By definition, $σ∈Fρ$ iff there exists $p>0$ and $τ∈St1(A)$ such that $ρ=pσ+(1−p)τ$. If $(a|ρ)=∥a∥$, then we have $∥a∥=p(a|σ)+(1−p)(a|τ)$. Since $(a|σ)$ and $(a|τ)$ cannot be larger than $∥a∥$, the only way to have the equality is to have $(a|σ)=(a|τ)=∥a∥$. By definition, this amounts to say $a=ρ∥a∥e$. Similarly, if $(a|ρ)=0$, one has $0=p(a|σ)+(1−p)(a|τ)$, which is satisfied only if $(a|σ)=(a|τ)=0$, that is, if $a=ρ0$. $▪$

As consequence, we have the following.

Corollary 8. Let $ρ∈St1(A)$ be a state, $a∈Eff(A)$ be an effect, and $A∈Transf(A,B)$ be an atomic transformation such that $(a|A=(e|BA$. If $(a|ρ)=1$, then $A$ is correctable upon input of $ρ$, that is, there exists a correction operation $C∈Transf(B,A)$ such that $CA=ρIA$.

Proof. If $(a|ρ)=1$, then clearly $∥a∥=1$. Lemma 19 then implies $(a|=ρ(e|$. Applying theorem 5 we finally obtain the thesis. $▪$

Corollary 9. Let $ρ∈St1(A)$ be a state, $a∈Eff(A)$ be an effect such that $(a|ρ)=1$. Then there exists a transformation $C∈Transf(A)$ such that $(a|=(e|C$ and $C=ρI$.

Proof. Straightforward consequence of lemma 18 and of corollary 8. $▪$

Finally, we say that an observation-test ${ai}i∈X$ is noninformative upon input of $ρ$ if we have $(ai|=ρpi(e|$ for every $i∈X$. This means that the test ${ai}i∈X$ is unable to distinguish the states in the face $Fρ$. As a consequence of theorem 5 we have the following “no disturbance without information” result.

Corollary 10 (No disturbance without information). If the test ${ai}i∈X$ is noninformative upon input of $ρ$ then there is a test ${Di}i∈X⊂Transf(A)$ that is nondisturbing upon input of $ρ$ and satisfies $(e|Di=(ai|$ for every $i∈X$.

Proof. By lemma 18 there exists a test ${Ai}⊂Transf(A,B)$ such that each transformation $Ai$ is atomic and $(e|Ai=(ai|$. By theorem 5, for each $Ai$ there is a correction channel $Ci$ such that $CiAi=ρpiIA$. Defining $Di:=CiAi$ we then obtain the thesis. $▪$

##### V. PERFECTLY DISTINGUISHABLE STATES

In this section we prove some basic facts about perfectly distinguishable states. Let us start from the definition.

Definition 5 (Perfectly distinguishable states). The normalized states ${ρi}i=1N⊆St1(A)$ are perfectly distinguishable if there exists an observation-test ${ai}i=1N$ such that $(aj|ρi)=δij$. The observation-test ${ai}i=1N$ is called perfectly distinguishing.

From the distinguishability axiom 2 it is clear that every nontrivial system has at least two perfectly distinguishable states.

Lemma 20. For every nontrivial system $A$ there are at least two perfectly distinguishable states.

Proof. Let $ϕ$ be a pure state of $A$. Obviously, $ϕ$ is not completely mixed (unless the system $A$ has only one state, that is, unless $A$ is trivial). Hence, by axiom 2 there exists at least a state $σ$ that is perfectly distinguishable from $ϕ$. $▪$

An equivalent condition for perfect distinguishability is the following.

Lemma 21. The states ${ρi}i=1N⊂St1(A)$ are perfectly distinguishable if and only if there exists an observation-test ${ai}i=1N$ such that $(ai|ρi)=1$ for every $i$.

Proof. The condition $(ai|ρi)=1,∀i=1,⋯,N$ is clearly necessary. On the other hand, the condition $(ai|ρi)=1,∀i=1,⋯,N$ implies

$(ai|ρi)=1=∑j=1N(aj|ρi)=(ai|ρi)+∑i≠j(aj|ρi).$

Since all probabilities are nonnegative, we must have $(aj|ρi)=0$ for $i≠j$, and therefore, $(aj|ρi)=δij$. $▪$

A very general fact about state discrimination is expressed by the following.

Lemma 22. If $ρ$ is perfectly distinguishable from $σ$ and $ρ′$ (or $σ′$) belongs to the face identified by $ρ$ (or $σ$), then $ρ′$ is perfectly distinguishable from $σ′$.

Proof. Let ${a,e−a}$ be the binary observation test that distinguishes perfectly between $ρ$ and $σ$. By definition, $a∈Eff(A)$ is such that $(a|ρ)=1$ and $(a|σ)=0$. Now, by lemma 19, $(a|ρ′)=1$ and $(a|σ′)=0$ for all $ρ′∈Fρ$ and $σ′∈Fσ$. $▪$

Thanks to purification and to the local distinguishability axiom 4, we are also in position to show a much stronger result.

Lemma 23. Let ${ρi}i=1N⊂Fρ$ and ${ρj}j=N+1N+M⊂Fσ$ be two sets of perfectly distinguishable states.

If $ρ$ is perfectly distinguishable from $σ$, then the states ${ρi}i=1N+M$ are perfectly distinguishable.

Proof. Let ${a,eA−a}$ be the observation test such that $(a|ρ)=1$ and $(a|σ)=0$. Now, by corollary 9 there is a transformation $C∈Transf(A)$ such that $(eA|C=(a|$ and $C=ρIA$. Similarly, there exists a transformation $C′∈Transf(A)$ such that $(eA|C′=(eA|−(a|$ and $C′=σIA$. We can then define the following observation test:

$ci=aiCi⩽NbiC′N+1⩽i⩽N+M,$

where ${ai}i=1N$ (or ${bj}j=N+1N+M$) is the observation test that perfectly distinguishes among the states ${ρi}i=1N$ (or ${ρj}j=N+1N+M$). By corollary 4 [see in particular Eq. (8)], ${ci}i=1N+M$ is indeed an observation test: each $ci$ is an effect and one has the normalization

$∑i=1N+Mci=∑i=1NaiC+∑i=N+1N+MbiC′=eAC+eAC′=a+eA−a=eA.$

Moreover, since $C=ρIA$ and $C′=σIA$, one has $(ci|ρi)=1$ for every $i=1,⋯,M+N$. By lemma 21, this implies that the states ${ρi}i=1N+M$ are perfectly distinguishable. $▪$

Definition 6. A set of perfectly distinguishable states ${ρi}i=1N$ is maximal if there is no state $ρN+1∈St1(A)$ such that the states ${ρi}i=1N+1$ are perfectly distinguishable.

Theorem 6. A set of perfectly distinguishable states ${ρi}i=1N$ is maximal if and only if the state $ω=∑i=1Nρi/N$ is completely mixed.

Proof. We first prove that if $ω$ is completely mixed, then the set ${ρi}i=1N$ must be maximal. Indeed, if there existed a state $ρN+1$ such that ${ρi}i=1N+1$ are perfectly distinguishable, then clearly $ρN+1$ would be distinguishable from $ω$. This is absurd because by proposition 1 no state can be perfectly distinguished from a completely mixed state. Conversely, if ${ρi}i=1N$ is maximal, then $ω$ is completely mixed. If it were not, by the distinguishability axiom 2, $ω$ would be perfectly distinguishable from some state $ρN+1$. By lemma 23, this would imply that the states ${ρi}i=1N+1$ are perfectly distinguishable, in contradiction with the hypothesis that the set ${ρi}i=1N$ is maximal. $▪$

Lemma 24. Every set of perfectly distinguishable pure states can be extended to a maximal set of perfectly distinguishable pure states.

Proof. Let ${ϕi}i=1N$ be a nonmaximal set of perfectly distinguishable pure states. By definition, there exists a state $σ$ such that ${ϕi}i=1N∪{σ}$ is perfectly distinguishable. Let $ϕN+1$ be a pure state in $Fσ$. By lemma 19 the states ${ϕi}i=1N+1$ will be perfectly distinguishable. Since the dimension of $StR(A)$ is finite and distinguishable states are linearly independent, iterating this procedure one finally obtains a maximal set of pure states in a finite number of steps. $▪$

Corollary 11. Any pure state belongs to a maximal set of perfectly distinguishable pure states.

We conclude this section with a few elementary facts about how the ideal compression of axiom 3 preserves the distinguishability properties. In the following we will choose a state $ρ∈St1(A)$ and $E∈Transf(A,C)$ [or $D∈Transf(C,A)$] will be the encoding (or decoding) in the ideal compression scheme for $ρ$.

Lemma 25. If the states ${ρi}i=1k⊂Fρ$ are perfectly distinguishable, then the states ${Eρi}i=1k⊂St1(C)$ are perfectly distinguishable. Conversely, if the states ${σi}i=1k⊂St1(C)$ are perfectly distinguishable, then the states ${Dσi}i=1k⊂Fρ$ are perfectly distinguishable.

Proof. Let ${ai}i=1k$ be the observation test such that $(ai|ρi)=1$ for every $i=1,⋯,k$. Since the compression is lossless, we have $DE|ρi)=|ρi)$ and $(ai|DE|ρi)=1$. Now, consider the test ${ci}i=1k$ defined by $(ci|=(ai|D$. Clearly we have $(ci|E|ρi)=1$ for every $i=1,⋯,k$. By lemma 21 this means that the states ${Eρi}i=1k$ are perfectly distinguishable. Similarly, let ${bi}i=1k$ the observation test that distinguishes the set ${σi}i=1k$. Since $ED=IC$ (lemma 2), we can conclude by the same argument that the states ${Dσi}i=1k$ are perfectly distinguishable. $▪$

We say that a set of perfectly distinguishable states ${ρi}i=1k⊂Fρ$ is maximal in the face $Fρ$ if there is no state $ρk+1∈Fρ$ such that the states ${ρi}i=1k+1$ are perfectly distinguishable. We then have the following.

Corollary 12. If ${ρi}i=1k⊂Fρ$ is a maximal set of perfectly distinguishable states in the face $Fρ$, then ${Eρi}i=1k∈St1(C)$ is a maximal set of perfectly distinguishable states. Conversely, if ${σi}i=1kSt1(C)$ is a maximal set of perfectly distinguishable states, then ${Dσi}i=1k$ is a maximal set of perfectly distinguishable states in the face $Fρ$.

Proof. Distinguishability of the states ${Eρi}i=1k$ and ${Dσi}i=1k$ is proved by lemma 25. Let us now prove maximality. By contradiction, suppose that the set ${ρi}i=1k$ is maximal in the face $Fρ$ while the set ${σi}i=1k$, $σi:=Eρi$ is not maximal. This means that there exists a state $σk+1∈St1(C)$ such that the states ${σi}i=1k+1$ are perfectly distinguishable. By lemma 25 the states ${Dσi}i=1k+1$ are perfectly distinguishable. Since $DEρi=ρi$ for every $i=1,⋯,k$, this means that the states ${ρi}i=1k∪{Dσk+1}$ are perfectly distinguishable, in contradiction with the fact that ${ρi}i=1k$ is maximal. This proves that the set ${Eρi}i=1k$ must be maximal. Conversely, if the set ${σi}⊂St1(C)$ is maximal, using the same argument we can prove that the set ${Dσi}i=1k$ must be maximal in $Fρ$. $▪$

##### VI. DUALITY BETWEEN PURE STATES AND ATOMIC EFFECTS

We now show the existence of a one-to-one correspondence between states and effects of any system $A$ in the theory. Let us start from a simple observation.

Lemma 26. If $a$ is atomic and $(a|ρ)=∥a∥$ for $ρ∈St1(A)$, then $ρ$ must be pure.

Proof. By lemma 19 the condition $(a|ρ)=∥a∥$ implies $a=ρ∥a∥e$. By theorem 1 the condition $a=ρ∥a∥e$ implies

where $Ψρ∈St1(AB)$ is any purification of $ρ$. Since $a$ is atomic, the pure conditioning axiom 5 implies that the marginal state $|ρ̃)B=(e|A|Ψρ)AB$ is pure. Since the marginal of $Ψρ$ on system $B$ is pure, $Ψρ$ must be factorized, that is, $Ψρ=ρ⊗ρ̃$ (see lemma 19 of Ref. [1]). Hence, $ρ$ must be pure, otherwise we would have a nontrivial convex decomposition of the pure state $Ψρ$. $▪$

We are now in position to show that every atomic effect is associated to a unique pure state.

Theorem 7. For every atomic effect $a∈Eff(A)$, there exists a unique pure state $ϕ∈St1(A)$ such that $(a|ϕ)=∥a∥$.

Proof. Let $ρ$ be a state such that $(a|ρ)=∥a∥$. By lemma 26 $ρ$ must be pure. Moreover, this pure state must be unique: suppose that $ϕ$ and $ϕ′$ are pure states such that $(a|ϕ)=(a|ϕ′)=∥a∥$. Then for $ω=1/2(ϕ+ϕ′)$ one has $(a|ω)=∥a∥$. Since $ω$ must be pure, one has $ϕ=ϕ′$. $▪$

We now show the converse result: for every pure state $ϕ∈St1(A)$ there exists a unique atomic effect $a$ such that $(a|ϕ)=1$. Let us start from the existence.

Lemma 27. Let ${ϕi}i=1N⊂St1(A)$ be a maximal set of perfectly distinguishable pure states and let ${ai}i=1N$ be the observation test such that $(ai|ϕj)=δij$. Then each effect $ai$ is atomic with $∥ai∥=1$.

Proof. It is obvious that $∥ai∥=1$ because of the condition $(ai|ϕi)=1$. It remains to prove atomicity. Consider the state $ω=∑i=1Nϕi/N$, which is completely mixed by theorem 6. Let $Ψω∈St1(AB)$ be a purification of $ω$, chosen in such a way that the marginal on system $B$ is completely mixed (theorem 4). As a consequence of purification (lemma 12), there exists an observation-test ${bi}i=1N$ on system $B$ such that $(bi|B|Ψω)AB=1/N|ϕi)A$. Since $Ψω$ is dynamically faithful on system $B$, each effect $bi$ must be atomic. Now, define the normalized states ${ρi}i=1N⊂St1(B)$ and the probabilities ${pi}i=1N$ by

###### (10)

Applying the deterministic effect $eB$ on both sides one has $pi=(ai|ω)=1/N$. On the other hand, applying the effect $bj$ one has instead $1/N(bj|ρi)B=1/N(ai|ϕj)=δij/N$. This implies $(bi|ρi)=1$ for every $i$. Since $bi$ is atomic, lemma 26 forces each $ρi$ to be pure. Finally, each $ai$ must be atomic since its Choi state $pi|ρi)B=(ai|A|Ψω)AB$ is pure (theorem 2). $▪$

As a consequence, we can prove the following existence result.

Lemma 28. For every pure state $ϕ∈St1(A)$ there exists an atomic effect such that $(a|ϕ)=1$.

Proof. By corollary 11, every pure state belongs to a maximal set of perfectly distinguishable pure states ${ϕi}i=1N$, say $ϕ=ϕ1$. The thesis then follows from lemma 27. $▪$

We now prove that the atomic effect $a$ such that $(a|ϕ)=1$ is unique. For this purpose we need two auxiliary lemmas.

Lemma 29. Let $ϕ∈St1(A)$ be an arbitrary pure state and let $pϕ$ be the probability defined by

$pϕ=maxp:∃σ,χ=pϕ+(1−p)σ,$
###### (11)

where $χ$ is the invariant state of system $A$. Then the value of the probability $pϕ$ is independent of $ϕ$.

Proof. Since for every couple of pure states $ϕ$ and $ψ$ one has $ψ=Uϕ$ for some reversible channel $U$ (lemma 9), and since $χ$ is invariant, one has $χ=pϕ+(1−p)σ$ if and only if $χ=pψ+(1−p)Uσ$. The maximum probabilities for $ϕ$ and $ψ$ are then equal. $▪$

Since $pϕ=pψ$ for every couple of pure states, from now on we will write $pmax$ in place of $pϕ$.

Lemma 30. Let $ϕ∈St1(A)$ be a pure state and $a∈Eff(A)$ be an atomic effect such that $(a|ϕ)=1$. Let $|Φ)AB$ be a purification of the invariant state $|χ)A$, chosen in such a way that the marginal on system $B$ is completely mixed, and let $b$ be the unique atomic effect on $B$ such that

###### (12)

[note that $b$ exists by lemma 12 is uniquely defined by Eq. (12) because $Φ$ is faithful for system $B$]. Then one has

###### (13)

where $ψ$ is the unique pure state such that $(b|ψ)=1$.

Proof. Define the normalized pure state $ψ$ and the probability $q$ by

###### (14)

In order to prove the thesis we have to show that $q=pmax$ and $(b|ψ)=1$. Applying $b$ on both sides of Eq. (14) and using Eq. (12) we obtain $q(b|ψ)=pmax(a|ϕ)=pmax$. This implies

$q⩾pmax,$
###### (15)

with the equality if and only if $(b|ψ)=1$. Let $b′$ be an atomic effect such that $(b′|ψ)=1$ (such an effect exists because of lemma 28). Define the normalized pure state $ϕ′$ and the probability $p′$ by

Applying $a$ on both sides and using Eq. (14) we obtain $p′(a|ϕ′)=q(b′|ψ)=q$, which implies $p′⩾q$, with the equality if and only if $(a|ϕ′)=1$. Combining this with the inequality (15) we have $p′⩾q⩾pmax$. On the other hand, by lemma 29 one has $p′⩽pmax$, and consequently $p′=q=pmax$. This also implies that $(b|ψ)=1$ and $(a|ϕ′)=1$. $▪$

Theorem 8. For every pure state $ϕ∈St1(A)$ there is a unique atomic effect $a∈Eff(A)$ such that $(a|ϕ)=1$.

Proof. Existence has been already proved in lemma 28. Let us prove uniqueness: suppose that $a$ and $a′$ are two atomic effects such that $(a|ϕ)=(a′|ϕ)=1$. Then, applying lemma 30 to $a$ and $a′$ we obtain

Since $Φ$ is dynamically faithful, this implies $a=a′$. $▪$

Finally, an important consequence of theorem 8.

Corollary 13. If $a,a′∈Eff(A)$ are two atomic effects with $∥a∥=∥a′∥=1$, then there is a reversible channel $U∈GA$ such that $(a′|A=(a|AU$.

Proof. Let $ϕ$ and $ϕ′$ be the (unique) normalized states such that $(a|ϕ)=1$ and $(a′|ϕ′)=1$, respectively. Now, there is a reversible channel $U∈GA$ such that $|ϕ)A=U|ϕ′)A$. Hence $(a′|ϕ′)=(a|ϕ)A=(a|U|ϕ′)$. By theorem 8 one has $(a′|A=(a|AU$. $▪$

We conclude this section with an elementary result that will be used later in the paper.

Lemma 31. Let $E∈Transf(A,C)$ and $D∈Transf(C,A)$ be the encoding and the decoding in the ideal compression scheme for $ρ∈St1(A)$. If $|ϕ)∈Fρ$ is a pure state and $(a|∈Eff(A)$ is the atomic effect such that $(a|ϕ)=1$, then $|γ):=E|ϕ)∈St1(C)$ is a pure state and $(c|:=(a|D∈Eff(C)$ is the atomic effect such that $(c|γ)=1$.

Proof. The state $|γ):=E|ϕ)$ is pure by lemma 4. The effect $(c|:=(a|D$ is atomic by lemmas 14 and 16. Since $DE=ρIA$, one has $(c|γ)=(a|DE|ϕ)=(a|ϕ)=1$. $▪$

##### VII. DIMENSION

In this section we show that each system in our theory has given informational dimension, defined as the maximum number of perfectly distinguishable pure states available in the system. In the Hilbert space framework, the informational dimension will be the dimension of the Hilbert space.

Lemma 32. All maximal sets of perfectly distinguishable pure states have the same number of elements.

Proof. Let ${ϕi}i=1N$ be a maximal set of perfectly distinguishable pure states for system $A$, and let ${ai}i=1N$ the observation test such that $(ai|ϕj)=δij$. By lemma 27 each $ai$ is atomic and $∥ai∥=1$. Then, by corollary 13 one has $(ai|A=(a0|Ui$, where each $Ui$ is a reversible channel and $a0$ is a fixed atomic effect with $∥a0∥=1$. By the invariance of $χ$ we then obtain $(ai|χA)=(a0|Ui|χA)=(a0|χA)$. On the other hand, one has $∑i=1N(ai|χA)=1$, which implies $N=1/(a0|χA)$. Since $a0$ is arbitrary, $N$ is independent of the choice of the set ${ϕi}i=1N$. $▪$

As a consequence, the number of perfectly distinguishable pure states in a maximal set is a property of the system $A$. We will call this number the informational dimension (or simply the dimension) of system $A$, and denote it with $dA$. The informational dimension $dA$ has not to be confused with the size $DA$, defined as the dimension of the real vector space $StR(A)$.

An immediate consequence of the proof of lemma 32.

Corollary 14. For every atomic effect $a$ with $∥a∥=1$ one has $(a|χA)=1/dA$.

This simple fact has two very important consequences. The first is that the dimension of a composite system is the product of the dimensions of the components.

Corollary 15. The dimension of the composite system $AB$ is the product of the dimensions of $A$ and $B$, namely $dAB=dAdB$.

Proof. From lemma 10 we know that $χA⊗χB$ is the unique invariant state of system $AB$. Now, if $a∈Eff(A)$ and $b∈Eff(B)$ are such that $∥a∥=∥b∥=1$, then $a⊗b$ is such that $∥a⊗b∥=1$. Hence we have $1/dAB=(a⊗b|χA⊗χB)=(a|χA)(b|χB)=1/(dAdB)$. $▪$

The second consequence is the relation between the dimension and the maximum probability of a pure state in the convex decomposition of the invariant state $|χ)A$.

Lemma 33. For every system $A$ the maximum probability of a pure state in the convex decomposition of the invariant state is $pmax=1/dA$.

Proof. Let $Φ∈St1(AB)$ be a purification of the invariant state $|χA)$, chosen in such a way that the marginal on system $B$ is completely mixed. Let $a∈Eff(A)$ be an atomic effect with $∥a∥=1$. Then, Eq. (13) becomes

where $ψ$ is some normalized pure state of system $B$. Applying the deterministic effect $e$ on system $B$ on both sides we obtain $(a|χA)=pmax$. Finally, corollary 14 states $(a|χA)=1/dA$. By comparison, we obtain $pmax=1/dA$. $▪$

Thanks to the compression axiom 3, the notion of dimension can be applied not only to the whole state space $St1(A)$ but also to its faces. With face $F$ of the convex set $St1(A)$ we always mean the face $Fρ$ identified by some state $ρ∈St1(A)$.

Lemma 34. Let $F$ be a face of the convex set $St1(A)$. Every maximal set ${ϕi}i=1k$ of perfectly distinguishable pure states in $F$ has the same cardinality $k$. Precisely, if $F$ is the face identified by $ρ∈St1(A)$ and $E∈Transf(A,C)$ is the encoding in the ideal compression for $ρ$, then we have $k=dC$.

Proof. The set ${Eϕi}i=1k⊂St1(C)$ is perfectly distinguishable by lemma 25, and it is maximal by corollary 12. Moreover, the states ${Eϕi}i=1k$ are pure by lemma 4. Hence the cardinality $k$ of the set ${ϕi}i=1k$ must be $k=dC$. $▪$

From now on the maximum number of perfectly distinguishable states in the face $F$ will be called the dimension of the face $F$ and will be denoted by $|F|$.

##### VIII. DECOMPOSITION INTO PERFECTLY DISTINGUISHABLE PURE STATES

In this section we show that in a theory satisfying our principles any state can be written as a convex combination of perfectly distinguishable pure states. In quantum theory, this corresponds to the diagonalization of the density matrix.

To prove this result we need first a sufficient condition for the distinguishability of states, given in the following

Lemma 35. Let ${ρi}i=1N⊂St1(A)$ be a set of states. If there exists a set of effects ${bi}i=1N⊂Eff(A)$ (not necessarily an observation test) such that $(bi|ρj)=δij$, then the states ${ρi}i=1N$ are perfectly distinguishable.

Proof. For each $i=1,⋯,N$ consider the binary test ${bi,e−bi}$. Since by hypothesis $(bi|ρj)=δij$, the test ${bi,e−bi}$ can perfectly distinguish $ρi$ from any mixture of the states ${ρj}j≠i$. In particular, this means that, for every $M, $ρM+1$ can be perfectly distinguished from the mixture $ωM=∑j=1Mρj/M$. Note that, by definition, the states ${ρi}i=1M$ belong to the face $FωM$. We now prove by induction on $M$ that the states ${ρi}i=1M$ are perfectly distinguishable. This is true for $M=1$. Now, suppose that the states ${ρi}i=1M$ are perfectly distinguishable. Since the state $ρM+1$ is perfectly distinguishable from $ωM$, by lemma 23 we have that the states ${ρi}i=1M+1$ are perfectly distinguishable. Taking $M=N−1$ the thesis follows. $▪$

We now show that the invariant state $χ$ is a mixture of perfectly distinguishable pure states.

Theorem 9. For every maximal set of perfectly distinguishable pure states ${ϕi}i=1dA⊂St1(A)$ one has

$χ=1dA∑i=1dAϕi.$

Proof. Let ${ai}i=1dA$ be the observation test such that $(ai|ϕj)=δij$, and $Φ∈St1(AB)$ be a purification of $χ$, chosen in such a way that the marginal on system $B$ is completely mixed (theorem 4). Let ${ψi}i=1dA⊂St1(B)$ be the pure states defined by

and, for each $i$, let $bi$ be the atomic effect such that

###### (16)

(here we used lemma 30 and the fact that $pmax=1/dA$). Then we have

###### (17)

By lemma 35 this implies that the states ${ψi}i=1dA$ are perfectly distinguishable. Now, since the marginal of $|Φ)AB$ on system $B$ is completely mixed, theorem 6 states that the set ${ψi}i=1dA$ is maximal. Let ${bi′}i=1dA$ the observation test such that $(bi′|ψj)=δij$. By lemma 27 each $bi′$ must be atomic. On the other hand, there is a unique atomic effect $bi$ such that $(bi|ψi)=1$ (theorem 8). Therefore, $bi′=bi$. This means that the effects ${bi}i=1dA$ form an observation test. Once this fact has been proved, using Eq. (16) we obtain

$|χ)A=(eB||Φ)AB=∑i(bi||Φ)AB=1/dA∑i|ϕi).$

As a consequence, we have the following.

Corollary 16 (Existence of conjugate systems). For every system $A$ there exists a system $Ã$, called the conjugate system, and a purification $Φ∈St1(AÃ)$ of the invariant state $χA$ such that $dÃ=dA$ and the marginal on $Ã$ is the invariant state $χÃ$. The conjugate system $Ã$ is unique up to operational equivalence.

Proof. We first prove that $Ã$ is unique up to operational equivalence. The defining property of the conjugate system $Ã$ is that the marginal of $Φ$ on $Ã$ is the invariant state $χÃ$, which is completely mixed. Theorem 4 then implies that $Ã$ is unique up to operational equivalence. Let us now show the existence of $Ã$. Take a purification of $χA$, with purifying system $Ã$ chosen so that the marginal of $Φ$ on $Ã$ is completely mixed (this is possible thanks to theorem 4). Now, the states ${ψi}i=1dA⊆St(B)$, defined by $1dA|ψi):=[(ai|⊗IÃ]|Φ)$, are perfectly distinguishable [see Eq. (17) in the proof of theorem 9]. Hence, by theorem 6 they are a maximal set of perfectly distinguishable pure states. This implies $dÃ=dA$. Finally, by theorem 9 one has $1/dÃ∑i=1dÃψi=χÃ$. $▪$

Corollary 17. The distance between the invariant state $χA$ and an arbitrary pure state $ϕ∈St1(A)$ is

$∥χ−ϕ∥=2(dA−1)dA.$

Proof. Take a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$ such that $ϕ1=ϕ$ (corollary 11). Since $χ=∑i=1dAϕi/dA$ one has $χ−ϕ=(dA−1)dA(σ−ϕ1)$, where $σ=∑i=2dAϕi/(dA−1)$. Hence, one has $∥χ−ϕ∥=(dA−1)dA∥σ−ϕ1∥=2(dA−1)dA$, having used that $σ$ and $ϕ1$ are perfectly distinguishable and therefore $∥σ−ϕ1∥=2$ (see subsection II-I in Ref. [22]). $▪$

We can now prove the following strong result.

Theorem 10 (Spectral decomposition). For every system $A$, every mixed state can be written as a convex combination of perfectly distinguishable pure states.

Proof. The proof is by induction on the dimension of the system. If $dA=1$, the thesis trivially holds. Now suppose that the thesis holds for any system $B$ with dimension $dB⩽N$, and take a mixed state $ρ∈St1(A)$ where $dA=N+1$. There are two possibilities: either (1) $ρ$ is not completely mixed or (2) $ρ$ is completely mixed. Suppose that (1) $ρ$ is not completely mixed. Then by the compression axiom 3 one can encode it in a system $C$, using an encoding operation $E∈Transf(A,C)$. Now, the maximum number of perfectly distinguishable states in $C$ is equal to the maximum number of perfectly distinguishable states in the face $Fρ$ (corollary 12). Since $ρ$ is not completely mixed, we must have $dC⩽N$. Using the induction hypothesis we then obtain that the state $Eρ∈St1(C)$ is a mixture of perfectly distinguishable pure states, say $Eρ=∑ipiψi$. Applying the decoding operation $D∈Transf(C,A)$ we get $ρ=DEρ=∑ipiDψi$. Since by lemmas 4 and 25 we know that the states ${Dψi}i=1dC$ are pure and perfectly distinguishable, this is the desired decomposition for $ρ$. Now suppose that $ρ$ is completely mixed (2). Consider the half-line in $StR(A)$ defined by $σt=(1+t)ρ−tχ$, $t⩾0$. Since the set of normalized states $St1(A)$ is compact, the line will cross its border at some point $t0$. Therefore, one will have

$ρ=11+t0σt0+t01+t0χ$

for some state $σt0$ on the border of $St1(A)$, that is, for some state that is not completely mixed. But we know from the discussion of point (1) that the state $σt0$ is a mixture of perfectly distinguishable pure states, say $σt0=∑i=1kpiϕi$. By lemma 24 this set can be extended to a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$. On the other hand, theorem 9 states that $χ=∑i=1dAϕi/dA$. This implies the desired decomposition

$ρ=∑i=1dAqi1+t0+t0dA(1+t0)ϕi,$

where $qi=pi$ for $1⩽i⩽k$, and $qi=0$ otherwise. $▪$

It is easy to show that the marginals of a pure bipartite state have the same spectral decomposition.

Corollary 18. Let $Ψ∈St1(AB)$ be a pure state, and let $ρ$ and $ρ̃$ be the marginals of $Ψ$ on systems $A$ and $B$, respectively. If $ρ$ has spectral decomposition $ρ=∑i=1dApiϕi$, with $pi>0$ for every $i=1,⋯,r$, $r⩽dA$, then $ρ̃$ has spectral decomposition $ρ̃=∑i=1rpiψi$.

Proof. Let ${ai}i=1dA$ be the observation test such that $(ai|ϕj)=δij$, ${bi}i=1r$ be the observation test such that $(bi|B|Ψ)AB=pi|ϕi)A$ for every $i⩽r$. For $i⩽r$ define the pure state $ψi∈St1(B)$ and the probability $qi$ via the relation

$qiψiB:=aiAΨAB.$

[Note that $ψi$ is pure due to the pure conditioning axiom.] By definition we have

$qi(bj|ψi)=(ai⊗bj|Ψ)=(ai|ϕj)=piδij∀i⩽r,∀j⩽r.$

The above relation implies $qi=∑j=1dAqi(bj|ψi)=∑jpiδij=pi$ and $(bj|ψi)=δij$. Hence the states ${ψi}i=1r$ are perfectly distinguishable. On the other hand, we have $(ai⊗eB|Ψ)=(ai|ρ)=0∀i>r,$ which implies $(ai|A|Ψ)AB=0$, $∀i>r$. Therefore, we obtained

$ρ̃B=eAΨAB=∑i=1dAaiAΨAB=∑i=1raiAΨAB=∑i=1rpiψiA,$

which is the desired spectral decomposition. $▪$

The spectral decomposition of states has many consequences. Here we just discuss the simplest ones, which are needed for the purpose of the derivation of quantum theory.

A first consequence is the following lemma.

Lemma 36. Let $ϕ∈St1(A)$ be a pure state and let $a∈Eff(A)$ be the unique atomic effect such that $(a|ϕ)=1$. If $ϕ$ is perfectly distinguishable from $ρ$, then $(a|ρ)=0$.

Proof. Let us write $ρ=∑i=1kpiϕi$, with ${ϕi}i=1k$ perfectly distinguishable pure states and $pi>0$ for each $i$. Now, by lemma 23 the states ${ϕ1,⋯,ϕk,ϕ}$ are perfectly distinguishable, and by lemma 24 this set can be extended to a maximal set of perfectly distinguishable pure states ${γm}m=1dA$, with $γi=ϕi$ for $i⩽k$ and $γk+1=ϕ$. Denote by ${cm}m=1dA$ the observation test that perfectly distinguishes between the states ${γm}$. Note that, by definition, $(ck+1|ϕ)=1$ and $(ck+1|ϕj)=0$ for every $j≠k+1$. Also, recall that $ck+1$ is atomic (lemma 27). By the duality of theorem 8 we have $a=ck+1$ and, therefore, $(a|ρ)=∑i=1kpi(ck+1|ψi)=0$. $▪$

Another consequence of theorem 10 is the following characterization of the completely mixed states as full rank states.

Corollary 19 (Characterization of completely mixed states). A state $ρ∈St1(A)$, written as a mixture $ρ=∑i=1dApiϕi$ of a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$, is completely mixed if and only if $pi>0$ for every $i=1,⋯,dA$.

Proof. Necessity: If $pi=0$ for some $i$, then $ρ$ is perfectly distinguishable from $ϕi$. Hence, it cannot be completely mixed. Sufficiency: let $pmin=min{pi,i=1,⋯,dA}$. Then we have $ρ=pminχ+(1−pmin)σ$, where $σ$ is the state defined by $σ=1/(1−pmin)∑i=1dA(pi−pmin/dA)ϕi$. Since $ρ$ contains $χ$ in its convex decomposition, and since $χ$ is completely mixed, we conclude that $ρ$ is completely mixed. $▪$

In particular, for two-dimensional systems we have the result.

Corollary 20. For $dA=2$ any state on the border of $St1(A)$ is pure.

Another consequence of theorem 10 is that every element in the vector space $StR(A)$ can be written as a linear combination of perfectly distinguishable states.

Corollary 21. For every $ξ∈StR(A)$ there exists a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$ and a set of real numbers ${ci}i=1dA$ such that $|ξ)=∑ici|ϕi)$.

Proof. Write $ξ$ as $ξ=c+ρ−c−σ$, where $c+,c−⩾0$ and $ρ$ and $σ$ are normalized states. If $c−=0$ there is nothing to prove, because $ξ$ is proportional to a state. Then, suppose that $c−>0$. Write $σ$ as $σ=∑ipiψi$ where ${ψi}$ are perfectly distinguishable and define $k=max{pi}$. Then one has $χ+1/(c−kdA)ξ=(χ−1/(kdA)σ)+c+/(c−kdA)ρ$. Now, by definition $χ−1/(kdA)σ$ is proportional to a state: indeed we have $[χ−1/(kdA)σ]=1/dA∑i(1−pi/k)ψi$, and, by definition $1−pi/k⩾0$. Therefore $χ+1/(c−kdA)ξ$ is proportional to a state, say $χ+1/(c−kdA)ξ=tτ$, with $t>0$. Writing $τ$ as $τ=∑iqiϕi$, where ${ϕi}i=1dA$ is a maximal set of perfectly distinguishable pure states, we then obtain $ξ=(c−kdA)(tτ−χ)=(c−kdA)∑i(tqi−1/dA)ϕi$, which is the desired decomposition. $▪$

In quantum theory corollary 21 is equivalent to the fact that every Hermitian matrix is diagonal in a suitable orthonormal basis. A simple consequence of corollary 21 is the following.

Corollary 22. For every system $A$ with $dA=2$ there is a continuous set of pure states.

Proof. Let $ξ∈StR(A)$ be an arbitrary vector such that $(e|ξ)=0$. Note that since the convex set $St1(A)$ cannot be a segment (corollary 7), we must have $DA=dim[StR(A)]>2$ and, therefore, the space of vectors $ξ$ such that $(e|ξ)=0$ is at least two dimensional. By corollary 21 we have $ξ=c(ϕ1−ϕ2)=2c(ϕ1−χ)$, where $c⩾0$, ${ϕ1,ϕ2}$ are two perfectly distinguishable pure states and we used the fact that $χ=12(ϕ1+ϕ2)$. Let us define $ϕξ:=ϕ1$. With this definition, if $ϕξ1=ϕξ2$ then one has $ξ2=tξ1$ for some $t⩾0$. Now, since there is a continuous infinity of vectors $ξ$ (up to scaling), there must be a continuous set of pure states. $▪$

We conclude this section with the dual result to the “spectral decomposition” of corollary 21.

Corollary 23. For every $x∈EffR(A)$ there exists a perfectly distinguishing observation-test ${ai}i=1dA$ and a set of real numbers ${di}i=1dA$ such that $(x|=∑idi(ai|$.

Proof. Let $Φ∈St1(AÃ)$ be a purification of the invariant state $χA$, where $Ã$ is the conjugate system defined in corollary 16. Take the Choi vector $|Rx)Ã:=(x|A|Φ)AÃ$. By corollary 21 there exists a maximal set of perfectly distinguishable pure states ${ψi}i=1dA$ and a set of real numbers ${ci}i=1dA$ such that $|Rx)=∑ici|ψi)$. Let ${ai}i=1dA⊂Eff(A)$ be the observation test such that $1/dA|ψi)Ã=(ai|A|Φ)AÃ$ for every $i=1,⋯,dA$ (recall that by corollary 16 the marginal of $Φ$ on system $Ã$ is the invariant state $χÃ$ and $dÃ=dA$). The test ${ai}i=1dA$ is perfectly distinguishing: if ${bi}i=1dA$ is the observation test such that $(bi|ψj)=δij$ and $ϕi∈St1(A)$ is the state defined by $|ϕi)A:=dA(bi|Ã|Φ)AÃ$, then we have

$(ai|ϕj)=dAai⊗bjΦ=(bj|ψi)=δij.$

Moreover, we have

$xAΦAÃ=RxÃ=∑iciψiÃ=∑icidAaiAΦAÃ.$

Since $Φ$ is dynamically faithful, this implies $(x|=∑idi(ai|$, where $di:=cidA$. $▪$

##### IX. TELEPORTATION REVISITED

In this section we revisit probabilistic teleportation using the results about informational dimension. The key point is the section will be the proof the equality $DA=dA2$, which relates the dimension of the vector space $StR(A)$ with the informational dimension $dA$.

###### A. Probability of teleportation

We start by showing a probabilistic teleportation scheme that achieves success probability $pA=1/dA$ for every system $A$.

Theorem 11 (Probability of teleportation). For every system $A$, probabilistic teleportation can be achieved with probability $pA=1/dA2$.

Proof. Let $Ã$ and $|Φ)AÃ$ be the conjugate system and the pure state defined in corollary 16. Then, the state $|Φ)AÃ|Φ)AÃ$ satisfies the identity

On the other hand, by lemma 33 the maximum probability of a pure state in the convex decomposition of $χAÃ$ is $pmax=1/dAÃ$, and by corollaries 15 and 16 one has $pmax=1/(dAdÃ)=1/dA2$. Therefore, by lemma 12 there exists an atomic effect $E$ such that

###### (18)

and, since $Φ$ is dynamically faithful,

###### (19)

as can be verified applying both members of Eq. (19) to $Φ$, thus obtaining Eq. (18). $▪$

###### B. Isotropic states and effects

Here we define two maps that send reversible transformations of $A$ to reversible transformations of $Ã$: the transpose and the conjugate. Using these maps we will also define the notions of isotropic states and effects and we will prove some properties of them.

Let us start from the definition of the transpose.

Lemma 37 (Transpose of a reversible transformation). Let $Φ∈St(AÃ)$ be a purification of the invariant state $χA$. The reversible transformations of system $Ã$ are in one-to-one correspondence with the reversible transformations of system $A$ via the transposition $τ$ defined as follows:

###### (20)

[note that the transposition is defined with respect to the given state $Φ$].

Proof. Since $(U⊗IÃ)|Φ)$ and $|Φ)$ are purifications of the same state $χA$, there exists a reversible transformation $Uτ∈GÃ$ such that Eq. (20) holds. Since $Φ$ is dynamically faithful on $A$, the map $U↦Uτ$ is injective. Furthermore, the map is surjective: for every reversible $V∈GÃ$ the states $(IA⊗V)|Φ)$ and $|Φ)$ are two purifications of the same state $χÃ$, and, by the uniqueness of purification stated in postulate 1, there exists a reversible $U∈GA$ such that

###### (21)

namely $V=Uτ$. $▪$

The conjugate is just defined as the inverse of the transpose.

Definition 7. Let $τ$ be the transpose defined with respect to the state $Φ∈St1(AÃ)$. The conjugate of the reversible channel $U∈GA$ is the reversible channel $U*∈GÃ$ defined by $U*:=(Uτ)−1$.

We can now give the definition of isotropic pure state (isotropic atomic effect).

Definition 8. A pure state $Ψ∈St(AÃ)$ [an atomic effect $F∈Eff(ÃA)$] is isotropic if it is invariant under the $U⊗U*$ (under $U*⊗U$). Diagrammatically

###### (22)

An example of isotropic state is $Φ$: indeed, by definition of conjugate we have, for every $U∈GA$,

$(U⊗U*)Φ=(U⊗(Uτ)−1)Φ=(IA⊗(Uτ)−1Uτ)Φ=Φ.$

As a consequence, the teleportation effect $E$ is isotropic: indeed one has

which implies $(E|(U*⊗U)=(E|$, since the state $Φ⊗Φ$ is dynamically faithful.

We now show that all isotropic pure states (isotropic atomic effects) are connected to the state $Φ$ (to the effect $E$) through a local reversible transformation.

Lemma 38. If a pure state $Ψ∈St1(AÃ)$ is isotropic then $|Ψ)=(V⊗IÃ)|Φ)$ for some reversible transformation $V∈GA$ such that $VU=UV$ for every $U∈GA$.

Proof. Since $Ψ$ satisfies Eq. (22), its marginal on system $Ã$ is the invariant state $|χÃ)$. Since $Ψ$ and $Φ$ are purifications of the same state, there must exist a reversible channel $V∈GA$ such that $|Ψ)=(V⊗IÃ)|Φ)$. Moreover, we have for every $U∈GA$

$(UVU−1⊗IÃ)Φ=(UV⊗U*)Φ=(U⊗U*)Ψ=Ψ=(V⊗IÃ)Φ.$

Since $Φ$ is dynamically faithful, the above equation implies $UVU−1=V$ for every $U∈GA$. $▪$

By the duality between states and effects, it is easy to obtain the following.

Lemma 39. Let $A∈Eff(AÃ)$ be the atomic effect such that $(A|Φ)=1$. If an atomic effect $F∈Eff(ÃA)$ is isotropic then $(F|ÃA=(A|ÃA(IÃ⊗V)$ for some reversible transformation $V∈GA$ such such that $VU=UV$ for every $U∈GA$.

Proof. Let $Ψ$ be the pure state such that $(F|Ψ)=1$. Clearly $Ψ$ is isotropic: one has $(F|(U⊗U*)|Ψ)=(F|Ψ)=1$, and, therefore, $(U⊗U*)|Ψ)=|Ψ)$. By lemma 38, there exists a reversible transformation $V$ such that $|Ψ)=(V−1⊗IÃ)|Φ)$ and $V−1U=UV−1$ for every $U∈GA$. Now, this implies $(F|(V−1⊗IÃ)|Φ)=(F|Ψ)=1$, which by theorem 8 implies $(F|=(A|(V⊗IÃ)$. $▪$

As a consequence, every isotropic effect is connected to the teleportation effect by a local reversible transformation:

Corollary 24. If an atomic effect $F∈Eff(ÃA)$ is isotropic then $(F|ÃA=(E|ÃA(IÃ⊗V)$ for some reversible transformation $V∈GA$ such that $VU=UV$ for every $U∈GA$.

Proof. Since $(E|$ and $(F|$ are both isotropic, lemma 39 implies that they are both connected to $(A|$ through a local reversible transformation, say $V$ and $W$, respectively. Therefore, they are connected to each other through the transformation $WV−1$. $▪$

###### C. Dimension of the state space

In this subsection we use the local distinguishability axiom to prove the equality $DA=dA2$ (see theorem 12). As a consequence, we will be able to represent the states of a system $A$ as square $dA×dA$ Hermitian complex matrices, that is, Hermitian operators on the complex Hilbert space $CdA$. Theorem 12 is thus the point where the complex field (as opposed to the real field) enters in our derivation. Notice that, even if the local distinguishability excludes quantum theory on real Hilbert spaces since the very beginning, to prove the emergence of complex Hilbert spaces we need to use all the six principles.

Due to local distinguishability, any bipartite state $Ψ∈St(AB)$ can be written as

$Ψ=∑i=1DA∑j=1DBΨij|αi)|βj),$

where ${αi}$ ( ${βj}$) is a basis for the vector space $StR(A)$ [ $StR(B)$]. Similarly, a bipartite effect $F∈Eff(BA)$ can be written as

$F=∑k=1DB∑l=1DAFkl(βk*|(αl*|$

with $(αl*|αi)=δil$ and $(βk*|βj)=δjk$. Finally, a transformation $C$ from $A$ to $B$ can be written as

$C=∑j=1DB∑i=1DACji|βj)(αi*|.$

In this matrix representation the teleportation diagram of Eq. (19) becomes

$ΦE=IDAdA2,$
###### (23)

where $IDA$ is the identity matrix in dimension $DA$. On the other hand, we also have

$1⩾EΦ=Tr[ΦE]=DAdA2$

and, therefore,

$DA⩽dA2.$

We now show that one has the equality, using the following standard lemma.

Lemma 40. With a suitable choice of basis for the vector space $StR(A)$, every reversible transformation $U∈GA$ is represented by a matrix $MU$ of the form

$MU=100OU,$
###### (24)

where $OU$ is an orthogonal $(DA−1)×(DA−1)$ matrix.

Proof. Let ${ξi}$ be a basis for $StR(A)$, chosen in such a way that the first basis vector is $χ$, while the remaining vectors satisfy $(e|ξi)=0,∀i=2,⋯,DA$. Such a choice is always possible since every vector $v∈StR(A)$ can be written as $v=(e|v)χ+ξ$, where $ξ$ satisfies $(e|ξ)=0$. Now, since $Uχ=χ$, the first column of $MU$ must be $(1,0,⋯,0)T$. Moreover, since for every normalized state $ρ$, $Uρ$ is a normalized state, one must have $(e|U|ξ)=0$ for every $ξ$ such that $(e|ξ)=0$. Hence the first row of $MU$ must be $(1,0,⋯,0)$, namely $MU$ has the block form of Eq. (24). It remains to show that, with a suitable choice of basis, the matrix $OU$ in the second block can be chosen to be orthogonal. Observe that by definition the matrices ${MU}U∈GA$ form a representation of the group $GA$: indeed, one has $MI=IDA$ and $MUV=MUMV$ for every $U,V∈G$. Consider the positive definite matrix $P$ defined by the integral

$P:=∫dUOUTOU,$

where $dU$ is the Haar measure on the compact group $GA$ (see corollary 30 of Ref. [22] for the proof of compactness) and $AT$ denotes the transpose of $A$. By definition, one has $PT=P$ and $OUTPOU=P$ for every $U∈GA$. Let us now define the new representation

$OU′:=P12OUP−12,$

obtained from $OU$ by a change of basis in the subspace spanned by ${ξi}i=2DA$. With this choice, each matrix $OU′$ is orthogonal:

$O′UTO′U=P12OUP−12TP12OUP−12=P−12OUTPOUP−12=IDA−1.$

$▪$

As a consequence, we have the following.

Corollary 25. For every system $A$, the group of reversible transformations $GA$ is (isomorphic to) a compact subgroup of $O(DA−1)$.

Lemma 41. Let $E∈Eff(AÃ)$ be the teleportation effect of Eq. (19). Then one has $(E|Φ)=1$.

Proof. Let $A∈Eff(AÃ)$ be the atomic effect such that $(A|Φ)=1$. We now prove that $A=E$. Indeed, by corollary 24 there exists a reversible transformation $V∈GA$ such that $(A|=(E|(V⊗IÃ)$. Using a basis for $StR(A)$ such that the transformations in $GA$ are represented by orthogonal matrices as in Eq. (24), one has

$1=AΦ=E(V⊗IÃ)Φ=Tr[EMVΦ]=Tr[ΦEMV]=Tr[MV]dA2,$

having used Eq. (23) for the last equality. Using the inequality $Tr[MV]⩽Tr[IDA]$, that holds for every orthogonal $DA×DA$ matrix, we then obtain

$1=Tr[MV]dA2⩽Tr[IDA]dA2=Tr[EΦ]=EΦ⩽1,$

and, therefore $(E|Φ)=1$. $▪$

Theorem 12 (Dimension of the state space). The dimension $DA$ of the vector space generated by the states in $St(A)$ is $DA=dA2$.

Proof. Using lemma 41 and Eq. (23) we obtain $1=(E|Φ)=Tr[EΦ]=Tr[IDA]/dA2=DA/dA2$. Hence, $DA=dA2$. $▪$

An interesting consequence of the relation $(E|Φ)=1$ is the following.

Corollary 26 (No inversion). Let us write an arbitrary state $ρ∈St1(A)$ as $ρ=χA+ξ$, with $(e|ξ)=0$. Then, the linear map $N$ defined by $N(ρ)=χA−ξ$ is not a physical transformation.

Proof. Write the state $Φ$ as $Φ=χA⊗χÃ+Ξ$. Since $(e|A|Φ)AÃ=|χ)Ã$ one must have $(e|A|Ξ)AÃ=0$. Therefore, $Ξ$ must be of the form $Ξ=∑iαi⊗βi$ with $(e|αi)=0$ for all $i$. Applying the transformation $N$ one then obtains $(N⊗IÃ)Φ=χA⊗χÃ−Ξ$. We now prove that this is not a state, and therefore, $N$ cannot be a physical transformation. Let $E$ be the teleportation effect. Since $(E|Φ)=1$, we have $1=(E|χA⊗χÃ)+(E|Ξ)=1/dA2+(E|Ξ)$. Now, we have

$(E|(N⊗IÃ)|Φ)=1dA2−(E|Ξ)=2dA2−1.$

Since this quantity is negative for every $dA>1$, the map $N$ cannot be a physical transformation. $▪$

Corollary 27. The matrix $MN$ defined as

$MN=100−IDA−1$
###### (25)

cannot represent a physical transformation of system $A$.

##### X. DERIVATION OF THE QUBIT

In this section we show that every two-dimensional system in our theory is a qubit. With this expression we mean that the normalized states in $St1(A)$ can be represented as density matrices for a quantum system with two-dimensional Hilbert space. With this choice of representation we also show that the effects in $Eff(A)$ are all the positive Hermitian matrices bounded by the identity, and that the reversible transformations $GA$ act on the states by conjugation with unitary matrices in $SU(2)$.

The first step is to prove that the set of normalized states $St1(A)$ is a sphere. The idea of the proof is a simple geometric observation: in the ordinary three-dimensional space the sphere is the only compact convex set that has an infinite number of pure states connected by orthogonal transformations. The complete proof is given in the following.

Theorem 13 (The Bloch sphere). The normalized states of a system $A$ with $dA=2$ form a sphere and the group $GA$ is $SO(3)$.

Proof. According to corollary 25, the group of reversible transformations $GA$ is a compact subgroup of the orthogonal group $O(3)$. It cannot be the whole $O(3)$ because, as we saw in corollary 27, the inversion $−I$ cannot represent a physical transformation. We now show that $GA$ must be $SO(3)$ by excluding all the other possibilities. From corollary 22 we know that the system $A$ has a continuum of pure states. Therefore, the group $GA$ must contain a continuous set of transformations. Now, from the classification of the closed subgroups of $O(3)$ we know that there are only two possibilities: (i) $GA$ is $SO(3)$ and (ii) $GA$ is the subgroup generated by $SO(2)$, the group of rotations around a fixed axis, say the $z$ axis, and possibly by the reflections with respect to planes containing the $z$ axis. Note that the reflection in the $xy$ plane is forbidden, because the composition of this reflection with the rotation of $π$ around the $z$ axis would give the inversion, which is forbidden by corollary 26. Case (ii) is excluded because in this case the action of the group $GA$ cannot be transitive. The detailed proof is as follows: because of the $SO(2)$ symmetry, the set of pure states must contain at least a circle in the $xy$ plane. This circle will be necessarily invariant under all operations in the group. However, since the convex set of states is three dimensional, there is at least a pure state outside the circle. Clearly there is no way to transform a state on the circle into a state outside the circle by means of an operation in $GA$. This is in contradiction with the fact that every two pure states are connected by a reversible transformation. Hence, case (ii) is ruled out. The only remaining alternative is (i), namely that $GA=SO(3)$ and, hence, the set of pure states generated by its action on a fixed pure state is a sphere. $▪$

Since the convex set of density matrices on a two-dimensional Hilbert space is a sphere, we can represent the states in $St1(A)$ as density matrices. Precisely, we can choose three orthogonal axes passing through the center of the sphere and call them $x,y,z$ axes, take $ϕ+,k,ϕ−,k$, $k=x,y,z$ to be the two perfectly distinguishable pure states in the direction of the $k$ axis and define $σk:=ϕk,+−ϕk,−$. From the geometry of the sphere we know that any state $ρ∈St1(A)$ can be written as

$ρ=χ+12∑k=x,y,znkσk,∑k=x,y,znk2⩽1,$
###### (26)

where the pure states are those for which $∑k=x,y,znk2=1$. The Bloch representation $Sρ$ of quantum state $ρ$ is then obtained by associating the basis vectors $χ,σx,σy,σz$ to the matrices

$Sχ=121001,Sσx=0110,Sσy=0−ii0,Sσz=100−1,$

and by defining $Sρ$ by linearity from Eq. (26). Clearly, in this way we obtain

$Sρ=121+nznx−inynx+iny1−nz,$

which is the expression of a generic density matrix. Denoting by $M2(C)$ the set of complex two-by-two matrices we have the following.

Corollary 28 (Qubit density matrices). For $dA=2$ the set of states $St1(A)$ is isomorphic to the set of density matrices in $M2(C)$ through the isomorphism $ρ↦Sρ$.

Once we decide to represent the states in $St1(A)$ as matrices, the effects in $Eff(A)$ are necessarily represented by matrices too. The matrix representation of an effect, given by the map $a∈Eff(A)↦Ea∈M2(C)$ is defined uniquely by the relation

$Tr[EaSρ]=aρ∀ρ∈St(A).$

We then have the following.

Corollary 29. For $dA=2$ the set of effects $Eff(A)$ is isomorphic to the set of positive Hermitian matrices $P∈M2(C)$ such that $P⩽I$.

Proof. Clearly the matrix $Ea$ must be positive for every effect $a$, since we have $Tr[EaSρ]=(a|ρ)⩾0$ for every density matrix $Sρ$. Moreover, since we have $Tr[EaSρ]=(a|ρ)⩽1$ for every density matrix $Sρ$, we must have $Ea⩽I$. Finally, we know that for every couple of perfectly distinguishable pure states $ϕ,ϕ⊥$ there exists an atomic effect $a$ such that $(a|ϕ)=1$ and $(a|ϕ⊥)=0$. Since the two pure states $ϕ,ϕ⊥$ are represented by orthogonal rank-one projectors $Sϕ$ and $Sϕ⊥$, we must have $Ea=Sϕ$. This proves that the atomic effects are the whole set of positive rank-one projectors. As a consequence, also every positive matrix $P$ with $P⩽I$ must represent some effect $a$. $▪$

Finally, the reversible transformations are represented as conjugations by unitary matrices in $SU(2)$.

Corollary 30. For every reversible transformation $U∈GA$ with $dA=2$ there exists a unitary matrix $U∈SU(2)$ such that

$SUρ=USρU†,ρ∈St(A).$
###### (27)

Conversely, for every $U∈SU(2)$ there exists a reversible transformation $U∈GA$ such that Eq. (27) holds.

Proof. Every rotation of the Bloch sphere is represented by conjugation by some $SU(2)$ matrix. Conversely, every conjugation by an $SU(2)$ matrix represents some rotation on the Bloch sphere. On the other hand, we know that $GA$ is the group of all rotations on the Bloch sphere (theorem 13). $▪$

Note that we proved that all two-dimensional systems $A$ and $B$ in our theory have the same states [ $St1(A)≃St1(B)$], the same effects [ $Eff(A)≃Eff(B)$], and the same reversible transformations ( $GA≃GB$), but we did not show that $A$ and $B$ are operationally equivalent. For example, $A$ and $B$ could be different when we compose them with a third system $C$: the set of states $St1(AC)$ and $St1(BC)$ could be nonisomorphic. The fact that every couple of two-dimensional systems $A$ and $B$ are operationally equivalent will be proved later (cf. corollary 40).

We conclude this section with a simple fact that will be very useful later.

Corollary 31 (Superposition principle for qubits). Let ${ϕ1,ϕ2}⊂St1(A)$ be two perfectly distinguishable pure states of a system $A$ with $dA=2$. Let ${a1,a2}$ be the observation test such that $(ai|ϕj)=δij$. Then, for every probability $0⩽p⩽1$ there exists a pure state $ψp∈St1(A)$ such that

$a1ψp)=p,a2ψp)=1−p.$
###### (28)

Precisely, the set of pure states $ψp∈St1(A)$ satisfying Eq. (28) is a circle in the Bloch sphere.

Proof. Elementary property of density matrices. $▪$

##### XI. PROJECTIONS

In this section we define the projection on a face $F$ of the convex set $St1(A)$ and we prove several properties of projections. The projection on the face $F$ will be defined as an atomic operation $ΠF∈Transf(A)$ that acts as the identity on states in the face $F$ and that annihilates the states on the orthogonal face $F⊥$. In the following we first introduce the concept of orthogonal face, then prove the existence and uniqueness of projections, and finally give some useful results on the projection of a pure state on two orthogonal faces.

###### A. Orthogonal faces and orthogonal complements

In order to introduce the notion of orthogonal face we need first a few elementary results. We start by showing that there is a canonical way to associate a state $ωF$ to a face $F$.

Lemma 42 (State associated to a face). Let $F$ be a face of the convex set $St1(A)$ and let ${ϕi}i=1|F|$ be a maximal set of perfectly distinguishable pure states in $F$. Then the state $ωF:=1|F|∑i=1|F|ϕi$ depends only on the face $F$ and not on the particular set ${ϕi}i=1|F|$. Moreover, $F$ is the face identified by $ωF$.

Proof. Suppose that $F$ is the face identified by $ρ$ and let $E∈Transf(A,C)$ [or $D∈Transf(C,A)$] be the encoding (or decoding) in the ideal compression for $ρ$. By lemma 4 and corollary 12, ${Eϕi}i=1|F|$ is a maximal set of perfectly distinguishable pure states of $C$ and by theorem 9 one has $χC=1|F|∑i=1|F|Eϕi$. Hence, $ωF=1|F|∑i=1|F|ϕi=1|F|∑i=1|F|DEϕi=DχC$. Since the right-hand side of the equality is independent of the particular set ${ϕi}i=1|F|$, the state $ωF$ in the left-hand side is independent too. To prove that $F$ is the face identified by $ωF$ it is enough to observe that $ωF$ is completely mixed relative to $F$: this fact follows from the relation $ωF=DχC$ and from lemma 5. $▪$

We now define the orthogonal complement of the state $ωF$.

Definition 9. The orthogonal complement of the state $ωF$ is the state $ωF⊥∈St1(A)∪{0}$ defined as follows:

• (1) if $|F|=dA$, then $ωF⊥=0$;

• (2) if $F, then $ωF⊥$ is defined by the relation

$χ A = | F | d A ω F + d A − | F | d A ω F ⊥ .$
###### (29)

An easy way to write the orthogonal complement.

Lemma 43. Take a maximal set ${ϕi}i=1|F|$ of perfectly distinguishable pure states in $F$ and extend it to a maximal set ${ϕi}i=1dA$ of perfectly distinguishable pure states in $St1(A)$, then for $|F| we have

$ωF⊥=1dA−|F|∑i=|F|+1dAϕi.$

Proof. By definition, for $|F| we have $ωF⊥=1dA−|F|(dAχA−|F|ωF)$. Substituting the expressions $χA=1dA∑i=1dAϕi$ and $ωF=1|F|∑i=1|F|ϕi$ we then obtain the thesis. $▪$

Note, however, that by definition the orthogonal complement $ωF⊥$ depends only on the face $F$ and not on the choice of the maximal set in lemma 43.

An obvious consequence of lemma 43.

Corollary 32. The states $ωF$ and $ωF⊥$ are perfectly distinguishable.

Proof. Take a maximal set ${ϕi}i=1|F|$ of perfectly distinguishable pure states in $F$, extend it to a maximal set ${ϕi}i=1dA$, and take the observation test such that $(ai|ϕj)=δij$. Then the binary test ${aF,e−aF}$, defined by $aF:=∑i=1|F|ai$ distinguishes perfectly between $ωF$ and $ωF⊥$. $▪$

We say that a state $τ∈St1(A)$ is perfectly distinguishable from the face $F$ if $τ$ is perfectly distinguishable from every state $σ$ in the face $F$. With this definition we have the following.

Lemma 44. The following are equivalent:

• (1) $τ$ is perfectly distinguishable from the face $F$,

• (2) $τ$ is perfectly distinguishable from $ωF$,

• (3) $τ$ belongs to the face identified by $ωF⊥$, that is, $τ∈FωF⊥$.

Proof. ( $1⇔2$) $τ$ is perfectly distinguishable from $ωF$ if and only if then there exists a binary test ${a,e−a}$ such that $(a|τ)=1$ and $(a|ωF)=0$. By lemma 19 this is equivalent to the condition $(a|τ)=1$ and $a=ωF0$, that is, $τ$ is distinguishable from any state $σ$ in the face identified by $ωF$, which by definition is $F$. ( $2→3$) Let ${ϕi}i=1|F|$ be a maximal set of perfectly distinguishable states in $F$, $ωF=1|F|∑i=1|F|ϕi$, and let ${ϕi}i=|F|+1k$ be the maximal set of perfectly distinguishable pure states in the spectral decomposition $τ=∑i=|F|+1kpiϕi$, with $pi>0$ for every $i=|F|+1,⋯,k$. Since $τ$ is perfectly distinguishable from $ωF$, by lemma 23 we have that the states ${ϕi}i=1k$ are all perfectly distinguishable. Let us extend this set to a maximal set ${ϕi}i=1dA$. By lemma 43 have $ωF⊥=1dA−|F|∑i=|F|+1dAϕi$. Hence, all the states ${ϕi}i=|F|+1dA$ are in the face $FωF⊥$. Since $τ$ is a mixture of these states, it also belongs to the face $FωF⊥$. ( $3⇒2$). Since $ωF$ and $ωF⊥$ are perfectly distinguishable, if $τ$ belongs to the face identified by $ωF⊥$, then by lemma 22 $τ$ is perfectly distinguishable from $ωF$. $▪$

Corollary 33. If $ρ$ is perfectly distinguishable from $σ$ and from $τ$, then $ρ$ is perfectly distinguishable from any convex mixture of $σ$ and $τ$.

Proof. Let $F$ be the face identified by $ρ$. Then by lemma 44 we have $σ,τ∈FωF⊥$. Since $FωF⊥$ is a convex set, any mixture of $σ$ and $τ$ belongs to it. By lemma 44, this means that any mixture of $σ$ and $τ$ is perfectly distinguishable from $ρ$. $▪$

We are now ready to give the definition of orthogonal face.

Definition 10 (Orthogonal face). The orthogonal face $F⊥$ is the set of all states that are perfectly distinguishable from the face $F$.

By lemma 44 it is clear that $F⊥$ is the face identified by $ωF⊥$, that is $F⊥=FωF⊥$.

In the following we list few elementary facts about orthogonal faces.

Lemma 45. The following properties hold

• (1) $|F⊥|=dA−|F|$,

• (2) $χA=|F|dAωF+|F⊥|dAωF⊥$,

• (3) $ωF⊥=ωF⊥$,

• (4) $ωF⊥⊥=ωF$,

• (5) $(F⊥)⊥=F$.

Proof. Item 1. If $|F|=dA$ the thesis is obvious. If $|F|, take a maximal set ${ϕi}i=1|F|$ (or ${ϕj}j=|F|+1|F|+|F⊥|$) of perfectly distinguishable pure states in $F$ (or $F⊥$). Hence we have

$ωF=1|F|∑i=1|F|ϕiorωF⊥=1|F⊥|∑j=|F|+1|F|+|F⊥|ϕj.$

By corollary 32 the states $ωF$ and $ωF⊥$ are perfectly distinguishable. Hence the states ${ϕi}i=1|F|+|F⊥|$ are perfectly distinguishable jointly (lemma 23). Now we must have $|F|+|F⊥|=dA$, otherwise there would be a pure state $ψ$ that is perfectly distinguishable from the states ${ϕi}i=1|F|+|F⊥|$. This implies that $ψ$ belongs to $F⊥$ and that states ${ψ}∪{ϕj}j=|F|+1|F|+|F⊥|$ are perfectly distinguishable in $F⊥$, in contradiction with the hypotheses that the set ${ϕj}j=|F|+1|F|+|F⊥|$ is maximal in $F⊥$. Item 2 Immediate from item 1 and definition 9. Items 3 and 4 Both items follow by comparison of item 2 with Eq. (29). Item 5 By condition 3 of lemma 44, $(F⊥)⊥$ is the face identified by the state $ωF⊥⊥$, which, by item 4, is $ωF$. Since the face identified by $ωF$ is $F$, we have $(F⊥)⊥=F$. $▪$

We now show that there is a canonical way to associate an effect $aF$ to a face $F$.

Definition 11 (Effect associated to a face). We say that $aF∈Eff(A)$ is the effect associated to the face $F⊆St1(A)$ if and only if $aF=ωFe$ and $aF=ωF⊥0$.

In other words, the definition imposes that $(aF|ρ)=1$ for every $ρ∈F$ and $(aF|σ)=0$ for every $σ∈F⊥$.

Lemma 46. A state $ρ∈St1(A)$ belongs to the face $F$ if and only if $(aF|ρ)=1$.

Proof. By definition, if $ρ$ belongs to $F$, then $(aF|ρ)=1$. Conversely, if $(aF|ρ)=1$, then $ρ$ is perfectly distinguishable from $ωF⊥$, because $(aF|ωF⊥)=0$. Now, we know that $ωF⊥$ is equal to $ωF⊥$ (item 4 of lemma 45). By item 2 of lemma 44 the fact that $ρ$ is perfectly distinguishable from $ωF⊥$ implies that $ρ$ belongs to $(F⊥)⊥$, which is just $F$ (item 5 of lemma 45). $▪$

We now show that the effect $aF$ associated to the face $F$ exists and is unique. A preliminary result needed to this purpose is the following.

Lemma 47. The effect $aF$ must have the form $aF=∑i=1|F|ai$, where $ai$ is the atomic effect such that $(ai|ϕi)=1$ and ${ϕi}i=1|F|$ is a maximal set of perfectly distinguishable pure states in $F$.

Proof. By corollary 23 we have that $aF$ can be written as $(aF|=∑idi(ai|$ where ${ai}i=1dA$ is a perfectly distinguishing test. Moreover, since $aF$ is an effect, we must have $di⩾0$ for all $i=1,⋯,dA$. Now, by definition we have $(aF|ωF⊥)=0$, which implies $di(ai|ωF⊥)=0$ for every $i=1,⋯,dA$, that is, $(ai|ωF⊥)=0$ whenever $di≠0$. Let us focus on the values of $i$ for which $di≠0$. Let $ϕi$ be the pure state such that $(ai|ϕi)=1$. The condition $(ai|ωF⊥)=0$ implies that $ϕi$ is perfectly distinguishable from $ωF⊥$. Therefore, $ϕi$ belongs to $(F⊥)⊥$, which is $F$. Since by definition we must have $(aF|ϕi)=1$, this also implies that $di=1$. In summary, we proved that $aF=∑i′ai$ where the prime means that the sum is restricted to those values of $i$ such that $ϕi∈F$. The condition $aF=ωFe$ also implies that the number of terms in the sum must be exactly $|F|$. The thesis is then proved by suitably relabelling the effects ${ai}i=1dA$, in such a way that $ϕi$ belongs to $F$ for every $i=1,⋯,|F|$. $▪$

Lemma 48. The effect $aF$ associated to the face $F$ is unique.

Proof. Suppose that $aF=∑i=1|F|ai$ and $aF′=∑i=1|F|ai′$ are two effects associated to the face $F$, both written as in lemma 47. Let ${ϕi}i=1|F|$ (or ${ϕi′}i=1|F|$) be the maximal set of perfectly distinguishable pure states in $F$ such that $(ai|ϕi)=1$ for every $i=1,⋯,|F|$ [or $(ai′|ϕi′)=1$ for every $i=1,⋯,|F|$], and let ${ψj}j=1|F⊥|$ be a maximal set of perfectly distinguishable pure states in $F⊥$. Since $ωF$ and $ωF⊥$ are perfectly distinguishable, the states ${ϕi}i=1|F|∪{ψj}j=1|F⊥|$ (or ${ϕi′}i=1|F|∪{ψj}j=1|F⊥|$) are perfectly distinguishable (lemma 23). Moreover, the set is maximal since $|F|+|F⊥|=dA$. Let $bj$ be the atomic effect such that $(bj|ψj)=1$. Then, the test that distinguishes the states ${ϕi}i=1|F|∪{ψj}j=1|F⊥|$ (or ${ϕi′}i=1|F|∪{ψj}j=1|F⊥|$) is given by ${ai}i=1|F|∪{bj}j=1|F⊥|$ (or ${ai′}i=1|F|∪{bj}j=1|F⊥|$) and its normalization reads

$e=∑i=1|F|ai+∑j=1|F⊥|bj=aF+∑j=1|F⊥|bj,e=∑i=1|F|ai′+∑j=1|F⊥|bj=aF′+∑j=1|F⊥|bj.$

By comparison we obtain $aF=aF′$. $▪$

###### B. Projections

We are now in position to define the projection on a face.

Definition 12 (Projection). Let $F$ be a face of $St1(A)$. A projection on the face $F$ is an atomic transformation $ΠF$ such that

• (1) $ΠF=ωFIA$,

• (2) $ΠF=ωF⊥0$.

When $F$ is the face identified by a pure state $ϕ∈St1(A)$, we have $F={ϕ}$ and call $Π{ϕ}$ a projection on the pure state $ϕ$.

The first condition in definition 12 means that the projection $ΠF$ does not disturb the states in the face $F$. The second condition means that $ΠF$ annihilates all states in the orthogonal face $F⊥$. As a notation, we will indicate with $ΠF⊥$ the projection on the face $F⊥$, that is, we will use the definition $ΠF⊥:=ΠF⊥$.

An equivalent condition for $ΠF$ to be a projection on the face $F$ is the following.

Lemma 49. Let ${ϕi}i=1dA$ be a maximal set of perfectly distinguishable pure states for system $A$. The transformation $ΠF$ in $Transf(A)$ is a projection on the face generated by the subset ${ϕi}i=1|F|$ if and only if

• (1) $ΠF=ωFIA$,

• (2) $ΠF|ϕl)=0$ for all $l>|F|$.

Proof. The condition is clearly necessary, since by definition 12 $ΠF|ϕl)=0$ for $l>|F|$. On the other hand, if $ΠF|ϕl)=0$ for $l>|F|$ then by definition of $ωF⊥$ we have $ΠF|ωF⊥)=0$ and, therefore, $ΠF=ωF⊥0$. $▪$

A result that will be useful later.

Lemma 50. The transformation $ΠF⊗IB$ is a projection on the face $F̃$ identified by the state $ωF⊗χB$.

Proof. $ΠF⊗IB$ is atomic, being the product of two atomic transformations. We now show that $ΠF⊗IB=ωF⊗χBIA⊗IB$: Indeed, by the local tomography axiom it is easy to see that every state $σ∈FωF⊗χB$ can be written as $|σ)=∑i=1r∑j=1dBσij|αi)|βj)$, where ${αi}i=1r$ is a basis for $Span(F)$ and ${βj}j=1dB$ is a basis for $St1(B)$. Since $ΠF=ωFIA$, we have

$|σ)=(ΠF⊗IB)σ=∑i=1r∑j=1dBσijΠF|αi)|βj)=∑i=1r∑j=1dBσij|αi)|βj)=σ,$

which implies $ΠF⊗IB=ωF⊗χBIA⊗IB$. Finally, note that $ωF̃=ωF⊗χB$, while $ωF̃⊥=ωF⊥⊗χB$. Since we have $(ΠF⊗IB)|ωF̃⊥)=ΠF|ωF⊥)⊗|χB)=0$, we can conclude $ΠF⊗IB=ωF̃⊥0$. Hence $ΠF⊗IB$ is a projection on $F̃$. $▪$

In the following we will show that for every face $F$ there exists a unique projection $ΠF$ and we will prove several properties of projections. Let us start from an elementary observation.

Lemma 51. Let $ϕ$ be a pure state in the face $F⊆St1(A)$ and let $a∈Eff(A)$ be the atomic effect such that $(a|ϕ)=1$. If $A∈Transf(A)$ is an atomic transformation such that $A=ωFIA$, then $(a|A=(a|$. Moreover, if $aF$ is the effect associated to the face $F$, then we have $(aF|A=(aF|$.

Proof. By lemma 16, the effect $(a|A$ is atomic. Now, since $A|ϕ)=|ϕ)$, we have $(a|A|ϕ)=(a|ϕ)=1$. However, by theorem 8 $(a|$ is the unique atomic effect such that $(a|ϕ)=1$. Hence, $(a|A=(a|$. Moreover, writing $aF$ as $aF=∑i=1|F|ai$ with $(ai|ϕi)=1$, $ϕi∈F$ (lemma 47), we obtain $(aF|A=∑i=1|F|(ai|A=∑i=1|F|(ai|=(aF|$. $▪$

When applied to the case of projections, the above lemma gives the following.

Corollary 34. Let $ϕ$ be a pure state in the face $F⊆St1(A)$ and let $a∈Eff(A)$ be the atomic effect such that $(a|ϕ)=1$. Then we have $(a|ΠF=(a|$. Moreover, if $aF$ is the effect associated to the face $F$, then we have $(aF|=(aF|ΠF$.

The counterpart of corollary 34 is given as follows.

Lemma 52. Let $ψ$ be a pure state in the face $F⊥$ and let $b$ be the atomic effect such that $(b|ψ)=1$. Then, we have $(b|ΠF=0$. Moreover, if $aF⊥$ is the effect associated to the face $F⊥$, then we have $(aF⊥|ΠF=0$.

Proof. By lemma 16, the effect $(b|ΠF$ is atomic. Hence $(b|ΠF$ must be proportional to an atomic effect $b′$ with $∥b′∥=1$, for some proportionality constant $λ∈[0,1]$, that is $(b|ΠF=λ(b′|$. We want to prove that $λ$ is zero. By contradiction, suppose that $λ≠0$. Let $ψ′$ be the pure state such that $(b′|ψ′)=1$. Now, since $ΠF|ωF⊥)=0$, we have $0=(b|ΠF|ωF⊥)=λ(b′|ωF⊥)$, which implies $(b′|ωF⊥)=0$. Hence, $ψ′$ is perfectly distinguishable from $ωF⊥$, which in turn implies that $ψ′$ belongs to $(F⊥)⊥=F$. We then have $λ=(b|ΠF|ψ′)=(b|ψ′)=0$ (the last equality follows from the fact that $ψ$ and $ψ′$ belong to $F⊥$ and $F$, respectively, and hence are perfectly distinguishable). This is in contradiction with the assumption $λ≠0$, thus concluding the proof that $(b|ΠF=0$. Moreover, writing $aF⊥$ as $aF⊥=∑i=1|F⊥|bi$ with $(bi|ψi)=1$, $ψi∈F⊥$, we obtain $(aF⊥|ΠF=∑i=1|F⊥|(bi|ΠF=0$. $▪$

Combining corollary 34 and lemma 52 we obtain an important property of projections, expressed by the following.

Corollary 35. If $ΠF$ is a projection on the face $F$, then one has $(eA|ΠF=(aF|$.

Proof. The thesis follows from corollary 34 and lemma 52 and from the fact that $aF+aF⊥=e$. $▪$

In the following we will see that for every face $F$ there exists a unique projection. To prove that, let us start from the existence.

Lemma 53 (Existence of projections). For every face $F$ of $St1(A)$ there exists a projection $ΠF$.

Proof. By lemma 18, there exists a system $B$ and an atomic transformation $A∈Transf(A,B)$ with $(e|BA=(aF|$. Then, if $ΨωF∈St(AC)$ is a purification of $ωF$, we can define the state $|Σ)BC:=(A⊗IC)|ΨωF)AC$. By lemma 16 $Σ$ is a pure state. Moreover, the pure states $Σ$ and $ΨωF$ have the same marginal on system $C$: indeed, we have $(eB||Σ)=[(eB|A]|ΨωF)=(aF||ΨωF)$ and, by definition, $aF=ωFeA$, which by theorem 1 implies $(aF||ΨωF)=(eA||ΨωF)$. If $ϕ0$ and $ψ0$ are two arbitrary pure states of $A$ and $B$, respectively, the uniqueness of purification stated by postulate 1 implies that there exists a reversible channel $U∈GAB$ such that

###### (30)

Now, take the atomic effect $b∈Eff(B)$ such that $(b|ψ0)=1$, and define the transformation $ΠF∈Transf(A)$ as

Applying $b$ on both sides of Eq. (30) we then obtain

$(ΠF⊗IC)|ΨωF)=|ΨωF)$

and, therefore, $ΠF=ωFIA$. Moreover, the transformation $ΠF$ is atomic, being the composition of atomic transformations (lemma 16). Finally, we have $ΠF=ωF⊥0$: indeed, by construction of $ΠF$ we have

$eAΠFρ=eA⊗bU(A⊗IA)ρ⊗ϕ0⩽eA⊗eBU(A⊗IA)ρ⊗ϕ0=eAAρ=aFρ.$

This implies $(eA|ΠF|ωF⊥)=(aF|ωF⊥)0$ and, therefore, $ΠF=ωF⊥0$. In conclusion, $ΠF$ is the desired projection. $▪$

To prove the uniqueness of the projection $ΠF$ we need two auxiliary lemmas, given in the following.

Lemma 54. Let $Φ∈St1(AÃ)$ be a purification of the invariant state $χA$, and let $ΠF∈Transf(A)$ be a projection on the face $F⊆St1(A)$. Then, the pure state $ΦF∈St1(AÃ)$ defined by

$|ΦF):=dA|F|(ΠF⊗IÃ)|Φ)$
###### (31)

is a purification of $ωF$.

Proof. The state $ΦF$ is pure by lemma 16. Let us choose a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$ such that ${ϕi}i=1|F|$ is maximal in $F$. Now, we have

$(eÃ||ΦF)AÃ=dA|F|ΠF⊗(eÃ||Φ)AÃ=dA|F|ΠF|χA)$

having used the relation $(eÃ||Φ)AÃ=|χA)$ (corollary 16). We then obtain

$(eÃ||ΦF)AÃ=dA|F|ΠF|χA)=1|F|∑i=1dAΠFϕi=1|F|∑i=1|F||φi)=|ωF)$

having used that $χA=∑i=1dAϕi/dA$ (theorem 9), and the definition of $ΠF$. $▪$

Lemma 55. Let $ΠF∈Transf(A)$ be a projection. A transformation $C∈Transf(A)$ satisfies $C=ωFIA$ if and only if

$CΠF=ΠF.$
###### (32)

Proof. Let $ΦF$ be the purification of $ωF$ defined in lemma 54. Since $C=ωFIA$, we have $(C⊗I)|ΦF)=|ΦF)$. In other words, we have $(CΠF⊗I)|Φ)=(ΠF⊗I)|Φ)$. Since $Φ$ is dynamically faithful, this implies that $CΠF=ΠF$. Conversely, Eq. (32) implies that for $σ∈FωF$, $C|σ)=CΠF|σ)=ΠF|σ)=|σ)$, namely $C=ωFIA$. $▪$

Theorem 14 (Uniqueness of projections). The projection $ΠF$ satisfying definition 12 is unique.

Proof. Let $ΠF$ and $ΠF′$ be two projections on the same face $F$, and define the pure states $ΦF$ and $ΦF′$ as in lemma 54. Now, $ΦF$ and $ΦF′$ are both purifications of the same state $ω̃F∈Ã$ : indeed, one has

$eAΦFAÃ=dA|F|[eAΠF]ΦAÃ=dA|F|eFΦAÃ=dA|F|[eAΠF′]ΦAÃ=(eA||ΦF′)AÃ$

having used the relation $(eA|ΠF=(aF|=(eA|ΠF′$, which comes from corollary 34 and from the uniqueness of the effect $aF$ (lemma 48). By the uniqueness of purification, we have $|ΦF′)=(U⊗IÃ)|ΦF)$ for some reversible transformation $U∈GA$. This implies $(ΠF′⊗IÃ)|Φ)=(UΠF⊗IÃ)|Φ)$, and, since $Φ$ is dynamically faithful, $ΠF′=UΠF$. Since by definition 12 we have $ΠF′=ωFIA$ and $ΠF=ωFIA$, we can conclude that $U=ωFIA$. Finally, using lemma 55 with $C=U$ we obtain $ΠF′=UΠF=ΠF$. $▪$

We now show a few simple properties of projections. In the following, given a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$ and any subset $V⊆{1,⋯,dA}$ we define (with a slight abuse of notation) $ωV:=∑i∈Vϕi/|V|$, and $ΠV$ as the projection on the face $FV:=FωV$. We will refer to $FV$ as the face generated by $V$.

Lemma 56. For two arbitrary subsets $V,W⊆{1,⋯,dA}$ one has

$ΠVΠW=ΠV∩W.$

In particular, if $V∩W=∅$ one has $ΠVΠW=0$.

Proof. First of all, $ΠVΠW$ is atomic, being the product of two atomic transformations. Moreover, since the face $FV∩W$ is contained in the faces $FV$ and $FW$, we have $ΠVΠW|ρ)=ΠV|ρ)=|ρ)$ for every $ρ∈FV∩W$. In other words, $ΠVΠW=ωV∩WIA$. Moreover, if $l∉V∩W$ we have $ΠVΠW|ϕl)=0$. By lemma 49 and and by the uniqueness of projections (theorem 14) we then obtain that $ΠVΠW$ is the projection on the face generated by $V∩W$. $▪$

Corollary 36 (Idempotence). Every projection $ΠF$ satisfies the identity $ΠF2=ΠF$.

Proof. Consider a maximal set of perfectly distinguishable pure states ${ϕi}i=1dA$ such that ${ϕi}i∈V$ is maximal in $F$. In this way $F$ is the face generated by $V$, and, therefore $ΠF=ΠV$. The thesis follows by taking $V=W$ in lemma 56. $▪$

Corollary 37. For every state $ρ∈St1(A)$ such that $ρ∉F⊥$, the normalized state $ρ′$ defined by

$ρ′=ΠFρeΠFρ$
###### (33)

belongs to the face $F$.

Proof. By corollary 35, we have $(e|ΠF=(aF|$. Since $ρ∉F⊥$, we must have $(e|ΠF|ρ)=(aF|ρ)>0$, and, therefore, the state $ρ′$ in Eq. (33) is well defined. Moreover, using the definition of $ρ′$ we obtain

$aFρ′=aFΠFρeΠFρ=1$

having used corollaries 34 and 35 for the last equality. Finally, lemma 46 implies that $ρ′$ belongs to the face $F$. $▪$

Corollary 38. Let $Π{ϕ}$ be the projection on the pure state $ϕ∈St1(A)$ and $a$ be the atomic effect such that $(a|ϕ)=1$. Then for every state $ρ∈St1(A)$ one has $Π{ϕ}|ρ)=p|ϕ)$ where $p=(a|ρ)$.

Proof. Recall that, by corollary 35, we have $(a|=(e|Π{ϕ}$. If $(a|ρ)=0$ then clearly $Π{ϕ}|ρ)=0$. Otherwise, the proof is a straightforward application of corollary 37. $▪$

We conclude the present subsection with a result that will be useful in the next subsection.

Lemma 57. An atomic transformation $A∈Transf(A)$ satisfies $A=ωFIA$ if and only if

$ΠFA=ΠF.$
###### (34)

Proof. Suppose that $A=ωFIA$. Let $Φ∈St1(AÃ)$ be a purification of the invariant state $χA$ and define the two pure states

$|ΦF):=dA|F|(ΠF⊗IÃ)|Φ),|ΦF′)=:dA|F|(ΠFA⊗IÃ)|Φ).$

Then we have

$(eA||ΦF′)=[(aF|A]|Φ)=(aF||Φ)=(eA||ΦF)$

having used the condition $(aF|A=(aF|$ (lemma 51). Now we proved that $ΦF$ and $ΦF′$ have the same marginal on system $Ã$. By the uniqueness of purification, there exists a reversible transformation $V∈GA$ such that $|ΦF′)=(V⊗IÃ)|ΦF)$. Since $Φ$ is dynamically faithful, this implies $ΠFA=VΠF$.

Now, for every $ρ$ in $F$ one has $V|ρ)=VΠF|ρ)=ΠFA|ρ)=|ρ)$, namely $V=ωFIA$. Applying lemma 55 with $C=VΠF$ and using the idempotence of projections we then obtain

$ΠFA=VΠF=(VΠF)ΠF=ΠFΠF=ΠF.$

Conversely, suppose that Eq. (34) is satisfied. Let $ϕ∈F$ be a pure state in $F$ and $a$ be the atomic effect such that $(a|ϕ)=1$. Then, we have

$aAϕ=(a|ΠFA|ϕ)=(a|ΠF|ϕ)=aϕ=1$

having used the relation $(a|ΠF=(a|$ (corollary 34). Then, by theorem 7 $Aϕ=ϕ$. Since $ϕ∈F$ is arbitrary this implies $A=ωFIA$. $▪$

###### C. Projection of a pure state on two orthogonal faces

In Sec. X we proved a number of results concerning two-dimensional systems. Some properties of two-dimensional systems will be extended to the case of generic systems using the following lemma.

Lemma 58. Consider a pure state $ϕ∈St1(A)$ and two complementary projections $ΠF$ and $ΠF⊥$. Then $ϕ$ belongs to the face identified by the state $|θ):=(ΠF+ΠF⊥)|ϕ)$.

Proof. If $ΠF|ϕ)=0$ (or $ΠF⊥|ϕ)=0$), then there is nothing to prove: this means that $ΠF⊥|ϕ)=|ϕ)$ (or $ΠF|ϕ)=|ϕ)$) and the thesis is trivially true. Suppose now that $ΠF|ϕ)≠0$ and $ΠF⊥|ϕ)≠0$. Using the notation $Π1:=ΠF$, $Π2:=ΠF⊥$, we can define the two pure states $|ϕi):=Πi|ϕ)/(e|Πi|ϕ)$, $i=1,2$, and the probabilities $pi=(e|Πi|ϕ)$. In this way we have $Πi|ϕ)=pi|ϕi)$ for $i=1,2$ and $θ=p1ϕ1+p2ϕ2$. Taking the atomic effect $(ai|$ such that $(ai|ϕi)=1$ we have $aFθ=a1+a2$, where $aFθ$ is the effect associated to the face $Fθ$. Recalling that $(ai|Πi=(ai|$ for $i=1,2$ (corollary 34), we then conclude the following:

$(aFθ|ϕ)=[(a1|+(a2|]|ϕ)=(a1|Π1|ϕ)+(a2|Π2|ϕ)=∑i=1,2piaiϕi=1.$

Finally, lemma 46 yields $ϕ∈Fθ$. $▪$

A consequence of lemma 58 is the following.

Lemma 59. Let $ϕ∈St1(A)$ be a pure state, $a∈Eff(A)$ be the unique atomic effect such that $(a|ϕ)=1$, and $F$ be a face in $St1(A)$. If $ρ$ is perfectly distinguishable from $ΠF|ϕ)$ and from $ΠF⊥|ϕ)$ then $ρ$ is perfectly distinguishable from $|ϕ)$. In particular, one has $(a|ρ)=0$.

Proof. Since $ρ$ is perfectly distinguishable from $ΠF|ϕ)$ and $ΠF⊥|ϕ)$, it is also perfectly distinguishable from any convex combination of them (corollary 33). Equivalently, $ρ$ is perfectly distinguishable from the face $Fθ$ identified by $|θ):=ΠF|ϕ)+ΠF⊥|ϕ)$. In particular, it must be perfectly distinguishable from $ϕ$, which belongs to $Fθ$ by virtue of lemma 58. If $a$ is the atomic effect such that $(a|ϕ)=1$, then by lemma 36 we have $(a|ρ)=0$. $▪$

A technical result that will be useful in the following.

Lemma 60. Let $ϕ∈St1(A)$ be a pure state such that $ΠF|ϕ)≠0$ and $ΠF⊥|ϕ)≠0$. Define the pure states $|ϕ1):=ΠF|ϕ)/(e|ΠF|ϕ)$ and $|ϕ2):=ΠF⊥|ϕ)/(e|ΠF⊥|ϕ)$ and the mixed state $|θ):=(ΠF+ΠF⊥)|ϕ)$. Then, we have

$ΠFΠFθ=Π{ϕ1},ΠF⊥ΠFθ=Π{ϕ2}.$

Proof. Let ${ψi}i=1|F|$ be a maximal set of perfectly distinguishable pure states in $F$, chosen in such a way that $ψ1=ϕ1$, and let ${ψi}i=|F|+1dA$ be a maximal set of perfectly distinguishable pure states in $F⊥$, chosen in such a way that $ψ|F|+1=ϕ2$. Defining the sets $V:={1,⋯,|F|}$, $W:={|F|+1,⋯,dA}$, and $U:={1,|F|+1}$ we then have $ΠV=ΠF$, $ΠW=ΠF⊥$, and $ΠU=ΠFθ$. Using lemma 56 we obtain

$ΠFΠθ=ΠVΠU=ΠV∩U=Π{ψ1}=Π{ϕ1}$

and

$ΠF⊥Πθ=ΠWΠU=ΠW∩U=Π{ψ|F|+1}=Π{ϕ2}$

$▪$

We conclude this subsection with an important observation about the group of reversible transformations that act as the identity on two orthogonal faces $F$ and $F⊥$. If $F$ is a face of $St1(A)$, let us define $GF,F⊥$ as the group of all reversible transformations $U∈GA$ such that

$U=ωFIA,U=ωF⊥IA.$

Then we have the following.

Theorem 15. For every face $F⊂St1(A)$ such that $F≠{0}$ and $F≠St1(A)$, the group $GF,F⊥$ is topologically equivalent to a circle.

Proof. Let $U$ be a transformation in $GF,F⊥$, $Φ∈St(AÃ)$ be a purification of the invariant state $χA$ and $|ΦU):=(U⊗IÃ)|Φ)$ be the Choi state of $U$. Define the orthogonal faces $F̃:=FωF⊗χÃ$ and $F̃⊥=FωF̃⊗χÃ$, and the projections $ΠF̃:=ΠF⊗IÃ$ and $ΠF̃⊥:=ΠF⊥⊗IÃ$ (see lemma 50). Using lemma 57 we then obtain

$ΠF̃ΦU=(ΠF⊗IÃ)ΦU=(ΠFU⊗IÃ)Φ=(ΠF⊗IÃ)Φ=|F|dAΦF$

and, similarly,

$ΠF̃⊥ΦU=(ΠF⊥⊗IÃ)ΦU=(ΠF⊥U⊗IÃ)Φ=(ΠF⊥⊗IÃ)Φ=|F⊥|dAΦF⊥.$

This means that the projections of $ΦU$ on the faces $F̃$ and $F̃⊥$ are independent of $U$. Also, it means that $ΦU$ belongs to the face $Fθ$ identified by the state $|θ):=|F|dA|ΦF)+|F⊥|dA|ΦF⊥)$ (lemma 58). Now, by the compression axiom, $Fθ$ is isomorphic to the state space of a qubit, say with $ΦF$ and $ΦF⊥$ indicating the north and south poles of the Bloch sphere, respectively, and we know that all the Choi states ${ΦU}U∈GF,F⊥$ are at the same latitude [precisely, the latitude is the angle $ζ$ given by $cosζ=(|F|−|F⊥|)/dA$]. This implies that the states ${ΦU}U∈GF,F⊥$ are a subset of a circle $Cζ$ in the Bloch sphere describing the face $Fθ$. Precisely, the circle $Cζ$ is given by

$Cζ:={Ψ∈Fθ⊥|Π{ΦF}Ψ=|F|dAΦF,Π{ΦF⊥}Ψ=|F⊥|dAΦF⊥}.$

We now prove that in fact they are the whole circle. Let $Ψ$ be a state in $Cζ$. Since $|Ψ)$ belongs to the face $Fθ$, we obtain

$(ΠF⊗IÃ)Ψ=ΠF̃Ψ=ΠF̃ΠFθΨ=Π{ΦF}Ψ=|F|dAΦF$

(the third equality comes from lemma 60 with the substitutions $F→F̃$, $ϕ→Ψ$, $ϕ1→ΦF$, and $ϕ2→ΦF⊥$) and, similarly,

$(ΠF⊥⊗IÃ)Ψ=ΠF̃⊥Ψ=ΠF̃⊥ΠFθΨ=Π{ΦF⊥}Ψ=|F⊥|dAΦF⊥.$

Therefore, we have

$eAΨ=[(aF|+(aF⊥|]Ψ=[eAΠF⊗IÃ]Ψ+[eAΠF⊥⊗IÃ]Ψ=|F|dAeAΦF+|F⊥|dAeAΦF⊥=[eAΠF⊗IÃ]Φ+[eAΠF⊥⊗IÃ]Φ=[aF+(aF⊥|]Φ=eAΦ=|χÃ).$

Since $Ψ$ and $Φ$ are both purifications of the invariant state $χÃ$, by the uniqueness of purification there must be a reversible transformation $U∈GA$ such that $|Ψ)=(U⊗IÃ)|Φ)$. Finally, it is easy to check that $ΠFU=ΠF$ and $ΠF⊥U=ΠF⊥$, which, by lemma 57 implies $U=ωFIA$ and $U=ωF⊥IA$. This proves that the Choi states ${ΦU}U∈GF,F⊥$ are the whole circle $Cζ$. Since the Choi isomorphism is continuous in the operational norm (see theorem 14 of [22]), the group $GF,F⊥$ is topologically equivalent to a circle. $▪$

##### XII. THE SUPERPOSITION PRINCIPLE

The validity of the superposition principle, proved for two-dimensional systems using the geometry of the Bloch sphere (corollary 31), can be now extended to arbitrary systems thanks to lemma 58.

Theorem 16 (Superposition principle for general systems). Let ${ϕi}i=1dA⊆St1(A)$ be a maximal set of perfectly distinguishable pure states and ${ai}i=1dA$ be the observation test such that $(ai|ϕj)=δij$. Then, for every choice of probabilities ${pi}i=1dA$, $pi⩾0,∑i=1dApi=1$ there exists at least one pure state $ϕp∈St1(A)$ such that

$pi=aiϕp∀i=1,⋯,dA$
###### (35)

or, equivalently,

$Π{ϕi}|ϕp)=piϕi∀i=1,⋯,dA,$
###### (36)

where $Π{ϕi}$ is the projection on $ϕi$.

Proof. Let us first prove the equivalence between Eqs. (35) and (36). From Eq. (36) we obtain Eq. (35) using the relation $(e|Π{i}=(ai|$, which follows from corollary 35. Conversely, from Eq. (35) we obtain Eq. (36) using corollary 38. Now, we will prove Eq. (35) by induction. The statement for $N=2$ is proved by corollary 31. Assume that the statement holds for every system $B$ of dimension $dB=N$ and suppose that $dA=N+1$. Let $F$ be the face identified by $ωF=1/N∑i=1Nϕi$ and $F⊥$ be the orthogonal face, identified by the state $ϕN+1$. Now there are two cases: either $pN+1=1$ or $pN+1≠1$. If $pN+1=1$, then there is nothing to prove: the desired state is $ϕN+1$. Then, suppose that $pN+1≠1$. Using the induction hypothesis and the compression axiom 3 we can find a state $ψq∈F$ such that $(ai|ψq)=qi$, with $qi=pi/(1−pN+1)$, $i=1,⋯,N$. Let us then define a new maximal set of perfectly distinguishable pure states ${ϕi′}i=1N+1$, with $ϕ1′=ψq$ and $ϕN+1′=ϕN+1$. Note that one has $ωF=1/N∑i=1Nϕi′$, that is, $F$ is the face generated by the states ${ϕi′}i=1N$. Now consider the two-dimensional face $F′$ identified by $θ=1/2(ϕ1′+ϕN+1′)$. By corollary 31 (superposition principle for qubits) we know that there exists a pure state $ϕ∈F′$ with $(a1′|ϕ)=1−pN+1$ and $(aN+1′|ϕ)=pN+1$. Let us define $V:={1,⋯,N}$ and $W:={1,N+1}$. Then, we have $ΠF=ΠV$ and $ΠF′=ΠW$, and by lemma 56,

$ΠF|ϕ)=ΠFΠF′|ϕ)=ΠV∩W|ϕ)=Π{ϕ1′}|ϕ)=Π{ψq}|ϕ)=(1−pN+1)|ψq)$

having used corollary 38 for the last equality. Finally, for $i=1,⋯,N$ we have

$(ai|ϕ)=(ai|ΠF|ϕ)=(1−pN+1)(ai|ψq)=(1−pN+1)qi=pi.$

On the other hand we have $(aN+1|ϕ)=(aN+1′|ϕ)=pN+1$. $▪$

###### A. Completeness for purification

Using the superposition principle and the spectral decomposition of theorem 10 we can now show that every state of system $A$ has a purification in $AB$ provided $dB⩾dA$:

Lemma 61. For every state $ρ∈St1(A)$ and for every system $B$ with $dB⩾dA$ there exists a purification of $ρ$ in $St1(AB)$.

Proof. Take the spectral decomposition of $ρ$, given by $ρ=∑i=1dApiϕi$, where ${pi}$ are probabilities and ${ϕi}i=1dA⊂St1(A)$ is a maximal set of perfectly distinguishable pure states. Let ${ψi}i=1dB$ be a maximal set of perfectly distinguishable pure states and ${ai}i=1dA⊂Eff(A)$ [or ${bi}i=1dB⊂Eff(B)$] be the test such that $(ai|ϕj)=δij$ [or $(bi|ψj)=δij$]. Clearly ${ϕi⊗ψj}$ is a maximal set of perfectly distinguishable pure states for $AB$. Then, by the superposition principle (theorem 16) there exists a pure state $Ψρ$ such that $(ai⊗bj|Ψρ)=piδij$. Equivalently, we have $(bi|B|Ψρ)AB=pi|ϕi)A$ for every $i=1,⋯,dA$ and $(bi|B|Ψρ)AB=0$ for $i>dA$. Summing over $i$ we then obtain $(e|B|Ψρ)AB=∑i=1dB(bi|B|Ψρ)AB=∑i=1dApi|ϕi)A=|ρ)A$. $▪$

In the terminology of Ref. [22], lemma 61 states that a system $B$ with $dB⩾dA$ is complete for the purification of system $A$.

As a consequence of lemma 61 we have the following.

Corollary 39. Every system $B$ with $dB=dA$ is operationally equivalent to the conjugate system $Ã$.

Proof. By corollary 61, the invariant state $χA∈St1(A)$ has a purification $Ψ$ in $St1(AB)$. By corollary 18, the marginal of $Ψ$ on $B$ is the invariant state $χB$. By definition, this means that $B$ is a conjugate system of $A$. Since the conjugate system $Ã$ is unique up to operational equivalence (corollary 16), this implies the thesis. $▪$

###### B. Equivalence of systems with equal dimension

We are now in position to prove that two systems $A$ and $B$ with the same dimension are operationally equivalent, namely that there is a reversible transformation from $A$ to $B$. In other words, we prove that the informational dimension classifies the systems of our theory up to operational equivalence. The fact that this property is derived from the principles, rather than being assumed from the start, is one of the important differences of our work with respect to Refs. [16–18]. Another difference is that here the equivalence of systems with the same dimension is proved after the derivation of the qubit, whereas in Refs. [16–18] the derivation of the qubit requires the equivalence of systems with the same dimension.

Corollary 40 (Operational equivalence of systems with equal dimension). Every two systems $A$ an $B$ with $dA=dB$ are operationally equivalent.

Proof. By corollary 39, $A$ and $B$ are both operationally equivalent to the conjugate system $Ã$. Hence they are operationally equivalent to each other. $▪$

###### C. Reversible operations of perfectly distinguishable pure states

An important consequence of the superposition principle is the possibility of transforming an arbitrary maximal set of perfectly distinguishable pure states into another via a reversible transformation:

Corollary 41. Let $A$ and $B$ be two systems with $dA=dB=:d$ and let ${ϕi}i=1d$ (or ${ψi}i=1d$) be a maximal set of perfectly distinguishable pure states in $A$ (or $B$). Then, there exists a reversible transformation $U∈Transf(A,B)$ such that $U|ϕi)=|ψi)$.

Proof. Let $Φ∈St(AÃ)$ be a purification of the invariant state $χA$. Although we know that $A$ and $Ã$ are operationally equivalent (corollary 39) we use the notation $A$ and $Ã$ to distinguish between the two subsystems of $AÃ$. Define the pure state $ϕ̃i$ via the relation $(ai|A|Φ)AÃ=1d|ϕ̃i)Ã$, where ${ai}i=1d$ is the observation test such that $(ai|ϕi)=δij$. Let ${ãi}i=1d$ be the observation test such that $(ãi|ϕ̃j)=δij$. Then, by lemma 30 we have

$ãiÃΦAÃ=1dϕiA.$
###### (37)

On the other hand, if ${bi}i=1d$ is the observation test such that $(bi|ψj)=δij$, then using the superposition principle (theorem 16) we can construct a state $Ψ∈St1(BÃ)$ such that $(bi⊗ãj|Ψ)=δij/d$, or, equivalently,

$ãiÃΨBÃ=1dψiB.$
###### (38)

Now, $Φ$ and $Ψ$ have the same marginal on system $Ã$: they are both purifications of the invariant state $χÃ$. Moreover, $A$ and $B$ are operationally equivalent because they have the same dimension (corollary 40). Hence, by the uniqueness of purification, there must be a reversible transformation $U∈Transf(A,B)$ such that

$ΨBÃ=(U⊗IÃ)ΦAÃ.$
###### (39)

Combining Eqs. (37), (38), and (39) we finally obtain

$1dU|ϕi)A=[U⊗(ãi|Ã]|Φ)BÃ=ãiÃΨBÃ=1dψiB,$

that is, $U|ϕi)=|ψi)$ for every $i=1,⋯,d$. $▪$

##### XIII. DERIVATION OF THE DENSITY MATRIX FORMALISM

The goal of this section is to show that our set of axioms implies that

• (1) the set of states for a system $A$ of dimension $dA$ is the set of density matrices on the Hilbert space $CdA$,

• (2) the set of effects is the set of positive matrices bounded by the identity, and

• (3) the pairing between a state and an effect is given by the trace of the product of the corresponding matrices.

Using the result of theorem 3, we will then obtain that all the physical transformations in our theory are exactly the physical transformations allowed in quantum mechanics. This will conclude our derivation of quantum theory.

###### A. The basis

In order to specify the correspondence between states and matrices we choose a particular basis for the vector space $StR(A)$. For this purpose, we adopt the choice of basis used in Ref. [16]. The basis is constructed as follows: Let us first choose a maximal set of $dA$ perfectly distinguishable states ${ϕm}m=1dA$, and declare that they are the first $dA$ basis vectors. Then, for every $m the face $Fmn$ generated by ${ϕm,ϕn}$ defines a “two-dimensional subsystem”: precisely, the face $Fmn:=Fωmn$ with $ωmn:=ϕm+ϕn2$ can be ideally encoded in a two-dimensional system. Now, the convex set of states of a two-dimensional system is the Bloch sphere, and we can choose the $z$ axis to be the line joining the two states ${ϕm,ϕn}$, for example, with the positive direction of the $z$ axis being the direction from $ϕm$ to $ϕn$. Once the direction of the $z$ axis has been specified, we can choose the $x$ and $y$ axes. Note that any couple of orthogonal directions in the plane orthogonal to $z$ axis is a valid choice for the $x$ and $y$ axes (here we do not restrict ourselves to the choice of a right-handed coordinate system). At the moment there is no relation among the different choices of axes made for different values of $m$ and $n$. However, to prove that the states are represented by positive matrices, later we will have to find a suitable way of connecting all these choices of axes.

Let $ϕx,+mn,ϕx,−mn∈Fmn$ ( $ϕy,+mn,ϕy,−mn∈Fmn$) be the two perfectly distinguishable states in the direction of the $x$ axis ( $y$ axis) and define

$σkmn:=ϕk,+mn−ϕk,−mn,k=x,y.$
###### (40)

An immediate observation is the following.

Lemma 62. The four vectors ${ϕm,ϕn,σxmn,σymn}⊆StR(A)$ are linearly independent.

Proof. Linear independence is evident from the geometry of the Bloch sphere. $▪$

We now show that the collection of all vectors obtained in this way is a basis for $StR(A)$. To this purpose we use the following.

Lemma 63. Let $V⊂{1,⋯,dA}$, and consider the projection $ΠV$. Then, for $m∈V$ and $n∉V$, one has $ΠV|σkmn)=0$ for $k=x,y$.

Proof. Using lemma 56 and corollary 38 we obtain

$ΠV|ϕk,±mn)=ΠVΠ{m,n}|ϕk,±mn)=Π{m}|ϕk,±mn)=|ϕm)am|ϕk,±mn.$

Since the face $Fmn$ is isomorphic to the Bloch sphere and the state since $ϕk±mn$, $k=x,y$ lie on the equator of the Bloch sphere, we know that $(am|ϕk±mn)=12$. This implies

$ΠV|σkmn)=ΠV|ϕk,+mn)−|ϕk,−mn)=ϕm12−12=0.$

$▪$

Lemma 64. The vectors ${ϕn}m=1dA∪{σkmn}n>m=1,⋯,dAk=x,y$ form a basis for $StR(A)$.

Proof. Since the number of vectors is exactly $dA2$, to prove that they form a basis it is enough to show that they are linearly independent. Suppose that there exists a vector of coefficients ${cm}∪{ckmn}$ such that

$∑mcmϕm+∑n>m,k=x,yckmnσkmn=0.$

Applying the projection $Π{m,n}$ on both sides and using lemma 63 we obtain

$cm|ϕm)+cn|ϕn)+cmnx|σxmn)+cymn|σymn)=0.$

However, from lemma 62 we know that the vectors ${ϕm,ϕn,σxmn,σymn}$ are linearly independent. Consequently, $cm=cn=cmnk=0$ for all $m,n,k$. $▪$

###### B. The matrices

Since the state space $St(A)$ for system $A$ spans a real vector space of dimension $DA=dA2$, we can decide to represent the vectors ${ϕm}m=1dA∪{σkmn}n>m=1,⋯,dAk=x,y$ as Hermitian $dA×dA$ matrices. Precisely, we associate the vector $ϕm$ to the matrix $Sϕm$ defined by

$Sϕmrs=δrmδsm,$
###### (41)

the vector $σxmn$ to the matrix

$Sσxmnrs=δrmδsn+δrnδsm$
###### (42)

and the vector $σymn$ to the matrix

$Sσymnrs=iλδrmδsn−δrnδsm,$
###### (43)

where $λ$ can take the values $+1$ or $−1$. The freedom in the choice of $λ$ will be useful in Sec. XIII C, where we will introduce the representation of composite systems of two qubits. However, this choice of sign plays no role in the present subsection, and for simplicity we will take the positive sign.

Recall that in principle any orthogonal direction in the plane orthogonal to the $z$ axis can be chosen to be the $x$ axis. In general, the other possible choices for the $x$ axis will lead to matrices of the form

$Sσx,θmnrs=δrmδsneiθ+δrnδsme−iθ,θ∈[0,2π),$
###### (44)

and the corresponding choice for the $y$ axis will lead to a matrices of the form

$Sσy,θmnrs=iλ(δrmδsneiθ−δrnδsme−iθ),θ∈[0,2π).$
###### (45)

Since the vectors ${ϕm}m=1dA∪{σkmn}n>m=1,⋯,dA;k=x,y$ are a basis for the real vector space $StR(A)$, we can expand any state $ρ∈St(A)$ on them:

$ρ=∑mρmϕm+∑n>m,k=x,yρkmnσkmn$
###### (46)

and the expansion coefficients ${ρm}m=1dA∪{ρkmn}n>m=1,⋯,dA;k=x,y$ are all real. Hence each state $ρ$ is in one-to-one correspondence with a Hermitian matrix, given by

$Sρ=∑mρmSϕm+∑n>m,k=x,yρkmnSσkmn.$
###### (47)

Since effects are linear functionals on states, they are also represented by Hermitian matrices. We will indicate with $Ea$ the Hermitian matrix associated to the effect $a∈Eff(A)$. The matrix $Ea$ is uniquely defined by the relation

$aρ=Tr[EaSρ].$

In the rest of the section we show that the set of matrices ${Sρ|ρ∈St1(A)}$ is the whole set of positive Hermitian matrices with unit trace and that the set of matrices ${Ea|a∈Eff(A)}$ is the set of positive Hermitian matrices bounded by the identity.

Let us start from some simple facts:

Lemma 65. The invariant state $χA$ has matrix representation $SχA=IdAdA$, where $IdA$ is the identity matrix in dimension $dA$.

Proof. Obvious from the expression $χA=1d∑mϕm$ and from the matrix representation of the states ${ϕm}m=1dA$ in Eq. (41). $▪$

Lemma 66. Let $am∈Eff(A)$ be the atomic effect such that $(am|ϕm)=1$. Then, the effect $am$ has matrix representation $Eam$ such that $Eam=Sϕm$.

Proof. Let $ρ∈St1(A)$ be an arbitrary state. Expanding $ρ$ as in Eq. (46) and using lemma 62 we obtain $(am|ρ)=ρm$. On the other hand, by Eq. (47) we have that $ρm$ is the $m$th diagonal element of the matrix $Sρ$: by definition of $Sϕm$ [Eq. (41)], this implies $ρm=Tr[SϕmSρ]$. Now, by construction we have $Tr[EamSρ]=(am|ρ)=ρm=Tr[SϕmSρ]$ for every $ρ∈St1(A)$. Hence $Eam=Sϕm$. $▪$

Lemma 67. The deterministic effect $e∈Eff(A)$ has matrix representation $Ee=IdA$.

Proof. Obvious from the expression $e=∑mam$, combined with lemma 66 and Eq. (41). $▪$

Corollary 42. For every state $ρ∈St1(A)$ one has

$Tr[Sρ]=1.$

Proof. $Tr[Sρ]=Tr[EeSρ]=(e|ρ)=1$. $▪$

Theorem 17. The matrix elements of $Sϕ$ for a pure state $ϕ∈St1(A)$ are $(Sϕ)mn=pmpneiθmn$, with $∑m=1dApm=1$, $θmn∈[0,2π)$, $θmn=0$ and $θmn=−θnm$.

Proof. First of all, the diagonal elements of $Sϕ$ are given by $[Sϕ]mm=(am|ϕ)$ [cf. Eqs. (46) and (47)]. Denoting the $m$th element by $pm$, we clearly have $∑m=1dApm=(e|ϕ)=1$. Now, the projection $Π{m,n}|ϕ)$ is a state in the face $Fmn$, and, by our choice of representation, the corresponding matrix $SΠ{m,n}|ϕ)$ is proportional to a pure qubit state (nonnegative rank-one matrix). On the other hand, it is easy to see from Eqs. (46) and (47) that $SΠ{m,n}|ϕ)$ is the matrix with the same elements as $Sϕ$ in the block corresponding to the qubit $(m,n)$ and 0 elsewhere. In order to be positive and rank-one the corresponding $2×2$ submatrix must have the off-diagonal elements $(Sϕ)mn=pmpneiθmn$ for some $θmn∈[0,2π)$ with $θnm=−θmn$. Repeating the same argument for all choices of indices $m,n$, the thesis follows. $▪$

Theorem 18. For a pure state $ϕ∈St1(A)$, the corresponding atomic effect $aϕ$ such that $(aϕ|ϕ)=1$ has a matrix representation $Eϕ$ with the property that $Eϕ=Sϕ$.

Proof. We already know that the statement holds for $dA=2$, where we proved the Bloch sphere representation, equivalent to the fact that states and effects are represented as $2×2$ positive complex matrices, with the set of pure states identified with the set of all rank-one projectors. Let us now consider a generic system $A$. For every $m, the face $Fmn$ generated by ${ϕm,ϕn}$ can be encoded in a two-dimensional system. Therefore, the matrices $SΠ{m,n}|ϕ)$ and $E(a|Π{m,n}$ are positive [also, recall that all matrix elements outside the $(m,n)$ block are zero]. Let $ϕ⊥(mn)$ be the pure state in the face $Fmn$ that is perfectly distinguishable from $Π{m,n}|ϕ)$. Note that, since $ϕ⊥(mn)$ belongs to the face $Fmn$, it is also perfectly distinguishable from $Π{1,⋯,dA}∖{m,n}|ϕ)$. Hence $ϕ⊥(mn)$ is perfectly distinguishable from $ϕ$ and, in particular, $(a|ϕ⊥(mn))=0$ (lemma 59). This implies the relation

$TrEaΠ{m,n}S|ϕ⊥(mn))=aΠ{m,n}|ϕ⊥(mn))=(a|ϕ⊥(mn))=0.$

Now, since the matrix $E(a|Π{m,n}$ is positive, the above relation implies $E(a|Π{m,n}=cmnSΠ{m,n}|ϕ)$, where $cmn⩾0$. Finally, repeating the argument for all possible values of $(m,n)$, we obtain that $cmn=c$ for every $m,n$, that is, $Ea=cSϕ$. Taking the trace on both sides we obtain $Tr[Ea]=c$. To prove that $c=1$, we use the relation $Tr[Ea]/dA=(a|χA)=1/dA$. $▪$

We conclude with a simple corollary that will be used in the next subsection.

Corollary 43. Let $ϕ∈St1(A)$ be a pure state and let ${γi}i=1r⊂St1(A)$ be a set of pure states. If the state $ϕ$ can be written as

$ϕ=∑ixiγi$

for some real coefficients ${xi}i=1r$, then the atomic effect $a$ such that $(a|ϕ)=1$ is given by

$(a|=∑ixi(ci|,$

where $ci$ is the atomic effect such that $(ci|γi)=1$.

Proof. For every $ρ∈St(A)$ by theorem 18 one has

$aρ=Tr[EaSρ]=Tr[SϕSρ]=∑ixiTr[SγiSρ]=∑ixiTr[EciSρ]=∑ixiciρ,$

thus implying the thesis. $▪$

###### C. Choice of axes for a two-qubit system

If $A$ and $B$ are two systems with $dA=dB=2$, then we can use two different types of matrix representations for the states of the composite system $AB$.

The first type of representation is the representation $Sϕ$ introduced through lemma 64: here we will refer to it as the standard representation. Note that there are many different representations of this type because for every pair $(m,n)$ there is freedom in choice of the $x$ and $y$ axis [cf. Eqs. (44) and (45)].

The second type of representation is the tensor product representation $Tϕ$, defined by the tensor product of matrices representing states of systems $A$ and $B$: for a state $|ρ)=∑i,jρij|αi)|βj)$, with $αi∈St(A),βj∈St(B)$, we have

$Tρ:=∑i,jρijSαiA⊗SβjB,$
###### (48)

where $SA$ (or $SB$) is the matrix representation for system $A$ (or $B$). Here the freedom is in the choice of the axes for the Bloch spheres of qubits $A$ and $B$. Since $A$ and $B$ are operationally equivalent, we will indicate the elements of the bases for $StR(A)$ and $StR(B)$ with the same letters: ${ϕm}m=12$ for the two perfectly distinguishable pure states and ${σk}k=x,y$ for the remaining basis vectors.

We now show a few properties of the tensor representation. Let $FA$ denote the matrix corresponding to the effect $A∈Eff(AB)$ in the tensor representation, that is, the matrix defined by

$Aρ:=Tr[FATρ]∀ρ∈St(AB).$
###### (49)

It is easy to show that the matrix representation for effects must satisfy the analog of Eq. (48).

Lemma 68. Let $A∈Eff(AB)$ be a bipartite effect, written as $(A|=∑i,jAij(ai|(bj|$. Then one has

$FA=∑i,jAijEaiA⊗EbjB,$

where $EaiA$ (or $EbjB$) is the matrix representing the single-qubit effect $ai$ (or $bj$) in the standard representation for qubit $A$ (or $B$).

Proof. For every bipartite state $|ρ)=∑k,lρkl|αk)|βl)$ one has

$Tr[FATρ]=Aρ=∑i,j,k,lAijρklaiαk(bj|βl=∑i,j,k,lAijρklTrEaiASαkATrEbjBSβlB=∑i,j,k,lAijρklTrEaiA⊗EbjBTαk⊗βl=∑i,jAijTrEaiA⊗EbjBTρ$

which implies the thesis. $▪$

Corollary 44. Let $Ψ∈St1(AB)$ be a pure state and let $A∈Eff(AB)$ be the atomic effect such that $(A|Ψ)=1$. Then one has $FA=TΨ$.

Proof. Let ${ai}i=14$ (or ${βj}j=14$) be a set of pure states that span $StR(A)$ [or $StR(B)$] and expand $Ψ$ as $|Ψ)=∑i,jcij|αi)|βj)$. Then, corollary 43 yields $(A|=∑i,jcij(ai|(bj|$ where $ai$ and $bj$ are the atomic effects such that $(ai|αi)=(bj|βj)=1$. Therefore, we have

$FA=∑i,jcijEaiA⊗EbjB=∑i,jcijSαiA⊗SβjB=TΨ.$

$▪$

Corollary 45. For every bipartite state $ρ∈St1(AB)$, $dA=dB=2$ one has $Tr[Tρ]=1$.

Proof. For each qubit we have

$Ea1=1000,Ea2=0001.$
###### (50)

Hence $EeAA=EeBB=I$, where $I$ is the $2×2$ identity matrix. By lemma 68 we then have $FeA⊗eB=I⊗I$ and, therefore, $Tr[Tρ]=Tr[FeA⊗eBTρ]=(eA⊗eB|ρ)=1$. $▪$

Finally, an immediate consequence of local distinguishability is the following.

Lemma 69. Suppose that $U∈GA$ and $V∈GB$ are two reversible transformations for qubits $A$ and $B$, respectively, and that $U,V∈SU(2)$ are such that

$SUρA=USρAU†∀ρ∈St1(A),SVσB=VSσBV†∀σ∈St1(B).$

Then, we have $T(U⊗V)τ=(U⊗V)Tτ(U†⊗V†)$ for every $τ∈St1(AB)$.

Proof. The thesis follows by linearity expanding $τ$ as $τ=∑i,j=14τijαi⊗βj$, where ${αi}i=14$ and ${βj}j=14$ are bases for the $StR(A)$ and $StR(B)$. $▪$

The rest of this subsection is aimed at showing that, with a suitable choice of matrix representation for system $B$, the standard representation coincides with the tensor representation, that is, $Sρ=Tρ$ for every $ρ∈St(AB)$. This technical result is important because some properties used in our derivation are easily proved in the standard representation, while the property expressed by lemma 69 is easily proved in the tensor representation: it is then essential to show that we can construct a representation that enjoys both properties.

The four states ${ϕm⊗ϕn}m,n=12$ are clearly a maximal set of perfectly distinguishable pure states in $AB$. In the following we will construct the standard representation starting from this set.

Lemma 70. For a composite system $AB$ with $dA=dB=2$ one can choose the standard representation in such a way that the following equalities hold:

$Sφm⊗φn=Tφm⊗φn,$
###### (51)

$Sφm⊗σk=Tφm⊗σk, k=x,y,$
###### (52)

$Sσk⊗φm=Tσk⊗φm, k=x,y.$
###### (53)

Proof. Let us choose single-qubit representations $SA$ and $SB$ that satisfy Eqs. (41), (42), and (43). On the other hand, choosing the states ${ϕn⊗ϕn}$ in lexicographic order as the four distinguishable states for the standard representation, we have

$[Sϕ1⊗ϕ1]rs=δ1rδ1s,[Sϕ1⊗ϕ2]rs=δ2rδ2s,[Sϕ2⊗ϕ1]rs=δ3rδ3s,[Sϕ2⊗ϕ2]rs=δ4rδ4s.$

With this choice we get $Sϕm⊗ϕn=SϕmA⊗SϕnB=Tϕm⊗ϕn$ for every $m,n=1,2$. This proves Eq. (51). Let us now prove Eqs. (52) and (53). Consider the two-dimensional face $F11,12$, generated by the states $ϕ1⊗ϕ1$ and $ϕ1⊗ϕ2$. This face is the face identified by the state $ω11,12:=ϕ1⊗χB$, and we have $F11,12≃{ϕ1}⊗St1(B)$. Therefore we can choose the vectors $σk11,12$, $k=x,y$ to satisfy the relation $σk11,12:=ϕ1⊗σk$, $k=x,y$. Now, in the standard representation we have

$Sσx11,12rs=δr1δs2+δr2δs1,Sσy11,12rs=iλ(δr1δs2−δr2δs1)$

[cf. Eqs. (42) and (43)]. This implies $Sσk11,12=Sϕ11A⊗SσkB=Tϕ11⊗σk$ for $k=x,y$. Repeating the same argument for the face $F22,21$, $F11,21$, and $F21,22$ we obtain the proof of Eqs. (52) and (53). $▪$

In order to prove that, with a suitable choice of axes, the standard representation coincides with the tensor representation—that is, $Sρ=Tρ$ for every $ρ∈St(AB)$—it remains to find a choice of axes such that $Sσk⊗σl=Tσk⊗σl$, $k=x,y$. This will be proved in the following.

Lemma 71. Let $Φ∈St1(AB)$ be a pure state such that $(a1⊗a1|Φ)=(a2⊗a2|Φ)=1/2$ [such a state exists due to the superposition principle]. With a suitable choice of the matrix representation $SB$, the state $Φ$ is represented by the matrix

$TΦ=121001000000001001.$
###### (54)

Moreover, one has

$Φ=χA⊗χB+14(σx⊗σx−σy⊗σy+σz⊗σz).$
###### (55)

Proof. Let us start with the proof of Eq. (54). For every reversible transformation $U∈GA$, let $U*∈GB$ be the conjugate of $U$, defined with respect to the state $Φ$. Since all $2×2$ unitary (nontrivial) representations of $SU(2)$ are unitarily equivalent, by a suitable choice of the standard representation $SρB$ for system $B$, one has

$SU*ρB=U*SρBUT,$
###### (56)

where $U*$ and $UT$ are the complex conjugate and the transpose of the matrix $U∈SU(2)$ such that $SUρA=USρAU†$. Due to Eq. (56) and to lemma 69, the isotropic state $Φ$ must satisfy the condition $(U⊗U*)TΦ(U†⊗UT)=TΦ,∀U∈SU(2)$. Now, the unitary representation ${U⊗U*}$ has two irreducible subspaces and the projectors on them are given by the matrices

$P0=121001000000001001,P1=12100−102000020−1001=I⊗I−P0,$

where $I$ is the $2×2$ identity matrix. The most general form for $TΦ$ is then the following:

$TΦ=x0P0+x1P1=(x0−x1)P0+x1I⊗I=α+β00β0α0000α0β00α+β$

having defined $α:=x1$ and $β:=(x0−x1)/2$. Now, by construction the state $Φ$ satisfies the condition

$amAΦAB=12ϕmB,m=1,2.$

By definition of the tensor representation, the conditional states $(am|A|Φ)AB$ are described by the diagonal blocks of the matrix $TΦ$:

$Sa1ΦABB=α+β00α,Sa2ΦABB=α00α+β.$
###### (57)

Since the states $ϕ1$ and $ϕ2$ are pure, the above matrices must be be rank-one. Moreover, their trace must be equal to $(am⊗eB|Φ)=1/2(eB|ϕm)=12$, $m=1,2$. Then we have two possibilities. Either (i) $α=0$ and $β=12$ or (ii) $α=−β=12$. In case (i) Eq. (54) holds. In case (ii) to prove Eq. (54) we need to change our choice of matrix representation for the qubit $B$. Precisely, we make the following change:

$SσxB↦S̃σxB=−SσxB,SσyB↦S̃σyB=−SσyB,SσzB↦S̃σzB=−SσzB,$
###### (58)

where $σz:=ϕ1−ϕ2$. Note that the inversion of the axes, sending $σk$ to $−σk$ for every $k=x,y,z$ is not an allowed physical transformation, but this is not a problem here, because Eq. (58) is just a new choice of matrix representation, in which the set of states of system $B$ is still represented by the Bloch sphere.

More concisely, the change of matrix representation $SB↦S̃B$ can be expressed as

$SρB↦S̃ρB:=YSρBTY†,Y:=0−110.$

Note that in the new representation $S̃B$ the physical transformation $U*$ is still represented as $S̃UρB=U*S̃ρBUT$: indeed we have

$S̃U*ρB=YSU*ρBTY†=YU*SρBUTTY†=YUSρBTU†Y†=(YUY†)YSρBTY†(YU†Y†)=U*YSρBTY†UT=U*S̃ρBUT$

having used the relations $Y†Y=I$ and $YUY†=U*$ for every $U∈SU(2)$. Clearly the change of standard representation $S→S̃$ for the qubit $B$ induces a change of tensor representation $T→T̃$, where $T̃$ is the tensor representation defined by $T̃ρ⊗σ:=SρA⊗S̃σB$. With this change of representation, we have

$T̃Φ=121001000000001001.$

This concludes the proof of Eq. (54).

Let us now prove Eq. (55). Using the fact that by definition $Tρ⊗τ=(SρA⊗SτB)$ one can directly verify the relation

$TΦ=SχA⊗SχB+14SσxA⊗SσxB−SσyA⊗SσyB+SσzA⊗SσzB.$

This is precisely the matrix version of Eq. (55). $▪$

Note that the choice of $SB$ needed in Eq. (54) is compatible with the choice of $SB$ needed in lemma 70: indeed, to prove compatibility we only have to show that the representation $SB$ used in Eq. (54) has the property $[SϕmB]rs=δmrδms$, $m=1,2$. This property is automatically guaranteed by the relation $(am|A|Φ)AB=1/2|ϕm)$, $m=1,2$ and by Eq. (57) with $α=0$ and $β=1/2$.

Corollary 46. In the standard representation the state $Φ∈St1(AB)$ is represented by the matrix

$SΦ=12100eiθ00000000e−iθ001.$
###### (59)

Proof. The thesis follows from theorem 17 and lemma 70. $▪$

We now define the reversible transformations $Ux,π$ and $Uz,π2$ as follows:

$SUx,πρ=XSρX,X:=0110,SUz,π2ρ=e−iπ4ZSρeiπ4Z,Z:=100−1.$
###### (60)

Also, we define the states $Ψ,Φz,π2$, and $Ψz,π2$ as

$Ψ:=Ux,π⊗IΦ,Φz,π2:=Uz,π2⊗IΦ,Ψz,π2:=Uz,π2⊗IΨ.$

Lemma 72 The states $Ψ,Φz,π2$, and $Ψz,π2$ have the following tensor representation:

$TΨ=120000011001100000,TΦz,π2=12100−i00000000i001,TΨz,π2=12000001−i00i100000.$
###### (61)

Moreover, one has

$Ψ=χA⊗χB+14(σx⊗σx+σy⊗σy−σz⊗σz),Φz,π2=χA⊗χB+14(σy⊗σx+σx⊗σy+σz⊗σz),Ψz,π2=χA⊗χB+14(σy⊗σx−σx⊗σy−σz⊗σz).$
###### (62)

Proof. Equation (61) is obtained from Eq. (54) by explicit calculation using lemma 69 and Eq. (60). Then, the validity of Eq. (62) is easily obtained from Eq. (55) using the relations

$Ux,πσx=σx,Ux,π|σy)=−|σy),Ux,πσz=−σz,$

and

$Uz,π/2σx=|σy),Uz,π/2|σy)=−|σx),Uz,π/2σz=σz.$

$▪$

Lemma 73. The states $Ψ,Φz,π2$, and $Ψz,π2$ have a standard representation of the form

$SΨ=12000001eiγ00e−iγ100000,SΦz,π2=12100λieiθ00000000−λie−iθ001,SΨz,π2=12000001μieiγ00−μie−iγ100000.$
###### (63)

with $θ$ as in corollary 46, $γ∈[0,2π)$ and $λ,μ∈{−1,1}$.

Proof. Let us start from $Ψ$. First, from Eq. (62) it is immediate to obtain $(a1⊗a1|Ψ)=(a2⊗a2|Ψ)=0$ and $(a1⊗a2|Ψ)=(a2⊗a1|Ψ)=1/2$. This gives the diagonal elements of $SΨ$. Then, using theorem 17 we obtain that $SΨ$ must be as in Eq. (63), for some value of $γ$. Let us now consider $Φz,π2$. Again, the diagonal elements of the matrix $SΦz,π2$ are obtained from Eq. (62), which in this case yields $(a1⊗a1|Φz,π2)=(a2⊗a2|Φz,π2)=1/2$ and $(a1⊗a2|Φz,π2)=(a2⊗a1|Φz,π2)=0$. Hence, by theorem 17 we must have

$SΦz,π2=12100eiθ′00000000e−iθ′001$

for some value of $θ′∈[0,2π)$. Now, denote by $A$ the effect such that $(A|Φ)=1$. We then have

$A|Φz,π2=TrEASΦz,π2=TrSΦSΦz,π2,A|Φz,π2=TrFATΦz,π2=TrTΦTΦz,π2=12$

having used theorem 18, corollary 44, and Eq. (61). Hence we have $Tr[SΦSΦz,π2]=1/2$, which implies $θ′=θ±π2$, as in Eq. (63). Finally, the same arguments can be used for $Ψz,π2$: The diagonal elements of $SΨz,π2$ are obtained from the relations $(a1⊗a1|Ψz,π2)=(a2⊗a2|Ψz,π2)=0$ and $(a1⊗a2|Ψz,π2)=(a2⊗a1|Ψz,π2)=1/2$, which follow from Eq. (62). This implies that the matrix $SΨz,π2$ has the form

$SΨz,π2=12000001eiγ′00e−iγ′100000$

for some $γ′∈[0,2π)$. The relation $Tr[SΨSΨz,π2]=Tr[TΨTΨz,π2]=1/2$ then implies $γ′=γ±π2$. $▪$

Let us now consider the four vectors $Σx(11,22),Σy(11,22),Σx(12,21),Σy(12,21)$ defined as follows:

$Σx(11,22)=2Φ−χA⊗χB−14σz⊗σz,Σy(11,22)=2Φz,π2−χA⊗χB−14σz⊗σz,Σx(12,21)=2Ψ−χA⊗χB+14σz⊗σz,Σx(12,21)=2Ψz,π2−χA⊗χB+14σz⊗σz.$
###### (64)

By the previous results, it is immediate to obtain the matrix representations of these vectors. In the tensor representation, using Eqs. (54) and (61), we obtain

$TΣx(11,22)=0001000000001000,TΣy(11,22)=000−i00000000i000,TΣx(12,21)=0000001001000000,TΣy(12,21)=000000−i00i000000,$

while in the standard representation, using Eqs. (46) and (63), we obtain

$SΣx(11,22)=000eiθ00000000e−iθ000,SΣy(11,22)=000−λieiθ00000000λie−iθ000,SΣx(12,21)=000000eiγ00e−iγ000000,SΣx(11,22)=000000−μieiγ00μie−γ000000,$

Comparing the two matrix representations we are now in position to prove the desired result.

Lemma 74. With a suitable choice of axes, one has $Sσk⊗σl=Tσk⊗σl$ for every $k,l=x,y$.

Proof. For the face $(11,22)$, using the freedom coming from Eqs. (43) and (44), we redefine the $x$ and $y$ axes so that $σx(11,22):=Σx(11,22)$ and $λσy(11,22):=Σy(11,22)$. In this way we have

$SΣk(11,22)=TΣk(11,22)∀k=x,y.$

Likewise, for the face $(12,21)$ we redefine the $x$ and $y$ axes so that $σx(12,21):=Σx(12,21)$ and $μσy(12,21):=Σy(12,21)$, so that we have

$SΣk(12,21)=TΣk(12,21)∀k=x,y.$

Finally, using Eqs. (55), (62), and (64) we have the relations

$σx⊗σx=Σx(11,22)+Σx(12,21),σy⊗σy=Σx(11,22)−Σx(12,21),σx⊗σy=Σy(11,22)−Σy(12,21),σy⊗σx=Σy(11,22)+Σy(12,21).$

Since $S$ and $T$ coincide on the right-hand side of each equality, they must also coincide on the left-hand side. $▪$

Theorem 19. With a suitable choice of axes, the standard representation coincides with the tensor representation, that is, $Sρ=Tρ$ for every $ρ∈St(AB)$.

Proof. Combining lemma 70 with lemma 74 we obtain that $S$ and $T$ coincide on the tensor products basis $B×B$, where $B={ϕ1,ϕ2,σx,σy}$. By linearity, $S$ and $T$ coincide on every state. $▪$

From now on, whenever we will consider a composite system $AB$ where $A$ and $B$ are two dimensional we will adopt the choice that guarantees that the standard representation coincides with the tensor representation.

###### D. Positivity of the matrices

In this paragraph we show that the states in our theory can be represented by positive matrices. This amounts to prove that for every system $A$, the set of states $St1(A)$ can be represented as a subset of the set of density matrices in dimension $dA$. This result will be completed in Sec. XIII E, where we will see that, in fact, every density matrix in dimension $dA$ corresponds to some state of $St1(A)$.

The starting point to prove positivity is the following.

Lemma 75. Let $A$ and $B$ be two-dimensional systems. Then, for every pure state $Ψ∈St(AB)$ one has $SΨ⩾0$.

Proof. Take an arbitrary vector $Z∈C2⊗C2$, written in the Schmidt form as $|Z⟩=∑n=12λn|vn⟩|wn⟩$. Introducing the unitaries $U,V$ such that $U|vn⟩=|n⟩$ and $V|wn⟩=|n⟩$ for every $n=1,2$ then we have $|Z⟩=(U†⊗V†)|W⟩$, where $|W⟩=∑n=12λn|n⟩|n⟩$. Therefore, we have

$⟨Z|SΨ|Z⟩=⟨W|S(U⊗V)Ψ|W⟩,$

where $U$ and $V$ are the reversible transformations defined by $SUρ=USρU†$ and $SVρ=VSρV†$, respectively ( $U$ and $V$ are physical transformations by virtue of corollary 30). Here we used the fact that the standard two-qubit representation coincides with the tensor representation and, therefore, $S(U⊗V)Ψ=(U⊗V)SΨ(U⊗V)†$. Denoting the pure state $(U⊗V)|Ψ)$ by $|Ψ′)$ we then have

$⟨Z|SΨ|Z⟩=λ1SΨ′11,11+λ2SΨ′22,22+2λ1λ2Re([SΨ′]11,22).$

Since by theorem 17 we have $[SΨ′]11,22=[SΨ′]11,11[SΨ′]22,22eiθ$, we conclude

$⟨Z|SΨ|Z⟩=λ1SΨ′11,11+λ2SΨ′22,22+2cosθλ1λ2[SΨ′]11,11[SΨ′]22,22⩾(λ1[SΨ′]11,11−λ2[SΨ′]22,22)2⩾0.$

Finally, since the vector $Z∈C2⊗C2$ is arbitrary, the matrix $SΨ$ is positive. $▪$

Corollary 47. Let $C$ be a system of dimension $dC=4$. Then, with a suitable choice of matrix representation the pure states of $C$ are represented by positive matrices.

Proof. The system $C$ is operationally equivalent to the composite system $AB$, where $dA=dB=2$. Let $U∈Transf(AB,C)$ be the reversible transformation implementing the equivalence. Now, we know that the states of $AB$ are represented by positive matrices. If we define the basis vectors for $C$ by applying $U$ to the basis for $AB$, then we obtain that the states of $C$ are represented by the same matrices representing the states of $AB$. $▪$

Corollary 48. Let $A$ be a system with $dA=3$. With a suitable choice of matrix representation, the matrix $Sϕ$ is positive for every pure state $ϕ∈St(A)$.

Proof. Let $C$ be a system with $dC=4$. By corollary 47 the states of $C$ are represented by positive matrices. Define the state $ω:=13(ϕ1+ϕ2+ϕ3)$, where ${ϕm}m=14$ are four perfectly distinguishable pure states. By the compression axiom, the face $Fω$ can be encoded in a three-dimensional system $D$ (corollary 40). In fact, since $D$ is operationally equivalent to $A$, the face $Fω$ can be encoded in $A$. Let $E∈Transf(D,A)$ and $D∈Transf(A,D)$ be the encoding and decoding operation, respectively. If we define the basis vectors for $A$ by applying $E$ to the basis vectors for the face $Fω$, then we obtain that the states of $A$ are represented by the same matrices representing the states in the face $Fω$. Since these matrices are positive, the thesis follows. $▪$

From now on, for every three-dimensional system $A$ we will choose the $x$ and $y$ axes so that $Sρ$ is positive for every $ρ∈St(A)$.

Corollary 49. Let $ϕ∈St1(A)$ be a pure state with $dA=3$. Then, the corresponding matrix $Sϕ$, given by

$Sϕ=p1p1p2eiθ12p1p3eiθ13p1p2e−iθ12p2p2p3eiθ23p1p3e−iθ13p2p3e−iθ23p3$
###### (65)

satisfies the property

$eiθ13=ei(θ12+θ23).$

Equivalently, $Sϕ=|v⟩⟨v|$, where $v∈C3$ is the vector given by $|v⟩:=(p1,p2e−iθ12,p3e−iθ13)T$.

Proof. The relation can be trivially satisfied when $pi=0$ for some $i∈{1,2,3}$. Hence let us assume $p1,p2,p3>0$. Computing the determinant of $Sϕ$ one obtains $det(Sϕ)=2p1p2p3[cos(θ12+θ23−θ13)−1]$. Since $Sϕ$ is positive, we must have $det(Sϕ)⩾0$. If $p1,p2,p3>0$ the only possibility is $θ13=θ12+θ23mod2π$. $▪$

Corollary 49 can be easily extended to systems of arbitrary dimension. To this purpose, we choose the $x$ and $y$ axes in such a way that the projection of every state $ρ∈St1(A)$ on a three-dimensional face is represented by a positive matrix.

Lemma 76. If $ϕ∈St1(A)$ is a pure state and $dA=N$, then $Sϕ=|v⟩⟨v|$, where $v∈CN$ is the vector given by $v:=(p1,p2e−iα2,⋯,pNe−iαN)T$ with $αi∈[0,2π)∀i=2,⋯,N$.

Proof. Consider a triple $V={p,q,r}⊆{1,⋯,N}$. Then the state $ΠV|ϕ)$ is proportional to a pure state of a three-dimensional system, whose representation $SΠVϕ$ is the $3×3$ square submatrix of $Sϕ$ with elements $[Sϕ]kl=pkpkeiθkl$, $(k,l)∈V×V$. Now, corollary 49 forces the relation $eiθpr=ei(θpq+θqr)$. Since this relation must hold for every choice of the triple $V={p,q,r}$, if we define $αp:=θp1$, then we have $eiθpq=ei(θp1+θ1q)=ei(θp1−θq1)=ei(αp−αq)$. It is then immediate to verify that $Sϕ=|v⟩⟨v|$, where $v=(p1,p2e−iα2,⋯,pNe−iαN)T$. $▪$

In conclusion, we proved the following.

Corollary 50. For every system $A$, the state space $St1(A)$ can be represented as a subset of the set of density matrices in dimension $dA$.

Proof. For every state $ρ∈St1(A)$ the matrix $Sρ$ is Hermitian by construction, with unit trace by corollary 42, and positive since it is a convex mixture of positive matrices. $▪$

###### E. Quantum theory in finite dimensions

Here we conclude our derivation of quantum theory by showing that every density matrix in dimension $dA$ corresponds to some state $ρ∈St1(A)$.

We already know from the superposition principle (lemma 16) that for every choice probabilities ${pi}i=1dA$ there is a pure state $ϕ∈St1(A)$ such that ${pi}i=1dA$ are the diagonal elements of $Sϕ$. Thus the set of density matrices corresponding to pure states contains at least one matrix of the form $Sϕ=|v⟩⟨v|$, with $|v⟩=(p1,p2e−iβ2,⋯,pdAe−iβdA)$. It only remains to prove that every possible choice of phases $βi∈[0,2π)$ corresponds to some pure state.

Recall that for a face $F⊆St1(A)$ we defined the group $GF,F⊥$ to be the group of reversible transformations $U∈GA$ such that $U=ωFIA$ and $U=ωF⊥IA$. We then have the following.

Lemma 77. Consider a system $A$ with $dA=N$. Let ${ϕi}i=1N⊂St1(A)$ be a maximal set of perfectly distinguishable pure states, $F$ be the face identified by $ωF=1/(N−1)∑i=1N−1ϕi$ and $F⊥$ its orthogonal face, identified by the state $ϕN$. If $U$ is a reversible transformation in $GF,F⊥$, then the action of $U$ is given by

$SUρ=USρU†U=0IN−1⋮00⋯0e−iβ,$
###### (66)

where $IN−1$ is the $(N−1)×(N−1)$ identity matrix and $β∈[0,2π)$.

Proof. Consider an arbitrary state $ρ∈St1(A)$ and its matrix representation

$Sρ=SΠFρff†SΠF⊥ρ,$

where $f∈CN−1$ is a suitable vector. Since $U=ωFIA$ and $U=ωF⊥IA$, we have that

$SUρ=SΠFρgg†SΠF⊥ρ,$

where $g∈CN−1$ is a suitable vector. To prove Eq. (66), we will now prove that $g=eiβf$ for some suitable $β∈[0,2π)$.

Let us start from the case $N=3$. Since $U|ϕi)=|ϕi)∀i=1,2,3$, we have $(ai|U=(ai|∀i=1,2,3$ (lemma 51). This implies that $U$ sends states in the face $F13$ to states in the face $F13$: indeed, for every $ρ∈F13$ one has $(a13|U|ρ)=(a13|ρ)=1$, which implies $Uρ∈F13$ (lemma 46). In other words, the restriction of $U$ to the face $F13$ is a reversible qubit transformation. Therefore, the action of $U$ on a state $ρ∈F13$ must be given by

$SUρ=ρ110ρ13eiβ000ρ31e−iβ0ρ33,$

for some $β∈[0,2π)$. Similarly, we can see that $U$ sends states in the face $F23$ to states in the face $F23$. Hence, for every $σ∈F23$ we have

$SUσ=0000σ22σ23eiβ′0σ32e−iβ′ρ33$

for some $β′∈[0,2π)$. We now show that $eiβ′=eiβ$. To see that, consider a generic state $ϕ∈St1(A)$, with the property that $pi=(ai|ϕ)>0$ for every $i=1,2,3$ (such state exists due to the superposition principle of theorem 16). Writing $Sϕ$ as in Eq. (65) we then have

$SUϕ=p1p1p2eiθ12p1p3ei(θ13+β)p1p2e−iθ12p2<$