Tag Archives: physics

Solving Newcomb’s paradox for classical and quantum predictors

A recent HN post reminded me of Newcomb’s paradox, which goes as follows (from Wiki):

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following:

  • Box A is clear, and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor:
    • If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
    • If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.

The player does not know what the predictor predicted or what box B contains while making the choice.

The question is whether the player should take both boxes, or only box B.

I first saw this problem many years ago but didn’t have a strong opinion. Now it seems clear that the controversy is about the definition of “reliable predictor”. This is usually left vague, leading to many unreliable philosophical and game-theory arguments. As usual, I will try to solve the problem using physics. Interestingly, the analysis is different for a classical versus quantum predictor, and also depends on the interpretation of quantum mechanics.

Classical predictor

Assume it is a classical supercomputer that, at prediction time, takes the state of the player and all the objects that they interact with until the decision. Call this state S_i. By running the physics forward, it arrives at either a state S_{AB} or S_B, corresponding to the decision to take both boxes or only box B, respectively. In this case, one should obviously take only box B.

Quantum predictor

In the quantum case, the initial wavefunction of the player/etc is \psi_i. The computer cannot measure the wavefunction directly due to the no-cloning theorem. Instead, one way to make the prediction is as follows. The decision to take both boxes corresponds to a set of orthonormal states \{\psi_{AB}\}, and likewise for \{\psi_B\}. These two sets are mutually orthonormal and form a complete basis, since there are only two choices. Given these sets, the computer can run Schrödinger’s equation back to prediction time to obtain the sets \{\psi_{ABi}\}=e^{i H t}\{\psi_{AB}\} and \{\psi_{Bi}\}=e^{i H t}\{\psi_B\}, respectively. These are also mutually orthonormal due to unitarity. At prediction time, it can measure the projection operator

\displaystyle P_{B}=\sum_a |\psi_{Bi}^a\rangle \langle\psi_{Bi}^a|.

The measurement gives 1 (take box B) with some probability p, and 0 (take both boxes) with probability 1-p. This collapses the player’s wavefunction to one of the states in \{\psi_{ABi}\} or \{\psi_{Bi}\}, which then evolves into a state in \{\psi_{AB}\} or \{\psi_B\}. Thus, from the predictor’s perspective, the predictor is always right.

The player models this measurement as the predictor becoming entangled with the player, so that the total wavefunction is something like

\displaystyle \sqrt{p}(\psi_{Bi}\otimes \psi_\text{predictB}) + \sqrt{1-p}(\psi_{ABi}\otimes\psi_\text{predictAB}).

If the player only makes a measurement at decision time, they will collapse the wavefunction to a state in \{\psi_{B}\} with probability p, or a state in \{\psi_{AB}\} with probability 1-p. We assume that this is the measurement basis since the player’s state should not become a superposition of (take B only) and (take both). The expected value is then simply:

\displaystyle E[p] = p B + (1-p)A = A+p(B-A)

where A=\text{\$1,000}, B=\text{\$1,000,000}. This is maximized at p=1, so the best decision is to take only box B, just as in the classical case.

Where we go from here depends on the interpretation of quantum mechanics. For many-worlds, there is only unitary evolution. The player ends up in the branch \psi_{B}\otimes \psi_\text{predictB} with probability p, giving the expected value above.

However, for Copenhagen-type interpretations where different observers can use different wavefunctions, the player can do better, since they are free to make any measurements between prediction and decision time, while the predictor assumes unitary evolution1. In fact, they can make the predictor predict (take B only) with certainty, while they actually take both with certainty. One way is as follows. Assume the player makes the decision based on measuring a qubit at decision time, where |\uparrow\rangle means take B only and |\downarrow\rangle means take both. The state of the qubit oscillates between |\uparrow\rangle and |\downarrow\rangle with period T, where T is the time between prediction and decision. At prediction time, assume the state is |\uparrow\rangle, so the predictor predicts (take B only). At time T/2, the player can make repeated measurements very quickly until decision time. The qubit stays in the |\downarrow\rangle state due to the quantum Zeno effect. Thus, at decision time, the player takes both boxes. The extra $1,000 can then contribute to funding the delicate and expensive equipment needed for the qubit.

We can take this one step further in some cases. For human players, the knowledge of the measurement protocol is classically encoded in the player’s brain in some way. If the supercomputer can decode this information instead of merely running the time evolution, they can also predict which measurements the player makes, and the probabilities of the subsequent results. We arrive back to the original case, where the best solution is to pick B only. This is not required by the postulates of quantum mechanics. The observer’s decision to make measurements on its state does not necessarily have to be encoded in its state itself.

Real predictor

In the real world, there are no such supercomputers, and no entity would risk $1,000,000 on a meaningless game. The best answer is to take both boxes.


1 In practice, a human’s measurements of their own state occur long after decoherence, so they have no control of their wavefunction in this way. However, if we are assuming all-powerful supercomputers, we may as well go all the way.

Fundamentals of classical mechanics, or why F = ma

Despite its simplicity, classical mechanics is not taught well in the typical physics curriculum. This is unfortunate because the general philosophy of constructing Lagrangians based on symmetries underlies all of modern physics. In this article, I explain basic Lagrangian mechanics in a systematic way starting from fundamental physical principles. It basically follows Landau and Lifshitz Vol. 1 but ties up some loose ends.

Principle of stationary action

Classical mechanics describes the motion of objects modeled as point particles. First, consider a single particle in empty space. At any given time, it has a position \vec x(t) and velocity \vec v(t)=\frac{d\vec{x}}{dt}.

Define a quantity S_{if}\{\vec x(t)\} that depends on the path of the particle \vec x(t) from time t_i to t_f. The principle of stationary action, or action principle, states that the path the particle actually takes is one where the action is stable to small perturbations in the path \vec x(t) \rightarrow \vec x(t) + \vec{\delta x}(t).

To elaborate, consider dividing the time interval from t_i to t_f into N segments, and take N\rightarrow \infty in the end. You may think of S_{if} as a function of many variables \{\vec{x}(t_i),t_i,\vec{x}(t_i+\Delta t),t_i+\Delta t,\cdots, \vec{x}(t_f), t_f\}, where \Delta t = (t_f-t_i)/N. (Note that the velocity \vec{v}(t) = \frac{\vec{x}(t+\Delta t)-\vec{x}(t)}{\Delta t}, so it is not an independent variable here.) Such a “function of a function” is called a functional. The principle of stationary action is then \frac{\delta S_{12}}{\delta x_i(t)}=0, i.e. the partial derivative of S_{12} with respect to any component of the position x_i at any time t is zero. The \delta symbol is generally used instead of \partial for functional derivatives.

Finally, the action principle only applies to perturbations that are zero at the boundaries: \vec{\delta x}(t_i) = \vec{\delta x}(t_f) = 0. This will become important later.

The Lagrangian

Consider the action S_{12} for time t_1 to t_2, and the action S_{34} for time t_3 to t_4, with t_1 < t_2 < t_3 < t_4. We require locality in time, meaning that a perturbation in the first interval only affects S_{12} and not S_{34}. Also, we assume additivity of the action: S_{12}+S_{23}=S_{13}. These conditions imply that S_{12} can be written as an integral from t_1 to t_2 of some quantity: S_{12}=\int_{t_1}^{t_2} \mathcal{L}(\vec{x}(t),\vec{v}(t), t). \mathcal{L}(\vec{x}(t),\vec{v}(t), t) is known as the Lagrangian. In general, it may depend on the position and velocity at time t, as well as the time t itself1.

Note that we may add a total time derivative \frac{df}{dt}(\vec{x},t) to the Lagrangian without affecting the principle of stationary action. Such a term produces the action:

\displaystyle\int_{t_i}^{t_f} dt\frac{df}{dt}(\vec{x},t) = f(\vec{x}(t_f), t_f)-f(\vec{x}(t_i), t_i)

by the fundamental theorem of calculus. The perturbation \vec{\delta x}(t) is zero at the boundaries by definition, so does not affect this action.

Let us now derive the form of the Lagrangian based on some other fundamental principles:

Homogeneity of space and time. No point in space or time is any different from any other, so the Lagrangian cannot depend on \vec{x} or t explicitly.

Isotropy of space. No direction in space is different from any other, so the Lagrangian can only depend on the magnitude (squared) of the velocity \vec{v}(t)^2.

Galilean invariance. The theory should be invariant under shifts by a constant velocity, \vec{x}\rightarrow \vec{x}+\vec{v}_0 t. In other words, there is no universal stationary frame of reference. Taking the time derivative, this is \vec{v}\rightarrow \vec{v}+\vec{v}_0. To first order in \vec{v}_0, the Lagrangian changes as

\displaystyle\mathcal{L}(\vec{v}^2)\rightarrow \mathcal{L}(\vec{v}^2+2\vec{v}\cdot \vec{v}_0) = \mathcal{L}(\vec{v}^2)+2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0

The term 2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0 will not affect the physics if it is a total time derivative of the form above. This only occurs if \frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) is a constant. Call this constant \frac{1}{2} m. Thus, the Lagrangian for a single particle in free space is: \mathcal{L} = \frac{1}{2} m \vec{v}^2. The constant m is, of course, the mass.

To summarize, we derived the unique action and Lagrangian (up to a total time derivative) for a single particle from the following postulates:

  1. Locality in time
  2. Additivity of the action
  3. Homogeneity of space and time
  4. Isotropy of space
  5. Galilean invariance

Multiple particles

Now consider the n-particle case. The Lagrangian may generally depend on all the positions and velocities \vec{x}_1, \vec{v}_1, \cdots, \vec{x}_n, \vec{v}_n. Following the postulates above, it must take the form2:

\displaystyle \mathcal{L} = \left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

where the function U(\Delta \vec{x}_{ij}) depends on all the separations between the particles \{\Delta\vec{x}_{12} = \vec{x}_1-\vec{x}_2, \Delta\vec{x}_{13} =\vec{x}_1-\vec{x}_3, \cdots\}.

Euler-Lagrange equations

Let us now apply the principle of stationary action to the action:

\displaystyle S=\int dt\left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

Plugging in the variation \vec{x}_i\rightarrow \vec{x}_i+\vec{\delta x}_i for particle i, and expanding to first order in \vec{\delta x}_i, we get:

\displaystyle S\rightarrow S+ \int dt\left(m_i \vec{v}_i\cdot \vec{\delta v}_i - \nabla_i U \cdot \vec{\delta x}_i\right)

where \nabla_i U is the gradient of U with respect to \vec{x}_i. Using \vec{\delta v}=\frac{d}{dt}\vec{\delta x}, we can integrate the first term by parts, discarding the boundary term m_i \vec{v}_i\cdot \vec{\delta x}_i since \vec{\delta x}_i= 0 at the boundaries. We obtain:

\displaystyle \frac{\delta S}{\delta \vec{x}_i(t)}=-m_i \vec{a}_i(t)-\nabla_i U(t) = 0

where \vec{a} = \frac{d\vec{v}}{dt}. The equations obtained using the action principle are known as Euler-Lagrange equations or equations of motion. In this case, we have found Newton’s law for a conservative potential:

\displaystyle \vec{F} = -\nabla_i U=m_i \vec{a}_i

Beyond classical mechanics

Finally, it is interesting to see how the postulates above are modified in quantum and relativistic theories.

  1. Principle of stationary action. In quantum physics, the particle takes all paths instead of only the classical one! The quantum amplitude is given by summing up e^{i S\{x\}} over all paths. This is known as a path integral.
  2. Locality in time gets promoted to locality in space and time in field theory.
  3. Additivity of the action remains the same.
  4. Homogeneity of space and time remains the same.
  5. Isotropy of space remains the same.
  6. Galilean invariance is promoted to Lorentz invariance in relativity. Lorentz transformations relate space and time.

In modern theories, there are often additional symmetry principles that constrain the Lagrangian, such as gauge invariance and conformal invariance.


1 It also cannot depend on higher time derivatives due to the Ostrogradsky instability.

2 A term like \vec{v}_i\cdot \vec{v}_j with i \neq j is possible, but would imply that particles infinitely far away can affect each other, violating common sense (or, if you like, the cluster decomposition principle).