All posts by pekingo

Solving Newcomb’s paradox for classical and quantum predictors

A recent HN post reminded me of Newcomb’s paradox, which goes as follows (from Wiki):

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following:

  • Box A is clear, and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor:
    • If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
    • If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.

The player does not know what the predictor predicted or what box B contains while making the choice.

The question is whether the player should take both boxes, or only box B.

I first saw this problem many years ago but didn’t have a strong opinion. Now it seems clear that the controversy is about the definition of “reliable predictor”. This is usually left vague, leading to many unreliable philosophical and game-theory arguments. As usual, I will try to solve the problem using physics. Interestingly, the analysis is different for a classical versus quantum predictor, and also depends on the interpretation of quantum mechanics.

Classical predictor

Assume it is a classical supercomputer that, at prediction time, takes the state of the player and all the objects that they interact with until the decision. Call this state S_i. By running the physics forward, it arrives at either a state S_{AB} or S_B, corresponding to the decision to take both boxes or only box B, respectively. In this case, one should obviously take only box B.

Quantum predictor

In the quantum case, the initial wavefunction of the player/etc is \psi_i. The computer cannot measure the wavefunction directly due to the no-cloning theorem. Instead, one way to make the prediction is as follows. The decision to take both boxes corresponds to a set of orthonormal states \{\psi_{AB}\}, and likewise for \{\psi_B\}. These two sets are mutually orthonormal and form a complete basis, since there are only two choices. Given these sets, the computer can run Schrödinger’s equation back to prediction time to obtain the sets \{\psi_{ABi}\}=e^{i H t}\{\psi_{AB}\} and \{\psi_{Bi}\}=e^{i H t}\{\psi_B\}, respectively. These are also mutually orthonormal due to unitarity. At prediction time, it can measure the projection operator

\displaystyle P_{B}=\sum_a |\psi_{Bi}^a\rangle \langle\psi_{Bi}^a|.

The measurement gives 1 (take box B) with some probability p, and 0 (take both boxes) with probability 1-p. This collapses the player’s wavefunction to one of the states in \{\psi_{ABi}\} or \{\psi_{Bi}\}, which then evolves into a state in \{\psi_{AB}\} or \{\psi_B\}. Thus, from the predictor’s perspective, the predictor is always right.

The player models this measurement as the predictor becoming entangled with the player, so that the total wavefunction is something like

\displaystyle \sqrt{p}(\psi_{Bi}\otimes \psi_\text{predictB}) + \sqrt{1-p}(\psi_{ABi}\otimes\psi_\text{predictAB}).

If the player only makes a measurement at decision time, they will collapse the wavefunction to a state in \{\psi_{B}\} with probability p, or a state in \{\psi_{AB}\} with probability 1-p. We assume that this is the measurement basis since the player’s state should not become a superposition of (take B only) and (take both). The expected value is then simply:

\displaystyle E[p] = p B + (1-p)A = A+p(B-A)

where A=\text{\$1,000}, B=\text{\$1,000,000}. This is maximized at p=1, so the best decision is to take only box B, just as in the classical case.

Where we go from here depends on the interpretation of quantum mechanics. For many-worlds, there is only unitary evolution. The player ends up in the branch \psi_{B}\otimes \psi_\text{predictB} with probability p, giving the expected value above.

However, for Copenhagen-type interpretations where different observers can use different wavefunctions, the player can do better, since they are free to make any measurements between prediction and decision time, while the predictor assumes unitary evolution1. In fact, they can make the predictor predict (take B only) with certainty, while they actually take both with certainty. One way is as follows. Assume the player makes the decision based on measuring a qubit at decision time, where |\uparrow\rangle means take B only and |\downarrow\rangle means take both. The state of the qubit oscillates between |\uparrow\rangle and |\downarrow\rangle with period T, where T is the time between prediction and decision. At prediction time, assume the state is |\uparrow\rangle, so the predictor predicts (take B only). At time T/2, the player can make repeated measurements very quickly until decision time. The qubit stays in the |\downarrow\rangle state due to the quantum Zeno effect. Thus, at decision time, the player takes both boxes. The extra $1,000 can then contribute to funding the delicate and expensive equipment needed for the qubit.

We can take this one step further in some cases. For human players, the knowledge of the measurement protocol is classically encoded in the player’s brain in some way. If the supercomputer can decode this information instead of merely running the time evolution, they can also predict which measurements the player makes, and the probabilities of the subsequent results. We arrive back to the original case, where the best solution is to pick B only. This is not required by the postulates of quantum mechanics. The observer’s decision to make measurements on its state does not necessarily have to be encoded in its state itself.

Real predictor

In the real world, there are no such supercomputers, and no entity would risk $1,000,000 on a meaningless game. The best answer is to take both boxes.


1 In practice, a human’s measurements of their own state occur long after decoherence, so they have no control of their wavefunction in this way. However, if we are assuming all-powerful supercomputers, we may as well go all the way.

Fundamentals of classical mechanics, or why F = ma

Despite its simplicity, classical mechanics is not taught well in the typical physics curriculum. This is unfortunate because the general philosophy of constructing Lagrangians based on symmetries underlies all of modern physics. In this article, I explain basic Lagrangian mechanics in a systematic way starting from fundamental physical principles. It basically follows Landau and Lifshitz Vol. 1 but ties up some loose ends.

Principle of stationary action

Classical mechanics describes the motion of objects modeled as point particles. First, consider a single particle in empty space. At any given time, it has a position \vec x(t) and velocity \vec v(t)=\frac{d\vec{x}}{dt}.

Define a quantity S_{if}\{\vec x(t)\} that depends on the path of the particle \vec x(t) from time t_i to t_f. The principle of stationary action, or action principle, states that the path the particle actually takes is one where the action is stable to small perturbations in the path \vec x(t) \rightarrow \vec x(t) + \vec{\delta x}(t).

To elaborate, consider dividing the time interval from t_i to t_f into N segments, and take N\rightarrow \infty in the end. You may think of S_{if} as a function of many variables \{\vec{x}(t_i),t_i,\vec{x}(t_i+\Delta t),t_i+\Delta t,\cdots, \vec{x}(t_f), t_f\}, where \Delta t = (t_f-t_i)/N. (Note that the velocity \vec{v}(t) = \frac{\vec{x}(t+\Delta t)-\vec{x}(t)}{\Delta t}, so it is not an independent variable here.) Such a “function of a function” is called a functional. The principle of stationary action is then \frac{\delta S_{12}}{\delta x_i(t)}=0, i.e. the partial derivative of S_{12} with respect to any component of the position x_i at any time t is zero. The \delta symbol is generally used instead of \partial for functional derivatives.

Finally, the action principle only applies to perturbations that are zero at the boundaries: \vec{\delta x}(t_i) = \vec{\delta x}(t_f) = 0. This will become important later.

The Lagrangian

Consider the action S_{12} for time t_1 to t_2, and the action S_{34} for time t_3 to t_4, with t_1 < t_2 < t_3 < t_4. We require locality in time, meaning that a perturbation in the first interval only affects S_{12} and not S_{34}. Also, we assume additivity of the action: S_{12}+S_{23}=S_{13}. These conditions imply that S_{12} can be written as an integral from t_1 to t_2 of some quantity: S_{12}=\int_{t_1}^{t_2} \mathcal{L}(\vec{x}(t),\vec{v}(t), t). \mathcal{L}(\vec{x}(t),\vec{v}(t), t) is known as the Lagrangian. In general, it may depend on the position and velocity at time t, as well as the time t itself1.

Note that we may add a total time derivative \frac{df}{dt}(\vec{x},t) to the Lagrangian without affecting the principle of stationary action. Such a term produces the action:

\displaystyle\int_{t_i}^{t_f} dt\frac{df}{dt}(\vec{x},t) = f(\vec{x}(t_f), t_f)-f(\vec{x}(t_i), t_i)

by the fundamental theorem of calculus. The perturbation \vec{\delta x}(t) is zero at the boundaries by definition, so does not affect this action.

Let us now derive the form of the Lagrangian based on some other fundamental principles:

Homogeneity of space and time. No point in space or time is any different from any other, so the Lagrangian cannot depend on \vec{x} or t explicitly.

Isotropy of space. No direction in space is different from any other, so the Lagrangian can only depend on the magnitude (squared) of the velocity \vec{v}(t)^2.

Galilean invariance. The theory should be invariant under shifts by a constant velocity, \vec{x}\rightarrow \vec{x}+\vec{v}_0 t. In other words, there is no universal stationary frame of reference. Taking the time derivative, this is \vec{v}\rightarrow \vec{v}+\vec{v}_0. To first order in \vec{v}_0, the Lagrangian changes as

\displaystyle\mathcal{L}(\vec{v}^2)\rightarrow \mathcal{L}(\vec{v}^2+2\vec{v}\cdot \vec{v}_0) = \mathcal{L}(\vec{v}^2)+2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0

The term 2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0 will not affect the physics if it is a total time derivative of the form above. This only occurs if \frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) is a constant. Call this constant \frac{1}{2} m. Thus, the Lagrangian for a single particle in free space is: \mathcal{L} = \frac{1}{2} m \vec{v}^2. The constant m is, of course, the mass.

To summarize, we derived the unique action and Lagrangian (up to a total time derivative) for a single particle from the following postulates:

  1. Locality in time
  2. Additivity of the action
  3. Homogeneity of space and time
  4. Isotropy of space
  5. Galilean invariance

Multiple particles

Now consider the n-particle case. The Lagrangian may generally depend on all the positions and velocities \vec{x}_1, \vec{v}_1, \cdots, \vec{x}_n, \vec{v}_n. Following the postulates above, it must take the form2:

\displaystyle \mathcal{L} = \left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

where the function U(\Delta \vec{x}_{ij}) depends on all the separations between the particles \{\Delta\vec{x}_{12} = \vec{x}_1-\vec{x}_2, \Delta\vec{x}_{13} =\vec{x}_1-\vec{x}_3, \cdots\}.

Euler-Lagrange equations

Let us now apply the principle of stationary action to the action:

\displaystyle S=\int dt\left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

Plugging in the variation \vec{x}_i\rightarrow \vec{x}_i+\vec{\delta x}_i for particle i, and expanding to first order in \vec{\delta x}_i, we get:

\displaystyle S\rightarrow S+ \int dt\left(m_i \vec{v}_i\cdot \vec{\delta v}_i - \nabla_i U \cdot \vec{\delta x}_i\right)

where \nabla_i U is the gradient of U with respect to \vec{x}_i. Using \vec{\delta v}=\frac{d}{dt}\vec{\delta x}, we can integrate the first term by parts, discarding the boundary term m_i \vec{v}_i\cdot \vec{\delta x}_i since \vec{\delta x}_i= 0 at the boundaries. We obtain:

\displaystyle \frac{\delta S}{\delta \vec{x}_i(t)}=-m_i \vec{a}_i(t)-\nabla_i U(t) = 0

where \vec{a} = \frac{d\vec{v}}{dt}. The equations obtained using the action principle are known as Euler-Lagrange equations or equations of motion. In this case, we have found Newton’s law for a conservative potential:

\displaystyle \vec{F} = -\nabla_i U=m_i \vec{a}_i

Beyond classical mechanics

Finally, it is interesting to see how the postulates above are modified in quantum and relativistic theories.

  1. Principle of stationary action. In quantum physics, the particle takes all paths instead of only the classical one! The quantum amplitude is given by summing up e^{i S\{x\}} over all paths. This is known as a path integral.
  2. Locality in time gets promoted to locality in space and time in field theory.
  3. Additivity of the action remains the same.
  4. Homogeneity of space and time remains the same.
  5. Isotropy of space remains the same.
  6. Galilean invariance is promoted to Lorentz invariance in relativity. Lorentz transformations relate space and time.

In modern theories, there are often additional symmetry principles that constrain the Lagrangian, such as gauge invariance and conformal invariance.


1 It also cannot depend on higher time derivatives due to the Ostrogradsky instability.

2 A term like \vec{v}_i\cdot \vec{v}_j with i \neq j is possible, but would imply that particles infinitely far away can affect each other, violating common sense (or, if you like, the cluster decomposition principle).

Physics textbooks for self-study

Here are some physics textbooks that I’ve read over the years. Each textbook is rated from 1-5 Diracs (Paul_Dirac,_1933.jpg) on quality for self-study. Most topics are divided into (basic) and (advanced).

Screen Shot 2020-01-04 at 11.43.27 AM
Figure 1. Areas of physics (biased toward high-energy theory). Special relativity and electromagnetism can be learned separately but complement each other. “Weak prerequisites” are math subjects that can usually be learned as you go along.

Tips for self-study:

  • Shorter is better when it comes to textbooks. The problem with self-study is missing the forest for the trees. Most textbooks can give you the details, but there is no one to explain how to fit the information in your head in a compact and memorable way. Shorter books are usually better for this. The flip side is that shorter books are harder to understand if you have no past exposure. Start by reading parts of a standard textbook to get the basics, then go back.
  • Do enough exercises. But don’t feel the need to do every single one before moving on, even if you are a little confused. It can be more efficient to just keep going, since physics is interconnected and the new material often clarifies the old.
  • Write notes in the margins of any confusing aspects of derivations or errata you discover. These will undoubtedly help you when you revisit them years later.

Personal (controversial) opinions:

  • Avoid mathematical physics-oriented books. When I started out, I thought more rigor can never hurt. But if you are interested in physics, learn physics. Math books often dwell on excessive formalism that is irrelevant for physics at the end of the day.
  • Amazon ratings are useless. Unless they’re really terrible, most books will have very good ratings. I suspect most reviewers used the book for a class, are already experts on the subject, or simply want to look smart. 🙃

Quantum mechanics (basic)

Griffiths, Introduction to Quantum Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

I start by contradicting my own advice about shorter books. 😀  This is a long but very readable book that is even worth reading from cover to cover. There is a reason this is the standard textbook in many places. One tends to forget how much it covers: statistical mechanics, spontaneous and stimulated emission, band structure, WKB approximation… Not in great detail, but often enough.

Quantum mechanics (advanced)

Weinberg, Lectures on Quantum Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Weinberg’s books are known for their slow and systematic presentation. If you’re in a rush, my recommendation is to just read chapters 3 and 4, which contain the essentials of quantum mechanics and spin and are relatively self-contained.

Linear algebra (basic)

Strang, Introduction to Linear Algebra (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Actually, I suggest the lectures instead of the book. One relaxing 45-minute lecture a day and you’ll know linear algebra in a month.

Classical mechanics (advanced)

Landau and Lifshitz, Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

The Russian school excels at explaining things deeply and simply. The first two chapters contain the best exposition of classical mechanics there is. In my experience, even professional physicists are often confused by some foundational topics that are explained here. (For example, where does the Lagrangian \frac{1}{2}mv^2 come from? Answer: Homogeneity+isotropy of space, and Galilean invariance. Together with the principle of stationary action, this leads to F=ma.) If you’ve never seen a Lagrangian before, start with one of the numerous intros, like this one.

Special relativity (SR)/Electromagnetism (advanced)

Landau and Lifshitz, The Classical Theory of Fields (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Amusingly, this does not actually cover the simplest classical field theories (scalar fields) since the only relevant classical fields in practice are the electromagnetic and gravitational. Chapters 1-4 are an excellent exposition of SR and how E&M fits into it, while chapters 10-12 are a decent introduction to general relativity that complements other texts.

General relativity (GR)

Dirac, General Theory of Relativity (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Who said GR is hard to understand? This pamphlet by the big man himself weighs in at only 69 pages. Unlike most books, it explains curved spacetime as a surface embedded in a higher dimensional space with flat metric. In my view, this is the most intuitive way to understand it. Among other things, it leads to the covariant derivative as the projection of the directional derivative onto the tangent space, a very pleasing interpretation of an otherwise confusing concept.

No exercises though. So as an introduction, you will want:

Zee, Einstein Gravity in a Nutshell (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This is the book I wish I had when starting GR. Zee is one of the most gifted physics expositors of our time. Unfortunately, it is rather long, so I would recommend first reading enough of this one to understand Dirac, then going back to this one for special topics.

Carroll, Spacetime and Geometry: An Introduction to General Relativity (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This was my first exposure to GR. I got through about chapter 3 before getting confused and stopping. This is one of those mathematical physics books I mentioned above, with a lot of formalism surrounding manifolds, tensors, and differential forms at the outset. It is good to know eventually, but not what you need as an introduction. I suppose it would make a good reference, but Zee’s book also serves well in this regard.

Quantum field theory

The subjects above are all well-established and have a fairly defined “core”. On the other hand, QFT is an evolving field with a sprawling mess of important results. Each textbook emphasizes different aspects, so you will need multiple books.

Zee, Quantum Field Theory in a Nutshell (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This was my first and favorite QFT book. Other textbooks have more detail, but none will make you fall in love with the subject like this one. Just get it and enjoy the magic of the path integral.

Schwartz, Quantum Field Theory and the Standard Model (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This is a very thorough textbook, perhaps the modern successor to the classic Peskin and Schroeder. I particularly enjoyed the bottom-up construction of spin 1 and 2 Lagrangians in chapter 8. One criticism is that many calculations are rather clunky and involved. For example, scalar QED is heavily used, which is conceptually simpler but involves more diagrams than spinor QED. I prefer Zee’s approach of just starting with spinor QED.

(Also, his notation with all indices on the same level bugs me…)

Srednicki, Quantum Field Theory

No rating for this one since I haven’t read it in much detail. The first chapter (“Attempts at relativistic quantum mechanics”) is an excellent motivation for QFT. The chapters are short and to the point. If I could start over, I would probably read this one concurrently with Zee.

Group theory

Zee, Group Theory in a Nutshell for Physicists (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

For those like me that get bored to death reading pure math textbooks, Zee’s usual colloquial style makes even classifying representations of finite groups exciting. Not absolutely necessary to read if you’re in a hurry to learn more physics, but still a joy.


Advanced resources

Once you have a grasp of the areas above, additional topics can be learned without having to rearrange your entire worldview (with the possible exception of string theory). Here are some of my favorite advanced resources.

Shifman, Advanced Topics in Quantum Field Theory

Despite the title, this book focuses on simple explanations of modern topics without arduous derivations. Some interesting results cannot be found elsewhere, e.g. that domain walls antigravitate!

Terning, Modern Supersymmetry: Dynamics and Duality

This is a compact volume on supersymmetric field theory. The first three chapters are quite good, but I found some explanations in later chapters hard to understand. A better intro to Young tableaux is found here.

Polchinski, String Theory Vols. 1 and 2

This labor of love by the father of D-branes himself covers pre-AdS/CFT string theory. It seems to be the standard textbook on the subject, for good reason. The explanations are clear and the text contains many invaluable exercises. His passion for the topic is evident throughout.

Hartman, Lecture notes on quantum gravity and black holes

Not a textbook, but a good set of lecture notes by Tom Hartman. Explores many contemporary topics that have yet to make it into any textbooks I know of. Many useful exercises are included.

Notes on gravity as a gauge theory

Gravity has often been called a gauge theory of the Poincaré or Lorentz group. Here, I develop general relativity in direct analogy to Yang-Mills theory, avoiding geometry entirely1. None of this is original, but I have tried to simplify the presentation compared to the literature, where the similarities and differences between the two theories are often unclear.

Gauge fields and field strengths

The Poincaré algebra is:

[P_a, P_b] = 0

[P_a, M_{bc}]=\eta_{ab}P_c - \eta_{ac} P_b

[M_{ab}, M_{cd}] = \eta_{ad}M_{bc}+\eta_{bc}M_{ad} - \eta_{bd}M_{ac}-\eta_{ac}M_{bd}

Roman letters a,b, \cdots are gauge indices, while Greek letters \mu, \nu, \cdots are coordinate indices. We use “mathematician’s convention” for the generators where the i is absorbed: T_{math}=i T_{physics}. We proceed just as in Yang-Mills theory, taking the Poincaré group as the gauge group. It has 10 generators: 4 translations P_a and 6 rotations/boosts M_{ab}.

Introduce the covariant derivative:

\displaystyle D_\mu = \partial_\mu - e_\mu^a P_a - \frac{1}{2}\omega^{ab}_\mu M_{ab}

where e_\mu^a (the vielbein) and \omega^{ab}_\mu (the spin connection) are the gauge fields associated with translations and rotations, respectively. Note the units: P_a has unit 1, so e_\mu^a is unitless, while M_{ab} is unitless, so \omega^{ab}_\mu has unit 1. We can take \omega^{ab}_\mu to be antisymmetric in ab since M_{ab} is antisymmetric. The field strengths are found in the usual way:

\begin{aligned} F_{\mu\nu}&=D_\mu D_\nu - D_\nu D_\mu \\ &= -C^a_{\mu\nu}P_a -\frac{1}{2}R^{ab}_{\mu\nu}M_{ab} \end{aligned}

where we have defined the field strengths C^a_{\mu\nu} (the torsion) and R^{ab}_{\mu\nu} (the curvature tensor).
We obtain:

C^a_{\mu\nu}=\partial_\mu e^a_\nu - \partial_\nu e^a_\mu - \omega^{a}_{\mu b} e^b_\nu + \omega^{a}_{\nu b} e^b_\mu

R^{ab}_{\mu\nu}=\partial_\mu \omega_\nu^{ab} - \partial_\nu \omega_\mu^{ab}-\omega_\mu^{ac}\omega_{\nu c}^{\;\;\;\;b} + \omega_\nu^{ac}\omega_{\mu c}^{\;\;\;\;b}

As usual, we raise and lower indices using \eta_{ab} and \eta^{ab}.

General relativity is obtained by setting the torsion C^a_{\mu\nu}=0. Certainly, theories with torsion have been extensively considered, but we will not do so here. Experimental data have not ruled out theories involving both torsion and curvature. However, the bottom-up construction of the Lagrangian of an interacting massless spin-2 particle produces general relativity2.

This constraint allows us to solve for the spin connection in terms of the vielbein. After some calculation (e.g. listing out all possible terms and matching coefficients), the answer is:

\displaystyle \omega_\mu^{ab}=\frac{1}{2}(e^{\rho b}\partial_\mu e_\rho^a-e^{\rho a}\partial_\mu e_\rho^b+ e^{\rho a} e^{\sigma b} \partial_\rho g_{\mu\sigma}-e^{\rho b}e^{\sigma a}\partial_\rho g_{\mu\sigma} )

where g_{\mu\nu}=e_\mu^a \eta_{ab} e_\nu^b.

Representations and Lagrangians

Just as in Yang-Mills theory, the Poincaré group here acts as an internal symmetry group. Fields transform as a finite-dimensional representation of the Lorentz algebra, and transform trivially under translations3: P_a=0. This has an important consequence for constructing Lagrangians. Recall that the gauge field A_\mu(x)=A^a_\mu(x) T^a in Yang-Mills theory transforms as

A_\mu\rightarrow U A_\mu U^{-1} + (\partial_\mu U) U^{-1}

under a gauge transformation U(x). The (\partial_\mu U) U^{-1} is required to cancel out the (\partial_\mu U)\phi(x) in the transformation of \partial_\mu\phi(x). However, since P_a=0, this additional term is not needed. e^a_\mu is already gauge-covariant and can be placed directly in the Lagrangian.

The simplest term is:

\displaystyle \mathcal{S}_\Lambda = \frac{\Lambda}{4!} \int \epsilon_{abcd}e^a e^b e^c e^d

where e^a=e^a_\mu dx^\mu is a 1-form and \epsilon_{abcd} is the totally antisymmetric symbol4. This is the cosmological constant. It is equivalent to the standard form \Lambda\int d^4 x \sqrt{-g}.

On the other hand, the spin connection \omega^{ab}_\mu does show up in the gauge transformation, so we must use the field strength R^{ab}_{\mu\nu} in the Lagrangian. The next simplest term is then:

\displaystyle \mathcal{S}_{EH} = \frac{M_{Pl}^2}{3}\int \epsilon_{abcd} e^a e^b R^{cd}

where R^{cd}=R^{cd}_{\mu\nu}dx^\mu dx^\nu is a 2-form. This is the Einstein-Hilbert action. Unlike Yang-Mills, we are permitted a term that is only linear in the field strength R^{cd}.

Coupling to matter fields

Flat-space Lagrangians contain terms with global Lorentz indices, such as \partial_\mu \varphi and A_\mu. We would like these to transform under the local Lorentz group with indices a, b, \cdots. The only object that can switch between global and local indices is e_\mu^a, or its inverse, e^\mu_a. Thus, the general prescription for coupling a flat-space Lagrangian to gravity is:

  1. Contract all tensors with e_\mu^a or e^\mu_a.
  2. Make flat-space invariants use local indices: \eta_{\mu\nu}\rightarrow \eta_{ab}, \epsilon_{\mu\nu\rho\sigma}\rightarrow \epsilon_{abcd}.
  3. Use covariant derivatives: \partial_\mu\rightarrow \partial_\mu-\frac{1}{2}\omega_\mu^{ab} M_{ab}.

Note that this even works on the volume form d^4 x, producing the familiar invariant measure d^4 x \sqrt{-g}:

\displaystyle d^4 x = \frac{1}{4!}\epsilon_{\mu\nu\rho\sigma} dx^\mu dx^\nu dx^\rho dx^\sigma \rightarrow \frac{1}{4!} \epsilon_{abcd} e^a_\mu e^b_\nu e^c_\rho e^d_\sigma dx^\mu dx^\nu dx^\rho dx^\sigma

For example, a scalar field coupled to gravity has the action:

\displaystyle \mathcal{S} = \frac{1}{2\cdot 4!}\int \epsilon_{bcdf} e^b e^c e^d e^f (e_a^\mu e^{\nu a} \partial_\mu\varphi\partial_\nu\varphi - m^2 \varphi^2)

An advantage of the vielbein formalism is that spinors can be coupled to gravity. For Dirac spinors, the Dirac matrices should also be converted to local indices \gamma^\mu\rightarrow \gamma^a, since they satisfy the Clifford algebra \{\gamma^a,\gamma^b\}=2\eta^{ab}. The Lagrangian for a massless fermion becomes:

\mathcal{L}=i\bar\Psi \gamma^a e_a^\mu (\partial_\mu-\omega^{bc}_\mu M_{bc})\Psi

where

\displaystyle M_{ab}=S_{ab}=\frac{1}{4}[\gamma_a,\gamma_b]


1 This is ironic from a historical perspective, since Yang and Mills were inspired by general relativity. Of course, in physics, there are many ways to skin a cat.

2 Schwartz, Matthew. Quantum Field Theory and the Standard Model, Ch. 8.

3 Thus, you could say gravity is the gauge theory of the Lorentz group instead. However, we had to introduce the vielbein as part of the covariant derivative in order to get the correct theory. So there is a slight wrinkle in the analogy.

4 Unlike Yang-Mills theory, we cannot write the Lagrangian using the “abbreviated” fields e=e^a_\mu P_a dx^\mu. In fact, e e vanishes due to [P_a,P_b]=0.

Quantum mechanics explained

After being a strong believer in the many-worlds interpretation of quantum mechanics for years, I have now completely changed my mind. Many-worlds is seriously flawed, and the good old Copenhagen interpretation is not so bad.

Specifically, the correct interpretation of quantum mechanics is the Von Neumann-Wigner interpretation, a flavor of Copenhagen that puts the Heisenberg cut at the observer’s consciousness. The orthodox Copenhagen interpretation, which allows placing the cut at a physical measuring device, is a useful approximation due to decoherence.

What is physics?

Understanding quantum mechanics requires thinking carefully about what physics is and is not. The point of a physical theory is to make predictions about sensory experience. It is only about modeling the world if this helps to make predictions. Thus, the observer’s consciousness1 is just as fundamental as the mathematical objects of the theory. In classical physics, this is obscured because the mathematical objects of the theory are shared among all observers, rendering the observer apparently redundant. Quantum mechanics relaxes this assumption and allows different observers to use different mathematical objects (wavefunctions).

Quantum and classical compared

Let me elaborate on classical and quantum physics.

Classical mechanics describes a system of particles with positions and momenta that evolve in time under Newton’s law. Quantum mechanics is quite similar: it describes a system of particles with a field called the wavefunction that evolves in time under Schrödinger’s equation2. If that were the whole story, quantum mechanics would be pretty much the same as classical mechanics.

However, these are just mathematical constructs so far. How do we actually verify classical mechanics? We can only sense the set of particles corresponding to our body/brain, so we must find a way to cause the system of interest to interact with these particles. In other words, we must split the universe into system and observer3. Then we must assign different states of our state space to different perceptions corresponding to the results of a measurement.

This is exactly what happens in quantum mechanics as well. The difference is that quantum mechanics contains superposition states, while observers can only distinguish between orthogonal states. Thus, there must be a rule to say which orthogonal state in a superposition the observer actually perceives: Born’s rule4.

Why many-worlds fails

Many-worlds seems like a simple and attractive idea that accomplishes the goal: it tells you what an observer perceives using only unitary evolution of a global wavefunction, similar to classical physics. However, it is seriously flawed. Many-worlds models a measurement as follows:

\displaystyle \left(\sum_i c_i | s_i \rangle \right) \otimes |O_0\rangle \rightarrow \sum_i c_i |s_i'\rangle \otimes |O_i\rangle

where |s_i\rangle are the system basis states, |s_i'\rangle are the new system states for each |s_i\rangle, |O_0\rangle is the initial observer state and |O_i\rangle are the final observer states. The |s_i'\rangle are left arbitrary to include both destructive and non-destructive measurements. Measurement is complete upon decoherence, when \langle O_i|O_j\rangle \approx \delta_{ij}. Then the states |O_i\rangle are interpreted as the different perceptions of the observer.

This has several problems. In order of least to most serious:

1. Decoherence is never complete.

What happens in this case? Observers can only distinguish between orthogonal states. An idea is to rewrite the final wavefunction as a sum of direct products in some orthonormal observed basis |O_i''\rangle:

\sum_i c_i'' |s_i''\rangle \otimes |O_i''\rangle

Then the observed system states c_i'' |s_i''\rangle would simply be slightly different than the original ones c_i |s_i'\rangle, corresponding to a small error in the measurement.

2. It assumes the observer is not entangled with the system before measurement.

This is obviously false most of the time! Everything is usually entangled with everything else. To generalize the above, what we actually want is some rule for “hopping” between perceived states of the observer, given an arbitrary entangled state \psi(t). I invite you to come up with such a hopping rule. Seriously, try it.

For example, consider this plausible attempt at a hopping rule. The probability of hopping from state i at time t, to state j at time t+\Delta t, is:

p_{i\rightarrow j} = \displaystyle \frac{\text{tr}\left( P_j e^{-iH\Delta t} P_i \rho(t) P_i e^{iH\Delta t}\right)}{\text{tr}\left( P_i \rho(t)\right)}

where P_i is a projection operator corresponding to state i and \rho(t) is the density matrix5. This has the required property that \sum_j p_{i\rightarrow j} = 1, since \sum_i P_i = 1. This gives the same probabilities that would be observed if the state had collapsed to i at time t, but without actually collapsing the state. The problem is that the denominator can be zero, since there is a nonzero probability that the previous hop landed in the state i even if \text{tr}\left(P_i \rho(t)\right) = 0. The state actually has to collapse to ensure this doesn’t happen.

3. It assumes the many worlds never re-merge or overlap.

Consider the observer’s density matrix \rho_O(t)=tr_S(\rho(t)). The diagonal elements in the observed basis \rho_{Oii} = \langle O_i | \rho_O(t) | O_i\rangle are constantly evolving into each other, with \sum_i \rho_{Oii} = 1. A hopping rule is impossible because you cannot tell which previous state a certain \rho_{Oii} “came from” in the past, unless you assume each state comes from just one past state. This is clearly not true in general.

Many-worlds proponents sometimes argue that macroscopic systems in different states are unlikely to revisit the same state. However, then one must pick a certain size (dimensionality) above which re-merging becomes “acceptably” unlikely. There is clearly no fixed size. For an exact theory of physics, one cannot ignore edge cases like this just because they are rare. Ironically, while many-worlds proponents like to point to the seemingly arbitrary nature of wavefunction collapse, it is many-worlds that places arbitrary restrictions on what systems can be considered observers.

Why Copenhagen is fine

The key insight of the Copenhagen interpretation (i.e. quantum mechanics itself) is that a global (objective) reality is not required to make predictions.

One way to understand this is with the Wigner’s friend thought experiment, which I have slightly extended below.

Wigner prepares his friend and a two-state system in a superposition state

(a|\uparrow\rangle + b|\downarrow\rangle)\otimes |\psi_{friend}\rangle

When his friend measures the system, he may obtain the state |\uparrow\rangle. He then tells Wigner his result, so that in his view, Wigner knows that |\uparrow\rangle was measured. However, Wigner models this measurement as the total state

a |\uparrow\rangle\otimes |\uparrow_{observed}\rangle + b |\downarrow\rangle\otimes |\downarrow_{observed}\rangle

When Wigner measures his friend (by asking him about it, perhaps), he may see a different state |\downarrow\rangle\otimes |\downarrow_{observed}\rangle, so he believes that |\downarrow\rangle was measured. Thus, they may both experience totally different things. But each observer sees an internally consistent story, so the theory is consistent. That’s it.

Measuring devices

This subjective view of physics implies that measurements are made on the observer’s Hilbert space, not on external measuring devices. Then why can some objects be considered classical measuring devices in practice? The answer comes down to decoherence. I will explain this in a somewhat roundabout way that highlights the behavior of real measuring devices.

Recall the textbook measurement postulate: a measurement collapses the system to an eigenstate of the measured Hermitian operator, with probability given by Born’s rule. This is often false in practice! For example, in quantum optics, photodetectors may measure position of a photon, but collapse the system to the state of “no photon”.

Real-world measurements are described by so-called general measurements6. These are defined by a set of operators M_i corresponding to the results of the measurement. The probability for result i is:

p_i = \langle \psi | M_i^\dagger M_i | \psi\rangle

upon which the wavefunction collapses to

\displaystyle |\psi\rangle \rightarrow \frac{M_i |\psi\rangle}{\sqrt{\langle \psi | M_i^\dagger M_i | \psi\rangle}}

The measurement operators satisfy the completeness relation

\sum_i M_i^\dagger M_i = 1

M_i do not have to be Hermitian. For a photodetector, they would be something like M_\textbf{n}=|0\rangle\langle \textbf{n}|, where \textbf{n} are some properties of the photon, like position and polarization. General measurements reduce to conventional (projective) measurements when the M_i are Hermitian and orthogonal projectors: M_i M_j = \delta_{ij} M_i.

General measurements are equivalent to unitary interaction of a system with an ideal environment, followed by a projective measurement on the environment. Specifically, consider coupling the system to an environmental Hilbert space: \mathcal{H} = \mathcal{H}_s \otimes \mathcal{H}_e. The environment is initially in the state |0\rangle. Introduce the operator U such that

U(|\psi\rangle \otimes |0\rangle)=\displaystyle \sum_i M_i|\psi\rangle \otimes |i_E\rangle

where |i_E\rangle are orthonormal states of the environment corresponding to the M_i.

You can check that U preserves inner products of the system Hilbert space:

(\langle 0|\otimes \langle v|) U^\dagger U (| w\rangle \otimes |0\rangle) = \langle v | w\rangle

It can be shown that such a U can be extended to a unitary operator U' on the entire Hilbert space. Now if we measure an operator on the environment with eigenstates |i_E\rangle, we obtain one of the system states

\displaystyle \frac{M_i |\psi\rangle}{\sqrt{\langle \psi | M_i^\dagger M_i | \psi\rangle}}

with probability

p_i = \langle \psi | M_i^\dagger M_i | \psi\rangle

just as above.

Look familiar? This interaction U is a more general version of the many-worlds “decoherence equation” above. Thus, the condition for a quantum object to implement a general measurement is that its internal states must interact with the system in this way. Decoherence propagates to the next object and so on until it reaches the observer, who makes the measurement.

Conclusion

In a nutshell: quantum mechanics relaxes the assumption of an objective description of the universe, while still being a predictive physical theory.

FAQ

Q: How is the system measurement basis determined (the preferred-basis problem)?

A: First, recall that we do not measure the system directly, only our brain/body after it has interacted with the system. As to which of our internal states correspond to which perceptions, note that the same question applies to classical physics. In both cases, we must determine this empirically.

Q: Isn’t the boundary between system and observer also arbitrary? How do we determine which degrees of freedom can be perceived?

A: Again, the same question applies to classical physics, and must be determined empirically.

Q: What objects have consciousness?

A: No physical objects have consciousness. From your perspective, all physical objects are part of the wavefunction, and nothing else has the power to collapse the wavefunction. (Yes, this unfortunately leads to a kind of solipsism. It’s a lonely world out there.)


1 “Consciousness” is a dirty word among physicists, usually for good reason. Here, it simply means the ability to perceive things: cogito, ergo sum. In the formalism of quantum mechanics, this translates to the ability to collapse the wavefunction by inquiring about a measurement result. Much confusion results from trying to ascribe consciousness to physical objects or from giving the word additional meanings.

2 Or its field theory generalizations.

3 Semantic note: I sometimes use “observer” to refer to the subspace of the state space that is perceived, and sometimes to the conscious entity that does the measurement to collapse the wavefunction. Many-worlds says the latter does not exist. It should be clear from context which one is meant.

4 Can Born’s rule be derived? No, since probability is nowhere to be found in unitary time evolution, so there must be some axiom introducing probability into the theory. Regardless, whether Born’s rule is fundamental or derived has no bearing on the next section.

5 P_i = P_{Oi} \otimes 1, where P_{Oi} is a projector on the observer’s space and 1 is the identity on the rest of the space.

6 This section mostly comes from Nielsen and Chuang, Quantum Computation and Quantum Information, Ch. 2.2.

What is spin?

This is the first in a series of posts explaining fundamental physics concepts in simple terms. I will try to explain as deeply as possible from first principles, but without assuming any math beyond high-school level. However, the footnotes contain details for more advanced readers.


The first topic is spin. Spin is a measure of the internal rotational degrees of freedom of a particle. Consider a particle at rest at the origin. We will assume the particle has nonzero mass for now and discuss the massless case later. What transformations can we make to it that leave it looking externally the same? There are just the three rotations, one around each axis1.

Spin 0

Now consider describing the particle with a sequence of numbers (degrees of freedom, or DOFs) that change in a defined way under rotations. The simplest case is just to give it a single number that doesn’t change under rotations. This is spin 0. The Higgs boson is the only known fundamental spin 0 particle.
 

What if we try to make this single number change? Let’s say a 180° rotation around the x, y, or z axis multiplies it by 2. This doesn’t work, because a 180° rotation around x followed by a 180° rotation around y is the same as a 180° rotation around z, which you can check. But the former results in a factor of 4, while the latter gives a factor of 2. So not every possible choice of transformation works: it must be compatible with the behavior of rotations.

The little group

Finding all the possible ways that a set of numbers can transform under some symmetries (called a group) is known as representation theory. In a landmark paper, Wigner first classified particles as representations of the Poincaré group, the group of symmetries of special relativity. In addition to rotations, this group includes boosts and translations in space and time. He showed that internal DOFs are described by the little group, the group that leaves the particle externally the same. In this case, it is the group of rotations in three dimensions, called SO(3). Spin 0 is a 1-dimensional representation of the little group, since it is just a single number.

Spin 1

Anyway, back to our particle. Another obvious choice is to describe the particle with a 3D vector. Under 3D rotations, this just rotates in the usual way. This is called spin 1. It is a 3-dimensional representation of the little group. Spin-1 particles include the W and Z vector bosons, which mediate the weak force2.

Spin 1/2

We have been working with representations where all the numbers are real. For example, a real 3D vector stays real under rotations, since rotation matrices are real. But quantum mechanics says the universe uses complex numbers. It turns out that there is a complex representation in between spin 0 and spin 1, called spin 1/2. It has two complex DOFs. The majority of particles in our universe are spin-1/2: electrons, muons, quarks, etc.

Higher spins

We can continue upwards, constructing larger and larger representations with spin > 1. In general, a spin-s representation has 2s + 1 DOFs. There is a representation for every integer and half-integer s \geq 0. Integer spin particles are called bosons, and half-integer particles are called fermions.
 
Composite particles form so-called product representations that decompose into independent spin representations. For example, two spin-1/2 particles have 4 DOFs. These split into a spin-1 representation (the “triplet”) and a spin-0 (the “singlet”). In group theory notation this is sometimes written as
 
2\times 2 = 3 + 1
 
(Who said group theory was hard?) This means that under rotations, the spin-1 DOFs transform as a 3D vector while the spin-0 part doesn’t change. Specifically, there is a linear combination of these DOFs that transform as spin-1 and spin-0. An arbitrary linear combination of DOFs will all mix into each other under rotations.
 
While composite particles can have high spin, no fundamental massive particles with spin \geq 3/2 are known to exist.

Spin and statistics

Fermion representations have the peculiar property3 that a full rotation around 360° multiplies the state by -1 instead of 1. In fact, this implies that identical fermions cannot occupy the same state, known as the Pauli exclusion principle. This is responsible for the diverse matter in our universe such as atoms and molecules. Otherwise, fermions in a system would all collapse near the state of lowest energy, the ground state. On the other hand, identical bosons acquire a +1 under rotation, so can occupy the same state. At low temperatures, they almost all occupy the ground state, forming a Bose-Einstein condensate.
 
This connection between the behavior of large numbers of particles and their spin is also called the spin-statistics theorem. Proving this theorem requires quantum field theory and relativity, which are beyond the scope of this article.

Spin and angular momentum

DOFs associated with rotations are known as angular momentum. We have only discussed the internal DOFs (spin), but particles also carry external rotational DOFs called orbital angular momentum. The total angular momentum is the sum of spin and orbital contributions. Since spin representations are finite-dimensional, spin angular momentum is quantized4. A measurement in a particular direction will give a result from \{-s\hbar, (-s+1)\hbar, \cdots, (s-1)\hbar, s\hbar\} for a particle of spin s, where \hbar is Planck’s constant. You can see there are 2s+1 different values. Because rotations around different axes don’t commute, angular momentum in different directions cannot be measured simultaneously: once angular momentum in one direction is known exactly, the other directions become uncertain.

For spin-1/2, the two values are -\hbar/2 and \hbar/2, corresponding to states called “spin down” and “spin up” with respect to a particular direction. These states can be visualized as little arrows in the direction of angular momentum:

Screen Shot 2020-05-19 at 10.12.35 PM

Spin and magnetism

When electromagnetism is included into the theory, it turns out that spin couples to the magnetic field5. Classically, you can visualize spin angular momentum as arising from a particle literally spinning around:Screen Shot 2020-05-28 at 5.00.14 PM

The potential energy of a current loop in a magnetic field is6:

U = -IA\vec{B}\cdot\hat{n}

where I is the current, A is the area, and \hat{n} is the normal vector in the direction given by the right-hand rule on the current. Using A=\pi r^2, I = qv/2\pi r, and angular momentum \vec{L} = rmv\hat{n}, this becomes

U = -\frac{q}{2m} \vec{B}\cdot\vec{L}

Thus, spins tend to “align” with an external magnetic field, since the states with spin in the same direction as the field have lower energy than the states in the opposite direction (for a positive charge q). q/2m is known as the gyromagnetic ratio. It turns out in the quantum theory that this ratio is actually twice the value of the classical theory: q/m. So a particle with spin cannot really be thought of as a classical current loop.

Magnetic fields allow us to manipulate and measure spin, as in the Stern-Gerlach experiment.

Massless particles and helicity

Massless particles such as the photon do not have spin, because their little group is different. Special relativity tells us that massless particles must travel at the speed of light. Therefore, we cannot imagine them “at rest”: they are always going in a particular direction. However, we can still make rotations around this direction and leave the particle the same. The little group in this case is SO(2), the rotation group in 2 dimensions. This is a very simple group that only has 1-dimensional representations7. In fact, we can find them all here. A helicity-s representation acts on the number by multiplying it by

R(\theta) = e^{i s\theta}

under a rotation by angle \theta. There is a representation for every integer and half-integer s, where s can be less than 0 now. Note that half-integer representations are fermions again, since a rotation by 2\pi gives -1.

It would then seem that all massless particles have only one degree of freedom. In fact, another symmetry principle requires us to stick two of these representations together: parity, or symmetry under spatial reflections (x\rightarrow -x, y\rightarrow -y, z\rightarrow -z). Under parity, a rotation in one direction around the particle’s velocity goes in the opposite direction:

Screen Shot 2020-05-29 at 9.58.49 PM

We have reflected both the black rotation arrow and the velocity vector v. Particle representations must be invariant under parity. If we include the +s representation, we also need to include the representation that transforms in the opposite way under rotation. Therefore, massless particles have two DOFs: both the +s and -s representations. Each degree of freedom transforms independently under this rotation (see footnote 7). Note that massive particles have no preferred direction when at rest, so representations are automatically invariant under parity. Imagine removing the blue arrows in the above figure; you will see that a rotation reflected is the same rotation.

The photon has helicity 1, and the graviton (the force carrier of gravity) has helicity 2. One fascinating result of quantum field theory is that it is impossible to have a locally interacting theory of massless particles greater than helicity 2.

Further resources

So much for this whirlwind tour of spin and related topics. For more info, see any standard textbooks on quantum mechanics or quantum field theory. Some I recommend:

  • Griffiths, D.J. Introduction to Quantum Mechanics, Ch. 4.
  • Schwartz, Matthew. Quantum Field Theory and the Standard Model, Chs. 8, 10, 12.
  • Weinberg, Steven. The Quantum Theory of Fields, Ch. 2.4, 2.5.

1 In special relativity, there are also transformations called “boosts”, but these give the particle a constant velocity, so it is no longer at rest.

2 You may have heard that the photon is spin 1. But the photon is massless, so has helicity instead of spin. More on this later.

3 This may seem impossible, since a rotation by 360° is the same as no rotation at all. And you are right! I slightly lied earlier. Technically, we are finding representations of SU(2), the double cover of the rotation group SO(3). This is because the Lie algebras are identical, su(2) \sim so(3), but exponentiating the spinor representation of the algebra produces SU(2) instead of SO(3).

4 Orbital angular momentum is also quantized, but only integer representations exist. This is because the DOFs here are actually fields: functions of space. A rotation by 360° must take the field to the same field, because the position vector itself is a vector representation of rotations.

5 This can be derived by starting with the Lagrangian of quantum electrodynamics and taking the non-relativistic limit, where all energies are smaller than the rest energy of the electron E=mc^2. However, in the spirit of effective field theory, we can also consider writing down all terms consistent with the non-relativistic symmetries: SO(3), parity, and gauge invariance. The spin operator \vec{S} is a vector, so it must be dotted with another vector to create an SO(3) invariant. Actually, it is a pseudovector: \vec{L}=\vec{r}\times \vec{p} is invariant under parity \vec{r}\rightarrow -\vec{r}, \vec{p}\rightarrow -\vec{p}. It must be dotted with another pseudovector to be invariant under parity. This must be the magnetic field \vec{B}, since the electric field \vec{E} is a vector. It cannot be any other function of the vector potential A_\mu due to gauge invariance. Thus, the lowest order interaction is

c \vec{S}\cdot \vec{B}

for some constant c.

6 A nice derivation is as follows. Start with the integral form of Faraday’s law:

\frac{d\phi}{dt}=-\oint \vec{E}\cdot \vec{dl}

where \phi is the magnetic flux through the loop. Multiply by current and integrate over time:

I\Delta \phi=-\int dt IV=-\int dt P_{diss}

where V=\oint \vec{E}\cdot \vec{dl}, I is the current, and P_{diss}=IV is the dissipated power. Thus, it takes an energy

E=\int dt P_{diss}=-I\Delta \phi

to change the magnetic flux by an amount \Delta \phi. It is easiest to draw a picture to get the sign right. This derivation shows that the energy is independent of the shape of the loop.

7 A rotation matrix in 2D is, of course, two-dimensional. However, this is actually a reducible representation made of two irreducible, one-dimensional (complex) representations: helicity +1 and -1. You can see this by noting that (1, i) and (1,-i) are both eigenvectors under rotation, so do not transform into each other. We are only classifying the irreducible representations here.

Spin passing through wire loop

Consider a spin-1/2 particle passing through a wire loop with a light bulb in series:

Screen Shot 2018-04-07 at 9.56.45 PM.png

The spin produces a magnetic moment and therefore a magnetic field, which can induce current in the loop due to changing flux. For a pure upwards-pointing spin \lvert\uparrow\rangle (shown above), we have an increasing upwards flux before it enters and a decreasing flux afterwards, so the light bulb turns on before and after entering the loop. For \lvert\downarrow\rangle the flux is reversed but the light bulb still turns on at the same times.

Now let the spin part of the wavefunction be \frac{1}{\sqrt{2}}(\lvert\uparrow\rangle + \lvert\downarrow\rangle). Does the light bulb turn on?

Perceiving many branches in many-worlds?

I’m a firm believer in the many-worlds interpretation, since one man’s “wavefunction collapse” is another man’s entanglement (i.e. decoherence), but some nagging details have been bothering me recently. In particular, many-worlds does not seem to answer the fundamental question of why humans only perceive one branch of the wavefunction at a time. If we take it as a fundamental postulate of the theory, then we’re in no better position than the Copenhagen interpretation. Is it just something to be chalked up to human perception, to be thrown in some vague realm of philosophy? Well, after thinking for far too long, I realized that it is in general impossible to measure a superposition state, as it violates linearity.

To see this, consider a two-level system in a state a|1\rangle+b|2\rangle, as well as some observer system in a state |B\rangle, so that the combined state before the interaction is

(a|1\rangle+b|2\rangle)|B\rangle.

In a standard measurement with full decoherence, the wavefunction after measurement would be

a|1\rangle|A_1\rangle + b|2\rangle|A_2\rangle

Suppose the observer has some way of sensing both branches, i.e. measuring |a| and |b|. Then they can communicate this information by preparing a state |f(|a|,|b|)\rangle. For example, they may send out a photon of wavelength \lambda_1 |a|^2+\lambda_2|b|^2. Then the full interaction is

(a|1\rangle+b|2\rangle)|B\rangle \rightarrow (a|1\rangle|A_1\rangle + b|2\rangle|A_2\rangle) |f(|a|,|b|)\rangle.

It is important that we can choose the |f(|a|,|b|)\rangle orthogonal for all different values of |a| and |b| (up to normalization, as in the photon case). This interaction clearly violates linearity, as we have

|1\rangle|B\rangle \rightarrow |1\rangle|A_1\rangle|f(1, 0)\rangle,

|2\rangle|B\rangle \rightarrow |2\rangle|A_2\rangle|f(0, 1)\rangle,

so that

(a|1\rangle+b|2\rangle)|B\rangle \rightarrow a|1\rangle|A_1\rangle|f(1,0)\rangle + b|2\rangle|A_2\rangle|f(0,1)\rangle

which is definitely not what we wanted!

Decoherence is not really required; in fact the combined observer plus two-level system could evolve to any state |A(a, b)\rangle. The crucial point is that the observer’s wavefunction does not depend on a and b before the interaction, so we can use linearity. This means that we cannot use the same argument for the observer that prepared the two-level system, as their wavefunction must include the knowledge of the system wavefunction!

I feel that this should be some basic well-known theorem, but it doesn’t seem to match with any of the common no-go theorems such as no-cloning.