Magnetic-core memory, Faraday’s Law and winding numbers

Like every adult male entering his thirties, I had recently developed an interest in military history, so what better way to spend a Saturday than at the local Midway Museum, the dubiously self-proclaimed #1 attraction in San Diego. Besides the endless labyrinth of tiny hallways and crew quarters, perhaps the most interesting exhibit was the UNIVAC CP-642B and its “LOL memory”, named after the “little old ladies” from the textile industry that painstakingly wove it by hand!

Click image for full size

Click image for full size

While LOL memory is read-only, it shares basically the same design as an early form of RAM called magnetic-core memory. Naturally, I wanted to investigate how this strange criss-cross structure worked.

Wikipedia has a good overview. In a nutshell, each ring of magnetic material stores one bit in its magnetization direction, clockwise or counterclockwise. The ring has an X line (green), Y line (red), sense line (s, orange), and inhibit line (z, purple) passing through it. Changing the polarity requires a certain threshold current. To write a bit, a current with half the required magnitude is sent through both its X and Y line, only switching that ring.

More interesting is how it is read. First, a 0 bit is written to the ring. If it was previously 0, nothing happens. If it was 1, the polarity switches and causes a changing magnetic field. By Faraday’s law, a voltage pulse is generated in the sense line, which is picked up by the sensing circuit.

It was not obvious (to me) that this works for all the rings in the diagram above. The sense line forms quite a complicated self-overlapping path through all the rings. Would the magnitude of the voltage differ for rings on the edge, corner, or center? During switching, the magnetic flux goes into the page on one side of the ring, and out of the page on the other. The flux at either point induces an emf proportional to the winding number of the sense line around that point, viewing the sense line as a curve in the plane. We can plot the winding number of all the contiguous regions containing the flux:

The winding number can be found by unwinding the sense line as much as possible without intersecting the given point, and counting the number of windings around it. For example:

Finally, because the flux goes in opposite directions on either side of the ring, we can find the overall emf for a ring by subtracting the winding numbers on either side. Miraculously, the magnitude of this difference is 1 for every ring, so each ring induces the same voltage. What a clever design!

Or maybe not. In fact, any planar curve that goes through a ring will result in the same voltage. This is because the winding number around a point always changes by one when it crosses a curve:

Moving across a curve either unwinds a circle or creates a new circle. This is also proven visually in this Math.SE answer:

Here, we deform the contour (keeping winding number the same) until the point r becomes the point s with a closed clockwise loop around it. Thus, winding number of r equals winding number of s minus one.

Maybe all this is totally obvious to circuit engineers, but I found it quite interesting. Also, it gives a cool way to find the winding number for arbitrary curves: start from the outside and move the point in, adding or subtracting one at every crossing.


Programming languages summarized in one line

Inspired by the classic: A Brief, Incomplete, and Mostly Wrong History of Programming Languages.

BASIC: 10 PRINT "HELLO WORLD"; 20 GOTO 10 is the only BASIC program ever written.

C: Invented by Alan Turing as a precursor to the Turing Machine.

C++: Invented to give employment to C++ Standard Committee members.

CSS: The CSS standard formally consists of over 10,000 StackOverflow posts for every possible layout scenario.

D: C++ for hipsters.

Fortran: Used by Gauss to implement his linear algebra routines, never used since.

Go: All the advantages of C without the advantages of C++.

Haskell: The primary use case of Haskell is to write tooling for Haskell.

Java: All Java development has been fully automated since 2008.

Mathematica: What every scientist plans to learn, eventually.

MATLAB: The true backbone of all scientific research.

Lisp: The primary use case of Lisp is to write tooling for Lisp.

Perl: Used when Python programmers wake up and choose violence.

Python: All babies automatically learn Python at age 2-3.

Ruby: Lisp for hipsters’ hipsters.

Scala: A conspiracy by compiler writers to sell more compilers; does not actually exist.

Quantum mechanics for everyone

This post explains quantum mechanics (QM) without any advanced math. Unlike most introductions, I will focus on the interpretation of QM: what the objects in the theory mean and how they fit into a broader philosophy of doing physics. Specifically, I explain why the Von Neumann-Wigner interpretation, a variant of the standard Copenhagen interpretation, is the correct one. I also explain why a popular alternative to Copenhagen, the many-worlds interpretation, is incorrect.

The footnotes will contain details for more advanced readers. Also, see here for a shorter and more math-heavy version of this post.

What is science?

Let’s start with what we know. As Descartes said, “I think, therefore I am.” We know that subjective experience exists. In philosophy, subjective experiences are called qualia (singular quale). One purpose of science (including physics) is to predict what qualia we will experience, based on our past experiences. This is simply because qualia are, by definition, all that we can experience, so any attempt to verify a scientific theory necessarily involves qualia as inputs and outputs.

This focus on subjective experience may sound fuzzy and unrigorous, especially for those used to classical physics. However, it is actually a very conservative viewpoint. Some may say that the goal of science is instead to understand the objective world around us. That may be the case, but at a minimum, a theory must also be able to make predictions about our experiences. More on this as we go along.

The wavefunction and many-worlds

In this section, I will explain the basic ideas of QM, in the language of the many-worlds interpretation (MWI). MWI provides a convenient way to visualize QM as the continual splitting of a system’s state into many branches, or “worlds”. I will then show that MWI alone cannot be used to make predictions, for both practical and mathematical reasons. However, we can fix it by adding the concept of wavefunction collapse. This produces the Copenhagen interpretation.

Quantum mechanics describes the universe using a mathematical object called a wavefunction, with the symbol \psi. In the quantum world, a system can be in a combination of classical states instead of being in one state at a time. For example, a particle can be in two places at once. This is called a superposition.

Fig. 1 shows an example. The particle starts at position A, then over time, it evolves into an equal superposition of position A and B. (The boxes show instants in time.) At this time, if the experimenter measures the position of the particle, they will obtain either A or B with 50% probability1. This is indicated by the “probability amplitude” on top of each box. In QM, probabilities are given by the square of this amplitude. This is called Born’s rule. At any time, the squared amplitudes of all the branches must sum to 1. We get the number on each box as follows. When a box branches into multiple scenarios, we first multiply its amplitude with the number on each outgoing arrow (1/\sqrt{2}). Then, for each new scenario, we sum over all the incoming arrows. For example, on the top box in the superposition, 1/\sqrt{2} comes from 1 on the initial box times 1/\sqrt{2} from the one incoming arrow.

Fig. 1. Superposition.

The numbers on the arrows depend on the particular interactions between the particle and its environment. We will not be concerned with those here.

Of course, the experimenter is also composed of many particles, so should also be included as part of the wavefunction. This is shown in Fig. 2. When the experimenter measures the position, her brain’s particles record a state corresponding to seeing it at either A or B. We say that the experimenter’s state has become entangled with that of the particle.

Fig. 2. Measuring the position causes the experimenter’s brain to change state.

This shows how physics fundamentally works. To make predictions about qualia, a physical theory associates certain mathematical objects, or states, with qualia such as “seeing the particle in position A”. Given an initial state, classical physics predicts a certain future state, which is confirmed or denied by perceiving its associated qualia. In contrast, QM only predicts probabilities of obtaining future states. One way to confirm QM is then to do many identical experiments and then see if the results converge to the right probabilities2.

Does this mean that we must know the entire state of our brain in order to make or verify any predictions? Of course not. In practice, we rely on our eyes, ears, and other measuring devices to sense the world. This is because external inputs to these devices can reliably induce certain states in our brain. For example, light with a wavelength of 700nm that goes into our eyes can reliably induce the sensation of “seeing red”. More on this when we discuss measuring devices and decoherence later.

A prediction rule

Is the wavefunction all you need? No. As the experimenter, simply knowing the wavefunction at a given time does not allow you to make predictions, for the very obvious reason that you don’t know which branch you are on. At the least, you must also keep track of your current branch. For example, if you observe the particle at A, you know you are on the top branch of Fig. 2. Then, for future predictions, you must only use the arrows coming out of that state. Since the total probability must still equal one, you must then divide the probability (squared amplitude) on each future box by the current one on your box.

This is shown in Fig. 3 for multiple splittings. (Here, instead of drawing pictures in the boxes, I use letters A, B, etc. to show general states.) Let’s say you observe that you are in state B. Then in the future, you have a 1/3 chance of being in state D and a 2/3 chance of being in state E. This comes from (1/\sqrt{6})^2/(1/\sqrt{2})^2 = 1/3 and (1/\sqrt{3})^2/(1/\sqrt{2})^2 = 2/3. Even though the wavefunction contains states F and G at the same time as D and E, there is no probability of reaching those states because there are no arrows coming from B.

Fig. 3. Multiple splittings of the wavefunction. The highlighted branch corresponds to observing B instead of C.

This seems like a workable rule for making predictions: whenever you make a measurement, select your branch of the wavefunction and “follow the arrows” from there to predict future measurement results. Note that this rule does not discard the other branches entirely. All branches are still “there” at least mathematically, although most are unreachable in practice.

The wavefunction in this picture is globally shared among all observers. However, each person might perceive themselves to be in a different branch, depending on their random measurement results. This is shown in Fig. 4. Experimenters E1 and E2 measure the particle in turn. E1 may get A, so she selects the top branch. At the end of this branch, she perceives that both agree on position A. However, E2 may get B, so she selects the bottom branch, and perceives that both agree on position B. The key point is that in the end, each observer perceives an agreement on the position, so the measurement results are consistent from their own perspective.

Fig. 4. Experimenters E1 and E2 both measure the particle.

This example is similar to a famous thought experiment called Wigner’s friend. Wigner’s friend has historically been very confusing (as you can see from the Wiki article), so let me elaborate. Clearly, E1’s perceptions only depend on the particles in her own brain, not those in E2’s. When I say that she “perceives an agreement”, I mean that she treats E2 as a physical system and interacts with it, by asking her/it about the particle’s position, perhaps. That system then responds, by saying “A” or “B”, for example. This information gets received and stored in her brain in some form. From E1’s perspective, everything is a physical system, including other humans, animals, her own brain, etc. Only a subset of this system (her brain) corresponds to her perceptions3. Again, this is a very conservative viewpoint, since it does not assume other parts of the system correspond to some other entity’s perceptions. In other words, we do not assume other humans/animals/rocks/etc are “conscious”4.

Wavefunction collapse

So far so good, right? Unfortunately, this prediction rule does not quite work. Mathematically, you must completely discard the other branches every time you make an observation, and only keep the branch you are on. In other words, there can be no globally shared wavefunction. This is because probability amplitudes, unlike probabilities, can be negative. Quantum interference can cause the amplitude of a given scenario to be zero in a global wavefunction, even when that scenario is reachable in practice. If that branch is selected, it gives 0/0 for any future probabilities, which is undefined.

As usual, Fig. 5 shows an example. Assume you measure B. By the rule, you predict a 50% probability of either D or E ((1/2)^2/(1/\sqrt{2})^2=1/2). See Fig. 5(a). Note that we only consider arrows coming from B in this prediction. Then assume D is measured. We now try to apply the rule starting from D. See Fig. 5(b). However, the amplitude of D is zero! This comes from adding the two incoming arrows. We have 1/\sqrt{2}\times 1/\sqrt{2} from B, and 1/\sqrt{2}\times -1/\sqrt{2} from C, adding up to zero.

Fig. 5. (a) Making a prediction upon measuring B. (b) The prediction rule fails upon measuring D.

The solution is to discard all other branches upon each measurement, and set the amplitude of the measured branch equal to 1. This is called wavefunction collapse. It is shown in Fig. 6. When B is measured, we remove C and give B amplitude 1. Then when D is measured, we remove E and give D amplitude 1. This guarantees that probabilities are always well-defined.

Fig. 6. (a) Once B is measured, we discard branch C. (b) Once D is measured, we discard branch E.

Wavefunction collapse is the most controversial aspect of QM. However, from the discussion above, we see that it is basically just a mathematical formality, since the prediction rule is unchanged except in special cases. Remember, we are only concerned with making predictions, not “modeling the world”. This avoids meaningless philosophical issues about whether the wavefunction or its collapse is “real”. The reason many are uncomfortable with collapse is because it is different from classical physics, in the following ways:

  • Different observers use different wavefunctions. In MWI, although observers may find themselves in different branches, there is only one wavefunction. Similarly, the classical universe is in a single big classical state. However, by discarding the other branches, different observers use entirely different mathematical objects (wavefunctions) to describe the universe. Of course, the physics stays the same, since as just mentioned, the prediction rule is almost the same.
  • Wavefunction collapse happens instantaneously. In classical physics, the state evolves continuously in time under Newton’s laws. In quantum physics, apart from wavefunction collapse, the wavefunction also evolves continuously in time under an equation called Schrödinger’s equation5. (We have summarized this continuous evolution using the arrows with numbers on them.) Wavefunction collapse instantly discards the other branches and assigns a new amplitude to the observed branch. How is such a discontinuous process allowed? Because any predictions must specify a time when the measurement yields a definite result. This is when collapse occurs6. More on this later.

The Copenhagen interpretation

This theory of wavefunction evolution plus collapse is loosely called the Copenhagen interpretation. Actually, there is no widely-agreed-upon definition of the Copenhagen interpretation, but one hallmark is the separation of the world into classical and quantum systems. QM was originally developed to describe small objects such as single particles using a wavefunction. In contrast, large objects such as photon detectors or human beings were treated as classical systems that cause wavefunction collapse. For example, a particle detector appears to “collapse” the wavefunction of a superposition state like Fig. 1 into a state with definite position, either A or B. In this picture, the particle detector is not part of the wavefunction.

Of course, this led to much confusion about where exactly to draw the line between classical and quantum. How large does a system have to be in order to become classical? As we have argued above, there is no inherent difference between objects such as particles and humans; they are all quantum systems and all part of the wavefunction. In other words, we draw the line at the observer’s “consciousness”. The act of observation causes collapse. This variant of Copenhagen is sometimes called the Von Neumann-Wigner interpretation, or “consciousness causes collapse”.

Consciousness is a dirty word among serious physicists, almost always for good reason. However, we simply use it to mean the ability to have subjective experiences, which was our very first assumption.

Measuring devices and decoherence

This begs the question of why large systems like particle detectors tend to “look” classical. In fact, this was not fully understood until the theory of decoherence emerged in the 1950s-1970s, decades after QM was developed. The basic idea is quite simple. Take a small system S in one of a few states A, B, C, etc. When it interacts with an environmental system E, this environment turns into a corresponding state E_A, E_B, E_C, etc. For a large environment, these environmental states tend to become well-separated very quickly. This is because there are many more microscopic states that the large environment can take.

For example, Fig. 7 shows a single particle bouncing around in a box. This is a small environmental system. If another particle is placed at position A (top left), eventually they will hit each other, affecting the path of the first particle in some way. If instead the second particle is placed at position B (bottom left), it will affect the first particle in a different way. However, there is a good chance that at some future time, the first particle will happen to be at (nearly) the same location for both scenarios, as seen in Fig. 7.

Fig. 7. Particle in a box with another one placed at either A or B. At some future time, it is likely that the first particle will be at the same location in both scenarios, as shown here.

Now consider a huge number of particles bouncing around in the box. This is a large environmental system. If a new particle is introduced at position A, it will rapidly scramble the paths of all the other particles as they interact with it and with each other. If instead the new particle is introduced at position B, it will scramble the paths in a very different way. At any future time, there is very little chance that all the original particles will be at all the same locations in the two scenarios. The environmental states E_A and E_B are well-separated.

Fig. 8 shows a more accurate version of the measurement in Fig. 2, incorporating decoherence. The wavefunction initially splits into an equal superposition of position states A and B of the particle. At this time, the experimenter is in the same initial state for both branches. The experimenter then measures the particle by interacting with it. For example, there may be some light illuminating the particle, which goes into the experimenter’s eyes, which sends an electrical signal to the brain, etc. After a short amount of time, the experimenter’s brain is in very different states for the two scenarios A and B. This is seen by the nearly zero amplitude of the “observed B” state when the particle is at A (top-most branch), and the nearly zero amplitude of the “observed A” state when the particle is at B (bottom-most branch).

Fig. 8. More accurate version of Fig. 2 that incorporates decoherence.

To summarize: a measuring device looks classical if it causes decoherence. Therefore, you might think that decoherence can be used to define measurement, so that we do not need wavefunction collapse. This is not the case, for a couple of reasons. First, decoherence is never complete. In most decoherence models, the amplitude of the “wrong” branch approaches zero exponentially with time, but never reaches it. Therefore, we cannot define a time when the measurement is complete. Second, decoherence is only an emergent property of large systems. Why should conscious observers be limited to these systems? Indeed, how do we set a lower limit on the size or amount of decoherence anyway? Clearly, we cannot. The theory must still apply to general quantum systems as observers.

For example, consider an observer system that fluctuates rapidly in time, as in Fig. 9. The theory must still be able to associate states of this system with the observer’s perceptions. Since the branches do not remain separated over time, we cannot rely on decoherence. We also cannot say a state must be stable for a minimum amount of time in order to be measured. The observation, and thus collapse, must happen instantaneously.

Fig. 9. An observer in a rapidly fluctuating superposition.

Other interpretations

The Copenhagen interpretation has always been the standard one taught in textbooks. In the last few decades, many other interpretations have sprung up. I myself believed in MWI until I started thinking deeply about QM a few years ago. In my opinion, these other interpretations all stem from misunderstanding either the Copenhagen interpretation or the purpose of a physical theory. I will list some of them and their flaws here without further detail.

  • MWI is incomplete, as argued above.
  • Bohmian mechanics and consistent histories are ugly and overly complicated.
  • Quantum Bayesianism and relational quantum mechanics just dress up Copenhagen with some fancy words.


  • The minimum requirement for a scientific theory is that it makes predictions about an observer’s qualia. It does not have to predict the qualia of other entities, since they are not observable.
  • A theory does this by associating mathematical objects, or states, to certain qualia.
  • Classical physics predicts one future state, while quantum physics only predicts probabilities of each future state. This is done using a wavefunction that splits into multiple scenarios.
  • The wavefunction collapses upon an observation to the observed branch. Thus, different observers use different objects (wavefunctions) to describe the universe. Collapse is required mathematically for the theory to work.
  • Collapse must be instantaneous for the theory to apply to all possible observers.
  • Decoherence explains why certain objects look like classical measuring devices. However, it is only an approximation and does not replace the need for collapse.

1 Why can’t we observe the particle in two places at once? There are two ways to interpret this question in QM. 1) Why do we prefer the position basis instead of another basis? This is known as the preferred-basis problem. The short answer is that the preferred basis must be empirically determined, just as the perception of the color “red” must be correlated with certain wavelengths of light. More in the advanced version of this post. 2) Why can’t we perceive that we are in a superposition, in general? Because then we could prepare an identical state, violating the no-cloning theorem. More on this here, or see Nielsen & Chuang’s textbook.

2 To be pedantic, no experiments can be truly identical, because 1) the initial states cannot be exactly the same, and 2) the state of your brain has to include the memory of previous experiments. Of course, we really mean that for a series of experiments where we control all the relevant inputs, the results stored in your brain will converge to the predicted probabilities. Also, it goes without saying that many states are associated with the same quale: shifting the position of one molecule in your brain by a tiny amount has no observable effect.

3 This begs the question: how do we know what subset we can observe? As usual, we must determine this empirically!

4 Yes, this is basically solipsism. Unfortunately, that is where the logic of QM leads us. Don’t take it so seriously as to affect your personal moral code or anything.

5 Or more generally, the operator generated by the Hamiltonian.

6 Another common belief is that collapse is incompatible with relativity. This is false. Of course, we do not have a complete theory of quantum gravity, but for QFT in curved space, we can choose the collapse to occur on any spacelike hypersurface. This is because spacelike-separated operators commute, so can be simultaneously measured.

Solving Newcomb’s paradox for classical and quantum predictors

A recent HN post reminded me of Newcomb’s paradox, which goes as follows (from Wiki):

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following:

  • Box A is clear, and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor:
    • If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
    • If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.

The player does not know what the predictor predicted or what box B contains while making the choice.

The question is whether the player should take both boxes, or only box B.

I first saw this problem many years ago but didn’t have a strong opinion. Now it seems clear that the controversy is about the definition of “reliable predictor”. This is usually left vague, leading to many unreliable philosophical and game-theory arguments. As usual, I will try to solve the problem using physics. Interestingly, the analysis is different for a classical versus quantum predictor, and also depends on the interpretation of quantum mechanics.

Classical predictor

Assume it is a classical supercomputer that, at prediction time, takes the state of the player and all the objects that they interact with until the decision. Call this state S_i. By running the physics forward, it arrives at either a state S_{AB} or S_B, corresponding to the decision to take both boxes or only box B, respectively. In this case, one should obviously take only box B.

Quantum predictor

In the quantum case, the initial wavefunction of the player/etc is \psi_i. The computer cannot measure the wavefunction directly due to the no-cloning theorem. Instead, one way to make the prediction is as follows. The decision to take both boxes corresponds to a set of orthonormal states \{\psi_{AB}\}, and likewise for \{\psi_B\}. These two sets are mutually orthonormal and form a complete basis, since there are only two choices. Given these sets, the computer can run Schrödinger’s equation back to prediction time to obtain the sets \{\psi_{ABi}\}=e^{i H t}\{\psi_{AB}\} and \{\psi_{Bi}\}=e^{i H t}\{\psi_B\}, respectively. These are also mutually orthonormal due to unitarity. At prediction time, it can measure the projection operator

\displaystyle P_{B}=\sum_a |\psi_{Bi}^a\rangle \langle\psi_{Bi}^a|.

The measurement gives 1 (take box B) with some probability p, and 0 (take both boxes) with probability 1-p. This collapses the player’s wavefunction to one of the states in \{\psi_{ABi}\} or \{\psi_{Bi}\}, which then evolves into a state in \{\psi_{AB}\} or \{\psi_B\}. Thus, from the predictor’s perspective, the predictor is always right.

The player models this measurement as the predictor becoming entangled with the player, so that the total wavefunction is something like

\displaystyle \sqrt{p}(\psi_{Bi}\otimes \psi_\text{predictB}) + \sqrt{1-p}(\psi_{ABi}\otimes\psi_\text{predictAB}).

If the player only makes a measurement at decision time, they will collapse the wavefunction to a state in \{\psi_{B}\} with probability p, or a state in \{\psi_{AB}\} with probability 1-p. We assume that this is the measurement basis since the player’s state should not become a superposition of (take B only) and (take both). The expected value is then simply:

\displaystyle E[p] = p B + (1-p)A = A+p(B-A)

where A=\text{\$1,000}, B=\text{\$1,000,000}. This is maximized at p=1, so the best decision is to take only box B, just as in the classical case.

Where we go from here depends on the interpretation of quantum mechanics. For many-worlds, there is only unitary evolution. The player ends up in the branch \psi_{B}\otimes \psi_\text{predictB} with probability p, giving the expected value above.

However, for Copenhagen-type interpretations where different observers can use different wavefunctions, the player can do better, since they are free to make any measurements between prediction and decision time, while the predictor assumes unitary evolution1. In fact, they can make the predictor predict (take B only) with certainty, while they actually take both with certainty. One way is as follows. Assume the player makes the decision based on measuring a qubit at decision time, where |\uparrow\rangle means take B only and |\downarrow\rangle means take both. The state of the qubit oscillates between |\uparrow\rangle and |\downarrow\rangle with period T, where T is the time between prediction and decision. At prediction time, assume the state is |\uparrow\rangle, so the predictor predicts (take B only). At time T/2, the player can make repeated measurements very quickly until decision time. The qubit stays in the |\downarrow\rangle state due to the quantum Zeno effect. Thus, at decision time, the player takes both boxes. The extra $1,000 can then contribute to funding the delicate and expensive equipment needed for the qubit.

We can take this one step further in some cases. For human players, the knowledge of the measurement protocol is classically encoded in the player’s brain in some way. If the supercomputer can decode this information instead of merely running the time evolution, they can also predict which measurements the player makes, and the probabilities of the subsequent results. We arrive back to the original case, where the best solution is to pick B only. This is not required by the postulates of quantum mechanics. The observer’s decision to make measurements on its state does not necessarily have to be encoded in its state itself.

Real predictor

In the real world, there are no such supercomputers, and no entity would risk $1,000,000 on a meaningless game. The best answer is to take both boxes.

1 In practice, a human’s measurements of their own state occur long after decoherence, so they have no control of their wavefunction in this way. However, if we are assuming all-powerful supercomputers, we may as well go all the way.

Fundamentals of classical mechanics, or why F = ma

Despite its simplicity, classical mechanics is not taught well in the typical physics curriculum. This is unfortunate because the general philosophy of constructing Lagrangians based on symmetries underlies all of modern physics. In this article, I explain basic Lagrangian mechanics in a systematic way starting from fundamental physical principles. It basically follows Landau and Lifshitz Vol. 1 but ties up some loose ends.

Principle of stationary action

Classical mechanics describes the motion of objects modeled as point particles. First, consider a single particle in empty space. At any given time, it has a position \vec x(t) and velocity \vec v(t)=\frac{d\vec{x}}{dt}.

Define a quantity S_{if}\{\vec x(t)\} that depends on the path of the particle \vec x(t) from time t_i to t_f. The principle of stationary action, or action principle, states that the path the particle actually takes is one where the action is stable to small perturbations in the path \vec x(t) \rightarrow \vec x(t) + \vec{\delta x}(t).

To elaborate, consider dividing the time interval from t_i to t_f into N segments, and take N\rightarrow \infty in the end. You may think of S_{if} as a function of many variables \{\vec{x}(t_i),t_i,\vec{x}(t_i+\Delta t),t_i+\Delta t,\cdots, \vec{x}(t_f), t_f\}, where \Delta t = (t_f-t_i)/N. (Note that the velocity \vec{v}(t) = \frac{\vec{x}(t+\Delta t)-\vec{x}(t)}{\Delta t}, so it is not an independent variable here.) Such a “function of a function” is called a functional. The principle of stationary action is then \frac{\delta S_{12}}{\delta x_i(t)}=0, i.e. the partial derivative of S_{12} with respect to any component of the position x_i at any time t is zero. The \delta symbol is generally used instead of \partial for functional derivatives.

Finally, the action principle only applies to perturbations that are zero at the boundaries: \vec{\delta x}(t_i) = \vec{\delta x}(t_f) = 0. This will become important later.

The Lagrangian

Consider the action S_{12} for time t_1 to t_2, and the action S_{34} for time t_3 to t_4, with t_1 < t_2 < t_3 < t_4. We require locality in time, meaning that a perturbation in the first interval only affects S_{12} and not S_{34}. Also, we assume additivity of the action: S_{12}+S_{23}=S_{13}. These conditions imply that S_{12} can be written as an integral from t_1 to t_2 of some quantity: S_{12}=\int_{t_1}^{t_2} \mathcal{L}(\vec{x}(t),\vec{v}(t), t). \mathcal{L}(\vec{x}(t),\vec{v}(t), t) is known as the Lagrangian. In general, it may depend on the position and velocity at time t, as well as the time t itself1.

Note that we may add a total time derivative \frac{df}{dt}(\vec{x},t) to the Lagrangian without affecting the principle of stationary action. Such a term produces the action:

\displaystyle\int_{t_i}^{t_f} dt\frac{df}{dt}(\vec{x},t) = f(\vec{x}(t_f), t_f)-f(\vec{x}(t_i), t_i)

by the fundamental theorem of calculus. The perturbation \vec{\delta x}(t) is zero at the boundaries by definition, so does not affect this action.

Let us now derive the form of the Lagrangian based on some other fundamental principles:

Homogeneity of space and time. No point in space or time is any different from any other, so the Lagrangian cannot depend on \vec{x} or t explicitly.

Isotropy of space. No direction in space is different from any other, so the Lagrangian can only depend on the magnitude (squared) of the velocity \vec{v}(t)^2.

Galilean invariance. The theory should be invariant under shifts by a constant velocity, \vec{x}\rightarrow \vec{x}+\vec{v}_0 t. In other words, there is no universal stationary frame of reference. Taking the time derivative, this is \vec{v}\rightarrow \vec{v}+\vec{v}_0. To first order in \vec{v}_0, the Lagrangian changes as

\displaystyle\mathcal{L}(\vec{v}^2)\rightarrow \mathcal{L}(\vec{v}^2+2\vec{v}\cdot \vec{v}_0) = \mathcal{L}(\vec{v}^2)+2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0

The term 2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0 will not affect the physics if it is a total time derivative of the form above. This only occurs if \frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) is a constant. Call this constant \frac{1}{2} m. Thus, the Lagrangian for a single particle in free space is: \mathcal{L} = \frac{1}{2} m \vec{v}^2. The constant m is, of course, the mass.

To summarize, we derived the unique action and Lagrangian (up to a total time derivative) for a single particle from the following postulates:

  1. Locality in time
  2. Additivity of the action
  3. Homogeneity of space and time
  4. Isotropy of space
  5. Galilean invariance

Multiple particles

Now consider the n-particle case. The Lagrangian may generally depend on all the positions and velocities \vec{x}_1, \vec{v}_1, \cdots, \vec{x}_n, \vec{v}_n. Following the postulates above, it must take the form2:

\displaystyle \mathcal{L} = \left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

where the function U(\Delta \vec{x}_{ij}) depends on all the separations between the particles \{\Delta\vec{x}_{12} = \vec{x}_1-\vec{x}_2, \Delta\vec{x}_{13} =\vec{x}_1-\vec{x}_3, \cdots\}.

Euler-Lagrange equations

Let us now apply the principle of stationary action to the action:

\displaystyle S=\int dt\left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

Plugging in the variation \vec{x}_i\rightarrow \vec{x}_i+\vec{\delta x}_i for particle i, and expanding to first order in \vec{\delta x}_i, we get:

\displaystyle S\rightarrow S+ \int dt\left(m_i \vec{v}_i\cdot \vec{\delta v}_i - \nabla_i U \cdot \vec{\delta x}_i\right)

where \nabla_i U is the gradient of U with respect to \vec{x}_i. Using \vec{\delta v}=\frac{d}{dt}\vec{\delta x}, we can integrate the first term by parts, discarding the boundary term m_i \vec{v}_i\cdot \vec{\delta x}_i since \vec{\delta x}_i= 0 at the boundaries. We obtain:

\displaystyle \frac{\delta S}{\delta \vec{x}_i(t)}=-m_i \vec{a}_i(t)-\nabla_i U(t) = 0

where \vec{a} = \frac{d\vec{v}}{dt}. The equations obtained using the action principle are known as Euler-Lagrange equations or equations of motion. In this case, we have found Newton’s law for a conservative potential:

\displaystyle \vec{F} = -\nabla_i U=m_i \vec{a}_i

Beyond classical mechanics

Finally, it is interesting to see how the postulates above are modified in quantum and relativistic theories.

  1. Principle of stationary action. In quantum physics, the particle takes all paths instead of only the classical one! The quantum amplitude is given by summing up e^{i S\{x\}} over all paths. This is known as a path integral.
  2. Locality in time gets promoted to locality in space and time in field theory.
  3. Additivity of the action remains the same.
  4. Homogeneity of space and time remains the same.
  5. Isotropy of space remains the same.
  6. Galilean invariance is promoted to Lorentz invariance in relativity. Lorentz transformations relate space and time.

In modern theories, there are often additional symmetry principles that constrain the Lagrangian, such as gauge invariance and conformal invariance.

1 It also cannot depend on higher time derivatives due to the Ostrogradsky instability.

2 A term like \vec{v}_i\cdot \vec{v}_j with i \neq j is possible, but would imply that particles infinitely far away can affect each other, violating common sense (or, if you like, the cluster decomposition principle).

Physics textbooks for self-study

Here are some physics textbooks that I’ve read over the years. Each textbook is rated from 1-5 Diracs (Paul_Dirac,_1933.jpg) on quality for self-study. Most topics are divided into (basic) and (advanced).

Screen Shot 2020-01-04 at 11.43.27 AM
Figure 1. Areas of physics (biased toward high-energy theory). Special relativity and electromagnetism can be learned separately but complement each other. “Weak prerequisites” are math subjects that can usually be learned as you go along.

Tips for self-study:

  • Shorter is better when it comes to textbooks. The problem with self-study is missing the forest for the trees. Most textbooks can give you the details, but there is no one to explain how to fit the information in your head in a compact and memorable way. Shorter books are usually better for this. The flip side is that shorter books are harder to understand if you have no past exposure. Start by reading parts of a standard textbook to get the basics, then go back.
  • Do enough exercises. But don’t feel the need to do every single one before moving on, even if you are a little confused. It can be more efficient to just keep going, since physics is interconnected and the new material often clarifies the old.
  • Write notes in the margins of any confusing aspects of derivations or errata you discover. These will undoubtedly help you when you revisit them years later.

Personal (controversial) opinions:

  • Avoid mathematical physics-oriented books. When I started out, I thought more rigor can never hurt. But if you are interested in physics, learn physics. Math books often dwell on excessive formalism that is irrelevant for physics at the end of the day.
  • Amazon ratings are useless. Unless they’re really terrible, most books will have very good ratings. I suspect most reviewers used the book for a class, are already experts on the subject, or simply want to look smart. 🙃

Quantum mechanics (basic)

Griffiths, Introduction to Quantum Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

I start by contradicting my own advice about shorter books. 😀  This is a long but very readable book that is even worth reading from cover to cover. There is a reason this is the standard textbook in many places. One tends to forget how much it covers: statistical mechanics, spontaneous and stimulated emission, band structure, WKB approximation… Not in great detail, but often enough.

Quantum mechanics (advanced)

Weinberg, Lectures on Quantum Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Weinberg’s books are known for their slow and systematic presentation. If you’re in a rush, my recommendation is to just read chapters 3 and 4, which contain the essentials of quantum mechanics and spin and are relatively self-contained.

Linear algebra (basic)

Strang, Introduction to Linear Algebra (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Actually, I suggest the lectures instead of the book. One relaxing 45-minute lecture a day and you’ll know linear algebra in a month.

Classical mechanics (advanced)

Landau and Lifshitz, Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

The Russian school excels at explaining things deeply and simply. The first two chapters contain the best exposition of classical mechanics there is. In my experience, even professional physicists are often confused by some foundational topics that are explained here. (For example, where does the Lagrangian \frac{1}{2}mv^2 come from? Answer: Homogeneity+isotropy of space, and Galilean invariance. Together with the principle of stationary action, this leads to F=ma.) If you’ve never seen a Lagrangian before, start with one of the numerous intros, like this one.

Special relativity (SR)/Electromagnetism (advanced)

Landau and Lifshitz, The Classical Theory of Fields (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Amusingly, this does not actually cover the simplest classical field theories (scalar fields) since the only relevant classical fields in practice are the electromagnetic and gravitational. Chapters 1-4 are an excellent exposition of SR and how E&M fits into it, while chapters 10-12 are a decent introduction to general relativity that complements other texts.

General relativity (GR)

Dirac, General Theory of Relativity (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Who said GR is hard to understand? This pamphlet by the big man himself weighs in at only 69 pages. Unlike most books, it explains curved spacetime as a surface embedded in a higher dimensional space with flat metric. In my view, this is the most intuitive way to understand it. Among other things, it leads to the covariant derivative as the projection of the directional derivative onto the tangent space, a very pleasing interpretation of an otherwise confusing concept.

No exercises though. So as an introduction, you will want:

Zee, Einstein Gravity in a Nutshell (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This is the book I wish I had when starting GR. Zee is one of the most gifted physics expositors of our time. Unfortunately, it is rather long, so I would recommend first reading enough of this one to understand Dirac, then going back to this one for special topics.

Carroll, Spacetime and Geometry: An Introduction to General Relativity (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This was my first exposure to GR. I got through about chapter 3 before getting confused and stopping. This is one of those mathematical physics books I mentioned above, with a lot of formalism surrounding manifolds, tensors, and differential forms at the outset. It is good to know eventually, but not what you need as an introduction. I suppose it would make a good reference, but Zee’s book also serves well in this regard.

Quantum field theory

The subjects above are all well-established and have a fairly defined “core”. On the other hand, QFT is an evolving field with a sprawling mess of important results. Each textbook emphasizes different aspects, so you will need multiple books.

Zee, Quantum Field Theory in a Nutshell (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This was my first and favorite QFT book. Other textbooks have more detail, but none will make you fall in love with the subject like this one. Just get it and enjoy the magic of the path integral.

Schwartz, Quantum Field Theory and the Standard Model (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This is a very thorough textbook, perhaps the modern successor to the classic Peskin and Schroeder. I particularly enjoyed the bottom-up construction of spin 1 and 2 Lagrangians in chapter 8. One criticism is that many calculations are rather clunky and involved. For example, scalar QED is heavily used, which is conceptually simpler but involves more diagrams than spinor QED. I prefer Zee’s approach of just starting with spinor QED.

(Also, his notation with all indices on the same level bugs me…)

Srednicki, Quantum Field Theory

No rating for this one since I haven’t read it in much detail. The first chapter (“Attempts at relativistic quantum mechanics”) is an excellent motivation for QFT. The chapters are short and to the point. If I could start over, I would probably read this one concurrently with Zee.

Group theory

Zee, Group Theory in a Nutshell for Physicists (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

For those like me that get bored to death reading pure math textbooks, Zee’s usual colloquial style makes even classifying representations of finite groups exciting. Not absolutely necessary to read if you’re in a hurry to learn more physics, but still a joy.

Advanced resources

Once you have a grasp of the areas above, additional topics can be learned without having to rearrange your entire worldview (with the possible exception of string theory). Here are some of my favorite advanced resources.

Shifman, Advanced Topics in Quantum Field Theory

Despite the title, this book focuses on simple explanations of modern topics without arduous derivations. Some interesting results cannot be found elsewhere, e.g. that domain walls antigravitate!

Terning, Modern Supersymmetry: Dynamics and Duality

This is a compact volume on supersymmetric field theory. The first three chapters are quite good, but I found some explanations in later chapters hard to understand. A better intro to Young tableaux is found here.

Polchinski, String Theory Vols. 1 and 2

This labor of love by the father of D-branes himself covers pre-AdS/CFT string theory. It seems to be the standard textbook on the subject, for good reason. The explanations are clear and the text contains many invaluable exercises. His passion for the topic is evident throughout.

Hartman, Lecture notes on quantum gravity and black holes

Not a textbook, but a good set of lecture notes by Tom Hartman. Explores many contemporary topics that have yet to make it into any textbooks I know of. Many useful exercises are included.

Notes on gravity as a gauge theory

Gravity has often been called a gauge theory of the Poincaré or Lorentz group. Here, I develop general relativity in direct analogy to Yang-Mills theory, avoiding geometry entirely1. None of this is original, but I have tried to simplify the presentation compared to the literature, where the similarities and differences between the two theories are often unclear.

Gauge fields and field strengths

The Poincaré algebra is:

[P_a, P_b] = 0

[P_a, M_{bc}]=\eta_{ab}P_c - \eta_{ac} P_b

[M_{ab}, M_{cd}] = \eta_{ad}M_{bc}+\eta_{bc}M_{ad} - \eta_{bd}M_{ac}-\eta_{ac}M_{bd}

Roman letters a,b, \cdots are gauge indices, while Greek letters \mu, \nu, \cdots are coordinate indices. We use “mathematician’s convention” for the generators where the i is absorbed: T_{math}=i T_{physics}. We proceed just as in Yang-Mills theory, taking the Poincaré group as the gauge group. It has 10 generators: 4 translations P_a and 6 rotations/boosts M_{ab}.

Introduce the covariant derivative:

\displaystyle D_\mu = \partial_\mu - e_\mu^a P_a - \frac{1}{2}\omega^{ab}_\mu M_{ab}

where e_\mu^a (the vielbein) and \omega^{ab}_\mu (the spin connection) are the gauge fields associated with translations and rotations, respectively. Note the units: P_a has unit 1, so e_\mu^a is unitless, while M_{ab} is unitless, so \omega^{ab}_\mu has unit 1. We can take \omega^{ab}_\mu to be antisymmetric in ab since M_{ab} is antisymmetric. The field strengths are found in the usual way:

\begin{aligned} F_{\mu\nu}&=D_\mu D_\nu - D_\nu D_\mu \\ &= -C^a_{\mu\nu}P_a -\frac{1}{2}R^{ab}_{\mu\nu}M_{ab} \end{aligned}

where we have defined the field strengths C^a_{\mu\nu} (the torsion) and R^{ab}_{\mu\nu} (the curvature tensor).
We obtain:

C^a_{\mu\nu}=\partial_\mu e^a_\nu - \partial_\nu e^a_\mu - \omega^{a}_{\mu b} e^b_\nu + \omega^{a}_{\nu b} e^b_\mu

R^{ab}_{\mu\nu}=\partial_\mu \omega_\nu^{ab} - \partial_\nu \omega_\mu^{ab}-\omega_\mu^{ac}\omega_{\nu c}^{\;\;\;\;b} + \omega_\nu^{ac}\omega_{\mu c}^{\;\;\;\;b}

As usual, we raise and lower indices using \eta_{ab} and \eta^{ab}.

General relativity is obtained by setting the torsion C^a_{\mu\nu}=0. Certainly, theories with torsion have been extensively considered, but we will not do so here. Experimental data have not ruled out theories involving both torsion and curvature. However, the bottom-up construction of the Lagrangian of an interacting massless spin-2 particle produces general relativity2.

This constraint allows us to solve for the spin connection in terms of the vielbein. After some calculation (e.g. listing out all possible terms and matching coefficients), the answer is:

\displaystyle \omega_\mu^{ab}=\frac{1}{2}(e^{\rho b}\partial_\mu e_\rho^a-e^{\rho a}\partial_\mu e_\rho^b+ e^{\rho a} e^{\sigma b} \partial_\rho g_{\mu\sigma}-e^{\rho b}e^{\sigma a}\partial_\rho g_{\mu\sigma} )

where g_{\mu\nu}=e_\mu^a \eta_{ab} e_\nu^b.

Representations and Lagrangians

Just as in Yang-Mills theory, the Poincaré group here acts as an internal symmetry group. Fields transform as a finite-dimensional representation of the Lorentz algebra, and transform trivially under translations3: P_a=0. This has an important consequence for constructing Lagrangians. Recall that the gauge field A_\mu(x)=A^a_\mu(x) T^a in Yang-Mills theory transforms as

A_\mu\rightarrow U A_\mu U^{-1} + (\partial_\mu U) U^{-1}

under a gauge transformation U(x). The (\partial_\mu U) U^{-1} is required to cancel out the (\partial_\mu U)\phi(x) in the transformation of \partial_\mu\phi(x). However, since P_a=0, this additional term is not needed. e^a_\mu is already gauge-covariant and can be placed directly in the Lagrangian.

The simplest term is:

\displaystyle \mathcal{S}_\Lambda = \frac{\Lambda}{4!} \int \epsilon_{abcd}e^a e^b e^c e^d

where e^a=e^a_\mu dx^\mu is a 1-form and \epsilon_{abcd} is the totally antisymmetric symbol4. This is the cosmological constant. It is equivalent to the standard form \Lambda\int d^4 x \sqrt{-g}.

On the other hand, the spin connection \omega^{ab}_\mu does show up in the gauge transformation, so we must use the field strength R^{ab}_{\mu\nu} in the Lagrangian. The next simplest term is then:

\displaystyle \mathcal{S}_{EH} = \frac{M_{Pl}^2}{3}\int \epsilon_{abcd} e^a e^b R^{cd}

where R^{cd}=R^{cd}_{\mu\nu}dx^\mu dx^\nu is a 2-form. This is the Einstein-Hilbert action. Unlike Yang-Mills, we are permitted a term that is only linear in the field strength R^{cd}.

Coupling to matter fields

Flat-space Lagrangians contain terms with global Lorentz indices, such as \partial_\mu \varphi and A_\mu. We would like these to transform under the local Lorentz group with indices a, b, \cdots. The only object that can switch between global and local indices is e_\mu^a, or its inverse, e^\mu_a. Thus, the general prescription for coupling a flat-space Lagrangian to gravity is:

  1. Contract all tensors with e_\mu^a or e^\mu_a.
  2. Make flat-space invariants use local indices: \eta_{\mu\nu}\rightarrow \eta_{ab}, \epsilon_{\mu\nu\rho\sigma}\rightarrow \epsilon_{abcd}.
  3. Use covariant derivatives: \partial_\mu\rightarrow \partial_\mu-\frac{1}{2}\omega_\mu^{ab} M_{ab}.

Note that this even works on the volume form d^4 x, producing the familiar invariant measure d^4 x \sqrt{-g}:

\displaystyle d^4 x = \frac{1}{4!}\epsilon_{\mu\nu\rho\sigma} dx^\mu dx^\nu dx^\rho dx^\sigma \rightarrow \frac{1}{4!} \epsilon_{abcd} e^a_\mu e^b_\nu e^c_\rho e^d_\sigma dx^\mu dx^\nu dx^\rho dx^\sigma

For example, a scalar field coupled to gravity has the action:

\displaystyle \mathcal{S} = \frac{1}{2\cdot 4!}\int \epsilon_{bcdf} e^b e^c e^d e^f (e_a^\mu e^{\nu a} \partial_\mu\varphi\partial_\nu\varphi - m^2 \varphi^2)

An advantage of the vielbein formalism is that spinors can be coupled to gravity. For Dirac spinors, the Dirac matrices should also be converted to local indices \gamma^\mu\rightarrow \gamma^a, since they satisfy the Clifford algebra \{\gamma^a,\gamma^b\}=2\eta^{ab}. The Lagrangian for a massless fermion becomes:

\mathcal{L}=i\bar\Psi \gamma^a e_a^\mu (\partial_\mu-\omega^{bc}_\mu M_{bc})\Psi


\displaystyle M_{ab}=S_{ab}=\frac{1}{4}[\gamma_a,\gamma_b]

1 This is ironic from a historical perspective, since Yang and Mills were inspired by general relativity. Of course, in physics, there are many ways to skin a cat.

2 Schwartz, Matthew. Quantum Field Theory and the Standard Model, Ch. 8.

3 Thus, you could say gravity is the gauge theory of the Lorentz group instead. However, we had to introduce the vielbein as part of the covariant derivative in order to get the correct theory. So there is a slight wrinkle in the analogy.

4 Unlike Yang-Mills theory, we cannot write the Lagrangian using the “abbreviated” fields e=e^a_\mu P_a dx^\mu. In fact, e e vanishes due to [P_a,P_b]=0.

Quantum mechanics explained

After being a strong believer in the many-worlds interpretation of quantum mechanics for years, I have now completely changed my mind. Many-worlds is seriously flawed, and the good old Copenhagen interpretation is not so bad.

Specifically, the correct interpretation of quantum mechanics is the Von Neumann-Wigner interpretation, a flavor of Copenhagen that puts the Heisenberg cut at the observer’s consciousness. The orthodox Copenhagen interpretation, which allows placing the cut at a physical measuring device, is a useful approximation due to decoherence.

What is physics?

Understanding quantum mechanics requires thinking carefully about what physics is and is not. The point of a physical theory is to make predictions about sensory experience. It is only about modeling the world if this helps to make predictions. Thus, the observer’s consciousness1 is just as fundamental as the mathematical objects of the theory. In classical physics, this is obscured because the mathematical objects of the theory are shared among all observers, rendering the observer apparently redundant. Quantum mechanics relaxes this assumption and allows different observers to use different mathematical objects (wavefunctions).

Quantum and classical compared

Let me elaborate on classical and quantum physics.

Classical mechanics describes a system of particles with positions and momenta that evolve in time under Newton’s law. Quantum mechanics is quite similar: it describes a system of particles with a field called the wavefunction that evolves in time under Schrödinger’s equation2. If that were the whole story, quantum mechanics would be pretty much the same as classical mechanics.

However, these are just mathematical constructs so far. How do we actually verify classical mechanics? We can only sense the set of particles corresponding to our body/brain, so we must find a way to cause the system of interest to interact with these particles. In other words, we must split the universe into system and observer3. Then we must assign different states of our state space to different perceptions corresponding to the results of a measurement.

This is exactly what happens in quantum mechanics as well. The difference is that quantum mechanics contains superposition states, while observers can only distinguish between orthogonal states. Thus, there must be a rule to say which orthogonal state in a superposition the observer actually perceives: Born’s rule4.

Why many-worlds fails

Many-worlds seems like a simple and attractive idea that accomplishes the goal: it tells you what an observer perceives using only unitary evolution of a global wavefunction, similar to classical physics. However, it is seriously flawed. Many-worlds models a measurement as follows:

\displaystyle \left(\sum_i c_i | s_i \rangle \right) \otimes |O_0\rangle \rightarrow \sum_i c_i |s_i'\rangle \otimes |O_i\rangle

where |s_i\rangle are the system basis states, |s_i'\rangle are the new system states for each |s_i\rangle, |O_0\rangle is the initial observer state and |O_i\rangle are the final observer states. The |s_i'\rangle are left arbitrary to include both destructive and non-destructive measurements. Measurement is complete upon decoherence, when \langle O_i|O_j\rangle \approx \delta_{ij}. Then the states |O_i\rangle are interpreted as the different perceptions of the observer.

This has several problems. In order of least to most serious:

1. Decoherence is never complete.

What happens in this case? Observers can only distinguish between orthogonal states. An idea is to rewrite the final wavefunction as a sum of direct products in some orthonormal observed basis |O_i''\rangle:

\sum_i c_i'' |s_i''\rangle \otimes |O_i''\rangle

Then the observed system states c_i'' |s_i''\rangle would simply be slightly different than the original ones c_i |s_i'\rangle, corresponding to a small error in the measurement.

2. It assumes the observer is not entangled with the system before measurement.

This is obviously false most of the time! Everything is usually entangled with everything else. To generalize the above, what we actually want is some rule for “hopping” between perceived states of the observer, given an arbitrary entangled state \psi(t). I invite you to come up with such a hopping rule. Seriously, try it.

For example, consider this plausible attempt at a hopping rule. The probability of hopping from state i at time t, to state j at time t+\Delta t, is:

p_{i\rightarrow j} = \displaystyle \frac{\text{tr}\left( P_j e^{-iH\Delta t} P_i \rho(t) P_i e^{iH\Delta t}\right)}{\text{tr}\left( P_i \rho(t)\right)}

where P_i is a projection operator corresponding to state i and \rho(t) is the density matrix5. This has the required property that \sum_j p_{i\rightarrow j} = 1, since \sum_i P_i = 1. This gives the same probabilities that would be observed if the state had collapsed to i at time t, but without actually collapsing the state. The problem is that the denominator can be zero, since there is a nonzero probability that the previous hop landed in the state i even if \text{tr}\left(P_i \rho(t)\right) = 0. The state actually has to collapse to ensure this doesn’t happen.

3. It assumes the many worlds never re-merge or overlap.

Consider the observer’s density matrix \rho_O(t)=tr_S(\rho(t)). The diagonal elements in the observed basis \rho_{Oii} = \langle O_i | \rho_O(t) | O_i\rangle are constantly evolving into each other, with \sum_i \rho_{Oii} = 1. A hopping rule is impossible because you cannot tell which previous state a certain \rho_{Oii} “came from” in the past, unless you assume each state comes from just one past state. This is clearly not true in general.

Many-worlds proponents sometimes argue that macroscopic systems in different states are unlikely to revisit the same state. However, then one must pick a certain size (dimensionality) above which re-merging becomes “acceptably” unlikely. There is clearly no fixed size. For an exact theory of physics, one cannot ignore edge cases like this just because they are rare. Ironically, while many-worlds proponents like to point to the seemingly arbitrary nature of wavefunction collapse, it is many-worlds that places arbitrary restrictions on what systems can be considered observers.

Why Copenhagen is fine

The key insight of the Copenhagen interpretation (i.e. quantum mechanics itself) is that a global (objective) reality is not required to make predictions.

One way to understand this is with the Wigner’s friend thought experiment, which I have slightly extended below.

Wigner prepares his friend and a two-state system in a superposition state

(a|\uparrow\rangle + b|\downarrow\rangle)\otimes |\psi_{friend}\rangle

When his friend measures the system, he may obtain the state |\uparrow\rangle. He then tells Wigner his result, so that in his view, Wigner knows that |\uparrow\rangle was measured. However, Wigner models this measurement as the total state

a |\uparrow\rangle\otimes |\uparrow_{observed}\rangle + b |\downarrow\rangle\otimes |\downarrow_{observed}\rangle

When Wigner measures his friend (by asking him about it, perhaps), he may see a different state |\downarrow\rangle\otimes |\downarrow_{observed}\rangle, so he believes that |\downarrow\rangle was measured. Thus, they may both experience totally different things. But each observer sees an internally consistent story, so the theory is consistent. That’s it.

Measuring devices

This subjective view of physics implies that measurements are made on the observer’s Hilbert space, not on external measuring devices. Then why can some objects be considered classical measuring devices in practice? The answer comes down to decoherence. I will explain this in a somewhat roundabout way that highlights the behavior of real measuring devices.

Recall the textbook measurement postulate: a measurement collapses the system to an eigenstate of the measured Hermitian operator, with probability given by Born’s rule. This is often false in practice! For example, in quantum optics, photodetectors may measure position of a photon, but collapse the system to the state of “no photon”.

Real-world measurements are described by so-called general measurements6. These are defined by a set of operators M_i corresponding to the results of the measurement. The probability for result i is:

p_i = \langle \psi | M_i^\dagger M_i | \psi\rangle

upon which the wavefunction collapses to

\displaystyle |\psi\rangle \rightarrow \frac{M_i |\psi\rangle}{\sqrt{\langle \psi | M_i^\dagger M_i | \psi\rangle}}

The measurement operators satisfy the completeness relation

\sum_i M_i^\dagger M_i = 1

M_i do not have to be Hermitian. For a photodetector, they would be something like M_\textbf{n}=|0\rangle\langle \textbf{n}|, where \textbf{n} are some properties of the photon, like position and polarization. General measurements reduce to conventional (projective) measurements when the M_i are Hermitian and orthogonal projectors: M_i M_j = \delta_{ij} M_i.

General measurements are equivalent to unitary interaction of a system with an ideal environment, followed by a projective measurement on the environment. Specifically, consider coupling the system to an environmental Hilbert space: \mathcal{H} = \mathcal{H}_s \otimes \mathcal{H}_e. The environment is initially in the state |0\rangle. Introduce the operator U such that

U(|\psi\rangle \otimes |0\rangle)=\displaystyle \sum_i M_i|\psi\rangle \otimes |i_E\rangle

where |i_E\rangle are orthonormal states of the environment corresponding to the M_i.

You can check that U preserves inner products of the system Hilbert space:

(\langle 0|\otimes \langle v|) U^\dagger U (| w\rangle \otimes |0\rangle) = \langle v | w\rangle

It can be shown that such a U can be extended to a unitary operator U' on the entire Hilbert space. Now if we measure an operator on the environment with eigenstates |i_E\rangle, we obtain one of the system states

\displaystyle \frac{M_i |\psi\rangle}{\sqrt{\langle \psi | M_i^\dagger M_i | \psi\rangle}}

with probability

p_i = \langle \psi | M_i^\dagger M_i | \psi\rangle

just as above.

Look familiar? This interaction U is a more general version of the many-worlds “decoherence equation” above. Thus, the condition for a quantum object to implement a general measurement is that its internal states must interact with the system in this way. Decoherence propagates to the next object and so on until it reaches the observer, who makes the measurement.


In a nutshell: quantum mechanics relaxes the assumption of an objective description of the universe, while still being a predictive physical theory.


Q: How is the system measurement basis determined (the preferred-basis problem)?

A: First, recall that we do not measure the system directly, only our brain/body after it has interacted with the system. As to which of our internal states correspond to which perceptions, note that the same question applies to classical physics. In both cases, we must determine this empirically.

Q: Isn’t the boundary between system and observer also arbitrary? How do we determine which degrees of freedom can be perceived?

A: Again, the same question applies to classical physics, and must be determined empirically.

Q: What objects have consciousness?

A: No physical objects have consciousness. From your perspective, all physical objects are part of the wavefunction, and nothing else has the power to collapse the wavefunction. (Yes, this unfortunately leads to a kind of solipsism. It’s a lonely world out there.)

1 “Consciousness” is a dirty word among physicists, usually for good reason. Here, it simply means the ability to perceive things: cogito, ergo sum. In the formalism of quantum mechanics, this translates to the ability to collapse the wavefunction by inquiring about a measurement result. Much confusion results from trying to ascribe consciousness to physical objects or from giving the word additional meanings.

2 Or its field theory generalizations.

3 Semantic note: I sometimes use “observer” to refer to the subspace of the state space that is perceived, and sometimes to the conscious entity that does the measurement to collapse the wavefunction. Many-worlds says the latter does not exist. It should be clear from context which one is meant.

4 Can Born’s rule be derived? No, since probability is nowhere to be found in unitary time evolution, so there must be some axiom introducing probability into the theory. Regardless, whether Born’s rule is fundamental or derived has no bearing on the next section.

5 P_i = P_{Oi} \otimes 1, where P_{Oi} is a projector on the observer’s space and 1 is the identity on the rest of the space.

6 This section mostly comes from Nielsen and Chuang, Quantum Computation and Quantum Information, Ch. 2.2.

What is spin?

This is the first in a series of posts explaining fundamental physics concepts in simple terms. I will try to explain as deeply as possible from first principles, but without assuming any math beyond high-school level. However, the footnotes contain details for more advanced readers.

The first topic is spin. Spin is a measure of the internal rotational degrees of freedom of a particle. Consider a particle at rest at the origin. We will assume the particle has nonzero mass for now and discuss the massless case later. What transformations can we make to it that leave it looking externally the same? There are just the three rotations, one around each axis1.

Spin 0

Now consider describing the particle with a sequence of numbers (degrees of freedom, or DOFs) that change in a defined way under rotations. The simplest case is just to give it a single number that doesn’t change under rotations. This is spin 0. The Higgs boson is the only known fundamental spin 0 particle.

What if we try to make this single number change? Let’s say a 180° rotation around the x, y, or z axis multiplies it by 2. This doesn’t work, because a 180° rotation around x followed by a 180° rotation around y is the same as a 180° rotation around z, which you can check. But the former results in a factor of 4, while the latter gives a factor of 2. So not every possible choice of transformation works: it must be compatible with the behavior of rotations.

The little group

Finding all the possible ways that a set of numbers can transform under some symmetries (called a group) is known as representation theory. In a landmark paper, Wigner first classified particles as representations of the Poincaré group, the group of symmetries of special relativity. In addition to rotations, this group includes boosts and translations in space and time. He showed that internal DOFs are described by the little group, the group that leaves the particle externally the same. In this case, it is the group of rotations in three dimensions, called SO(3). Spin 0 is a 1-dimensional representation of the little group, since it is just a single number.

Spin 1

Anyway, back to our particle. Another obvious choice is to describe the particle with a 3D vector. Under 3D rotations, this just rotates in the usual way. This is called spin 1. It is a 3-dimensional representation of the little group. Spin-1 particles include the W and Z vector bosons, which mediate the weak force2.

Spin 1/2

We have been working with representations where all the numbers are real. For example, a real 3D vector stays real under rotations, since rotation matrices are real. But quantum mechanics says the universe uses complex numbers. It turns out that there is a complex representation in between spin 0 and spin 1, called spin 1/2. It has two complex DOFs. The majority of particles in our universe are spin-1/2: electrons, muons, quarks, etc.

Higher spins

We can continue upwards, constructing larger and larger representations with spin > 1. In general, a spin-s representation has 2s + 1 DOFs. There is a representation for every integer and half-integer s \geq 0. Integer spin particles are called bosons, and half-integer particles are called fermions.
Composite particles form so-called product representations that decompose into independent spin representations. For example, two spin-1/2 particles have 4 DOFs. These split into a spin-1 representation (the “triplet”) and a spin-0 (the “singlet”). In group theory notation this is sometimes written as
2\times 2 = 3 + 1
(Who said group theory was hard?) This means that under rotations, the spin-1 DOFs transform as a 3D vector while the spin-0 part doesn’t change. Specifically, there is a linear combination of these DOFs that transform as spin-1 and spin-0. An arbitrary linear combination of DOFs will all mix into each other under rotations.
While composite particles can have high spin, no fundamental massive particles with spin \geq 3/2 are known to exist.

Spin and statistics

Fermion representations have the peculiar property3 that a full rotation around 360° multiplies the state by -1 instead of 1. In fact, this implies that identical fermions cannot occupy the same state, known as the Pauli exclusion principle. This is responsible for the diverse matter in our universe such as atoms and molecules. Otherwise, fermions in a system would all collapse near the state of lowest energy, the ground state. On the other hand, identical bosons acquire a +1 under rotation, so can occupy the same state. At low temperatures, they almost all occupy the ground state, forming a Bose-Einstein condensate.
This connection between the behavior of large numbers of particles and their spin is also called the spin-statistics theorem. Proving this theorem requires quantum field theory and relativity, which are beyond the scope of this article.

Spin and angular momentum

DOFs associated with rotations are known as angular momentum. We have only discussed the internal DOFs (spin), but particles also carry external rotational DOFs called orbital angular momentum. The total angular momentum is the sum of spin and orbital contributions. Since spin representations are finite-dimensional, spin angular momentum is quantized4. A measurement in a particular direction will give a result from \{-s\hbar, (-s+1)\hbar, \cdots, (s-1)\hbar, s\hbar\} for a particle of spin s, where \hbar is Planck’s constant. You can see there are 2s+1 different values. Because rotations around different axes don’t commute, angular momentum in different directions cannot be measured simultaneously: once angular momentum in one direction is known exactly, the other directions become uncertain.

For spin-1/2, the two values are -\hbar/2 and \hbar/2, corresponding to states called “spin down” and “spin up” with respect to a particular direction. These states can be visualized as little arrows in the direction of angular momentum:

Screen Shot 2020-05-19 at 10.12.35 PM

Spin and magnetism

When electromagnetism is included into the theory, it turns out that spin couples to the magnetic field5. Classically, you can visualize spin angular momentum as arising from a particle literally spinning around:Screen Shot 2020-05-28 at 5.00.14 PM

The potential energy of a current loop in a magnetic field is6:

U = -IA\vec{B}\cdot\hat{n}

where I is the current, A is the area, and \hat{n} is the normal vector in the direction given by the right-hand rule on the current. Using A=\pi r^2, I = qv/2\pi r, and angular momentum \vec{L} = rmv\hat{n}, this becomes

U = -\frac{q}{2m} \vec{B}\cdot\vec{L}

Thus, spins tend to “align” with an external magnetic field, since the states with spin in the same direction as the field have lower energy than the states in the opposite direction (for a positive charge q). q/2m is known as the gyromagnetic ratio. It turns out in the quantum theory that this ratio is actually twice the value of the classical theory: q/m. So a particle with spin cannot really be thought of as a classical current loop.

Magnetic fields allow us to manipulate and measure spin, as in the Stern-Gerlach experiment.

Massless particles and helicity

Massless particles such as the photon do not have spin, because their little group is different. Special relativity tells us that massless particles must travel at the speed of light. Therefore, we cannot imagine them “at rest”: they are always going in a particular direction. However, we can still make rotations around this direction and leave the particle the same. The little group in this case is SO(2), the rotation group in 2 dimensions. This is a very simple group that only has 1-dimensional representations7. In fact, we can find them all here. A helicity-s representation acts on the number by multiplying it by

R(\theta) = e^{i s\theta}

under a rotation by angle \theta. There is a representation for every integer and half-integer s, where s can be less than 0 now. Note that half-integer representations are fermions again, since a rotation by 2\pi gives -1.

It would then seem that all massless particles have only one degree of freedom. In fact, another symmetry principle requires us to stick two of these representations together: parity, or symmetry under spatial reflections (x\rightarrow -x, y\rightarrow -y, z\rightarrow -z). Under parity, a rotation in one direction around the particle’s velocity goes in the opposite direction:

Screen Shot 2020-05-29 at 9.58.49 PM

We have reflected both the black rotation arrow and the velocity vector v. Particle representations must be invariant under parity. If we include the +s representation, we also need to include the representation that transforms in the opposite way under rotation. Therefore, massless particles have two DOFs: both the +s and -s representations. Each degree of freedom transforms independently under this rotation (see footnote 7). Note that massive particles have no preferred direction when at rest, so representations are automatically invariant under parity. Imagine removing the blue arrows in the above figure; you will see that a rotation reflected is the same rotation.

The photon has helicity 1, and the graviton (the force carrier of gravity) has helicity 2. One fascinating result of quantum field theory is that it is impossible to have a locally interacting theory of massless particles greater than helicity 2.

Further resources

So much for this whirlwind tour of spin and related topics. For more info, see any standard textbooks on quantum mechanics or quantum field theory. Some I recommend:

  • Griffiths, D.J. Introduction to Quantum Mechanics, Ch. 4.
  • Schwartz, Matthew. Quantum Field Theory and the Standard Model, Chs. 8, 10, 12.
  • Weinberg, Steven. The Quantum Theory of Fields, Ch. 2.4, 2.5.

1 In special relativity, there are also transformations called “boosts”, but these give the particle a constant velocity, so it is no longer at rest.

2 You may have heard that the photon is spin 1. But the photon is massless, so has helicity instead of spin. More on this later.

3 This may seem impossible, since a rotation by 360° is the same as no rotation at all. And you are right! I slightly lied earlier. Technically, we are finding representations of SU(2), the double cover of the rotation group SO(3). This is because the Lie algebras are identical, su(2) \sim so(3), but exponentiating the spinor representation of the algebra produces SU(2) instead of SO(3).

4 Orbital angular momentum is also quantized, but only integer representations exist. This is because the DOFs here are actually fields: functions of space. A rotation by 360° must take the field to the same field, because the position vector itself is a vector representation of rotations.

5 This can be derived by starting with the Lagrangian of quantum electrodynamics and taking the non-relativistic limit, where all energies are smaller than the rest energy of the electron E=mc^2. However, in the spirit of effective field theory, we can also consider writing down all terms consistent with the non-relativistic symmetries: SO(3), parity, and gauge invariance. The spin operator \vec{S} is a vector, so it must be dotted with another vector to create an SO(3) invariant. Actually, it is a pseudovector: \vec{L}=\vec{r}\times \vec{p} is invariant under parity \vec{r}\rightarrow -\vec{r}, \vec{p}\rightarrow -\vec{p}. It must be dotted with another pseudovector to be invariant under parity. This must be the magnetic field \vec{B}, since the electric field \vec{E} is a vector. It cannot be any other function of the vector potential A_\mu due to gauge invariance. Thus, the lowest order interaction is

c \vec{S}\cdot \vec{B}

for some constant c.

6 A nice derivation is as follows. Start with the integral form of Faraday’s law:

\frac{d\phi}{dt}=-\oint \vec{E}\cdot \vec{dl}

where \phi is the magnetic flux through the loop. Multiply by current and integrate over time:

I\Delta \phi=-\int dt IV=-\int dt P_{diss}

where V=\oint \vec{E}\cdot \vec{dl}, I is the current, and P_{diss}=IV is the dissipated power. Thus, it takes an energy

E=\int dt P_{diss}=-I\Delta \phi

to change the magnetic flux by an amount \Delta \phi. It is easiest to draw a picture to get the sign right. This derivation shows that the energy is independent of the shape of the loop.

7 A rotation matrix in 2D is, of course, two-dimensional. However, this is actually a reducible representation made of two irreducible, one-dimensional (complex) representations: helicity +1 and -1. You can see this by noting that (1, i) and (1,-i) are both eigenvectors under rotation, so do not transform into each other. We are only classifying the irreducible representations here.

Spin passing through wire loop

Consider a spin-1/2 particle passing through a wire loop with a light bulb in series:

Screen Shot 2018-04-07 at 9.56.45 PM.png

The spin produces a magnetic moment and therefore a magnetic field, which can induce current in the loop due to changing flux. For a pure upwards-pointing spin \lvert\uparrow\rangle (shown above), we have an increasing upwards flux before it enters and a decreasing flux afterwards, so the light bulb turns on before and after entering the loop. For \lvert\downarrow\rangle the flux is reversed but the light bulb still turns on at the same times.

Now let the spin part of the wavefunction be \frac{1}{\sqrt{2}}(\lvert\uparrow\rangle + \lvert\downarrow\rangle). Does the light bulb turn on?