Symmetry factor of Feynman diagrams

Symmetry factors are often confusing, and existing calculation techniques seem to involve excessive drawings and computations. My personal technique is simple:

  1. Label both ends of each line with a number 1,2, \cdots.
  2. Count the number of labelings that give an equivalent diagram, where equivalent means they can be deformed into each other.

For example, this diagram has S=2, since we can interchange 2 \leftrightarrow 3 and 4 \leftrightarrow 5 at the same time, and get the same diagram (assuming the endpoints are fixed).

This one has S=8, coming from

  • 1\leftrightarrow 3
  • 2 \leftrightarrow 4
  • 1\leftrightarrow 2 and 3 \leftrightarrow 4 at the same time

With a bit of practice, it becomes quite easy to mentally visualize this labeling and quickly get the factor. It doesn’t involve “taking apart” the diagram, or considering the interchange of both vertices and lines separately.

I am actually not sure how to prove this is correct, but it seems to always work.

How to escape a black hole: adventures in causal structure

You’re an astronaut at a space station orbiting a black hole. You are assigned to fly a shuttle near the black hole and take some measurements. You are very careful to keep track of your position and not cross the invisible event horizon that marks the edge of the black hole, as you know that once you cross the horizon, there is no escape from being brutally crushed by the singularity at the black hole’s center. As you approach the horizon, disaster strikes! Your shuttle malfunctions, and you find yourself crossing the horizon into the black hole. Panicking, you set your thrusters at full blast to escape. After some time, you miraculously find yourself outside the black hole again. Take that, Einstein! But wait, something is off: instead of the black hole being, well, black, it seems there is light coming from the black hole. And your space station is nowhere to be found.

What happened? Let’s draw a picture:

It would seem that you left the station, entered the event horizon, then exited the horizon. But where is the station? There must be something wrong with this picture.

Black holes aren’t created equal

To understand what happened, we need to learn some facts about black holes. The simplest type of black hole is called the Schwarzschild black hole. It is totally stationary and carries no electric charge. But black holes can also rotate about their axis like planets or stars. In addition, if the black hole was formed by the collapse of charged particles like protons or electrons, it can also carry electric charge. The Schwarzschild black hole follows the expected behavior above: once you cross the horizon, there is no escape from being crushed by the singularity. We can show this behavior with a diagram:

Figure 1. Schwarzschild black hole.

We divide the entire universe into the region inside the black hole (left box) and outside the black hole (right box). Starting outside and crossing the horizon, we enter what is called a T-region. The rule for this region is that we are only allowed to move left or right in the direction we entered. Since we entered going left, we can only move left. Eventually, we hit the singularity.

A few things to note:

  • Mathematically, a T-region is where time and space coordinates are switched. Take r to be the distance from the center of the black hole. The event horizon is located at some distance r = rs. Upon crossing the horizon, r becomes a time coordinate. Since we entered the horizon moving with decreasing r, and the flow of time never changes direction, we must continue moving with decreasing r until we hit the singularity at r = 0.
  • This diagram is obviously not to scale. It only shows the possible end results depending on which region you enter. There are two possibilities here: either you stay outside the black hole (in “normal space”) forever, or you enter the black hole and hit the singularity.
  • You will feel nothing special when you enter the horizon. It is just an invisible line in space. However, it marks a change in causal structure: your fate is irreversibly changed once you cross.

Rotating or charged black holes have a radically different behavior:

Figure 2. Rotating or charged black hole.

Once again, the rightmost region is outside the black hole. There are two horizons, an inner and outer one. Once you enter the outer horizon, you are forced to go left until you cross the inner horizon. There, you are back in a normal space near the center of the black hole. There is still a singularity, but unlike the Schwarzschild case, you are not required to touch it! (Unless you have a death wish.) It is just like some object sitting there. For rotating black holes, the singularity is in the shape of a ring. For charged black holes, it is a point.

If you wish to leave, simply cross the inner horizon again. You are forced to exit the outer horizon back into the rest of the universe. You can cross in and out of the black hole as many times as you want. Perhaps you could build a vacation home near the center.

It’s universes all the way down

That explains why you could exit the black hole in the original scenario. It must have been rotating or charged! But it doesn’t explain why the black hole emitted light, or why the space station was gone.

For this, we must mention another crucial fact about horizons. Imagine your colleague at the space station watching you cross the horizon. From her perspective, it takes you an infinite time to reach the horizon. You will appear to get slower and darker as you approach, but never actually cross. Because you never seem to cross, you can never return! We can show this as a spacetime diagram where the vertical axis is time and the horizontal axis is space:

The station can only see what happens outside the horizon. For example, let’s say you raise your arm to wave at the station just as you cross the horizon. Your colleague will see your arm rising slower and slower, and if they wait an infinite amount of time, they will finally see the wave. What happens inside the horizon stays inside the horizon.

On the contrary, from your perspective, you easily cross both horizons into the region with the singularity, then go back and cross both again. So where did you end up? It turns out you end up in a new universe1. This universe may have a totally different origin and history from the old one. In particular, it probably does not contain a space station at the time and place you emerged into it.

And why is there light coming out? Well, just as you entered and exited the black hole, anything else could do the same. This includes any stray light from the original universe that happened to enter the black hole. For example, your colleague could send a signal from the space station into the black hole. This signal could reach you after you exit. But you cannot do the same. If you send the same signal back into the hole, it will end up in another new universe, not the original one. Likewise, if you travel back into the hole, you will end up in a new region with a singularity. Going back out, you enter another new universe, and so on. So much for the vacation home.

Penrose diagrams

It is hard to draw this behavior using diagrams like Fig. 1 or 2 that simply divide space into regions. Instead, physicists use drawings called Penrose diagrams to show all the possible paths you could go in a spacetime. Here is the Penrose diagram for a rotating or charged black hole:

Figure 3. Penrose diagram. Source: adapted from Wikipedia

How to read this:

  • The vertical axis is time, and the horizontal axis is space. The diagram is drawn so that light rays move diagonally at 45° angles. A massive body such as yourself is always slower than light, moving more in the vertical direction (time) than horizontal (space). In other words, the slope of your path is always greater than one.
  • The blue line shows the path of the space station. It starts in the infinite past, t=-\infty, shown as a point on the diagram. It ends in the infinite future, t=\infty. The red line shows your path as you enter and exit the black hole. It starts in our universe and crosses an outer and inner horizon to a region with a singularity (vertical dashed line). Then it crosses another inner and outer horizon into a new universe. Each time you enter and exit the hole, you end up in a new universe, so the diagram repeats an infinite number of times up and down.
  • In general, the path of any object must either (a) extend forever, or (b) terminate at a singularity or the upper/lower corner of a universe. The upper corner of a universe is the infinite future, and the lower corner is the infinite past. The space station starts in our universe and stays there forever, so it ends in the upper corner. If you stay in one of the “other universes” forever, your path will also terminate in its upper corner.
  • One technical detail: this diagram shows what is called the “maximally extended” spacetime, which essentially doubles every region: we have an “other universe” parallel to ours, two possible new universes, two singularities, etc. To match our situation described above, we can cut this diagram in half vertically. In any case, this doesn’t change the basic ideas.

Spacetime gone wild

If this multitude of universes isn’t enough, there are even more exotic spacetimes out there. One way to construct them is through the method2 of R-regions and T-regions. As we said above, T-regions are where time and space coordinates switch places. They act like a one-way ticket to the end of the region. R-regions are simply “normal space”. One can follow certain rules to glue together R- and T-regions to form new spacetimes. Here is one example:

Source: Rubin, S. G.; Bronnikov, K. A. Black Holes, Cosmology and Extra Dimensions.

Unlike black hole spacetimes, time starts and ends in T-regions here (on double lines). The red lines show some possible paths of objects. The left one starts in a T-region in the infinite past (t = -\infty), enters an R-region, then enters another T-region where it stays forever (t=\infty). Note that once it enters the final T-region, it must stay there forever, since it cannot travel faster than light (with slope < 1). The right path ends in a singularity (thick line) in an R-region. Poor guy. We could give all these regions fancy names like “universe”, “wormhole”, “white hole”, but there’s no real point. Suffice to say this would be a very confusing spacetime to live in.

Back to reality

Now for some bad news. The picture of black holes we have sketched so far is allowed by the theory of relativity, but probably wouldn’t happen in our universe. First, it assumes that the rotating/charged black hole has existed forever and continues to exist forever. Real black holes are formed by the gravitational collapse of stars, and did not exist at the start of the universe. So the Penrose diagram cannot extend infinitely down. More importantly, this ideal spacetime is not stable to small perturbations. The collapse process will likely mess up the structure of the interior horizon, leading to a conventional singularity that crushes the observer, with no possibility of entering a new universe. The more exotic spacetimes also require unrealistic conditions and are more of a theoretical curiosity.

However, the character of real black hole interiors is still an open question. It is not even possible to answer experimentally, since as we said above, information cannot get out of the horizon. The intrepid scientist that goes inside the black hole will know what happens, but the rest of us won’t (quantum gravity notwithstanding).


1 Terminology note. Usually we use the word universe to mean everything that exists. This can cause confusion when there are multiple universes. We use the word spacetime to mean a particular solution to the equations of relativity (such as the rotating or charged black hole), and universe as an informal term for certain regions in a spacetime. See the Penrose diagram in the next section.

2 This is a rather obscure topic that you won’t find in any standard relativity courses, but I thought it was too cool not to mention.

Magnetic-core memory, Faraday’s Law and winding numbers

Like every adult male entering his thirties, I had recently developed an interest in military history, so what better way to spend a Saturday than at the local Midway Museum, the dubiously self-proclaimed #1 attraction in San Diego. Besides the endless labyrinth of tiny hallways and crew quarters, perhaps the most interesting exhibit was the UNIVAC CP-642B and its “LOL memory”, named after the “little old ladies” from the textile industry that painstakingly wove it by hand!

Click image for full size

Click image for full size

While LOL memory is read-only, it shares basically the same design as an early form of RAM called magnetic-core memory. Naturally, I wanted to investigate how this strange criss-cross structure worked.

Wikipedia has a good overview. In a nutshell, each ring of magnetic material stores one bit in its magnetization direction, clockwise or counterclockwise. The ring has an X line (green), Y line (red), sense line (s, orange), and inhibit line (z, purple) passing through it. Changing the polarity requires a certain threshold current. To write a bit, a current with half the required magnitude is sent through both its X and Y line, only switching that ring.

More interesting is how it is read. First, a 0 bit is written to the ring. If it was previously 0, nothing happens. If it was 1, the polarity switches and causes a changing magnetic field. By Faraday’s law, a voltage pulse is generated in the sense line, which is picked up by the sensing circuit.

It was not obvious (to me) that this works for all the rings in the diagram above. The sense line forms quite a complicated self-overlapping path through all the rings. Would the magnitude of the voltage differ for rings on the edge, corner, or center? During switching, the magnetic flux goes into the page on one side of the ring, and out of the page on the other. The flux at either point induces an emf proportional to the winding number of the sense line around that point, viewing the sense line as a curve in the plane. We can plot the winding number of all the contiguous regions containing the flux:

The winding number can be found by unwinding the sense line as much as possible without intersecting the given point, and counting the number of windings around it. For example:

Finally, because the flux goes in opposite directions on either side of the ring, we can find the overall emf for a ring by subtracting the winding numbers on either side. Miraculously, the magnitude of this difference is 1 for every ring, so each ring induces the same voltage. What a clever design!

Or maybe not. In fact, any planar curve that goes through a ring will result in the same voltage. This is because the winding number around a point always changes by one when it crosses a curve:

Moving across a curve either unwinds a circle or creates a new circle. This is also proven visually in this Math.SE answer:

Here, we deform the contour (keeping winding number the same) until the point r becomes the point s with a closed clockwise loop around it. Thus, winding number of r equals winding number of s minus one.

Maybe all this is totally obvious to circuit engineers, but I found it quite interesting. Also, it gives a cool way to find the winding number for arbitrary curves: start from the outside and move the point in, adding or subtracting one at every crossing.

Programming languages summarized in one line

Inspired by the classic: A Brief, Incomplete, and Mostly Wrong History of Programming Languages.

BASIC: 10 PRINT "HELLO WORLD"; 20 GOTO 10 is the only BASIC program ever written.

C: Invented by Alan Turing as a precursor to the Turing Machine.

C++: Invented to give employment to C++ Standard Committee members.

CSS: The CSS standard formally consists of over 10,000 StackOverflow posts for every possible layout scenario.

D: C++ for hipsters.

Fortran: Used by Gauss to implement his linear algebra routines, never used since.

Go: All the advantages of C without the advantages of C++.

Haskell: The primary use case of Haskell is to write tooling for Haskell.

Java: All Java development has been fully automated since 2008.

Mathematica: What every scientist plans to learn, eventually.

MATLAB: The true backbone of all scientific research.

Lisp: The primary use case of Lisp is to write tooling for Lisp.

Perl: Used when Python programmers wake up and choose violence.

Python: All babies automatically learn Python at age 2-3.

Ruby: Lisp for hipsters’ hipsters.

Scala: A conspiracy by compiler writers to sell more compilers; does not actually exist.

Quantum mechanics for everyone

This post explains quantum mechanics (QM) without any advanced math. Unlike most introductions, I will focus on the interpretation of QM: what the objects in the theory mean and how they fit into a broader philosophy of doing physics. Specifically, I explain why the Von Neumann-Wigner interpretation, a variant of the standard Copenhagen interpretation, is the correct one. I also explain why a popular alternative to Copenhagen, the many-worlds interpretation, is incorrect.

The footnotes will contain details for more advanced readers. Also, see here for a shorter and more math-heavy version of this post.

What is science?

Let’s start with what we know. As Descartes said, “I think, therefore I am.” We know that subjective experience exists. In philosophy, subjective experiences are called qualia (singular quale). One purpose of science (including physics) is to predict what qualia we will experience, based on our past experiences. This is simply because qualia are, by definition, all that we can experience, so any attempt to verify a scientific theory necessarily involves qualia as inputs and outputs.

This focus on subjective experience may sound fuzzy and unrigorous, especially for those used to classical physics. However, it is actually a very conservative viewpoint. Some may say that the goal of science is instead to understand the objective world around us. That may be the case, but at a minimum, a theory must also be able to make predictions about our experiences. More on this as we go along.

The wavefunction and many-worlds

In this section, I will explain the basic ideas of QM, in the language of the many-worlds interpretation (MWI). MWI provides a convenient way to visualize QM as the continual splitting of a system’s state into many branches, or “worlds”. I will then show that MWI alone cannot be used to make predictions, for both practical and mathematical reasons. However, we can fix it by adding the concept of wavefunction collapse. This produces the Copenhagen interpretation.

Quantum mechanics describes the universe using a mathematical object called a wavefunction, with the symbol \psi. In the quantum world, a system can be in a combination of classical states instead of being in one state at a time. For example, a particle can be in two places at once. This is called a superposition.

Fig. 1 shows an example. The particle starts at position A, then over time, it evolves into an equal superposition of position A and B. (The boxes show instants in time.) At this time, if the experimenter measures the position of the particle, they will obtain either A or B with 50% probability1. This is indicated by the “probability amplitude” on top of each box. In QM, probabilities are given by the square of this amplitude. This is called Born’s rule. At any time, the squared amplitudes of all the branches must sum to 1. We get the number on each box as follows. When a box branches into multiple scenarios, we first multiply its amplitude with the number on each outgoing arrow (1/\sqrt{2}). Then, for each new scenario, we sum over all the incoming arrows. For example, on the top box in the superposition, 1/\sqrt{2} comes from 1 on the initial box times 1/\sqrt{2} from the one incoming arrow.

Fig. 1. Superposition.

The numbers on the arrows depend on the particular interactions between the particle and its environment. We will not be concerned with those here.

Of course, the experimenter is also composed of many particles, so should also be included as part of the wavefunction. This is shown in Fig. 2. When the experimenter measures the position, her brain’s particles record a state corresponding to seeing it at either A or B. We say that the experimenter’s state has become entangled with that of the particle.

Fig. 2. Measuring the position causes the experimenter’s brain to change state.

This shows how physics fundamentally works. To make predictions about qualia, a physical theory associates certain mathematical objects, or states, with qualia such as “seeing the particle in position A”. Given an initial state, classical physics predicts a certain future state, which is confirmed or denied by perceiving its associated qualia. In contrast, QM only predicts probabilities of obtaining future states. One way to confirm QM is then to do many identical experiments and then see if the results converge to the right probabilities2.

Does this mean that we must know the entire state of our brain in order to make or verify any predictions? Of course not. In practice, we rely on our eyes, ears, and other measuring devices to sense the world. This is because external inputs to these devices can reliably induce certain states in our brain. For example, light with a wavelength of 700nm that goes into our eyes can reliably induce the sensation of “seeing red”. More on this when we discuss measuring devices and decoherence later.

A prediction rule

Is the wavefunction all you need? No. As the experimenter, simply knowing the wavefunction at a given time does not allow you to make predictions, for the very obvious reason that you don’t know which branch you are on. At the least, you must also keep track of your current branch. For example, if you observe the particle at A, you know you are on the top branch of Fig. 2. Then, for future predictions, you must only use the arrows coming out of that state. Since the total probability must still equal one, you must then divide the probability (squared amplitude) on each future box by the current one on your box.

This is shown in Fig. 3 for multiple splittings. (Here, instead of drawing pictures in the boxes, I use letters A, B, etc. to show general states.) Let’s say you observe that you are in state B. Then in the future, you have a 1/3 chance of being in state D and a 2/3 chance of being in state E. This comes from (1/\sqrt{6})^2/(1/\sqrt{2})^2 = 1/3 and (1/\sqrt{3})^2/(1/\sqrt{2})^2 = 2/3. Even though the wavefunction contains states F and G at the same time as D and E, there is no probability of reaching those states because there are no arrows coming from B.

Fig. 3. Multiple splittings of the wavefunction. The highlighted branch corresponds to observing B instead of C.

This seems like a workable rule for making predictions: whenever you make a measurement, select your branch of the wavefunction and “follow the arrows” from there to predict future measurement results. Note that this rule does not discard the other branches entirely. All branches are still “there” at least mathematically, although most are unreachable in practice.

The wavefunction in this picture is globally shared among all observers. However, each person might perceive themselves to be in a different branch, depending on their random measurement results. This is shown in Fig. 4. Experimenters E1 and E2 measure the particle in turn. E1 may get A, so she selects the top branch. At the end of this branch, she perceives that both agree on position A. However, E2 may get B, so she selects the bottom branch, and perceives that both agree on position B. The key point is that in the end, each observer perceives an agreement on the position, so the measurement results are consistent from their own perspective.

Fig. 4. Experimenters E1 and E2 both measure the particle.

This example is similar to a famous thought experiment called Wigner’s friend. Wigner’s friend has historically been very confusing (as you can see from the Wiki article), so let me elaborate. Clearly, E1’s perceptions only depend on the particles in her own brain, not those in E2’s. When I say that she “perceives an agreement”, I mean that she treats E2 as a physical system and interacts with it, by asking her/it about the particle’s position, perhaps. That system then responds, by saying “A” or “B”, for example. This information gets received and stored in her brain in some form. From E1’s perspective, everything is a physical system, including other humans, animals, her own brain, etc. Only a subset of this system (her brain) corresponds to her perceptions3. Again, this is a very conservative viewpoint, since it does not assume other parts of the system correspond to some other entity’s perceptions. In other words, we do not assume other humans/animals/rocks/etc are “conscious”4.

Wavefunction collapse

So far so good, right? Unfortunately, this prediction rule does not quite work. Mathematically, you must completely discard the other branches every time you make an observation, and only keep the branch you are on. In other words, there can be no globally shared wavefunction. This is because probability amplitudes, unlike probabilities, can be negative. Quantum interference can cause the amplitude of a given scenario to be zero in a global wavefunction, even when that scenario is reachable in practice. If that branch is selected, it gives 0/0 for any future probabilities, which is undefined.

As usual, Fig. 5 shows an example. Assume you measure B. By the rule, you predict a 50% probability of either D or E ((1/2)^2/(1/\sqrt{2})^2=1/2). See Fig. 5(a). Note that we only consider arrows coming from B in this prediction. Then assume D is measured. We now try to apply the rule starting from D. See Fig. 5(b). However, the amplitude of D is zero! This comes from adding the two incoming arrows. We have 1/\sqrt{2}\times 1/\sqrt{2} from B, and 1/\sqrt{2}\times -1/\sqrt{2} from C, adding up to zero.

Fig. 5. (a) Making a prediction upon measuring B. (b) The prediction rule fails upon measuring D.

The solution is to discard all other branches upon each measurement, and set the amplitude of the measured branch equal to 1. This is called wavefunction collapse. It is shown in Fig. 6. When B is measured, we remove C and give B amplitude 1. Then when D is measured, we remove E and give D amplitude 1. This guarantees that probabilities are always well-defined.

Fig. 6. (a) Once B is measured, we discard branch C. (b) Once D is measured, we discard branch E.

Wavefunction collapse is the most controversial aspect of QM. However, from the discussion above, we see that it is basically just a mathematical formality, since the prediction rule is unchanged except in special cases. Remember, we are only concerned with making predictions, not “modeling the world”. This avoids meaningless philosophical issues about whether the wavefunction or its collapse is “real”. The reason many are uncomfortable with collapse is because it is different from classical physics, in the following ways:

  • Different observers use different wavefunctions. In MWI, although observers may find themselves in different branches, there is only one wavefunction. Similarly, the classical universe is in a single big classical state. However, by discarding the other branches, different observers use entirely different mathematical objects (wavefunctions) to describe the universe. Of course, the physics stays the same, since as just mentioned, the prediction rule is almost the same.
  • Wavefunction collapse happens instantaneously. In classical physics, the state evolves continuously in time under Newton’s laws. In quantum physics, apart from wavefunction collapse, the wavefunction also evolves continuously in time under an equation called Schrödinger’s equation5. (We have summarized this continuous evolution using the arrows with numbers on them.) Wavefunction collapse instantly discards the other branches and assigns a new amplitude to the observed branch. How is such a discontinuous process allowed? Because any predictions must specify a time when the measurement yields a definite result. This is when collapse occurs6. More on this later.

The Copenhagen interpretation

This theory of wavefunction evolution plus collapse is loosely called the Copenhagen interpretation. Actually, there is no widely-agreed-upon definition of the Copenhagen interpretation, but one hallmark is the separation of the world into classical and quantum systems. QM was originally developed to describe small objects such as single particles using a wavefunction. In contrast, large objects such as photon detectors or human beings were treated as classical systems that cause wavefunction collapse. For example, a particle detector appears to “collapse” the wavefunction of a superposition state like Fig. 1 into a state with definite position, either A or B. In this picture, the particle detector is not part of the wavefunction.

Of course, this led to much confusion about where exactly to draw the line between classical and quantum. How large does a system have to be in order to become classical? As we have argued above, there is no inherent difference between objects such as particles and humans; they are all quantum systems and all part of the wavefunction. In other words, we draw the line at the observer’s “consciousness”. The act of observation causes collapse. This variant of Copenhagen is sometimes called the Von Neumann-Wigner interpretation, or “consciousness causes collapse”.

Consciousness is a dirty word among serious physicists, almost always for good reason. However, we simply use it to mean the ability to have subjective experiences, which was our very first assumption.

Measuring devices and decoherence

This begs the question of why large systems like particle detectors tend to “look” classical. In fact, this was not fully understood until the theory of decoherence emerged in the 1950s-1970s, decades after QM was developed. The basic idea is quite simple. Take a small system S in one of a few states A, B, C, etc. When it interacts with an environmental system E, this environment turns into a corresponding state E_A, E_B, E_C, etc. For a large environment, these environmental states tend to become well-separated very quickly. This is because there are many more microscopic states that the large environment can take.

For example, Fig. 7 shows a single particle bouncing around in a box. This is a small environmental system. If another particle is placed at position A (top left), eventually they will hit each other, affecting the path of the first particle in some way. If instead the second particle is placed at position B (bottom left), it will affect the first particle in a different way. However, there is a good chance that at some future time, the first particle will happen to be at (nearly) the same location for both scenarios, as seen in Fig. 7.

Fig. 7. Particle in a box with another one placed at either A or B. At some future time, it is likely that the first particle will be at the same location in both scenarios, as shown here.

Now consider a huge number of particles bouncing around in the box. This is a large environmental system. If a new particle is introduced at position A, it will rapidly scramble the paths of all the other particles as they interact with it and with each other. If instead the new particle is introduced at position B, it will scramble the paths in a very different way. At any future time, there is very little chance that all the original particles will be at all the same locations in the two scenarios. The environmental states E_A and E_B are well-separated.

Fig. 8 shows a more accurate version of the measurement in Fig. 2, incorporating decoherence. The wavefunction initially splits into an equal superposition of position states A and B of the particle. At this time, the experimenter is in the same initial state for both branches. The experimenter then measures the particle by interacting with it. For example, there may be some light illuminating the particle, which goes into the experimenter’s eyes, which sends an electrical signal to the brain, etc. After a short amount of time, the experimenter’s brain is in very different states for the two scenarios A and B. This is seen by the nearly zero amplitude of the “observed B” state when the particle is at A (top-most branch), and the nearly zero amplitude of the “observed A” state when the particle is at B (bottom-most branch).

Fig. 8. More accurate version of Fig. 2 that incorporates decoherence.

To summarize: a measuring device looks classical if it causes decoherence. Therefore, you might think that decoherence can be used to define measurement, so that we do not need wavefunction collapse. This is not the case, for a couple of reasons. First, decoherence is never complete. In most decoherence models, the amplitude of the “wrong” branch approaches zero exponentially with time, but never reaches it. Therefore, we cannot define a time when the measurement is complete. Second, decoherence is only an emergent property of large systems. Why should conscious observers be limited to these systems? Indeed, how do we set a lower limit on the size or amount of decoherence anyway? Clearly, we cannot. The theory must still apply to general quantum systems as observers.

For example, consider an observer system that fluctuates rapidly in time, as in Fig. 9. The theory must still be able to associate states of this system with the observer’s perceptions. Since the branches do not remain separated over time, we cannot rely on decoherence. We also cannot say a state must be stable for a minimum amount of time in order to be measured. The observation, and thus collapse, must happen instantaneously.

Fig. 9. An observer in a rapidly fluctuating superposition.

Other interpretations

The Copenhagen interpretation has always been the standard one taught in textbooks. In the last few decades, many other interpretations have sprung up. I myself believed in MWI until I started thinking deeply about QM a few years ago. In my opinion, these other interpretations all stem from misunderstanding either the Copenhagen interpretation or the purpose of a physical theory. I will list some of them and their flaws here without further detail.

  • MWI is incomplete, as argued above.
  • Bohmian mechanics and consistent histories are ugly and overly complicated.
  • Quantum Bayesianism and relational quantum mechanics just dress up Copenhagen with some fancy words.

Summary

  • The minimum requirement for a scientific theory is that it makes predictions about an observer’s qualia. It does not have to predict the qualia of other entities, since they are not observable.
  • A theory does this by associating mathematical objects, or states, to certain qualia.
  • Classical physics predicts one future state, while quantum physics only predicts probabilities of each future state. This is done using a wavefunction that splits into multiple scenarios.
  • The wavefunction collapses upon an observation to the observed branch. Thus, different observers use different objects (wavefunctions) to describe the universe. Collapse is required mathematically for the theory to work.
  • Collapse must be instantaneous for the theory to apply to all possible observers.
  • Decoherence explains why certain objects look like classical measuring devices. However, it is only an approximation and does not replace the need for collapse.

1 Why can’t we observe the particle in two places at once? There are two ways to interpret this question in QM. 1) Why do we prefer the position basis instead of another basis? This is known as the preferred-basis problem. The short answer is that the preferred basis must be empirically determined, just as the perception of the color “red” must be correlated with certain wavelengths of light. More in the advanced version of this post. 2) Why can’t we perceive that we are in a superposition, in general? Because then we could prepare an identical state, violating the no-cloning theorem. More on this here, or see Nielsen & Chuang’s textbook.

2 To be pedantic, no experiments can be truly identical, because 1) the initial states cannot be exactly the same, and 2) the state of your brain has to include the memory of previous experiments. Of course, we really mean that for a series of experiments where we control all the relevant inputs, the results stored in your brain will converge to the predicted probabilities. Also, it goes without saying that many states are associated with the same quale: shifting the position of one molecule in your brain by a tiny amount has no observable effect.

3 This begs the question: how do we know what subset we can observe? As usual, we must determine this empirically!

4 Yes, this is basically solipsism. Unfortunately, that is where the logic of QM leads us. Don’t take it so seriously as to affect your personal moral code or anything.

5 Or more generally, the operator generated by the Hamiltonian.

6 Another common belief is that collapse is incompatible with relativity. This is false. Of course, we do not have a complete theory of quantum gravity, but for QFT in curved space, we can choose the collapse to occur on any spacelike hypersurface. This is because spacelike-separated operators commute, so can be simultaneously measured.

Solving Newcomb’s paradox for classical and quantum predictors

A recent HN post reminded me of Newcomb’s paradox, which goes as follows (from Wiki):

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following:

  • Box A is clear, and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor:
    • If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
    • If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.

The player does not know what the predictor predicted or what box B contains while making the choice.

The question is whether the player should take both boxes, or only box B.

I first saw this problem many years ago but didn’t have a strong opinion. Now it seems clear that the controversy is about the definition of “reliable predictor”. This is usually left vague, leading to many unreliable philosophical and game-theory arguments. As usual, I will try to solve the problem using physics. Interestingly, the analysis is different for a classical versus quantum predictor, and also depends on the interpretation of quantum mechanics.

Classical predictor

Assume it is a classical supercomputer that, at prediction time, takes the state of the player and all the objects that they interact with until the decision. Call this state S_i. By running the physics forward, it arrives at either a state S_{AB} or S_B, corresponding to the decision to take both boxes or only box B, respectively. In this case, one should obviously take only box B.

Quantum predictor

In the quantum case, the initial wavefunction of the player/etc is \psi_i. The computer cannot measure the wavefunction directly due to the no-cloning theorem. Instead, one way to make the prediction is as follows. The decision to take both boxes corresponds to a set of orthonormal states \{\psi_{AB}\}, and likewise for \{\psi_B\}. These two sets are mutually orthonormal and form a complete basis, since there are only two choices. Given these sets, the computer can run Schrödinger’s equation back to prediction time to obtain the sets \{\psi_{ABi}\}=e^{i H t}\{\psi_{AB}\} and \{\psi_{Bi}\}=e^{i H t}\{\psi_B\}, respectively. These are also mutually orthonormal due to unitarity. At prediction time, it can measure the projection operator

\displaystyle P_{B}=\sum_a |\psi_{Bi}^a\rangle \langle\psi_{Bi}^a|.

The measurement gives 1 (take box B) with some probability p, and 0 (take both boxes) with probability 1-p. This collapses the player’s wavefunction to one of the states in \{\psi_{ABi}\} or \{\psi_{Bi}\}, which then evolves into a state in \{\psi_{AB}\} or \{\psi_B\}. Thus, from the predictor’s perspective, the predictor is always right.

The player models this measurement as the predictor becoming entangled with the player, so that the total wavefunction is something like

\displaystyle \sqrt{p}(\psi_{Bi}\otimes \psi_\text{predictB}) + \sqrt{1-p}(\psi_{ABi}\otimes\psi_\text{predictAB}).

If the player only makes a measurement at decision time, they will collapse the wavefunction to a state in \{\psi_{B}\} with probability p, or a state in \{\psi_{AB}\} with probability 1-p. We assume that this is the measurement basis since the player’s state should not become a superposition of (take B only) and (take both). The expected value is then simply:

\displaystyle E[p] = p B + (1-p)A = A+p(B-A)

where A=\text{\$1,000}, B=\text{\$1,000,000}. This is maximized at p=1, so the best decision is to take only box B, just as in the classical case.

Where we go from here depends on the interpretation of quantum mechanics. For many-worlds, there is only unitary evolution. The player ends up in the branch \psi_{B}\otimes \psi_\text{predictB} with probability p, giving the expected value above.

However, for Copenhagen-type interpretations where different observers can use different wavefunctions, the player can do better, since they are free to make any measurements between prediction and decision time, while the predictor assumes unitary evolution1. In fact, they can make the predictor predict (take B only) with certainty, while they actually take both with certainty. One way is as follows. Assume the player makes the decision based on measuring a qubit at decision time, where |\uparrow\rangle means take B only and |\downarrow\rangle means take both. The state of the qubit oscillates between |\uparrow\rangle and |\downarrow\rangle with period T, where T is the time between prediction and decision. At prediction time, assume the state is |\uparrow\rangle, so the predictor predicts (take B only). At time T/2, the player can make repeated measurements very quickly until decision time. The qubit stays in the |\downarrow\rangle state due to the quantum Zeno effect. Thus, at decision time, the player takes both boxes. The extra $1,000 can then contribute to funding the delicate and expensive equipment needed for the qubit.

We can take this one step further in some cases. For human players, the knowledge of the measurement protocol is classically encoded in the player’s brain in some way. If the supercomputer can decode this information instead of merely running the time evolution, they can also predict which measurements the player makes, and the probabilities of the subsequent results. We arrive back to the original case, where the best solution is to pick B only. This is not required by the postulates of quantum mechanics. The observer’s decision to make measurements on its state does not necessarily have to be encoded in its state itself.

Real predictor

In the real world, there are no such supercomputers, and no entity would risk $1,000,000 on a meaningless game. The best answer is to take both boxes.


1 In practice, a human’s measurements of their own state occur long after decoherence, so they have no control of their wavefunction in this way. However, if we are assuming all-powerful supercomputers, we may as well go all the way.

Fundamentals of classical mechanics, or why F = ma

Despite its simplicity, classical mechanics is not taught well in the typical physics curriculum. This is unfortunate because the general philosophy of constructing Lagrangians based on symmetries underlies all of modern physics. In this article, I explain basic Lagrangian mechanics in a systematic way starting from fundamental physical principles. It basically follows Landau and Lifshitz Vol. 1 but ties up some loose ends.

Principle of stationary action

Classical mechanics describes the motion of objects modeled as point particles. First, consider a single particle in empty space. At any given time, it has a position \vec x(t) and velocity \vec v(t)=\frac{d\vec{x}}{dt}.

Define a quantity S_{if}\{\vec x(t)\} that depends on the path of the particle \vec x(t) from time t_i to t_f. The principle of stationary action, or action principle, states that the path the particle actually takes is one where the action is stable to small perturbations in the path \vec x(t) \rightarrow \vec x(t) + \vec{\delta x}(t).

To elaborate, consider dividing the time interval from t_i to t_f into N segments, and take N\rightarrow \infty in the end. You may think of S_{if} as a function of many variables \{\vec{x}(t_i),t_i,\vec{x}(t_i+\Delta t),t_i+\Delta t,\cdots, \vec{x}(t_f), t_f\}, where \Delta t = (t_f-t_i)/N. (Note that the velocity \vec{v}(t) = \frac{\vec{x}(t+\Delta t)-\vec{x}(t)}{\Delta t}, so it is not an independent variable here.) Such a “function of a function” is called a functional. The principle of stationary action is then \frac{\delta S_{12}}{\delta x_i(t)}=0, i.e. the partial derivative of S_{12} with respect to any component of the position x_i at any time t is zero. The \delta symbol is generally used instead of \partial for functional derivatives.

Finally, the action principle only applies to perturbations that are zero at the boundaries: \vec{\delta x}(t_i) = \vec{\delta x}(t_f) = 0. This will become important later.

The Lagrangian

Consider the action S_{12} for time t_1 to t_2, and the action S_{34} for time t_3 to t_4, with t_1 < t_2 < t_3 < t_4. We require locality in time, meaning that a perturbation in the first interval only affects S_{12} and not S_{34}. Also, we assume additivity of the action: S_{12}+S_{23}=S_{13}. These conditions imply that S_{12} can be written as an integral from t_1 to t_2 of some quantity: S_{12}=\int_{t_1}^{t_2} \mathcal{L}(\vec{x}(t),\vec{v}(t), t). \mathcal{L}(\vec{x}(t),\vec{v}(t), t) is known as the Lagrangian. In general, it may depend on the position and velocity at time t, as well as the time t itself1.

Note that we may add a total time derivative \frac{df}{dt}(\vec{x},t) to the Lagrangian without affecting the principle of stationary action. Such a term produces the action:

\displaystyle\int_{t_i}^{t_f} dt\frac{df}{dt}(\vec{x},t) = f(\vec{x}(t_f), t_f)-f(\vec{x}(t_i), t_i)

by the fundamental theorem of calculus. The perturbation \vec{\delta x}(t) is zero at the boundaries by definition, so does not affect this action.

Let us now derive the form of the Lagrangian based on some other fundamental principles:

Homogeneity of space and time. No point in space or time is any different from any other, so the Lagrangian cannot depend on \vec{x} or t explicitly.

Isotropy of space. No direction in space is different from any other, so the Lagrangian can only depend on the magnitude (squared) of the velocity \vec{v}(t)^2.

Galilean invariance. The theory should be invariant under shifts by a constant velocity, \vec{x}\rightarrow \vec{x}+\vec{v}_0 t. In other words, there is no universal stationary frame of reference. Taking the time derivative, this is \vec{v}\rightarrow \vec{v}+\vec{v}_0. To first order in \vec{v}_0, the Lagrangian changes as

\displaystyle\mathcal{L}(\vec{v}^2)\rightarrow \mathcal{L}(\vec{v}^2+2\vec{v}\cdot \vec{v}_0) = \mathcal{L}(\vec{v}^2)+2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0

The term 2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0 will not affect the physics if it is a total time derivative of the form above. This only occurs if \frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) is a constant. Call this constant \frac{1}{2} m. Thus, the Lagrangian for a single particle in free space is: \mathcal{L} = \frac{1}{2} m \vec{v}^2. The constant m is, of course, the mass.

To summarize, we derived the unique action and Lagrangian (up to a total time derivative) for a single particle from the following postulates:

  1. Locality in time
  2. Additivity of the action
  3. Homogeneity of space and time
  4. Isotropy of space
  5. Galilean invariance

Multiple particles

Now consider the n-particle case. The Lagrangian may generally depend on all the positions and velocities \vec{x}_1, \vec{v}_1, \cdots, \vec{x}_n, \vec{v}_n. Following the postulates above, it must take the form2:

\displaystyle \mathcal{L} = \left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

where the function U(\Delta \vec{x}_{ij}) depends on all the separations between the particles \{\Delta\vec{x}_{12} = \vec{x}_1-\vec{x}_2, \Delta\vec{x}_{13} =\vec{x}_1-\vec{x}_3, \cdots\}.

Euler-Lagrange equations

Let us now apply the principle of stationary action to the action:

\displaystyle S=\int dt\left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

Plugging in the variation \vec{x}_i\rightarrow \vec{x}_i+\vec{\delta x}_i for particle i, and expanding to first order in \vec{\delta x}_i, we get:

\displaystyle S\rightarrow S+ \int dt\left(m_i \vec{v}_i\cdot \vec{\delta v}_i - \nabla_i U \cdot \vec{\delta x}_i\right)

where \nabla_i U is the gradient of U with respect to \vec{x}_i. Using \vec{\delta v}=\frac{d}{dt}\vec{\delta x}, we can integrate the first term by parts, discarding the boundary term m_i \vec{v}_i\cdot \vec{\delta x}_i since \vec{\delta x}_i= 0 at the boundaries. We obtain:

\displaystyle \frac{\delta S}{\delta \vec{x}_i(t)}=-m_i \vec{a}_i(t)-\nabla_i U(t) = 0

where \vec{a} = \frac{d\vec{v}}{dt}. The equations obtained using the action principle are known as Euler-Lagrange equations or equations of motion. In this case, we have found Newton’s law for a conservative potential:

\displaystyle \vec{F} = -\nabla_i U=m_i \vec{a}_i

Beyond classical mechanics

Finally, it is interesting to see how the postulates above are modified in quantum and relativistic theories.

  1. Principle of stationary action. In quantum physics, the particle takes all paths instead of only the classical one! The quantum amplitude is given by summing up e^{i S\{x\}} over all paths. This is known as a path integral.
  2. Locality in time gets promoted to locality in space and time in field theory.
  3. Additivity of the action remains the same.
  4. Homogeneity of space and time remains the same.
  5. Isotropy of space remains the same.
  6. Galilean invariance is promoted to Lorentz invariance in relativity. Lorentz transformations relate space and time.

In modern theories, there are often additional symmetry principles that constrain the Lagrangian, such as gauge invariance and conformal invariance.


1 It also cannot depend on higher time derivatives due to the Ostrogradsky instability.

2 A term like \vec{v}_i\cdot \vec{v}_j with i \neq j is possible, but would imply that particles infinitely far away can affect each other, violating common sense (or, if you like, the cluster decomposition principle).

Physics textbooks for self-study

Here are some physics textbooks that I’ve read over the years. Each textbook is rated from 1-5 Diracs (Paul_Dirac,_1933.jpg) on quality for self-study. Most topics are divided into (basic) and (advanced).

Screen Shot 2020-01-04 at 11.43.27 AM
Figure 1. Areas of physics (biased toward high-energy theory). Special relativity and electromagnetism can be learned separately but complement each other. “Weak prerequisites” are math subjects that can usually be learned as you go along.

Tips for self-study:

  • Shorter is better when it comes to textbooks. The problem with self-study is missing the forest for the trees. Most textbooks can give you the details, but there is no one to explain how to fit the information in your head in a compact and memorable way. Shorter books are usually better for this. The flip side is that shorter books are harder to understand if you have no past exposure. Start by reading parts of a standard textbook to get the basics, then go back.
  • Do enough exercises. But don’t feel the need to do every single one before moving on, even if you are a little confused. It can be more efficient to just keep going, since physics is interconnected and the new material often clarifies the old.
  • Write notes in the margins of any confusing aspects of derivations or errata you discover. These will undoubtedly help you when you revisit them years later.

Personal (controversial) opinions:

  • Avoid mathematical physics-oriented books. When I started out, I thought more rigor can never hurt. But if you are interested in physics, learn physics. Math books often dwell on excessive formalism that is irrelevant for physics at the end of the day.
  • Amazon ratings are useless. Unless they’re really terrible, most books will have very good ratings. I suspect most reviewers used the book for a class, are already experts on the subject, or simply want to look smart. 🙃

Quantum mechanics (basic)

Griffiths, Introduction to Quantum Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

I start by contradicting my own advice about shorter books. 😀  This is a long but very readable book that is even worth reading from cover to cover. There is a reason this is the standard textbook in many places. One tends to forget how much it covers: statistical mechanics, spontaneous and stimulated emission, band structure, WKB approximation… Not in great detail, but often enough.

Quantum mechanics (advanced)

Weinberg, Lectures on Quantum Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Weinberg’s books are known for their slow and systematic presentation. If you’re in a rush, my recommendation is to just read chapters 3 and 4, which contain the essentials of quantum mechanics and spin and are relatively self-contained.

Linear algebra (basic)

Strang, Introduction to Linear Algebra (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Actually, I suggest the lectures instead of the book. One relaxing 45-minute lecture a day and you’ll know linear algebra in a month.

Classical mechanics (advanced)

Landau and Lifshitz, Mechanics (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

The Russian school excels at explaining things deeply and simply. The first two chapters contain the best exposition of classical mechanics there is. In my experience, even professional physicists are often confused by some foundational topics that are explained here. (For example, where does the Lagrangian \frac{1}{2}mv^2 come from? Answer: Homogeneity+isotropy of space, and Galilean invariance. Together with the principle of stationary action, this leads to F=ma.) If you’ve never seen a Lagrangian before, start with one of the numerous intros, like this one.

Special relativity (SR)/Electromagnetism (advanced)

Landau and Lifshitz, The Classical Theory of Fields (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Amusingly, this does not actually cover the simplest classical field theories (scalar fields) since the only relevant classical fields in practice are the electromagnetic and gravitational. Chapters 1-4 are an excellent exposition of SR and how E&M fits into it, while chapters 10-12 are a decent introduction to general relativity that complements other texts.

General relativity (GR)

Dirac, General Theory of Relativity (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

Who said GR is hard to understand? This pamphlet by the big man himself weighs in at only 69 pages. Unlike most books, it explains curved spacetime as a surface embedded in a higher dimensional space with flat metric. In my view, this is the most intuitive way to understand it. Among other things, it leads to the covariant derivative as the projection of the directional derivative onto the tangent space, a very pleasing interpretation of an otherwise confusing concept.

No exercises though. So as an introduction, you will want:

Zee, Einstein Gravity in a Nutshell (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This is the book I wish I had when starting GR. Zee is one of the most gifted physics expositors of our time. Unfortunately, it is rather long, so I would recommend first reading enough of this one to understand Dirac, then going back to this one for special topics.

Carroll, Spacetime and Geometry: An Introduction to General Relativity (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This was my first exposure to GR. I got through about chapter 3 before getting confused and stopping. This is one of those mathematical physics books I mentioned above, with a lot of formalism surrounding manifolds, tensors, and differential forms at the outset. It is good to know eventually, but not what you need as an introduction. I suppose it would make a good reference, but Zee’s book also serves well in this regard.

Quantum field theory

The subjects above are all well-established and have a fairly defined “core”. On the other hand, QFT is an evolving field with a sprawling mess of important results. Each textbook emphasizes different aspects, so you will need multiple books.

Zee, Quantum Field Theory in a Nutshell (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This was my first and favorite QFT book. Other textbooks have more detail, but none will make you fall in love with the subject like this one. Just get it and enjoy the magic of the path integral.

Schwartz, Quantum Field Theory and the Standard Model (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

This is a very thorough textbook, perhaps the modern successor to the classic Peskin and Schroeder. I particularly enjoyed the bottom-up construction of spin 1 and 2 Lagrangians in chapter 8. One criticism is that many calculations are rather clunky and involved. For example, scalar QED is heavily used, which is conceptually simpler but involves more diagrams than spinor QED. I prefer Zee’s approach of just starting with spinor QED.

(Also, his notation with all indices on the same level bugs me…)

Srednicki, Quantum Field Theory

No rating for this one since I haven’t read it in much detail. The first chapter (“Attempts at relativistic quantum mechanics”) is an excellent motivation for QFT. The chapters are short and to the point. If I could start over, I would probably read this one concurrently with Zee.

Group theory

Zee, Group Theory in a Nutshell for Physicists (Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg Paul_Dirac,_1933.jpg)

For those like me that get bored to death reading pure math textbooks, Zee’s usual colloquial style makes even classifying representations of finite groups exciting. Not absolutely necessary to read if you’re in a hurry to learn more physics, but still a joy.


Advanced resources

Once you have a grasp of the areas above, additional topics can be learned without having to rearrange your entire worldview (with the possible exception of string theory). Here are some of my favorite advanced resources.

Shifman, Advanced Topics in Quantum Field Theory

Despite the title, this book focuses on simple explanations of modern topics without arduous derivations. Some interesting results cannot be found elsewhere, e.g. that domain walls antigravitate!

Terning, Modern Supersymmetry: Dynamics and Duality

This is a compact volume on supersymmetric field theory. The first three chapters are quite good, but I found some explanations in later chapters hard to understand. A better intro to Young tableaux is found here.

Polchinski, String Theory Vols. 1 and 2

This labor of love by the father of D-branes himself covers pre-AdS/CFT string theory. It seems to be the standard textbook on the subject, for good reason. The explanations are clear and the text contains many invaluable exercises. His passion for the topic is evident throughout.

Hartman, Lecture notes on quantum gravity and black holes

Not a textbook, but a good set of lecture notes by Tom Hartman. Explores many contemporary topics that have yet to make it into any textbooks I know of. Many useful exercises are included.

Notes on gravity as a gauge theory

Gravity has often been called a gauge theory of the Poincaré or Lorentz group. Here, I develop general relativity in direct analogy to Yang-Mills theory, avoiding geometry entirely1. None of this is original, but I have tried to simplify the presentation compared to the literature, where the similarities and differences between the two theories are often unclear.

Gauge fields and field strengths

The Poincaré algebra is:

[P_a, P_b] = 0

[P_a, M_{bc}]=\eta_{ab}P_c - \eta_{ac} P_b

[M_{ab}, M_{cd}] = \eta_{ad}M_{bc}+\eta_{bc}M_{ad} - \eta_{bd}M_{ac}-\eta_{ac}M_{bd}

Roman letters a,b, \cdots are gauge indices, while Greek letters \mu, \nu, \cdots are coordinate indices. We use “mathematician’s convention” for the generators where the i is absorbed: T_{math}=i T_{physics}. We proceed just as in Yang-Mills theory, taking the Poincaré group as the gauge group. It has 10 generators: 4 translations P_a and 6 rotations/boosts M_{ab}.

Introduce the covariant derivative:

\displaystyle D_\mu = \partial_\mu - e_\mu^a P_a - \frac{1}{2}\omega^{ab}_\mu M_{ab}

where e_\mu^a (the vielbein) and \omega^{ab}_\mu (the spin connection) are the gauge fields associated with translations and rotations, respectively. Note the units: P_a has unit 1, so e_\mu^a is unitless, while M_{ab} is unitless, so \omega^{ab}_\mu has unit 1. We can take \omega^{ab}_\mu to be antisymmetric in ab since M_{ab} is antisymmetric. The field strengths are found in the usual way:

\begin{aligned} F_{\mu\nu}&=D_\mu D_\nu - D_\nu D_\mu \\ &= -C^a_{\mu\nu}P_a -\frac{1}{2}R^{ab}_{\mu\nu}M_{ab} \end{aligned}

where we have defined the field strengths C^a_{\mu\nu} (the torsion) and R^{ab}_{\mu\nu} (the curvature tensor).
We obtain:

C^a_{\mu\nu}=\partial_\mu e^a_\nu - \partial_\nu e^a_\mu - \omega^{a}_{\mu b} e^b_\nu + \omega^{a}_{\nu b} e^b_\mu

R^{ab}_{\mu\nu}=\partial_\mu \omega_\nu^{ab} - \partial_\nu \omega_\mu^{ab}-\omega_\mu^{ac}\omega_{\nu c}^{\;\;\;\;b} + \omega_\nu^{ac}\omega_{\mu c}^{\;\;\;\;b}

As usual, we raise and lower indices using \eta_{ab} and \eta^{ab}.

General relativity is obtained by setting the torsion C^a_{\mu\nu}=0. Certainly, theories with torsion have been extensively considered, but we will not do so here. Experimental data have not ruled out theories involving both torsion and curvature. However, the bottom-up construction of the Lagrangian of an interacting massless spin-2 particle produces general relativity2.

This constraint allows us to solve for the spin connection in terms of the vielbein. After some calculation (e.g. listing out all possible terms and matching coefficients), the answer is:

\displaystyle \omega_\mu^{ab}=\frac{1}{2}(e^{\rho b}\partial_\mu e_\rho^a-e^{\rho a}\partial_\mu e_\rho^b+ e^{\rho a} e^{\sigma b} \partial_\rho g_{\mu\sigma}-e^{\rho b}e^{\sigma a}\partial_\rho g_{\mu\sigma} )

where g_{\mu\nu}=e_\mu^a \eta_{ab} e_\nu^b.

Representations and Lagrangians

Just as in Yang-Mills theory, the Poincaré group here acts as an internal symmetry group. Fields transform as a finite-dimensional representation of the Lorentz algebra, and transform trivially under translations3: P_a=0. This has an important consequence for constructing Lagrangians. Recall that the gauge field A_\mu(x)=A^a_\mu(x) T^a in Yang-Mills theory transforms as

A_\mu\rightarrow U A_\mu U^{-1} + (\partial_\mu U) U^{-1}

under a gauge transformation U(x). The (\partial_\mu U) U^{-1} is required to cancel out the (\partial_\mu U)\phi(x) in the transformation of \partial_\mu\phi(x). However, since P_a=0, this additional term is not needed. e^a_\mu is already gauge-covariant and can be placed directly in the Lagrangian.

The simplest term is:

\displaystyle \mathcal{S}_\Lambda = \frac{\Lambda}{4!} \int \epsilon_{abcd}e^a e^b e^c e^d

where e^a=e^a_\mu dx^\mu is a 1-form and \epsilon_{abcd} is the totally antisymmetric symbol4. This is the cosmological constant. It is equivalent to the standard form \Lambda\int d^4 x \sqrt{-g}.

On the other hand, the spin connection \omega^{ab}_\mu does show up in the gauge transformation, so we must use the field strength R^{ab}_{\mu\nu} in the Lagrangian. The next simplest term is then:

\displaystyle \mathcal{S}_{EH} = \frac{M_{Pl}^2}{3}\int \epsilon_{abcd} e^a e^b R^{cd}

where R^{cd}=R^{cd}_{\mu\nu}dx^\mu dx^\nu is a 2-form. This is the Einstein-Hilbert action. Unlike Yang-Mills, we are permitted a term that is only linear in the field strength R^{cd}.

Coupling to matter fields

Flat-space Lagrangians contain terms with global Lorentz indices, such as \partial_\mu \varphi and A_\mu. We would like these to transform under the local Lorentz group with indices a, b, \cdots. The only object that can switch between global and local indices is e_\mu^a, or its inverse, e^\mu_a. Thus, the general prescription for coupling a flat-space Lagrangian to gravity is:

  1. Contract all tensors with e_\mu^a or e^\mu_a.
  2. Make flat-space invariants use local indices: \eta_{\mu\nu}\rightarrow \eta_{ab}, \epsilon_{\mu\nu\rho\sigma}\rightarrow \epsilon_{abcd}.
  3. Use covariant derivatives: \partial_\mu\rightarrow \partial_\mu-\frac{1}{2}\omega_\mu^{ab} M_{ab}.

Note that this even works on the volume form d^4 x, producing the familiar invariant measure d^4 x \sqrt{-g}:

\displaystyle d^4 x = \frac{1}{4!}\epsilon_{\mu\nu\rho\sigma} dx^\mu dx^\nu dx^\rho dx^\sigma \rightarrow \frac{1}{4!} \epsilon_{abcd} e^a_\mu e^b_\nu e^c_\rho e^d_\sigma dx^\mu dx^\nu dx^\rho dx^\sigma

For example, a scalar field coupled to gravity has the action:

\displaystyle \mathcal{S} = \frac{1}{2\cdot 4!}\int \epsilon_{bcdf} e^b e^c e^d e^f (e_a^\mu e^{\nu a} \partial_\mu\varphi\partial_\nu\varphi - m^2 \varphi^2)

An advantage of the vielbein formalism is that spinors can be coupled to gravity. For Dirac spinors, the Dirac matrices should also be converted to local indices \gamma^\mu\rightarrow \gamma^a, since they satisfy the Clifford algebra \{\gamma^a,\gamma^b\}=2\eta^{ab}. The Lagrangian for a massless fermion becomes:

\mathcal{L}=i\bar\Psi \gamma^a e_a^\mu (\partial_\mu-\omega^{bc}_\mu M_{bc})\Psi

where

\displaystyle M_{ab}=S_{ab}=\frac{1}{4}[\gamma_a,\gamma_b]


1 This is ironic from a historical perspective, since Yang and Mills were inspired by general relativity. Of course, in physics, there are many ways to skin a cat.

2 Schwartz, Matthew. Quantum Field Theory and the Standard Model, Ch. 8.

3 Thus, you could say gravity is the gauge theory of the Lorentz group instead. However, we had to introduce the vielbein as part of the covariant derivative in order to get the correct theory. So there is a slight wrinkle in the analogy.

4 Unlike Yang-Mills theory, we cannot write the Lagrangian using the “abbreviated” fields e=e^a_\mu P_a dx^\mu. In fact, e e vanishes due to [P_a,P_b]=0.

Quantum mechanics explained

After being a strong believer in the many-worlds interpretation of quantum mechanics for years, I have now completely changed my mind. Many-worlds is seriously flawed, and the good old Copenhagen interpretation is not so bad.

Specifically, the correct interpretation of quantum mechanics is the Von Neumann-Wigner interpretation, a flavor of Copenhagen that puts the Heisenberg cut at the observer’s consciousness. The orthodox Copenhagen interpretation, which allows placing the cut at a physical measuring device, is a useful approximation due to decoherence.

What is physics?

Understanding quantum mechanics requires thinking carefully about what physics is and is not. The point of a physical theory is to make predictions about sensory experience. It is only about modeling the world if this helps to make predictions. Thus, the observer’s consciousness1 is just as fundamental as the mathematical objects of the theory. In classical physics, this is obscured because the mathematical objects of the theory are shared among all observers, rendering the observer apparently redundant. Quantum mechanics relaxes this assumption and allows different observers to use different mathematical objects (wavefunctions).

Quantum and classical compared

Let me elaborate on classical and quantum physics.

Classical mechanics describes a system of particles with positions and momenta that evolve in time under Newton’s law. Quantum mechanics is quite similar: it describes a system of particles with a field called the wavefunction that evolves in time under Schrödinger’s equation2. If that were the whole story, quantum mechanics would be pretty much the same as classical mechanics.

However, these are just mathematical constructs so far. How do we actually verify classical mechanics? We can only sense the set of particles corresponding to our body/brain, so we must find a way to cause the system of interest to interact with these particles. In other words, we must split the universe into system and observer3. Then we must assign different states of our state space to different perceptions corresponding to the results of a measurement.

This is exactly what happens in quantum mechanics as well. The difference is that quantum mechanics contains superposition states, while observers can only distinguish between orthogonal states. Thus, there must be a rule to say which orthogonal state in a superposition the observer actually perceives: Born’s rule4.

Why many-worlds fails

Many-worlds seems like a simple and attractive idea that accomplishes the goal: it tells you what an observer perceives using only unitary evolution of a global wavefunction, similar to classical physics. However, it is seriously flawed. Many-worlds models a measurement as follows:

\displaystyle \left(\sum_i c_i | s_i \rangle \right) \otimes |O_0\rangle \rightarrow \sum_i c_i |s_i'\rangle \otimes |O_i\rangle

where |s_i\rangle are the system basis states, |s_i'\rangle are the new system states for each |s_i\rangle, |O_0\rangle is the initial observer state and |O_i\rangle are the final observer states. The |s_i'\rangle are left arbitrary to include both destructive and non-destructive measurements. Measurement is complete upon decoherence, when \langle O_i|O_j\rangle \approx \delta_{ij}. Then the states |O_i\rangle are interpreted as the different perceptions of the observer.

This has several problems. In order of least to most serious:

1. Decoherence is never complete.

What happens in this case? Observers can only distinguish between orthogonal states. An idea is to rewrite the final wavefunction as a sum of direct products in some orthonormal observed basis |O_i''\rangle:

\sum_i c_i'' |s_i''\rangle \otimes |O_i''\rangle

Then the observed system states c_i'' |s_i''\rangle would simply be slightly different than the original ones c_i |s_i'\rangle, corresponding to a small error in the measurement.

2. It assumes the observer is not entangled with the system before measurement.

This is obviously false most of the time! Everything is usually entangled with everything else. To generalize the above, what we actually want is some rule for “hopping” between perceived states of the observer, given an arbitrary entangled state \psi(t). I invite you to come up with such a hopping rule. Seriously, try it.

For example, consider this plausible attempt at a hopping rule. The probability of hopping from state i at time t, to state j at time t+\Delta t, is:

p_{i\rightarrow j} = \displaystyle \frac{\text{tr}\left( P_j e^{-iH\Delta t} P_i \rho(t) P_i e^{iH\Delta t}\right)}{\text{tr}\left( P_i \rho(t)\right)}

where P_i is a projection operator corresponding to state i and \rho(t) is the density matrix5. This has the required property that \sum_j p_{i\rightarrow j} = 1, since \sum_i P_i = 1. This gives the same probabilities that would be observed if the state had collapsed to i at time t, but without actually collapsing the state. The problem is that the denominator can be zero, since there is a nonzero probability that the previous hop landed in the state i even if \text{tr}\left(P_i \rho(t)\right) = 0. The state actually has to collapse to ensure this doesn’t happen.

3. It assumes the many worlds never re-merge or overlap.

Consider the observer’s density matrix \rho_O(t)=tr_S(\rho(t)). The diagonal elements in the observed basis \rho_{Oii} = \langle O_i | \rho_O(t) | O_i\rangle are constantly evolving into each other, with \sum_i \rho_{Oii} = 1. A hopping rule is impossible because you cannot tell which previous state a certain \rho_{Oii} “came from” in the past, unless you assume each state comes from just one past state. This is clearly not true in general.

Many-worlds proponents sometimes argue that macroscopic systems in different states are unlikely to revisit the same state. However, then one must pick a certain size (dimensionality) above which re-merging becomes “acceptably” unlikely. There is clearly no fixed size. For an exact theory of physics, one cannot ignore edge cases like this just because they are rare. Ironically, while many-worlds proponents like to point to the seemingly arbitrary nature of wavefunction collapse, it is many-worlds that places arbitrary restrictions on what systems can be considered observers.

Why Copenhagen is fine

The key insight of the Copenhagen interpretation (i.e. quantum mechanics itself) is that a global (objective) reality is not required to make predictions.

One way to understand this is with the Wigner’s friend thought experiment, which I have slightly extended below.

Wigner prepares his friend and a two-state system in a superposition state

(a|\uparrow\rangle + b|\downarrow\rangle)\otimes |\psi_{friend}\rangle

When his friend measures the system, he may obtain the state |\uparrow\rangle. He then tells Wigner his result, so that in his view, Wigner knows that |\uparrow\rangle was measured. However, Wigner models this measurement as the total state

a |\uparrow\rangle\otimes |\uparrow_{observed}\rangle + b |\downarrow\rangle\otimes |\downarrow_{observed}\rangle

When Wigner measures his friend (by asking him about it, perhaps), he may see a different state |\downarrow\rangle\otimes |\downarrow_{observed}\rangle, so he believes that |\downarrow\rangle was measured. Thus, they may both experience totally different things. But each observer sees an internally consistent story, so the theory is consistent. That’s it.

Measuring devices

This subjective view of physics implies that measurements are made on the observer’s Hilbert space, not on external measuring devices. Then why can some objects be considered classical measuring devices in practice? The answer comes down to decoherence. I will explain this in a somewhat roundabout way that highlights the behavior of real measuring devices.

Recall the textbook measurement postulate: a measurement collapses the system to an eigenstate of the measured Hermitian operator, with probability given by Born’s rule. This is often false in practice! For example, in quantum optics, photodetectors may measure position of a photon, but collapse the system to the state of “no photon”.

Real-world measurements are described by so-called general measurements6. These are defined by a set of operators M_i corresponding to the results of the measurement. The probability for result i is:

p_i = \langle \psi | M_i^\dagger M_i | \psi\rangle

upon which the wavefunction collapses to

\displaystyle |\psi\rangle \rightarrow \frac{M_i |\psi\rangle}{\sqrt{\langle \psi | M_i^\dagger M_i | \psi\rangle}}

The measurement operators satisfy the completeness relation

\sum_i M_i^\dagger M_i = 1

M_i do not have to be Hermitian. For a photodetector, they would be something like M_\textbf{n}=|0\rangle\langle \textbf{n}|, where \textbf{n} are some properties of the photon, like position and polarization. General measurements reduce to conventional (projective) measurements when the M_i are Hermitian and orthogonal projectors: M_i M_j = \delta_{ij} M_i.

General measurements are equivalent to unitary interaction of a system with an ideal environment, followed by a projective measurement on the environment. Specifically, consider coupling the system to an environmental Hilbert space: \mathcal{H} = \mathcal{H}_s \otimes \mathcal{H}_e. The environment is initially in the state |0\rangle. Introduce the operator U such that

U(|\psi\rangle \otimes |0\rangle)=\displaystyle \sum_i M_i|\psi\rangle \otimes |i_E\rangle

where |i_E\rangle are orthonormal states of the environment corresponding to the M_i.

You can check that U preserves inner products of the system Hilbert space:

(\langle 0|\otimes \langle v|) U^\dagger U (| w\rangle \otimes |0\rangle) = \langle v | w\rangle

It can be shown that such a U can be extended to a unitary operator U' on the entire Hilbert space. Now if we measure an operator on the environment with eigenstates |i_E\rangle, we obtain one of the system states

\displaystyle \frac{M_i |\psi\rangle}{\sqrt{\langle \psi | M_i^\dagger M_i | \psi\rangle}}

with probability

p_i = \langle \psi | M_i^\dagger M_i | \psi\rangle

just as above.

Look familiar? This interaction U is a more general version of the many-worlds “decoherence equation” above. Thus, the condition for a quantum object to implement a general measurement is that its internal states must interact with the system in this way. Decoherence propagates to the next object and so on until it reaches the observer, who makes the measurement.

Conclusion

In a nutshell: quantum mechanics relaxes the assumption of an objective description of the universe, while still being a predictive physical theory.

FAQ

Q: How is the system measurement basis determined (the preferred-basis problem)?

A: First, recall that we do not measure the system directly, only our brain/body after it has interacted with the system. As to which of our internal states correspond to which perceptions, note that the same question applies to classical physics. In both cases, we must determine this empirically.

Q: Isn’t the boundary between system and observer also arbitrary? How do we determine which degrees of freedom can be perceived?

A: Again, the same question applies to classical physics, and must be determined empirically.

Q: What objects have consciousness?

A: No physical objects have consciousness. From your perspective, all physical objects are part of the wavefunction, and nothing else has the power to collapse the wavefunction. (Yes, this unfortunately leads to a kind of solipsism. It’s a lonely world out there.)


1 “Consciousness” is a dirty word among physicists, usually for good reason. Here, it simply means the ability to perceive things: cogito, ergo sum. In the formalism of quantum mechanics, this translates to the ability to collapse the wavefunction by inquiring about a measurement result. Much confusion results from trying to ascribe consciousness to physical objects or from giving the word additional meanings.

2 Or its field theory generalizations.

3 Semantic note: I sometimes use “observer” to refer to the subspace of the state space that is perceived, and sometimes to the conscious entity that does the measurement to collapse the wavefunction. Many-worlds says the latter does not exist. It should be clear from context which one is meant.

4 Can Born’s rule be derived? No, since probability is nowhere to be found in unitary time evolution, so there must be some axiom introducing probability into the theory. Regardless, whether Born’s rule is fundamental or derived has no bearing on the next section.

5 P_i = P_{Oi} \otimes 1, where P_{Oi} is a projector on the observer’s space and 1 is the identity on the rest of the space.

6 This section mostly comes from Nielsen and Chuang, Quantum Computation and Quantum Information, Ch. 2.2.