Tag Archives: physics

Quantum mechanics for everyone

This post explains quantum mechanics (QM) without any advanced math. Unlike most introductions, I will focus on the interpretation of QM: what the objects in the theory mean and how they fit into a broader philosophy of doing physics. Specifically, I explain why the Von Neumann-Wigner interpretation, a variant of the standard Copenhagen interpretation, is the correct one. I also explain why a popular alternative to Copenhagen, the many-worlds interpretation, is incorrect.

The footnotes will contain details for more advanced readers. Also, see here for a shorter and more math-heavy version of this post.

What is science?

Let’s start with what we know. As Descartes said, “I think, therefore I am.” We know that subjective experience exists. In philosophy, subjective experiences are called qualia (singular quale). One purpose of science (including physics) is to predict what qualia we will experience, based on our past experiences. This is simply because qualia are, by definition, all that we can experience, so any attempt to verify a scientific theory necessarily involves qualia as inputs and outputs.

This focus on subjective experience may sound fuzzy and unrigorous, especially for those used to classical physics. However, it is actually a very conservative viewpoint. Some may say that the goal of science is instead to understand the objective world around us. That may be the case, but at a minimum, a theory must also be able to make predictions about our experiences. More on this as we go along.

The wavefunction and many-worlds

In this section, I will explain the basic ideas of QM, in the language of the many-worlds interpretation (MWI). MWI provides a convenient way to visualize QM as the continual splitting of a system’s state into many branches, or “worlds”. I will then show that MWI alone cannot be used to make predictions, for both practical and mathematical reasons. However, we can fix it by adding the concept of wavefunction collapse. This produces the Copenhagen interpretation.

Quantum mechanics describes the universe using a mathematical object called a wavefunction, with the symbol \psi. In the quantum world, a system can be in a combination of classical states instead of being in one state at a time. For example, a particle can be in two places at once. This is called a superposition.

Fig. 1 shows an example. The particle starts at position A, then over time, it evolves into an equal superposition of position A and B. (The boxes show instants in time.) At this time, if the experimenter measures the position of the particle, they will obtain either A or B with 50% probability1. This is indicated by the “probability amplitude” on top of each box. In QM, probabilities are given by the square of this amplitude. This is called Born’s rule. At any time, the squared amplitudes of all the branches must sum to 1. We get the number on each box as follows. When a box branches into multiple scenarios, we first multiply its amplitude with the number on each outgoing arrow (1/\sqrt{2}). Then, for each new scenario, we sum over all the incoming arrows. For example, on the top box in the superposition, 1/\sqrt{2} comes from 1 on the initial box times 1/\sqrt{2} from the one incoming arrow.

Fig. 1. Superposition.

The numbers on the arrows depend on the particular interactions between the particle and its environment. We will not be concerned with those here.

Of course, the experimenter is also composed of many particles, so should also be included as part of the wavefunction. This is shown in Fig. 2. When the experimenter measures the position, her brain’s particles record a state corresponding to seeing it at either A or B. We say that the experimenter’s state has become entangled with that of the particle.

Fig. 2. Measuring the position causes the experimenter’s brain to change state.

This shows how physics fundamentally works. To make predictions about qualia, a physical theory associates certain mathematical objects, or states, with qualia such as “seeing the particle in position A”. Given an initial state, classical physics predicts a certain future state, which is confirmed or denied by perceiving its associated qualia. In contrast, QM only predicts probabilities of obtaining future states. One way to confirm QM is then to do many identical experiments and then see if the results converge to the right probabilities2.

Does this mean that we must know the entire state of our brain in order to make or verify any predictions? Of course not. In practice, we rely on our eyes, ears, and other measuring devices to sense the world. This is because external inputs to these devices can reliably induce certain states in our brain. For example, light with a wavelength of 700nm that goes into our eyes can reliably induce the sensation of “seeing red”. More on this when we discuss measuring devices and decoherence later.

A prediction rule

Is the wavefunction all you need? No. As the experimenter, simply knowing the wavefunction at a given time does not allow you to make predictions, for the very obvious reason that you don’t know which branch you are on. At the least, you must also keep track of your current branch. For example, if you observe the particle at A, you know you are on the top branch of Fig. 2. Then, for future predictions, you must only use the arrows coming out of that state. Since the total probability must still equal one, you must then divide the probability (squared amplitude) on each future box by the current one on your box.

This is shown in Fig. 3 for multiple splittings. (Here, instead of drawing pictures in the boxes, I use letters A, B, etc. to show general states.) Let’s say you observe that you are in state B. Then in the future, you have a 1/3 chance of being in state D and a 2/3 chance of being in state E. This comes from (1/\sqrt{6})^2/(1/\sqrt{2})^2 = 1/3 and (1/\sqrt{3})^2/(1/\sqrt{2})^2 = 2/3. Even though the wavefunction contains states F and G at the same time as D and E, there is no probability of reaching those states because there are no arrows coming from B.

Fig. 3. Multiple splittings of the wavefunction. The highlighted branch corresponds to observing B instead of C.

This seems like a workable rule for making predictions: whenever you make a measurement, select your branch of the wavefunction and “follow the arrows” from there to predict future measurement results. Note that this rule does not discard the other branches entirely. All branches are still “there” at least mathematically, although most are unreachable in practice.

The wavefunction in this picture is globally shared among all observers. However, each person might perceive themselves to be in a different branch, depending on their random measurement results. This is shown in Fig. 4. Experimenters E1 and E2 measure the particle in turn. E1 may get A, so she selects the top branch. At the end of this branch, she perceives that both agree on position A. However, E2 may get B, so she selects the bottom branch, and perceives that both agree on position B. The key point is that in the end, each observer perceives an agreement on the position, so the measurement results are consistent from their own perspective.

Fig. 4. Experimenters E1 and E2 both measure the particle.

This example is similar to a famous thought experiment called Wigner’s friend. Wigner’s friend has historically been very confusing (as you can see from the Wiki article), so let me elaborate. Clearly, E1’s perceptions only depend on the particles in her own brain, not those in E2’s. When I say that she “perceives an agreement”, I mean that she treats E2 as a physical system and interacts with it, by asking her/it about the particle’s position, perhaps. That system then responds, by saying “A” or “B”, for example. This information gets received and stored in her brain in some form. From E1’s perspective, everything is a physical system, including other humans, animals, her own brain, etc. Only a subset of this system (her brain) corresponds to her perceptions3. Again, this is a very conservative viewpoint, since it does not assume other parts of the system correspond to some other entity’s perceptions. In other words, we do not assume other humans/animals/rocks/etc are “conscious”4.

Wavefunction collapse

So far so good, right? Unfortunately, this prediction rule does not quite work. Mathematically, you must completely discard the other branches every time you make an observation, and only keep the branch you are on. In other words, there can be no globally shared wavefunction. This is because probability amplitudes, unlike probabilities, can be negative. Quantum interference can cause the amplitude of a given scenario to be zero in a global wavefunction, even when that scenario is reachable in practice. If that branch is selected, it gives 0/0 for any future probabilities, which is undefined.

As usual, Fig. 5 shows an example. Assume you measure B. By the rule, you predict a 50% probability of either D or E ((1/2)^2/(1/\sqrt{2})^2=1/2). See Fig. 5(a). Note that we only consider arrows coming from B in this prediction. Then assume D is measured. We now try to apply the rule starting from D. See Fig. 5(b). However, the amplitude of D is zero! This comes from adding the two incoming arrows. We have 1/\sqrt{2}\times 1/\sqrt{2} from B, and 1/\sqrt{2}\times -1/\sqrt{2} from C, adding up to zero.

Fig. 5. (a) Making a prediction upon measuring B. (b) The prediction rule fails upon measuring D.

The solution is to discard all other branches upon each measurement, and set the amplitude of the measured branch equal to 1. This is called wavefunction collapse. It is shown in Fig. 6. When B is measured, we remove C and give B amplitude 1. Then when D is measured, we remove E and give D amplitude 1. This guarantees that probabilities are always well-defined.

Fig. 6. (a) Once B is measured, we discard branch C. (b) Once D is measured, we discard branch E.

Wavefunction collapse is the most controversial aspect of QM. However, from the discussion above, we see that it is basically just a mathematical formality, since the prediction rule is unchanged except in special cases. Remember, we are only concerned with making predictions, not “modeling the world”. This avoids meaningless philosophical issues about whether the wavefunction or its collapse is “real”. The reason many are uncomfortable with collapse is because it is different from classical physics, in the following ways:

  • Different observers use different wavefunctions. In MWI, although observers may find themselves in different branches, there is only one wavefunction. Similarly, the classical universe is in a single big classical state. However, by discarding the other branches, different observers use entirely different mathematical objects (wavefunctions) to describe the universe. Of course, the physics stays the same, since as just mentioned, the prediction rule is almost the same.
  • Wavefunction collapse happens instantaneously. In classical physics, the state evolves continuously in time under Newton’s laws. In quantum physics, apart from wavefunction collapse, the wavefunction also evolves continuously in time under an equation called Schrödinger’s equation5. (We have summarized this continuous evolution using the arrows with numbers on them.) Wavefunction collapse instantly discards the other branches and assigns a new amplitude to the observed branch. How is such a discontinuous process allowed? Because any predictions must specify a time when the measurement yields a definite result. This is when collapse occurs6. More on this later.

The Copenhagen interpretation

This theory of wavefunction evolution plus collapse is loosely called the Copenhagen interpretation. Actually, there is no widely-agreed-upon definition of the Copenhagen interpretation, but one hallmark is the separation of the world into classical and quantum systems. QM was originally developed to describe small objects such as single particles using a wavefunction. In contrast, large objects such as photon detectors or human beings were treated as classical systems that cause wavefunction collapse. For example, a particle detector appears to “collapse” the wavefunction of a superposition state like Fig. 1 into a state with definite position, either A or B. In this picture, the particle detector is not part of the wavefunction.

Of course, this led to much confusion about where exactly to draw the line between classical and quantum. How large does a system have to be in order to become classical? As we have argued above, there is no inherent difference between objects such as particles and humans; they are all quantum systems and all part of the wavefunction. In other words, we draw the line at the observer’s “consciousness”. The act of observation causes collapse. This variant of Copenhagen is sometimes called the Von Neumann-Wigner interpretation, or “consciousness causes collapse”.

Consciousness is a dirty word among serious physicists, almost always for good reason. However, we simply use it to mean the ability to have subjective experiences, which was our very first assumption.

Measuring devices and decoherence

This begs the question of why large systems like particle detectors tend to “look” classical. In fact, this was not fully understood until the theory of decoherence emerged in the 1950s-1970s, decades after QM was developed. The basic idea is quite simple. Take a small system S in one of a few states A, B, C, etc. When it interacts with an environmental system E, this environment turns into a corresponding state E_A, E_B, E_C, etc. For a large environment, these environmental states tend to become well-separated very quickly. This is because there are many more microscopic states that the large environment can take.

For example, Fig. 7 shows a single particle bouncing around in a box. This is a small environmental system. If another particle is placed at position A (top left), eventually they will hit each other, affecting the path of the first particle in some way. If instead the second particle is placed at position B (bottom left), it will affect the first particle in a different way. However, there is a good chance that at some future time, the first particle will happen to be at (nearly) the same location for both scenarios, as seen in Fig. 7.

Fig. 7. Particle in a box with another one placed at either A or B. At some future time, it is likely that the first particle will be at the same location in both scenarios, as shown here.

Now consider a huge number of particles bouncing around in the box. This is a large environmental system. If a new particle is introduced at position A, it will rapidly scramble the paths of all the other particles as they interact with it and with each other. If instead the new particle is introduced at position B, it will scramble the paths in a very different way. At any future time, there is very little chance that all the original particles will be at all the same locations in the two scenarios. The environmental states E_A and E_B are well-separated.

Fig. 8 shows a more accurate version of the measurement in Fig. 2, incorporating decoherence. The wavefunction initially splits into an equal superposition of position states A and B of the particle. At this time, the experimenter is in the same initial state for both branches. The experimenter then measures the particle by interacting with it. For example, there may be some light illuminating the particle, which goes into the experimenter’s eyes, which sends an electrical signal to the brain, etc. After a short amount of time, the experimenter’s brain is in very different states for the two scenarios A and B. This is seen by the nearly zero amplitude of the “observed B” state when the particle is at A (top-most branch), and the nearly zero amplitude of the “observed A” state when the particle is at B (bottom-most branch).

Fig. 8. More accurate version of Fig. 2 that incorporates decoherence.

To summarize: a measuring device looks classical if it causes decoherence. Therefore, you might think that decoherence can be used to define measurement, so that we do not need wavefunction collapse. This is not the case, for a couple of reasons. First, decoherence is never complete. In most decoherence models, the amplitude of the “wrong” branch approaches zero exponentially with time, but never reaches it. Therefore, we cannot define a time when the measurement is complete. Second, decoherence is only an emergent property of large systems. Why should conscious observers be limited to these systems? Indeed, how do we set a lower limit on the size or amount of decoherence anyway? Clearly, we cannot. The theory must still apply to general quantum systems as observers.

For example, consider an observer system that fluctuates rapidly in time, as in Fig. 9. The theory must still be able to associate states of this system with the observer’s perceptions. Since the branches do not remain separated over time, we cannot rely on decoherence. We also cannot say a state must be stable for a minimum amount of time in order to be measured. The observation, and thus collapse, must happen instantaneously.

Fig. 9. An observer in a rapidly fluctuating superposition.

Other interpretations

The Copenhagen interpretation has always been the standard one taught in textbooks. In the last few decades, many other interpretations have sprung up. I myself believed in MWI until I started thinking deeply about QM a few years ago. In my opinion, these other interpretations all stem from misunderstanding either the Copenhagen interpretation or the purpose of a physical theory. I will list some of them and their flaws here without further detail.

  • MWI is incomplete, as argued above.
  • Bohmian mechanics and consistent histories are ugly and overly complicated.
  • Quantum Bayesianism and relational quantum mechanics just dress up Copenhagen with some fancy words.


  • The minimum requirement for a scientific theory is that it makes predictions about an observer’s qualia. It does not have to predict the qualia of other entities, since they are not observable.
  • A theory does this by associating mathematical objects, or states, to certain qualia.
  • Classical physics predicts one future state, while quantum physics only predicts probabilities of each future state. This is done using a wavefunction that splits into multiple scenarios.
  • The wavefunction collapses upon an observation to the observed branch. Thus, different observers use different objects (wavefunctions) to describe the universe. Collapse is required mathematically for the theory to work.
  • Collapse must be instantaneous for the theory to apply to all possible observers.
  • Decoherence explains why certain objects look like classical measuring devices. However, it is only an approximation and does not replace the need for collapse.

1 Why can’t we observe the particle in two places at once? There are two ways to interpret this question in QM. 1) Why do we prefer the position basis instead of another basis? This is known as the preferred-basis problem. The short answer is that the preferred basis must be empirically determined, just as the perception of the color “red” must be correlated with certain wavelengths of light. More in the advanced version of this post. 2) Why can’t we perceive that we are in a superposition, in general? Because then we could prepare an identical state, violating the no-cloning theorem. More on this here, or see Nielsen & Chuang’s textbook.

2 To be pedantic, no experiments can be truly identical, because 1) the initial states cannot be exactly the same, and 2) the state of your brain has to include the memory of previous experiments. Of course, we really mean that for a series of experiments where we control all the relevant inputs, the results stored in your brain will converge to the predicted probabilities. Also, it goes without saying that many states are associated with the same quale: shifting the position of one molecule in your brain by a tiny amount has no observable effect.

3 This begs the question: how do we know what subset we can observe? As usual, we must determine this empirically!

4 Yes, this is basically solipsism. Unfortunately, that is where the logic of QM leads us. Don’t take it so seriously as to affect your personal moral code or anything.

5 Or more generally, the operator generated by the Hamiltonian.

6 Another common belief is that collapse is incompatible with relativity. This is false. Of course, we do not have a complete theory of quantum gravity, but for QFT in curved space, we can choose the collapse to occur on any spacelike hypersurface. This is because spacelike-separated operators commute, so can be simultaneously measured.


Solving Newcomb’s paradox for classical and quantum predictors

A recent HN post reminded me of Newcomb’s paradox, which goes as follows (from Wiki):

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following:

  • Box A is clear, and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor:
    • If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
    • If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.

The player does not know what the predictor predicted or what box B contains while making the choice.

The question is whether the player should take both boxes, or only box B.

I first saw this problem many years ago but didn’t have a strong opinion. Now it seems clear that the controversy is about the definition of “reliable predictor”. This is usually left vague, leading to many unreliable philosophical and game-theory arguments. As usual, I will try to solve the problem using physics. Interestingly, the analysis is different for a classical versus quantum predictor, and also depends on the interpretation of quantum mechanics.

Classical predictor

Assume it is a classical supercomputer that, at prediction time, takes the state of the player and all the objects that they interact with until the decision. Call this state S_i. By running the physics forward, it arrives at either a state S_{AB} or S_B, corresponding to the decision to take both boxes or only box B, respectively. In this case, one should obviously take only box B.

Quantum predictor

In the quantum case, the initial wavefunction of the player/etc is \psi_i. The computer cannot measure the wavefunction directly due to the no-cloning theorem. Instead, one way to make the prediction is as follows. The decision to take both boxes corresponds to a set of orthonormal states \{\psi_{AB}\}, and likewise for \{\psi_B\}. These two sets are mutually orthonormal and form a complete basis, since there are only two choices. Given these sets, the computer can run Schrödinger’s equation back to prediction time to obtain the sets \{\psi_{ABi}\}=e^{i H t}\{\psi_{AB}\} and \{\psi_{Bi}\}=e^{i H t}\{\psi_B\}, respectively. These are also mutually orthonormal due to unitarity. At prediction time, it can measure the projection operator

\displaystyle P_{B}=\sum_a |\psi_{Bi}^a\rangle \langle\psi_{Bi}^a|.

The measurement gives 1 (take box B) with some probability p, and 0 (take both boxes) with probability 1-p. This collapses the player’s wavefunction to one of the states in \{\psi_{ABi}\} or \{\psi_{Bi}\}, which then evolves into a state in \{\psi_{AB}\} or \{\psi_B\}. Thus, from the predictor’s perspective, the predictor is always right.

The player models this measurement as the predictor becoming entangled with the player, so that the total wavefunction is something like

\displaystyle \sqrt{p}(\psi_{Bi}\otimes \psi_\text{predictB}) + \sqrt{1-p}(\psi_{ABi}\otimes\psi_\text{predictAB}).

If the player only makes a measurement at decision time, they will collapse the wavefunction to a state in \{\psi_{B}\} with probability p, or a state in \{\psi_{AB}\} with probability 1-p. We assume that this is the measurement basis since the player’s state should not become a superposition of (take B only) and (take both). The expected value is then simply:

\displaystyle E[p] = p B + (1-p)A = A+p(B-A)

where A=\text{\$1,000}, B=\text{\$1,000,000}. This is maximized at p=1, so the best decision is to take only box B, just as in the classical case.

Where we go from here depends on the interpretation of quantum mechanics. For many-worlds, there is only unitary evolution. The player ends up in the branch \psi_{B}\otimes \psi_\text{predictB} with probability p, giving the expected value above.

However, for Copenhagen-type interpretations where different observers can use different wavefunctions, the player can do better, since they are free to make any measurements between prediction and decision time, while the predictor assumes unitary evolution1. In fact, they can make the predictor predict (take B only) with certainty, while they actually take both with certainty. One way is as follows. Assume the player makes the decision based on measuring a qubit at decision time, where |\uparrow\rangle means take B only and |\downarrow\rangle means take both. The state of the qubit oscillates between |\uparrow\rangle and |\downarrow\rangle with period T, where T is the time between prediction and decision. At prediction time, assume the state is |\uparrow\rangle, so the predictor predicts (take B only). At time T/2, the player can make repeated measurements very quickly until decision time. The qubit stays in the |\downarrow\rangle state due to the quantum Zeno effect. Thus, at decision time, the player takes both boxes. The extra $1,000 can then contribute to funding the delicate and expensive equipment needed for the qubit.

We can take this one step further in some cases. For human players, the knowledge of the measurement protocol is classically encoded in the player’s brain in some way. If the supercomputer can decode this information instead of merely running the time evolution, they can also predict which measurements the player makes, and the probabilities of the subsequent results. We arrive back to the original case, where the best solution is to pick B only. This is not required by the postulates of quantum mechanics. The observer’s decision to make measurements on its state does not necessarily have to be encoded in its state itself.

Real predictor

In the real world, there are no such supercomputers, and no entity would risk $1,000,000 on a meaningless game. The best answer is to take both boxes.

1 In practice, a human’s measurements of their own state occur long after decoherence, so they have no control of their wavefunction in this way. However, if we are assuming all-powerful supercomputers, we may as well go all the way.

Fundamentals of classical mechanics, or why F = ma

Despite its simplicity, classical mechanics is not taught well in the typical physics curriculum. This is unfortunate because the general philosophy of constructing Lagrangians based on symmetries underlies all of modern physics. In this article, I explain basic Lagrangian mechanics in a systematic way starting from fundamental physical principles. It basically follows Landau and Lifshitz Vol. 1 but ties up some loose ends.

Principle of stationary action

Classical mechanics describes the motion of objects modeled as point particles. First, consider a single particle in empty space. At any given time, it has a position \vec x(t) and velocity \vec v(t)=\frac{d\vec{x}}{dt}.

Define a quantity S_{if}\{\vec x(t)\} that depends on the path of the particle \vec x(t) from time t_i to t_f. The principle of stationary action, or action principle, states that the path the particle actually takes is one where the action is stable to small perturbations in the path \vec x(t) \rightarrow \vec x(t) + \vec{\delta x}(t).

To elaborate, consider dividing the time interval from t_i to t_f into N segments, and take N\rightarrow \infty in the end. You may think of S_{if} as a function of many variables \{\vec{x}(t_i),t_i,\vec{x}(t_i+\Delta t),t_i+\Delta t,\cdots, \vec{x}(t_f), t_f\}, where \Delta t = (t_f-t_i)/N. (Note that the velocity \vec{v}(t) = \frac{\vec{x}(t+\Delta t)-\vec{x}(t)}{\Delta t}, so it is not an independent variable here.) Such a “function of a function” is called a functional. The principle of stationary action is then \frac{\delta S_{12}}{\delta x_i(t)}=0, i.e. the partial derivative of S_{12} with respect to any component of the position x_i at any time t is zero. The \delta symbol is generally used instead of \partial for functional derivatives.

Finally, the action principle only applies to perturbations that are zero at the boundaries: \vec{\delta x}(t_i) = \vec{\delta x}(t_f) = 0. This will become important later.

The Lagrangian

Consider the action S_{12} for time t_1 to t_2, and the action S_{34} for time t_3 to t_4, with t_1 < t_2 < t_3 < t_4. We require locality in time, meaning that a perturbation in the first interval only affects S_{12} and not S_{34}. Also, we assume additivity of the action: S_{12}+S_{23}=S_{13}. These conditions imply that S_{12} can be written as an integral from t_1 to t_2 of some quantity: S_{12}=\int_{t_1}^{t_2} \mathcal{L}(\vec{x}(t),\vec{v}(t), t). \mathcal{L}(\vec{x}(t),\vec{v}(t), t) is known as the Lagrangian. In general, it may depend on the position and velocity at time t, as well as the time t itself1.

Note that we may add a total time derivative \frac{df}{dt}(\vec{x},t) to the Lagrangian without affecting the principle of stationary action. Such a term produces the action:

\displaystyle\int_{t_i}^{t_f} dt\frac{df}{dt}(\vec{x},t) = f(\vec{x}(t_f), t_f)-f(\vec{x}(t_i), t_i)

by the fundamental theorem of calculus. The perturbation \vec{\delta x}(t) is zero at the boundaries by definition, so does not affect this action.

Let us now derive the form of the Lagrangian based on some other fundamental principles:

Homogeneity of space and time. No point in space or time is any different from any other, so the Lagrangian cannot depend on \vec{x} or t explicitly.

Isotropy of space. No direction in space is different from any other, so the Lagrangian can only depend on the magnitude (squared) of the velocity \vec{v}(t)^2.

Galilean invariance. The theory should be invariant under shifts by a constant velocity, \vec{x}\rightarrow \vec{x}+\vec{v}_0 t. In other words, there is no universal stationary frame of reference. Taking the time derivative, this is \vec{v}\rightarrow \vec{v}+\vec{v}_0. To first order in \vec{v}_0, the Lagrangian changes as

\displaystyle\mathcal{L}(\vec{v}^2)\rightarrow \mathcal{L}(\vec{v}^2+2\vec{v}\cdot \vec{v}_0) = \mathcal{L}(\vec{v}^2)+2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0

The term 2\frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) \vec{v}\cdot \vec{v}_0 will not affect the physics if it is a total time derivative of the form above. This only occurs if \frac{\delta \mathcal{L}}{\delta \vec{v}^2}(\vec{v}^2) is a constant. Call this constant \frac{1}{2} m. Thus, the Lagrangian for a single particle in free space is: \mathcal{L} = \frac{1}{2} m \vec{v}^2. The constant m is, of course, the mass.

To summarize, we derived the unique action and Lagrangian (up to a total time derivative) for a single particle from the following postulates:

  1. Locality in time
  2. Additivity of the action
  3. Homogeneity of space and time
  4. Isotropy of space
  5. Galilean invariance

Multiple particles

Now consider the n-particle case. The Lagrangian may generally depend on all the positions and velocities \vec{x}_1, \vec{v}_1, \cdots, \vec{x}_n, \vec{v}_n. Following the postulates above, it must take the form2:

\displaystyle \mathcal{L} = \left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

where the function U(\Delta \vec{x}_{ij}) depends on all the separations between the particles \{\Delta\vec{x}_{12} = \vec{x}_1-\vec{x}_2, \Delta\vec{x}_{13} =\vec{x}_1-\vec{x}_3, \cdots\}.

Euler-Lagrange equations

Let us now apply the principle of stationary action to the action:

\displaystyle S=\int dt\left(\sum_{i=1}^n \frac{1}{2} m_i \vec{v}_i^2\right) - U(\Delta \vec{x}_{ij})

Plugging in the variation \vec{x}_i\rightarrow \vec{x}_i+\vec{\delta x}_i for particle i, and expanding to first order in \vec{\delta x}_i, we get:

\displaystyle S\rightarrow S+ \int dt\left(m_i \vec{v}_i\cdot \vec{\delta v}_i - \nabla_i U \cdot \vec{\delta x}_i\right)

where \nabla_i U is the gradient of U with respect to \vec{x}_i. Using \vec{\delta v}=\frac{d}{dt}\vec{\delta x}, we can integrate the first term by parts, discarding the boundary term m_i \vec{v}_i\cdot \vec{\delta x}_i since \vec{\delta x}_i= 0 at the boundaries. We obtain:

\displaystyle \frac{\delta S}{\delta \vec{x}_i(t)}=-m_i \vec{a}_i(t)-\nabla_i U(t) = 0

where \vec{a} = \frac{d\vec{v}}{dt}. The equations obtained using the action principle are known as Euler-Lagrange equations or equations of motion. In this case, we have found Newton’s law for a conservative potential:

\displaystyle \vec{F} = -\nabla_i U=m_i \vec{a}_i

Beyond classical mechanics

Finally, it is interesting to see how the postulates above are modified in quantum and relativistic theories.

  1. Principle of stationary action. In quantum physics, the particle takes all paths instead of only the classical one! The quantum amplitude is given by summing up e^{i S\{x\}} over all paths. This is known as a path integral.
  2. Locality in time gets promoted to locality in space and time in field theory.
  3. Additivity of the action remains the same.
  4. Homogeneity of space and time remains the same.
  5. Isotropy of space remains the same.
  6. Galilean invariance is promoted to Lorentz invariance in relativity. Lorentz transformations relate space and time.

In modern theories, there are often additional symmetry principles that constrain the Lagrangian, such as gauge invariance and conformal invariance.

1 It also cannot depend on higher time derivatives due to the Ostrogradsky instability.

2 A term like \vec{v}_i\cdot \vec{v}_j with i \neq j is possible, but would imply that particles infinitely far away can affect each other, violating common sense (or, if you like, the cluster decomposition principle).