# Post History

##
**#3: Post edited**

- "Disorder" and "order" don't mean anything with regards to entropy. This is a common "science popularization" level description of entropy that is "not even wrong" in that, as you've seen, these terms are not defined, and especially not defined in terms of usual physical models.
- Now one *could* define "(dis)order" but generally one of three things will happen. First, it will be used purely as a synonym for entropy or a closely related concept and thus provides no explanatory power. Second, "(dis)order" is defined as [some concept related to entropy in a non-trivial way](https://en.wikipedia.org/wiki/Entropy_(energy_dispersal)), but it doesn't well connect with the colloquial meaning of "(dis)order" and so is perhaps not the best choice of word. As a final option, "(dis)order" *is* defined [in a way that better aligns with colloquial usage](https://en.wikipedia.org/wiki/Phase_transitions#Order_parameters), but then it doesn't always align with entropy.
- In practice, as you've seen, when "(dis)order" is brought up in this context, *no* definition is given and it seems the intent is that the colloquial understanding of the term is intended. Unfortunately, this 1) fails properly explain entropy, in particular, it doesn't touch on the right property, and, more objectively, 2) it simply doesn't always match with the intuitive notion of "(dis)ordered".
- A good example of this mismatch arises in liquid crystals. The molecules making up certain types of liquid crystals are modeled as rods. One of the higher entropy phases is the [nematic phase](https://en.wikipedia.org/wiki/Liquid_crystal#Nematic_phase) where all the rods are aligned. Lower entropy states do allow the rods to have a greater variety of orientations. This seems at best non-obvious if not outright counter-intuitive with the "disorder" framing: all the rods aligned seems a pretty "ordered" arrangement. With the (actual) "volume of phase space" definition of entropy, what's happening is much easier to understand: [the reduction in the volume of the phase space arising from collapsing the orientation part to a point is more than made up for by the increase in the volume of phase space in the positional part](https://en.wikipedia.org/wiki/Liquid_crystal#Onsager_hard-rod_model). Roughly speaking, when the rods are aligned, they can more easily slide around and past each other. With respect to the [orientation order parameter](https://en.wikipedia.org/wiki/Liquid_crystal#Order_parameter), perfectly aligned rods have maximum (orientation) order, but it is also a higher entropy state than some other phases with less orientation order. So from the "disorder" view of entropy, increasing the orientation order can increase the "disorder".
~~So what is entropy? You gave Boltzmann's definition, but, unsurprisingly, there is a lot of implicit context and suppressed detail in that brief formula. It is also a rather special case of a more general definition, that is, in my opinion, also far more enlightening. In my opinion, the appropriately general level to formulate entropy is via the notion of [**relative entropy**](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) (aka [the negation of the] KL-divergence) defined as follows in the continuous case: $$S[p, q] = -\int p(x)\log\left(\frac{p(x)}{q(x)} ight)dx$$ or, in the discrete case: $$S[p, q] = -\sum_x p(x)\log\left(\frac{p(x)}{q(x)} ight)$$ $S[p, q]$ should be read as "the entropy of (the probability distribution) $p$ relative to (the probability distribution) $q$". We require $q(x) = 0$ implies $p(x) = 0$. I've purposely omitted the domain of the integral/sum which should be taken as the entire domain that the probability distributions are defined over. The non-relative entropy can then be defined as $S[p] = S[p, p_U]$ where $p_U$ is the uniform distribution. This already starts to reveal issues with this non-relative notion since uniform distributions get tricky in continuous contexts, and if we have a function $f$ such that $p \circ f$ is a probability distribution whenever $p$ is, then we can show that $S[p \circ f, q \circ f] = S[p, q]$ but $S[p \circ f]~~~~eq S[p]$ since $p_U \circ f$ certainly doesn't need to be a uniform distribution. In other words, relative entropy is invariant under parameterization while the non-relative form is not. To get Boltzmann's formula, we further narrow down to a $p$ of the form $p(x) = \begin{cases} 1/|\Omega|, & x \in \Omega \\\\ 0, & x~~~~otin \Omega\end{cases}$ where $\Omega$ is some subset of the domain and $|\Omega|$ is its volume. For this choice of $p$, $S[p] = \log |\Omega|$ modulo an additive constant and absorbing the constant into the base of the logarithm.~~- One thing you might have noticed is the definition of relative entropy doesn't seem to have anything at all to do with physics. The definition makes just as much sense if the probability distributions are over products and specify how likely that product is to sell. This is because entropy is a tool of inference in general. Its use in physics is just an application of this tool. Once this is understood, many aspects of the notion of entropy as used in physics make much more sense. For example, the "statistical" nature of, e.g. the second law of thermodynamics, becomes expected as well as the fact that there isn't a "mechanical" derivation for that law. The fact that thermodynamics works for a huge diversity of materials becomes a less surprising fact. That things like [thermoeconomics](https://en.wikipedia.org/wiki/Thermoeconomics) succeed at all is also no longer very surprising. A common operation is to find the probability distribution subject to some constraints that maximizes (relative) entropy. That is, find a $p$ such that $F[p]=0$ for some functional $F$ for which $S[p,q]$ is maximum for a given $q$. (This is a little clearer in terms of KL-divergence where this is the same as saying: find the probability distribution $p$ that is as "close" to $q$ as possible while still satisfying the constraint $F$.) To emphasize that entropy is about inference, the operation of maximizing relative entropy gives a mechanism for updating a "prior" probability distribution into "posterior" probability distribution. This *directly* generalizes [Bayes' rule](https://en.wikipedia.org/wiki/Bayes'_theorem#Statement_of_theorem)^[One subtle but crucial aspect of this is to understand what Bayes' rule is actually saying. The formula $P(H\mid D) = P(D\mid H)P(H)/P(D)$ is easily derived from the basic laws of probability, but this formula isn't the content of Bayes' rule. Bayes' rule is about how we update our probabilities when learning new information. Let $P_s(H)$ be our probability for $H$ before learning $D$ and $P_f(H)$ be our probability of $H$ after learning (only) $D$. Bayes' rule is that $P_f(H) = P_s(H\mid D)$. While Bayes' formula is just a mathematical theorem given the laws of probability theory, Bayes' rule is something one could reasonably argue against.]. This is spelled out in section 6.6 of [Entropic Inference and the Foundations of Physics](
- https://livealbany.sharepoint.com/sites/web_physics/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fweb%5Fphysics%2FShared%20Documents%2FProfessor%20Docs%2FCaticha%2FACaticha%2DEIFP%2Dbook%2Epdf&parent=%2Fsites%2Fweb%5Fphysics%2FShared%20Documents%2FProfessor%20Docs%2FCaticha&p=true&originalPath=aHR0cHM6Ly9saXZlYWxiYW55LnNoYXJlcG9pbnQuY29tLzpiOi9zL3dlYl9waHlzaWNzL0VhTklGck1ERDIxR2lHV3BaWDlpR2JjQk4tdlZLT2NqN1NDejd4WGRucE9QWGc_cnRpbWU9amJRQTBmQ1UyVWc) by Ariel Caticha. (In fact, pretty much everything I'm saying in this answer is covered by this book.)
- Related to this, an additional aspect that makes things confusing when talking about entropy is what is used is not directly the (relative) entropy I defined above but rather its maximum subject to some constraints^[Another more subtle aspect is that traditional thermodynamics only defines temperature and entropy for states of thermodynamic equilibrium. This makes many non-technical arguments invoking entropy fail immediately, because they are being applied to systems that are not at thermodynamic equilibrium. That said, the understanding of entropy I'm describing here does open the door to a general theory of non-equilibrium statistical mechanics and a "second law" with broader generality.]. You often see expressions like $S(U, V)$, e.g. [here](https://en.wikipedia.org/wiki/Free_entropy#Entropy), where $U$ and $V$ are (internal) energy and volume respectively. What these expressions stand for is the *maximum* entropy that can be achieved subject to some constraints which are parameterized by $U$ and $V$. For example, for a simple gas, $S(U)$ might be the entropy, $S[p^\*]$, of the probability distribution $p^\*$ which is zero everywhere the (total) kinetic energy isn't $U$, i.e. $p^\*((x,v)) = 0$ when $\sum\_i mv\_i^2/2 \neq U$. This distribution is of the form that leads to Boltzmann's form of entropy. These variables which parameterize the constraints, $U$ in this case, determine the macrostates.
- The source of the second law of thermodynamics starts making itself known. If we start with a given macrostate, then we actually have microstate that is compatible with that macrostate but we don't know which one. If we perform an experiment starting from this initial macrostate we'll get some final macrostate. If the experiment is reproducible, we will consistently get the same final macrostate given the same initial macrostate. That is, all or at least nearly all of microstates compatible with the initial macrostate evolve to a microstate compatible with the same final macrostate. If we evolve the subvolume of phase space corresponding to all microstates compatible with the initial macrostate, then [Liouville's theorem](https://en.wikipedia.org/wiki/Liouville's_theorem_(Hamiltonian)) states that this volume is preserved and thus the Bolzmann entropy would be conserved. However, this entropy may be lower than the *maximum* entropy corresponding to the final macrostate, and this maximized entropy is what the second law of thermodynamics is talking about. The maximum entropy for a given macrostate is clearly not a function of how you arrived at that macrostate, while, of course, the subvolume of phase space the initial macrostate evolves to clearly is. The "information" of how we arrived at the final macrostate is lost when we consider the maximum entropy for that macrostate, and (some of) this lost "information" corresponds to the increase in entropy.
- In addition to the Caticha's book referenced earlier, I'd also recommend the papers [The Evolution of Carnot's Principle](https://bayes.wustl.edu/etj/articles/ccarnot.pdf) and [The Gibbs Paradox](https://bayes.wustl.edu/etj/articles/gibbs.paradox.pdf) by E. T. Jaynes. These topics are also covered by the book but from a different perspective and with a very different style. These contain discussions of the second law of the thermodynamics and how it relates to reproducibility that are also very good and clarifying.

- "Disorder" and "order" don't mean anything with regards to entropy. This is a common "science popularization" level description of entropy that is "not even wrong" in that, as you've seen, these terms are not defined, and especially not defined in terms of usual physical models.
- Now one *could* define "(dis)order" but generally one of three things will happen. First, it will be used purely as a synonym for entropy or a closely related concept and thus provides no explanatory power. Second, "(dis)order" is defined as [some concept related to entropy in a non-trivial way](https://en.wikipedia.org/wiki/Entropy_(energy_dispersal)), but it doesn't well connect with the colloquial meaning of "(dis)order" and so is perhaps not the best choice of word. As a final option, "(dis)order" *is* defined [in a way that better aligns with colloquial usage](https://en.wikipedia.org/wiki/Phase_transitions#Order_parameters), but then it doesn't always align with entropy.
- In practice, as you've seen, when "(dis)order" is brought up in this context, *no* definition is given and it seems the intent is that the colloquial understanding of the term is intended. Unfortunately, this 1) fails properly explain entropy, in particular, it doesn't touch on the right property, and, more objectively, 2) it simply doesn't always match with the intuitive notion of "(dis)ordered".
- A good example of this mismatch arises in liquid crystals. The molecules making up certain types of liquid crystals are modeled as rods. One of the higher entropy phases is the [nematic phase](https://en.wikipedia.org/wiki/Liquid_crystal#Nematic_phase) where all the rods are aligned. Lower entropy states do allow the rods to have a greater variety of orientations. This seems at best non-obvious if not outright counter-intuitive with the "disorder" framing: all the rods aligned seems a pretty "ordered" arrangement. With the (actual) "volume of phase space" definition of entropy, what's happening is much easier to understand: [the reduction in the volume of the phase space arising from collapsing the orientation part to a point is more than made up for by the increase in the volume of phase space in the positional part](https://en.wikipedia.org/wiki/Liquid_crystal#Onsager_hard-rod_model). Roughly speaking, when the rods are aligned, they can more easily slide around and past each other. With respect to the [orientation order parameter](https://en.wikipedia.org/wiki/Liquid_crystal#Order_parameter), perfectly aligned rods have maximum (orientation) order, but it is also a higher entropy state than some other phases with less orientation order. So from the "disorder" view of entropy, increasing the orientation order can increase the "disorder".
- So what is entropy? You gave Boltzmann's definition, but, unsurprisingly, there is a lot of implicit context and suppressed detail in that brief formula. It is also a rather special case of a more general definition, that is, in my opinion, also far more enlightening. In my opinion, the appropriately general level to formulate entropy is via the notion of [**relative entropy**](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) (aka [the negation of the] KL-divergence) defined as follows in the continuous case: $$S[p, q] = -\int p(x)\log\left(\frac{p(x)}{q(x)} ight)dx$$ or, in the discrete case: $$S[p, q] = -\sum_x p(x)\log\left(\frac{p(x)}{q(x)} ight)$$ $S[p, q]$ should be read as "the entropy of (the probability distribution) $p$ relative to (the probability distribution) $q$". We require $q(x) = 0$ implies $p(x) = 0$. I've purposely omitted the domain of the integral/sum which should be taken as the entire domain that the probability distributions are defined over. The non-relative entropy can then be defined as $S[p] = S[p, p_U]$ where $p_U$ is the uniform distribution. This already starts to reveal issues with this non-relative notion since uniform distributions get tricky in continuous contexts, and if we have a function $f$ such that $p \circ f$ is a probability distribution whenever $p$ is, then we can show that $S[p \circ f, q \circ f] = S[p, q]$ but $S[p \circ f]
- eq S[p]$ since $p_U \circ f$ certainly doesn't need to be a uniform distribution. In other words, relative entropy is invariant under
**re**parameterization while the non-relative form is not. To get Boltzmann's formula, we further narrow down to a $p$ of the form $p(x) = \begin{cases} 1/|\Omega|, & x \in \Omega \\\\ 0, & x - otin \Omega\end{cases}$ where $\Omega$ is some subset of the domain and $|\Omega|$ is its volume. For this choice of $p$, $S[p] = \log |\Omega|$ modulo an additive constant and absorbing the constant into the base of the logarithm.
- One thing you might have noticed is the definition of relative entropy doesn't seem to have anything at all to do with physics. The definition makes just as much sense if the probability distributions are over products and specify how likely that product is to sell. This is because entropy is a tool of inference in general. Its use in physics is just an application of this tool. Once this is understood, many aspects of the notion of entropy as used in physics make much more sense. For example, the "statistical" nature of, e.g. the second law of thermodynamics, becomes expected as well as the fact that there isn't a "mechanical" derivation for that law. The fact that thermodynamics works for a huge diversity of materials becomes a less surprising fact. That things like [thermoeconomics](https://en.wikipedia.org/wiki/Thermoeconomics) succeed at all is also no longer very surprising. A common operation is to find the probability distribution subject to some constraints that maximizes (relative) entropy. That is, find a $p$ such that $F[p]=0$ for some functional $F$ for which $S[p,q]$ is maximum for a given $q$. (This is a little clearer in terms of KL-divergence where this is the same as saying: find the probability distribution $p$ that is as "close" to $q$ as possible while still satisfying the constraint $F$.) To emphasize that entropy is about inference, the operation of maximizing relative entropy gives a mechanism for updating a "prior" probability distribution into "posterior" probability distribution. This *directly* generalizes [Bayes' rule](https://en.wikipedia.org/wiki/Bayes'_theorem#Statement_of_theorem)^[One subtle but crucial aspect of this is to understand what Bayes' rule is actually saying. The formula $P(H\mid D) = P(D\mid H)P(H)/P(D)$ is easily derived from the basic laws of probability, but this formula isn't the content of Bayes' rule. Bayes' rule is about how we update our probabilities when learning new information. Let $P_s(H)$ be our probability for $H$ before learning $D$ and $P_f(H)$ be our probability of $H$ after learning (only) $D$. Bayes' rule is that $P_f(H) = P_s(H\mid D)$. While Bayes' formula is just a mathematical theorem given the laws of probability theory, Bayes' rule is something one could reasonably argue against.]. This is spelled out in section 6.6 of [Entropic Inference and the Foundations of Physics](
- https://livealbany.sharepoint.com/sites/web_physics/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fweb%5Fphysics%2FShared%20Documents%2FProfessor%20Docs%2FCaticha%2FACaticha%2DEIFP%2Dbook%2Epdf&parent=%2Fsites%2Fweb%5Fphysics%2FShared%20Documents%2FProfessor%20Docs%2FCaticha&p=true&originalPath=aHR0cHM6Ly9saXZlYWxiYW55LnNoYXJlcG9pbnQuY29tLzpiOi9zL3dlYl9waHlzaWNzL0VhTklGck1ERDIxR2lHV3BaWDlpR2JjQk4tdlZLT2NqN1NDejd4WGRucE9QWGc_cnRpbWU9amJRQTBmQ1UyVWc) by Ariel Caticha. (In fact, pretty much everything I'm saying in this answer is covered by this book.)
- Related to this, an additional aspect that makes things confusing when talking about entropy is what is used is not directly the (relative) entropy I defined above but rather its maximum subject to some constraints^[Another more subtle aspect is that traditional thermodynamics only defines temperature and entropy for states of thermodynamic equilibrium. This makes many non-technical arguments invoking entropy fail immediately, because they are being applied to systems that are not at thermodynamic equilibrium. That said, the understanding of entropy I'm describing here does open the door to a general theory of non-equilibrium statistical mechanics and a "second law" with broader generality.]. You often see expressions like $S(U, V)$, e.g. [here](https://en.wikipedia.org/wiki/Free_entropy#Entropy), where $U$ and $V$ are (internal) energy and volume respectively. What these expressions stand for is the *maximum* entropy that can be achieved subject to some constraints which are parameterized by $U$ and $V$. For example, for a simple gas, $S(U)$ might be the entropy, $S[p^\*]$, of the probability distribution $p^\*$ which is zero everywhere the (total) kinetic energy isn't $U$, i.e. $p^\*((x,v)) = 0$ when $\sum\_i mv\_i^2/2 \neq U$. This distribution is of the form that leads to Boltzmann's form of entropy. These variables which parameterize the constraints, $U$ in this case, determine the macrostates.
- The source of the second law of thermodynamics starts making itself known. If we start with a given macrostate, then we actually have microstate that is compatible with that macrostate but we don't know which one. If we perform an experiment starting from this initial macrostate we'll get some final macrostate. If the experiment is reproducible, we will consistently get the same final macrostate given the same initial macrostate. That is, all or at least nearly all of microstates compatible with the initial macrostate evolve to a microstate compatible with the same final macrostate. If we evolve the subvolume of phase space corresponding to all microstates compatible with the initial macrostate, then [Liouville's theorem](https://en.wikipedia.org/wiki/Liouville's_theorem_(Hamiltonian)) states that this volume is preserved and thus the Bolzmann entropy would be conserved. However, this entropy may be lower than the *maximum* entropy corresponding to the final macrostate, and this maximized entropy is what the second law of thermodynamics is talking about. The maximum entropy for a given macrostate is clearly not a function of how you arrived at that macrostate, while, of course, the subvolume of phase space the initial macrostate evolves to clearly is. The "information" of how we arrived at the final macrostate is lost when we consider the maximum entropy for that macrostate, and (some of) this lost "information" corresponds to the increase in entropy.
- In addition to the Caticha's book referenced earlier, I'd also recommend the papers [The Evolution of Carnot's Principle](https://bayes.wustl.edu/etj/articles/ccarnot.pdf) and [The Gibbs Paradox](https://bayes.wustl.edu/etj/articles/gibbs.paradox.pdf) by E. T. Jaynes. These topics are also covered by the book but from a different perspective and with a very different style. These contain discussions of the second law of the thermodynamics and how it relates to reproducibility that are also very good and clarifying.

##
**#2: Post edited**

- "Disorder" and "order" don't mean anything with regards to entropy. This is a common "science popularization" level description of entropy that is "not even wrong" in that, as you've seen, these terms are not defined, and especially not defined in terms of usual physical models.
~~Now one *could* define "(dis)order" but generally one of three things will happen. First, it~~**is**used purely as a synonym for entropy or a closely related concept and thus provides no explanatory power. Second, "(dis)order" is defined as [some concept related to entropy in a non-trivial way](https://en.wikipedia.org/wiki/Entropy_(energy_dispersal)) but it doesn't well connect with the colloquial meaning of "(dis)order" and so is perhaps not the best choice of word. As a final option, "(dis)order" *is* defined [in a way that better aligns with colloquial usage](https://en.wikipedia.org/wiki/Phase_transitions#Order_parameters), but then it doesn't always align with entropy.~~In practice, as you've seen, when "(dis)order" is brought up in this context, *no* definition is given and it seems the intent is that the colloquial understanding of the term is intended. Unfortunately, this 1) fails properly explain entropy, in particular, it doesn't touch on the right property, and, more objectively, 2) it simply doesn't always match with the intuitive notion of "(dis)ordered".~~**A good example of this mismatch arises in liquid crystals. The molecules making up certain types of liquid crystals are modeled as rods. One of the higher entropy phases is the [nematic phase](https://en.wikipedia.org/wiki/Liquid_crystal#Nematic_phase) where all the rods are aligned. Lower entropy states do allow the rods to have a greater variety of orientations. This seems at best non-obvious if not outright counter-intuitive with the "disorder" framing: all the rods aligned seems a pretty "ordered" arrangement. With the (actual) "volume of phase space" definition of entropy, what's happening is much easier to understand: [the reduction in the volume of the phase space arising from collapsing the orientation part to a point is more than made up for by the increase in the volume of phase space in the positional part](https://en.wikipedia.org/wiki/Liquid_crystal#Onsager_hard-rod_model). Roughly speaking, when the rods are aligned, they can more easily slide around and past each other. With respect to the [orientation order parameter](https://en.wikipedia.org/wiki/Liquid_crystal#Order_parameter), perfectly aligned rods have maximum (orientation) order, but it is also a higher entropy state than some other phases with less orientation order. So from the "disorder" view of entropy, increasing the orientation order can increase the "disorder".**~~So what is entropy? You gave Boltzmann's definition, but, unsurprisingly, there is a lot of implicit context and suppressed detail in that brief formula. It is also a rather special case of a more general definition, that is, in my opinion, also far more enlightening. In my opinion, the appropriately general level to formulate entropy is via the notion of [**relative entropy**](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) (aka [the negation of the] KL-divergence) defined as follows in the continuous case: $$S[p, q] = -\int p(x)\log\left(\frac{p(x)}{q(x)} ight)dx$$ or, in the discrete case: $$S[p, q] = -\sum_x p(x)\log\left(\frac{p(x)}{q(x)} ight)$$ $S[p, q]$ should be read as "the entropy of (the probability distribution) $p$ relative to (the probability distribution) $q$". We require $q(x) = 0$ implies $p(x) = 0$. I've purposely omitted the domain of the integral/sum which should be taken as the entire domain that the probability distributions are defined over. The non-relative entropy can then be defined as $S[p] = S[p, p_U]$ where $p_U$ is the uniform distribution. This already starts to reveal issues with this non-relative notion since uniform distributions get tricky in continuous contexts, and if we have a function $f$ such that $p \circ f$ is a probability distribution whenever $p$ is, then we can show that $S[p \circ f, q \circ f] = S[p, q]$ but $S[p \circ f]~~~~eq S[p]$ since $p_U \circ f$ certainly doesn't need to be a uniform distribution. To get Boltzmann's formula, we further narrow down to a $p$ of the form $p(x) = \begin{cases} 1/|\Omega|, & x \in \Omega \\\\ 0, & x~~~~otin \Omega\end{cases}$ where $\Omega$ is some subset of the domain and $|\Omega|$ is its volume. For this choice of $p$, $S[p] = \log |\Omega|$ modulo an additive constant and absorbing the constant into the base of the logarithm.~~- One thing you might have noticed is the definition of relative entropy doesn't seem to have anything at all to do with physics. The definition makes just as much sense if the probability distributions are over products and specify how likely that product is to sell. This is because entropy is a tool of inference in general. Its use in physics is just an application of this tool. Once this is understood, many aspects of the notion of entropy as used in physics make much more sense. For example, the "statistical" nature of, e.g. the second law of thermodynamics, becomes expected as well as the fact that there isn't a "mechanical" derivation for that law. The fact that thermodynamics works for a huge diversity of materials becomes a less surprising fact. That things like [thermoeconomics](https://en.wikipedia.org/wiki/Thermoeconomics) succeed at all is also no longer very surprising. A common operation is to find the probability distribution subject to some constraints that maximizes (relative) entropy. That is, find a $p$ such that $F[p]=0$ for some functional $F$ for which $S[p,q]$ is maximum for a given $q$. (This is a little clearer in terms of KL-divergence where this is the same as saying: find the probability distribution $p$ that is as "close" to $q$ as possible while still satisfying the constraint $F$.) To emphasize that entropy is about inference, the operation of maximizing relative entropy gives a mechanism for updating a "prior" probability distribution into "posterior" probability distribution. This *directly* generalizes [Bayes' rule](https://en.wikipedia.org/wiki/Bayes'_theorem#Statement_of_theorem)^[One subtle but crucial aspect of this is to understand what Bayes' rule is actually saying. The formula $P(H\mid D) = P(D\mid H)P(H)/P(D)$ is easily derived from the basic laws of probability, but this formula isn't the content of Bayes' rule. Bayes' rule is about how we update our probabilities when learning new information. Let $P_s(H)$ be our probability for $H$ before learning $D$ and $P_f(H)$ be our probability of $H$ after learning (only) $D$. Bayes' rule is that $P_f(H) = P_s(H\mid D)$. While Bayes' formula is just a mathematical theorem given the laws of probability theory, Bayes' rule is something one could reasonably argue against.]. This is spelled out in section 6.6 of [Entropic Inference and the Foundations of Physics](
- https://livealbany.sharepoint.com/sites/web_physics/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fweb%5Fphysics%2FShared%20Documents%2FProfessor%20Docs%2FCaticha%2FACaticha%2DEIFP%2Dbook%2Epdf&parent=%2Fsites%2Fweb%5Fphysics%2FShared%20Documents%2FProfessor%20Docs%2FCaticha&p=true&originalPath=aHR0cHM6Ly9saXZlYWxiYW55LnNoYXJlcG9pbnQuY29tLzpiOi9zL3dlYl9waHlzaWNzL0VhTklGck1ERDIxR2lHV3BaWDlpR2JjQk4tdlZLT2NqN1NDejd4WGRucE9QWGc_cnRpbWU9amJRQTBmQ1UyVWc) by Ariel Caticha. (In fact, pretty much everything I'm saying in this answer is covered by this book.)
~~Related to this, an additional aspect that makes things confusing when talking about entropy is what is used is not directly the (relative) entropy I defined above but rather its maximum subject to some constraints^[Another more subtle aspect is that traditional thermodynamics only defines temperature and entropy for states of thermodynamic equilibrium. This makes many non-technical arguments invoking entropy fail immediately, because they are being applied to systems that are not at thermodynamic equilibrium.]. You often see expressions like $S(U, V)$, e.g. [here](https://en.wikipedia.org/wiki/Free_entropy#Entropy), where $U$ and $V$ are (internal) energy and volume respectively. What these expressions stand for is the *maximum* entropy that can be achieved subject to some constraints which are parameterized by $U$ and $V$. For example, for a simple gas, $S(U)$ might be the entropy, $S[p^\*]$, of the probability distribution $p^\*$ which is zero everywhere the (total) kinetic energy isn't $U$, i.e. $p^\*((x,v)) = 0$ when $\sum\_i mv\_i^2/2~~~~eq U$. This distribution is of the form that leads to Boltzmann's form of entropy. These variables which parameterize the constraints, $U$ in this case, determine the macrostates.~~- The source of the second law of thermodynamics starts making itself known. If we start with a given macrostate, then we actually have microstate that is compatible with that macrostate but we don't know which one. If we perform an experiment starting from this initial macrostate we'll get some final macrostate. If the experiment is reproducible, we will consistently get the same final macrostate given the same initial macrostate. That is, all or at least nearly all of microstates compatible with the initial macrostate evolve to a microstate compatible with the same final macrostate. If we evolve the subvolume of phase space corresponding to all microstates compatible with the initial macrostate, then [Liouville's theorem](https://en.wikipedia.org/wiki/Liouville's_theorem_(Hamiltonian)) states that this volume is preserved and thus the Bolzmann entropy would be conserved. However, this entropy may be lower than the *maximum* entropy corresponding to the final macrostate, and this maximized entropy is what the second law of thermodynamics is talking about. The maximum entropy for a given macrostate is clearly not a function of how you arrived at that macrostate, while, of course, the subvolume of phase space the initial macrostate evolves to clearly is. The "information" of how we arrived at the final macrostate is lost when we consider the maximum entropy for that macrostate, and (some of) this lost "information" corresponds to the increase in entropy.
- In addition to the Caticha's book referenced earlier, I'd also recommend the papers [The Evolution of Carnot's Principle](https://bayes.wustl.edu/etj/articles/ccarnot.pdf) and [The Gibbs Paradox](https://bayes.wustl.edu/etj/articles/gibbs.paradox.pdf) by E. T. Jaynes. These topics are also covered by the book but from a different perspective and with a very different style. These contain discussions of the second law of the thermodynamics and how it relates to reproducibility that are also very good and clarifying.

- Now one *could* define "(dis)order" but generally one of three things will happen. First, it
**will be**used purely as a synonym for entropy or a closely related concept and thus provides no explanatory power. Second, "(dis)order" is defined as [some concept related to entropy in a non-trivial way](https://en.wikipedia.org/wiki/Entropy_(energy_dispersal))**,**but it doesn't well connect with the colloquial meaning of "(dis)order" and so is perhaps not the best choice of word. As a final option, "(dis)order" *is* defined [in a way that better aligns with colloquial usage](https://en.wikipedia.org/wiki/Phase_transitions#Order_parameters), but then it doesn't always align with entropy. - In practice, as you've seen, when "(dis)order" is brought up in this context, *no* definition is given and it seems the intent is that the colloquial understanding of the term is intended. Unfortunately, this 1) fails properly explain entropy, in particular, it doesn't touch on the right property, and, more objectively, 2) it simply doesn't always match with the intuitive notion of "(dis)ordered".
- eq S[p]$ since $p_U \circ f$ certainly doesn't need to be a uniform distribution.
**In other words, relative entropy is invariant under parameterization while the non-relative form is not.**To get Boltzmann's formula, we further narrow down to a $p$ of the form $p(x) = \begin{cases} 1/|\Omega|, & x \in \Omega \\\\ 0, & x - Related to this, an additional aspect that makes things confusing when talking about entropy is what is used is not directly the (relative) entropy I defined above but rather its maximum subject to some constraints^[Another more subtle aspect is that traditional thermodynamics only defines temperature and entropy for states of thermodynamic equilibrium. This makes many non-technical arguments invoking entropy fail immediately, because they are being applied to systems that are not at thermodynamic equilibrium.
**That said, the understanding of entropy I'm describing here does open the door to a general theory of non-equilibrium statistical mechanics and a "second law" with broader generality.**]. You often see expressions like $S(U, V)$, e.g. [here](https://en.wikipedia.org/wiki/Free_entropy#Entropy), where $U$ and $V$ are (internal) energy and volume respectively. What these expressions stand for is the *maximum* entropy that can be achieved subject to some constraints which are parameterized by $U$ and $V$. For example, for a simple gas, $S(U)$ might be the entropy, $S[p^\*]$, of the probability distribution $p^\*$ which is zero everywhere the (total) kinetic energy isn't $U$, i.e. $p^\*((x,v)) = 0$ when $\sum\_i mv\_i^2/2 - eq U$. This distribution is of the form that leads to Boltzmann's form of entropy. These variables which parameterize the constraints, $U$ in this case, determine the macrostates.