\documentclass{article}
\usepackage{amsmath}
\begin{document}

Here is what resampling we need to do.  Content video is at $C_V$ fps, audio at $C_A$.

\section{Easy case 1}

$C_V$ and $C_A$ are both DCI rates, e.g.\ if $C_V = 24$, $C_A = 48\times{}10^3$.

\medskip
\textbf{Nothing to do.}

\section{Easy case 2}

$C_V$ is a DCI rate, $C_A$ is not.  e.g.\ if $C_V = 24$, $C_A = 44.1\times{}10^3$.

\medskip
\textbf{Resample $C_A$ to the DCI rate.}

\section{Hard case 1}
\label{sec:hard1}

$C_V$ is not a DCI rate, $C_A$ is, e.g.\ if $C_V = 25$, $C_A =
48\times{}10^3$.  We will run the video at a nearby DCI rate $F_V$,
meaning that it will run faster or slower than it should.  We resample
the audio to $C_V C_A / F_V$ and mark it as $C_A$ so that it, too,
runs faster or slower by the corresponding factor.

e.g.\ if $C_V = 25$, $F_V = 24$ and $C_A = 48\times{}10^3$, we
resample audio to $25 * 48\times{}10^3 / 24 = 50\times{}10^3$.

\medskip
\textbf{Resample $C_A$ to $C_V C_A / F_V$}

\section{Hard case 2}

Neither $C_V$ nor $C_A$ is not a DCI rate, e.g.\ if $C_V = 25$, $C_A =
44.1\times{}10^3$.  We will run the video at a nearby DCI rate $F_V$,
meaning that it will run faster or slower than it should.  We first
resample the audio to a DCI rate $F_A$, then perform as with
Section~\ref{sec:hard1} above.

\medskip
\textbf{Resample $C_A$ to $C_V F_A / F_V$}


\section{The general case}

Given a DCP running at $F_V$ and $F_A$ and a piece of content at $C_V$
and $C_A$, resample the audio to $R_A$ where
\begin{align*}
R_A &= \frac{C_V F_A}{F_V}
\end{align*}

Once this is done, consider 1 second's worth of content samples ($C_A$
samples).  We have turned them into $R_A$ samples which should still
last 1 second.  These samples are then played back at $F_A$ samples
per second, so they last $R_A / F_A$ seconds.  Hence there is a
scaling between some content time and some DCP time of $R_A / F_A$
i.e. $C_V / F_V$.


\section{Another explanation}

Say we have some content at a video rate $C_V$ and we want to
run it at DCP video rate $F_V$.  It's always the video rates that
decide what to do, since we don't have an equivalent to audio
resampling in the video domain.

We can just mark the video as $F_V$ and it will run $F_V / C_V$ faster
than it was.  Let's call the factor $S = F_V / C_V$.

An equivalent for audio would be to take the content audio at a rate
$C_A$ and mark it as $C_A S$.  Then the same audio frames will be run
more quickly, just as the same video frames are being.  The audio would be
in sync with the video since it has been sped up by the same amount.

In practice we can't do this, in general, as the only allowed DCP
audio rates are 48kHz and 96kHz.  Instead, we'll resample to some new
rate $P$ and mark it as $Q$ where $Q / P = S$.  Resampling does not
change the sound, just how many samples are being used to describe it,
so this is equivalent to marking the original, unsampled audio as $C_A S$.

Then we set $Q = 48$kHz so that $P = 48000 / S$, or $P = C_V F_A
/ F_V$.

Note that the original sampling rate of the audio content is
irrelevant.  Also, skipping or doubling of video frames is analagous
to audio resampling: the data are the same, just represented with more
or fewer samples.


\section{Further thoughts}

Consider the case where the content video rate $C_V = 24$ and the DCP
video rate $F_V = 25$.  Then 46080 (resampled) samples of audio
content last 1s at the original rate or $24/25$s at the DCP rate and
1s of DCP is made up of 48000 (resampled) content samples.

\end{document}