\documentclass{article} \usepackage{amsmath} \begin{document} Here is what resampling we need to do. Content video is at $C_V$ fps, audio at $C_A$. \section{Easy case 1} $C_V$ and $C_A$ are both DCI rates, e.g.\ if $C_V = 24$, $C_A = 48\times{}10^3$. \medskip \textbf{Nothing to do.} \section{Easy case 2} $C_V$ is a DCI rate, $C_A$ is not. e.g.\ if $C_V = 24$, $C_A = 44.1\times{}10^3$. \medskip \textbf{Resample $C_A$ to the DCI rate.} \section{Hard case 1} \label{sec:hard1} $C_V$ is not a DCI rate, $C_A$ is, e.g.\ if $C_V = 25$, $C_A = 48\times{}10^3$. We will run the video at a nearby DCI rate $F_V$, meaning that it will run faster or slower than it should. We resample the audio to $C_V C_A / F_V$ and mark it as $C_A$ so that it, too, runs faster or slower by the corresponding factor. e.g.\ if $C_V = 25$, $F_V = 24$ and $C_A = 48\times{}10^3$, we resample audio to $25 * 48\times{}10^3 / 24 = 50\times{}10^3$. \medskip \textbf{Resample $C_A$ to $C_V C_A / F_V$} \section{Hard case 2} Neither $C_V$ nor $C_A$ is not a DCI rate, e.g.\ if $C_V = 25$, $C_A = 44.1\times{}10^3$. We will run the video at a nearby DCI rate $F_V$, meaning that it will run faster or slower than it should. We first resample the audio to a DCI rate $F_A$, then perform as with Section~\ref{sec:hard1} above. \medskip \textbf{Resample $C_A$ to $C_V F_A / F_V$} \section{The general case} Given a DCP running at $F_V$ and $F_A$ and a piece of content at $C_V$ and $C_A$, resample the audio to $R_A$ where \begin{align*} R_A &= \frac{C_V F_A}{F_V} \end{align*} Once this is done, consider 1 second's worth of content samples ($C_A$ samples). We have turned them into $R_A$ samples which should still last 1 second. These samples are then played back at $F_A$ samples per second, so they last $R_A / F_A$ seconds. Hence there is a scaling between some content time and some DCP time of $R_A / F_A$ i.e. $C_V / F_V$. \section{Another explanation} Say we have some content at a video rate $C_V$ and we want to run it at DCP video rate $F_V$. It's always the video rates that decide what to do, since we don't have an equivalent to audio resampling in the video domain. We can just mark the video as $F_V$ and it will run $F_V / C_V$ faster than it was. Let's call the factor $S = F_V / C_V$. An equivalent for audio would be to take the content audio at a rate $C_A$ and mark it as $C_A S$. Then the same audio frames will be run more quickly, just as the same video frames are being. The audio would be in sync with the video since it has been sped up by the same amount. In practice we can't do this, in general, as the only allowed DCP audio rates are 48kHz and 96kHz. Instead, we'll resample to some new rate $P$ and mark it as $Q$ where $Q / P = S$. Resampling does not change the sound, just how many samples are being used to describe it, so this is equivalent to marking the original, unsampled audio as $C_A S$. Then we set $Q = 48$kHz so that $P = 48000 / S$, or $P = C_V F_A / F_V$. Note that the original sampling rate of the audio content is irrelevant. Also, skipping or doubling of video frames is analagous to audio resampling: the data are the same, just represented with more or fewer samples. \section{Further thoughts} Consider the case where the content video rate $C_V = 24$ and the DCP video rate $F_V = 25$. Then 46080 (resampled) samples of audio content last 1s at the original rate or $24/25$s at the DCP rate and 1s of DCP is made up of 48000 (resampled) content samples. \end{document}