re-encode video, more text in exploring perf.

MTG · Dec 1, 2024 · 3ceae48 · 3ceae48
1 parent 870722c
commit 3ceae48
Show file tree

Hide file tree

Showing 2 changed files with 65 additions and 5 deletions.
diff --git a/webbook/resources/exploring-performance.ipynb b/webbook/resources/exploring-performance.ipynb
@@ -18,7 +18,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This notebook serves to demonstrate the wide range of tools available as part of the compIAM package. We demonstrate their use on a single performance from the Saraga Audiovisual Dataset, the multi-modal portion of the wider Saraga Dataset [1, 2]. The tools showcased here are not accompanied by exhaustive usage documentation, which can be found in their respective pages in other parts of this webbook, links for which are provided in each section."
+    "This notebook serves to demonstrate the wide range of tools available as part of the compIAM package. We demonstrate their use on a single performance from the Saraga Audiovisual Dataset, the multi-modal portion of the wider Saraga Dataset {cite}`saraga`. The tools showcased here are not accompanied by exhaustive usage documentation, which can be found in their respective pages in other parts of this webbook, links for which are provided in each section."
    ]
   },
   {
@@ -77,6 +77,13 @@
     "AUDIO_PATH = os.path.join(\"..\", \"audio\", \"demos\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since Saraga Audiovisual is a fresh new dataset very recently published in ISMIR 2024 (San Francisco, USA), it is still not available through mirdata and compIAM. However, we will manually download and load an example concert recording from this dataset, a concert performed by Dr. Brindha Manickavasakan during the December Season in Chennai in 2023. Dr. Manickavasakan is also a collaborator of the ongoing efforts on the computational melodic analysis of Carnatic Music in a collaboration between a group of researchers from the Music Technology Group and Dr. Lara Pearson from the Max Plank Institute of Empirical Aesthetics."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -106,6 +113,13 @@
     "os.remove(output)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alright, the data is downloaded and uncompressed. Let's get the path to it and analyse a rendition from the concert."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -337,6 +351,16 @@
     "print(\"20-second video segment processing complete. Output saved as 'output_segment.mp4'\")"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import subprocess\n",
+    "subprocess.run([\"ffmpeg\", \"-vcodec\", \"libx264\",  \"-acodec\", \"aac\", f\"{vid_out_path.replace('.mp4', '_re-encoded.mp4')}\"])"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -353,7 +377,7 @@
     "\n",
     "from IPython.core.display import Video\n",
     "\n",
-    "Video(vid_out_path, embed=True)"
+    "Video(vid_out_path.replace('.mp4', '_re-encoded.mp4'), embed=True)"
    ]
   },
   {
@@ -393,7 +417,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The tonic of the lead singer is useful for normalising pitch and comparing with other performers. Here we can use the compIAM TonicIndianMultiPitch tool."
+    "The tonic of the lead singer is useful for normalising pitch and comparing with other performers. Here we can use the TonicIndianMultiPitch tool which is available through Essentia and compIAM."
    ]
   },
   {
@@ -427,6 +451,13 @@
     "print(f'Performer tonic: {round(tonic, 2)} Hz')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can quickly listed to the estimated tonic on top of the original audio to perceptually evaluate if the tonic sounds reasonable to the chosen rendition."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -469,6 +500,15 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "That sounds good! This is the tonic of the recording. This is a really valuable information that allows us to characterise and normalize the melodies, and may give relevant information about the artist and performed concert.\n",
+    "\n",
+    "For further reference, please visit the [tonic identification page](tonic-identification)."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -552,7 +592,7 @@
     "freqs = ftanet_carnatic.predict(audio_mix)[:, 1]\n",
     "tonic = tonic_multipitch.extract(audio_mix)\n",
     "\n",
-    "k = 5\n",
+    "k = 9\n",
     "N = 200\n",
     "new_feat = []\n",
     "\n",
@@ -565,7 +605,7 @@
     "    new_feat.append(feature[c : c + 5000])\n",
     "new_feat = np.array(new_feat)\n",
     "\n",
-    "raga = deepsrgm.predict(new_feat)"
+    "raga = deepsrgm.predict(new_feat[:8])  # Let's again only take alap frames"
    ]
   },
   {
@@ -584,6 +624,17 @@
     "### 2.3 Music Source Separation"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Music source separation is the task of automatically estimating the individual elements in a musical mixture. Apart from its creative uses, it may be very important as builing block in research pipeline, acting as a very handy pre-processing step {cite}`plaja_separation_2023`. To carefully analyse the singing voice and its components, normally having it isolated from the rest of the instruments is beneficial. \n",
+    "\n",
+    "There are several models in the literature to address this problem, most of them based on deep learning architectures, some of them provide pre-trained weights such as Demucs {cite}`demucs` or Spleeter {cite}`spleeter`, the latter is broadly used in Carnatic Music computational research works. However, thes systems have two problems: (1) the training data of these models does normally not include Carnatic Music examples, therefore there are instruments and practices which are completely unseed by these models, and (2) these models have a restricted set of target elements, namely (_vocals_, _bass_, _drums_, and _other_), which does not fit to Carnatic Music arrangements at all. \n",
+    "\n",
+    "To address problem (1), there have been few attemps on trying to use the multi-stem data presented above to develop Carnatic-tailored source separation systmes, although the available multi-stem recordings are collected from mixing consoles in live performances, and therefore the individual tracks are noisy (have background leakage from the rest of the sources). We can test one example of these systems here: {cite}`plaja_separation_2023`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -662,6 +713,13 @@
     "ipd.Audio(separated_vocals, rate=SEPARATION_SR)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Although there is still a long way to go on this problem, the ongoing efforts on improving on the separation of singing voice (and also the rest of the instrumentation!) for Carnatic Music set an interesting baseline to bulid on top of."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/webbook/rhythmic_analysis/meter_analysis.ipynb b/webbook/rhythmic_analysis/meter_analysis.ipynb
@@ -25,6 +25,8 @@
     "if importlib.util.find_spec('compiam') is None:\n",
     "    ## Bear in mind this will only run in a jupyter notebook / Collab session\n",
     "    %pip install git+git://github.com/MTG/compIAM.git\n",
+    "# installing mirdata from master branch, including fix for Carnatic Rhythm dataset\n",
+    "%pip install -U https://github.com/mir-dataset-loaders/mirdata.git  \n",
     "import compiam\n",
     "\n",
     "# Import extras and supress warnings to keep the tutorial clean\n",