skip to content
ajcwebdev
Blog post cover art

Autogenerate Show Notes with ChatGPT and Whisper

Published:

ChatGPT, Whisper

Outline

Introduction

Download YouTube Video with YouTube API

Configure YouTube API Key and Channel ID for curl Commands

Set the environment variable globally to your .zshrc file. See what’s in your system’s file by running cat $HOME/.zshrc. To export a global variable, include export YOUTUBE_API_KEY=YOUR_KEY_HERE and make sure to put your own API key in place of YOUR_KEY_HERE. This can be done by opening .zshrc in a code editor or running this echo command:

TODO?: Explain how to get YouTube API key.

TODO: Explain how to get channel ID.

Terminal window
echo '\nexport YOUTUBE_API_KEY=YOUR_KEY_HERE' >> $HOME/.zshrc
# AIzaSyCaliciFAZ48IMEV5WhTjLZ0ejMIUMt_D4
echo '\nexport CHANNEL_ID=YOUR_CHANNEL_ID_HERE' >> $HOME/.zshrc
# UCpdzti0GURPfMjKzYK5FVSA
echo '\nexport YOUTUBE_API_ENDPOINT=www.googleapis.com/youtube/v3/' >> $HOME/.zshrc
source $HOME/.zshrc

For the environment variable to take effect you can either restart your terminal session or run source ~/.zshrc.

If you don’t want to use environment variables you can replace $CHANNEL_ID with your ID, $YOUTUBE_API_KEY with your own API key, and $YOUTUBE_API_ENDPOINT with www.googleapis.com/youtube/v3/. YouTube Data API has a quota limit, so you can only make a certain number of requests each day.

Get All YouTube Videos from a Channel with YouTube API

ajcwebdev Channel ID UCpdzti0GURPfMjKzYK5FVSA YOUR_UPLOADS_PLAYLIST_ID=UUOYqYy7ebj5j63TbdGB-Lcg YOUR_API_KEY=AIzaSyCaliciFAZ48IMEV5WhTjLZ0ejMIUMt_D4

Terminal window
brew install jq

To get all videos for a given channel, we can’t directly make a request for “all videos of a channel”. However, a channel’s “uploads” playlist is a system-generated playlist that contains all of the public uploads from a channel. We can retrieve it in two steps:

  1. Get the playlist ID of the ‘uploads’ playlist for a channel.
  2. List the videos in that playlist.

Get playlist ID of uploads playlist

You first need to make a request to the channels endpoint (www.googleapis.com/youtube/v3/channels) to get the “uploads” playlist ID.

Terminal window
curl "https://$YOUTUBE_API_ENDPOINT/channels?part=contentDetails&id=$CHANNEL_ID&key=$YOUTUBE_API_KEY"

In the JSON response from this request, look for the items[0].contentDetails.relatedPlaylists.uploads field to find the “uploads” playlist ID.

Save the ID to a new environment variable called UPLOADS_PLAYLIST_ID.

Terminal window
echo '\nexport UPLOADS_PLAYLIST_ID=YOUR_UPLOADS_PLAYLIST_ID_HERE' >> $HOME/.zshrc
# UUOYqYy7ebj5j63TbdGB-Lcg
source $HOME/.zshrc

List videos in uploads playlist

Once you have the “uploads” playlist ID (UUOYqYy7ebj5j63TbdGB-Lcg), you can make a request to the playlistItems endpoint (www.googleapis.com/youtube/v3/playlistItems) to list the videos in that playlist:

Terminal window
curl "https://$YOUTUBE_API_ENDPOINT/playlistItems?part=snippet&playlistId=$UPLOADS_PLAYLIST_ID&key=$YOUTUBE_API_KEY&maxResults=50"

The maxResults parameter in the previous command was set to 50 because this is the maximum number of results for a single request. For more than 50 videos, you will need to make additional requests with the pageToken parameter set to the nextPageToken from the previous response to get the next page of results.

Terminal window
curl "https://$YOUTUBE_API_ENDPOINT/playlistItems?part=snippet&playlistId=$UPLOADS_PLAYLIST_ID&key=$YOUTUBE_API_KEY&maxResults=50" | jq -r '.items[].snippet.resourceId.videoId | "https://www.youtube.com/watch?v=\(.)"' > video_urls.txt
curl "https://$YOUTUBE_API_ENDPOINT/playlistItems?part=snippet&playlistId=$UPLOADS_PLAYLIST_ID&key=$YOUTUBE_API_KEY&maxResults=50&pageToken=EAAaBlBUOkNESQ" | jq -r '.items[].snippet.resourceId.videoId | "https://www.youtube.com/watch?v=\(.)"' > video_urls2.txt
Terminal window
curl "https://$YOUTUBE_API_ENDPOINT/playlistItems?part=snippet&playlistId=$UPLOADS_PLAYLIST_ID&key=$YOUTUBE_API_KEY&maxResults=5" | jq '.items[].snippet | {publishedAt, title, description}'
curl "https://$YOUTUBE_API_ENDPOINT/playlistItems?part=snippet&playlistId=$UPLOADS_PLAYLIST_ID&key=$YOUTUBE_API_KEY&maxResults=50" | jq '.items[].snippet | {publishedAt, title, description}' > episodes.json
curl "https://$YOUTUBE_API_ENDPOINT/playlistItems?part=snippet&playlistId=$UPLOADS_PLAYLIST_ID&key=$YOUTUBE_API_KEY&maxResults=50&pageToken=EAAaBlBUOkNESQ" | jq '.items[].snippet | {publishedAt, title, description}' > episodes2.json

Download Extracted Audio with yt-dlp

For transcriptions of videos you can download and extract audio from YouTube with yt-dlp. For podcasts you’ll need to either download the audio file from the show’s native web audio player or download the file from the show’s RSS feed if there is no download button available in the UI.

yt-dlp is a command-line program to download videos from YouTube and other video platforms. It is a fork of yt-dlc, which itself is a fork of youtube-dl, with additional features and patches integrated from both yt-dlc and youtube-dl as well as other sources.

New Features in yt-dlp

It introduces new options and improvements over its predecessors, including better handling of format sorting, configuration files, and plugins. Various defaults have been changed from those earlier projects, such as format selection, output templates, and handling of metadata and playlists. If you’re already familiar with those previous projects, make sure to read the docs before trying anything you did before. Key features and improvements include:

  • Format Sorting and Selection: Enhanced default format sorting to prefer higher resolution and better codecs over merely higher bitrates. Users can specify sort orders to simplify format selection.
  • Merged Features: Incorporates features and improvements from another fork (animelover1984/youtube-dl), including support for writing comments, BiliBili search and channel support, embedding thumbnails, and more.
  • YouTube Enhancements: Adds support for various YouTube content types (e.g., Clips, Stories, Music Search), fixes for throttling issues, support for age-gated content without cookies, and downloading livestreams from the start.
  • Cookies from Browser: Automatically extracts cookies from major web browsers to facilitate downloading content that may require a login.
  • Partial Downloads and Chapter Splitting: Enables downloading specific sections of videos based on timestamps or chapters and splitting videos into multiple files based on chapters.
  • Multi-threaded and Aria2c Downloads: Supports downloading video fragments in parallel and using aria2c for downloading DASH (mpd) and HLS (m3u8) formats.
  • New and Improved Extractors: Regularly updates and fixes extractors to support a wide range of video platforms.
  • Subtitle Extraction: Allows extracting subtitles directly from streaming media manifests.
  • Advanced Output and Configuration Options: Provides extensive customization for download paths, output templates, and configuration settings.
  • Automated Updates and Builds: Includes a self-updater for stable releases, nightly builds, and master builds.

Install YouTube Downloader CLI

Terminal window
brew install yt-dlp ffmpeg

Extract Audio from Video and Rename File

Create a command that completes the following actions:

  1. Download a specified YouTube video.
  2. Extract the video’s audio.
  3. Convert the audio to WAV format.
  4. Save the file in Whisper’s samples directory.
  5. Give a filename with the format <upload_date>-<video_id>.wav.
Terminal window
yt-dlp \
--extract-audio \
--audio-format wav \
-o "samples/%(upload_date)s-%(id)s.%(ext)s" \
--verbose \
"https://www.youtube.com/watch?v=zSnKSlZLY-A"
# "https://www.youtube.com/watch?v=QhXc9rVLVUo"

This command uses yt-dlp, a command-line utility for downloading videos from YouTube and other video platforms, to perform the following actions:

  • --extract-audio tells yt-dlp to download the video from a given URL and extract the audio from it.
  • --audio-format specifies the format to which the audio should be converted. In this case, it’s set to wav for WAV files.
  • -o specifies the output template for the downloaded files, "samples/%(upload_date)s-%(id)s.%(ext)s". Output files are dynamically named based on the video’s metadata including the video’s upload date, unique YouTube ID, and appropriate file extension determined by the --audio-format option.
  • --verbose provides detailed information about the downloading and conversion process, helpful for troubleshooting or understanding what yt-dlp is doing behind the scenes.
  • "https://www.youtube.com/watch?v=zSnKSlZLY-A" is the specific YouTube video that yt-dlp will download and extract the audio from. Each YouTube video has a unique identifier (in this case, zSnKSlZLY-A) which is used in its URL.

Create a new command that uses the video’s upload date and title in the filename instead. This command will share several options with the first example but include an additional step that modifies the metadata of the extracted audio file. Unlike the first command, which uses the video’s YouTube ID, the filename format in the output template will be specified as samples/%(upload_date)s-%(title)s.%(ext)s.

--replace-in-metadata is the option not present in the first example. It tells yt-dlp to modify the extracted audio file’s metadata for both the title and uploader fields ("title,uploader"). For consistency, spaces and underscores ([ _,]) in these metadata fields are replaced with hyphens (-).

Terminal window
# TODO: Expressive code with diff syntax.
yt-dlp \
--extract-audio \
--audio-format wav \
-o "samples/%(upload_date)s-%(title)s.%(ext)s" \
--replace-in-metadata "title,uploader" "[ _,]" "-" \
--verbose \
"https://www.youtube.com/watch?v=zSnKSlZLY-A"

Convert WAV File for Whisper.cpp

We will need to convert our file to the required sample rate for Whisper. Create a command to read an audio file (input.wav), resample it to a lower sampling rate (16 kHz), and save the result to a new file (output.wav). We’ll run ffmpeg with the following:

  • -i samples/input.wav: This option specifies the input file for ffmpeg. samples/input.wav indicates that ffmpeg should read a file named input.wav located in the samples directory.
  • -ar 16000: Sets audio sampling rate of the output file to 16,000 Hz (Hertz). This is necessary for the whisper.cpp library to work.
  • -ac 1: This option sets the number of audio channels in the output file to 1, meaning the output will be mono audio. Mono audio is sufficient for many applications, such as speech processing, and reduces the file size compared to stereo.
  • samples/output.wav: Output file’s location and name. In this case, output.wav in the samples directory.
Terminal window
ffmpeg -i samples/input.wav -ar 16000 -ac 1 samples/output.wav
ffmpeg -i samples/input.wav -ar 16000 -ac 1 samples/output.wav && rm samples/input.wav

If you would like to delete the original file after the conversion, run rm samples/input.wav. If you need to convert an mp3, use the following commands:

Terminal window
ffmpeg -i samples/input.mp3 -ar 16000 -ac 1 samples/output.wav && rm samples/input.mp3
ffmpeg -i samples/input.mp4 -q:a 0 -map a samples/output.mp3 && rm samples/input.mp4
# ffmpeg -i samples/input.mp4 -map a samples/output.mp3 && rm samples/input.mp4

Convert mp3 audio file for Whisper model.

Terminal window
function wav() {
local input_file=$1
local base_name=${input_file%.mp3}
ffmpeg -i "samples/$input_file" \
-ar 16000 \
-ac 1 \
-c:a pcm_s16le \
samples/${base_name}.wav
}

If you have a video file, extracting and converting the audio track into an MP3 format is also possible with ffmpeg. The -map a option is used to extract only audio streams from the video, a necessary step not present in the audio-focused commands.

  • -q:a 0: This option sets the audio quality for the output MP3 file. The -q:a option controls the quality of audio compression, with 0 being the highest quality (and highest bitrate). This contrasts with the previous commands, which were focused on manipulating the sampling rate, channel configuration, and codec without explicitly setting the compression quality for an output compressed format.

  • -map a: This option tells ffmpeg to map only the audio streams from the input file to the output. This is crucial when working with video files since you might not want to include the video streams, subtitles, or other data in the output audio file. This option ensures that only the audio track is processed and included in the final output.

Setup and Run OpenAI Whisper Model for Transcription

https://github.com/openai/whisper https://github.com/ggerganov/whisper.cpp

Clone and Build Model

Clone the whisper.cpp repo and build the model.

Terminal window
git clone https://github.com/ggerganov/whisper.cpp
gh repo clone ggerganov/whisper.cpp
cd whisper.cpp
bash ./models/download-ggml-model.sh large-v2
make
  • ggml-large-v2.bin: Custom binary format (ggml) used by the whisper.cpp library representing a quantized or optimized version of OpenAI’s Whisper model tailored for high-performance inference on various platforms. The ggml format is designed to be lightweight and efficient, allowing the model to be easily integrated into different applications.

  • main: Executable compiled from the whisper.cpp repository for transcribing or translating audio files using the Whisper model. Running this executable with an audio file as input transcribes the audio to text.

  • samples: The directory for audio files including a sample file called jfk.wav provided for testing and demonstration purposes. The main executable can use it for showcasing the model’s transcription capabilities. The directory also contains audio files downloaded by yt-dlp.

  • whisper.cpp and whisper.h: These are the core C++ source and header files of the whisper.cpp project. They implement the high-level API for interacting with the Whisper automatic speech recognition (ASR) model, including loading the model, preprocessing audio inputs, and performing inference.

Run Whisper Main Function

Run Whisper model.

Terminal window
./main -m models/ggml-large-v2.bin -f samples/jfk.wav
./main \
--model models/ggml-large-v2.bin \
--output-txt --output-vtt \
--file samples/jfk.wav
./main --model models/ggml-large-v2.bin \
--file samples/output.wav \
--output-txt true \
--output-vtt true \
--output-srt true \
--output-lrc true \
--output-csv true \
--output-json true

Bash script.

Terminal window
function whisper() {
./main \
--model models/ggml-large-v2.bin \
--output-txt --output-vtt \
--file samples/${1}
}

Combine the previous three commands, file download from YouTube, audio conversion, and Whisper transcription.

Terminal window
url="" && \
base_filename=$(yt-dlp --get-filename -o "samples/%(upload_date)s-%(id)s" "$url") && \
yt-dlp --extract-audio --audio-format wav -o "${base_filename}.%(ext)s" "$url" && \
output_filename="${base_filename}-16khz.wav" && \
ffmpeg -i "${base_filename}.wav" -ar 16000 "$output_filename" && \
rm "${base_filename}.wav" && \
./main --model models/ggml-large-v2.bin --output-txt --output-vtt --file "$output_filename"

Modify Transcript Output for Model

VTT to Timestamps

I have a markdown file with a transcript like so:

WEBVTT
00:00:00.000 --> 00:00:10.400
The following you are about to hear may seem strange and bizarre, but it is our intention
00:00:10.400 --> 00:00:17.720
to bring you only documented proof. The general public is unaware of the fact that for the
00:00:17.720 --> 00:00:23.280
past two hundred years, fires of revolution have been smoldering beneath the whole structure

I want to transform this so it:

  • just gives the hour, minute, and second digits for the starting time of each line and nothing else.
  • I’d also like the corresponding line of text to follow after the timestamp with a dash - in between.
  • I’d also like the empty lines between to be removed so each line has a line of text.

The output should look like this:

WEBVTT
00:00:00 - The following you are about to hear may seem strange and bizarre, but it is our intention
00:00:10 - to bring you only documented proof. The general public is unaware of the fact that for the
00:00:17 - past two hundred years, fires of revolution have been smoldering beneath the whole structure

Write a Node script to accomplish this. Have it take in a file they will be unmodified called vtt.md and output a new file called timestamps.md. This script will:

  1. Read the contents of a file named vtt.md.
  2. Process the content according to your specifications.
  3. Write the processed content to a new file called timestamps.md.
const fs = require('fs');
const path = require('path');
// Function to transform the content correctly
function transformContent(content) {
// Split the content by lines and initialize an array to hold the transformed lines
const lines = content.split('\n');
const transformedLines = [];
// Iterate over the lines
for (let i = 0; i < lines.length; i++) {
// Check if the line contains a timestamp
const timestampRegex = /^(\d{2}:\d{2}:\d{2})\.\d{3} -->/;
const match = lines[i].match(timestampRegex);
if (match) {
// Extract and format the starting timestamp, then add the text from the next line
const formattedLine = `${match[1]} -${lines[i + 1]}`;
transformedLines.push(formattedLine);
i++; // Skip the next line since it's already included
} else if (!lines[i].includes('-->') && lines[i].trim() !== '') {
// Include lines that are not timestamps or empty, such as headers
transformedLines.push(lines[i]);
}
}
// Join the transformed lines back into a single string
return transformedLines.join('\n');
}
// Specify the input and output file paths
const inputFile = path.join(__dirname, 'vtt.md');
const outputFile = path.join(__dirname, 'timestamps.md');
// Read the input file
fs.readFile(inputFile, 'utf8', (err, data) => {
if (err) {
console.error('Error reading the file:', err);
return;
}
// Transform the content
const transformedContent = transformContent(data);
// Write the transformed content to the output file
fs.writeFile(outputFile, transformedContent, 'utf8', (err) => {
if (err) {
console.error('Error writing the output file:', err);
} else {
console.log('File has been written successfully.');
}
});
});

This script reads the vtt.md file, processes each line to format the timestamps and associated text according to your specifications, removes empty lines, and writes the processed content to timestamps.md. It focuses on extracting and formatting the starting timestamp (hour:minute:second) of each caption and removing the milliseconds and the end timestamp. The text from the following line is appended with a dash - separator, skipping empty lines and the lines immediately following timestamps (since they’re included with the timestamps).

Remove Hour Timestamp Prefix

.vtt files generated by Whisper include digits for seconds, minutes, and hours even if the audio file is less than an hour long. For transcripts that are shorter than an hour, there is no need for the timestamps to include the hour digits. So instead of:

00:00:00 - Smashing. Give everyone the best possible start to the day.
00:00:04 - See special packs for details.
00:00:07 - It's Smashing.
00:00:17 - In this episode of the Smashing Podcast, we're talking about RedwoodJS.
00:00:25 - What exactly does it mean to be a fullstack, Jamstack framework?

I would like to use a Node.js script to process the transcript and remove the 00: at the beginning of each line of text. The above excerpt would be processed and modified to instead look like:

00:00 - Smashing. Give everyone the best possible start to the day.
00:04 - See special packs for details.
00:07 - It's Smashing.
00:17 - In this episode of the Smashing Podcast, we're talking about RedwoodJS.
00:25 - What exactly does it mean to be a fullstack, Jamstack framework?

I would like this Node.js script to take a markdown file called edit.md as the input. This file will be unchanged and the script will create a new file called edited.md that includes the requested modifications. Follow these steps to create a Node.js script that will process a markdown file named edit.md to remove the 00: prefix from each timestamp and save the modified content to a new file called edited.md. First, create a file called removeHours.js in the scripts folder.

Terminal window
echo > scripts/removeHours.js

In this file, the following instructions will be executed:

  1. Read the file: Use the fs module to read the contents of edit.md.
  2. Modify the content: Use a regular expression to find and replace the 00: prefix in each timestamp.
  3. Write to a new file: Save the modified content to edited.md.

Here’s a complete script that performs these tasks, save this code to the removeHours.js file:

scripts/removeHours.js
const fs = require('fs')
const path = require('path')
const originalFilePath = path.join(__dirname, 'edit.md') // Path to the original file
const newFilePath = path.join(__dirname, 'edited.md') // Path to the new file
fs.readFile(originalFilePath, 'utf8', (err, data) => { // Read the original file
if (err) {
console.error('Error reading file:', err)
return
}
const modifiedData = data.replace(/^00:/gm, '') // Modify the content by removing '00:' from the start of each line
fs.writeFile(newFilePath, modifiedData, 'utf8', (err) => { // Write the modified content to a new file
if (err) {
console.error('Error writing to file:', err)
return
}
console.log('File has been modified and saved as', newFilePath)
})
})

This script reads edit.md, removes the 00: prefix from each timestamp, and saves the result to edited.md. To use this script, ensure you have a markdown file named edit.md in the same directory as the script and run the following terminal command:

Terminal window
node scripts/removeHours.js

Cut Transcript Lines in Half

I have a markdown file containing a transcript with timestamps like so:

00:00 - Smashing. Give everyone the best possible start to the day.
00:04 - See special packs for details.
00:07 - It's Smashing.
00:17 - In this episode of the Smashing Podcast, we're talking about RedwoodJS.
00:25 - What exactly does it mean to be a fullstack, Jamstack framework?
00:29 - We talked to community champion Anthony Campolo to find out.
00:32 - But first, did you know that Smashing Magazine publishes a brand new article to the website five days a week?
00:38 - That's a lot to keep up with, but we're here to help.

I want to cut the lines of text for each timestamp in half, so for every other line the timestamp would be deleted and the text would be shifted up to the timestamp right before it. For example, the previous excerpt would be modified to the following:

00:00 - Smashing. Give everyone the best possible start to the day. See special packs for details.
00:07 - It's Smashing. In this episode of the Smashing Podcast, we're talking about RedwoodJS.
00:25 - What exactly does it mean to be a fullstack, Jamstack framework? We talked to community champion Anthony Campolo to find out.
00:32 - But first, did you know that Smashing Magazine publishes a brand new article to the website five days a week? That's a lot to keep up with, but we're here to help.

I want to do this with a Node.js script that reads a file called edit.md, leaves it unmodified, and outputs the modified transcript to a new file called edited.md. The script will need to read the transcript lines, merge every other line as specified, and then write the modified transcript to a new file. Here’s how to create a Node.js script to perform this task:

scripts/halfTranscript.js
const fs = require('fs')
const path = require('path')
const originalFilePath = path.join(__dirname, 'edit.md')
const newFilePath = path.join(__dirname, 'edited.md')
fs.readFile(originalFilePath, 'utf8', (err, data) => {
if (err) {
console.error('Error reading file:', err)
return
}
// Split the data into lines
const lines = data.split('\n')
let modifiedLines = []
// Iterate through the lines, merging every two lines
for (let i = 0; i < lines.length; i += 2) {
// If there's a line to merge with
if (i + 1 < lines.length) {
// Extract the full timestamp from the current line
const currentTimestamp = lines[i].match(/^\d+:\d+:\d+/)[0]
const currentText = lines[i].substring(currentTimestamp.length + 3)
const nextText = lines[i + 1].substring(lines[i + 1].indexOf('-') + 2)
const textToMerge = `${currentText} ${nextText}`
modifiedLines.push(`${currentTimestamp} - ${textToMerge}`)
} else {
// If it's the last line and there's no line to merge with, just add it as is
modifiedLines.push(lines[i])
}
}
// Join the modified lines back into a single string
const modifiedData = modifiedLines.join('\n')
// Write the modified data to the new file
fs.writeFile(newFilePath, modifiedData, 'utf8', (err) => {
if (err) {
console.error('Error writing to file:', err)
return
}
console.log('File has been modified and saved as', newFilePath)
})
})

This script reads the original markdown file, processes the transcript according to your specifications by merging every other line (while preserving the timestamp of the first line in each pair), and then writes the modified transcript to a new file called edited.md.