I've been struggling to scrape the anime videos page [jkanime], specifically with extracting the mp4 video formats embedded in an iframe #document.
Despite trying to use cheerio for querying, I've only managed to retrieve src links from Facebook plugins instead of the desired mp4 sources within the iframe.
After entering the following in chrome dev tools: $('#jkvideo_html5_api source')
The mp4 src is displayed, but cheerio doesn't yield any results with the same query.
I've been attempting to extract the mp4 links for weeks without success. Any assistance would be greatly appreciated.
Image
const getAnimeVideo = async (id: string, chapter: number) => {
const res = await fetch(`${url}${id}/${chapter}/`);
const body = await res.text();
const $ = cheerio.load(body);
const arr = [];
$('iframe').each((index, element) => {
const $element = $(element);
const x = $element.attr('src');
console.log(x);
arr.push(x);
});
return arr;
}
Current Output
{
"videos": [
"https://www.facebook.com/plugins/like.php?href=https%3A%2F%2Fwww.facebook.com%2Fjkanimetv%2F&width=132&layout=box_count&action=like&size=large&show_faces=false&share=false&height=21&appId=149291901844100",
"https://www.facebook.com/plugins/like.php?href=https://jkanime.net/tokyo-ghoul/1/&width=76&layout=box_count&action=like&size=small&show_faces=false&share=false&height=65&appId=149291901844100"
]
}
Desired Output
{
"videos": [
"https://storage.googleapis.com/markesito.appspot.com/blakkkk-88.mp4"
]
}
Update: 10:52 pm
Through the use of puppeteer, I was able to access the iframe with the class "player_conte," resulting in the output shown in the terminal:
Now, my challenge lies in retrieving the link from _navigationURL in order to make reference to the video source using cheerio.
Updated Code
const getAnimeVideo = async (id: string, chapter: number) => {
const BASE_URL = `${url}${id}/${chapter}/` // => https://jkanime.net/tokyo-ghoul/1/
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto(BASE_URL);
const elementHandle = await page.$('.player_conte')
const frame = await elementHandle.contentFrame();
const $ = cheerio.load(`${frame}`);
console.log(frame)
}