My Web Crawling Process:
I navigate the web by creating promises from a list of website links. These promises act as crawlers and are executed sequentially. For instance, if I have 10 links, I will crawl the first link, wait for it to complete, then move on to the second link, and so on.
What I'm Striving For:
My objective is to organize my promises into groups. Each group will run concurrently, but the groups themselves will run in sequence. As an example, with 10 links, I would generate 10 promises. These promises would then be divided into groups, each containing a maximum of 3 promises. The process would proceed by crawling the first 3 promises (first group), waiting for them to finish, then moving on to the 4th, 5th, 6th promises (second group), and so forth.
My Attempt:
I developed a function to split promises:
export function splitPromises<T>(promises: Promise<T>[], maxPerItem: number): Promise<T>[][] {
const splitPromisesList: Promise<T>[][] = [];
let currentSplit: Promise<T>[] = [];
for (let i = 0; i < promises.length; i++) {
currentSplit.push(promises[i]);
if (currentSplit.length === maxPerItem || i === promises.length - 1) {
splitPromisesList.push(currentSplit);
currentSplit = [];
}
}
return splitPromisesList;
}
Following this method, I implemented another function that utilizes the splitting and executes promises:
async function crawler(links: string[], page: Page): Promise<MyData[]> {
const list: MyData[] = [];
const crawlPromises = links.map(async (link, index) => {
try {
const newPage = await page.browser().newPage();
const detail = await crawlLink(link, newPage);
await newPage.close();
return detail;
} catch (e) {
console.log(e);
return null as MyData;
}
});
const groupedPromises = splitPromises<MyData>(crawlPromises, 3);
let results: MyData[] = [];
for (const group of groupedPromises) {
results = await Promise.all(group);
const filteredResults: MyData[] = results.filter((detail) => detail !== null) as MyData[];
list.push(...filteredResults);
}
return list;
}
The Challenge:
I seem to be encountering an issue where all promises execute simultaneously instead of following the intended group-based execution pattern.