I am currently developing code to fetch data from various web resources using HTTP/HTTPS in a Node.js environment. My goal is to return the content as a string for text data and as a Buffer for binary data.
It is evident that any data starting with text
, such as text/html
, should be treated as text data and returned as a string, utilizing the appropriate character encoding if specified (e.g., text/html; charset=utf-8
). Additionally, the presence of an explicit charset
definition indicates that the content is text rather than binary, regardless of MIME type.
Based on my analysis, most content falls under the category of binary data. Audio and video formats are typically binary, as are most image types except for image/svg+xml
. Generally speaking, most application/...
types are considered binary, although there are exceptions like application/json
.
Does the following function effectively determine whether the content is binary? Are there any significant exceptions that I may have overlooked?
function isBinary(contentType: string): boolean {
let $: string[];
if (/;\s*charset\s*=/i.test(contentType))
return false;
// Remove anything other than MIME type.
contentType = contentType.replace(/;.*$/, '').trim();
if (/^text\//i.test(contentType) || /\+xml$/i.test(contentType))
return false;
else if (($ = /^application\/(.+)/i.exec(contentType)))
return !/^(javascript|ecmascript|json|ld\+json|rtf)$/i.test($[1]);
else
return true;
}