Update:
I have come up with what I believe to be the optimal final version in a single line:
var cells = (rows[i] + ',').split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/).slice(1).reduce((a, b) => (a.length > 0 && a[a.length - 1].length < 4) ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]] : [...a, [b]], []).map(e => e.reduce((a, b) => a !== undefined ? a : b, undefined))
Alternatively, presented in a more aesthetically pleasing manner:
var cells = (rows[i] + ',')
.split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/)
.slice(1)
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
;
While it may seem lengthy, this code remains elegantly functional in its design. Allow me to elaborate:
To begin with, let's discuss the regular expression component. Essentially, each segment you desire can fall into one of three potential scenarios:
*?([^",]+?) *?,
, which represents a string absent of "
or ,
enclosed by spaces and followed by a ,
.
" *?(.+?)" *?,
, representing a string enveloped within quotes, preceded and succeeded by an indeterminate number of spaces, followed by a ,
.
( *?),
, indicating any number of spaces, trailed by a ','.
Therefore, dividing based on a non-capturing group that includes these three possibilities will lead us to the desired outcome.
Keep in mind that when splitting using a regular expression, the resulting array comprises of:
- Strings separated by the separator (the regular expression)
- All the capturing groups identified in the separator
In our current scenario, the separator encompasses the entire string; thus, the separated strings are predominantly empty barring the last desired section which lacks a trailing ,
. Consequently, the resulting array structure should resemble:
- An empty string
- Three strings denoting the three matching capturing groups from the first delimiter
- An empty string
- Three strings signifying the three matching capturing groups from the second delimiter
- ...
- An empty string
- The concluding desired part standing independently
Hence, by adding a ,
at the conclusion, we achieve a harmonious pattern. This explains the presence of (rows[i] + ',')
.
The resultant array now consists of capturing groups segregated by empty strings. Removing the initial empty string, they appear in sets of four: [ 1st capturing group, 2nd capturing group, 3rd capturing group, empty string ].
The purpose of the reduce
function is to organize them into clusters of four:
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
Ultimately, identifying the premier non-undefined
elements (an unmatched capturing group manifests as undefined
) within each cluster reveals the sought-after components:
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
This brings about the resolution in full.
With that said, I propose the following succinct alternatives:
var cells = rows[i].split(/([^",]+?|".+?") *, */).filter(e => e)
Alternatively, if quotations extraction is superfluous:
var cells = rows[i].split(/(?:([^",]+?)|"(.+?)") *, */).filter(e => e)