Is there a way to separate text by commas without including any commas that are within quotation marks?

My challenge involves a Typescript file that is responsible for splitting a CSV file using the code below:

var cells = rows[i].split(",");

However, I am now faced with the task of modifying this code so that any commas inside quotes do not trigger a split. For instance, The,"quick, brown fox", jumped should be split into The, quick, brown fox, and jumped instead of inadvertently splitting quick and brown fox. How can this issue be addressed effectively?

Answer №1

Update:

I have come up with what I believe to be the optimal final version in a single line:

var cells = (rows[i] + ',').split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/).slice(1).reduce((a, b) => (a.length > 0 && a[a.length - 1].length < 4) ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]] : [...a, [b]], []).map(e => e.reduce((a, b) => a !== undefined ? a : b, undefined))

Alternatively, presented in a more aesthetically pleasing manner:

var cells = (rows[i] + ',')
  .split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/)
  .slice(1)
  .reduce(
    (a, b) => (a.length > 0 && a[a.length - 1].length < 4)
      ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
      : [...a, [b]],
    [],
  )
  .map(
    e => e.reduce(
      (a, b) => a !== undefined ? a : b, undefined,
    ),
  )
;

While it may seem lengthy, this code remains elegantly functional in its design. Allow me to elaborate:

To begin with, let's discuss the regular expression component. Essentially, each segment you desire can fall into one of three potential scenarios:

  1. *?([^",]+?) *?,, which represents a string absent of " or , enclosed by spaces and followed by a ,.
  2. " *?(.+?)" *?,, representing a string enveloped within quotes, preceded and succeeded by an indeterminate number of spaces, followed by a ,.
  3. ( *?),, indicating any number of spaces, trailed by a ','.

Therefore, dividing based on a non-capturing group that includes these three possibilities will lead us to the desired outcome.

Keep in mind that when splitting using a regular expression, the resulting array comprises of:

  1. Strings separated by the separator (the regular expression)
  2. All the capturing groups identified in the separator

In our current scenario, the separator encompasses the entire string; thus, the separated strings are predominantly empty barring the last desired section which lacks a trailing ,. Consequently, the resulting array structure should resemble:

  1. An empty string
  2. Three strings denoting the three matching capturing groups from the first delimiter
  3. An empty string
  4. Three strings signifying the three matching capturing groups from the second delimiter
  5. ...
  6. An empty string
  7. The concluding desired part standing independently

Hence, by adding a , at the conclusion, we achieve a harmonious pattern. This explains the presence of (rows[i] + ',').

The resultant array now consists of capturing groups segregated by empty strings. Removing the initial empty string, they appear in sets of four: [ 1st capturing group, 2nd capturing group, 3rd capturing group, empty string ].

The purpose of the reduce function is to organize them into clusters of four:

  .reduce(
    (a, b) => (a.length > 0 && a[a.length - 1].length < 4)
      ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
      : [...a, [b]],
    [],
  )

Ultimately, identifying the premier non-undefined elements (an unmatched capturing group manifests as undefined) within each cluster reveals the sought-after components:

  .map(
    e => e.reduce(
      (a, b) => a !== undefined ? a : b, undefined,
    ),
  )

This brings about the resolution in full.


With that said, I propose the following succinct alternatives:

var cells = rows[i].split(/([^",]+?|".+?") *, */).filter(e => e)

Alternatively, if quotations extraction is superfluous:

var cells = rows[i].split(/(?:([^",]+?)|"(.+?)") *, */).filter(e => e)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Issue: The element 'app-header' is unrecognized in Angular 2

Here is a breakdown of my folder structure. app - common - header header.component.css header.component.html header.component.ts - footer footer.component.css footer.component ...

Changing the way in which text is selected and copied from a webpage with visible white space modifications

After working on developing an HTML parser and formatter, I have implemented a new feature that allows whitespace to be rendered visible by replacing spaces with middle dot (·) characters and adding arrows for tabs and newlines. https://i.sstatic.net/qW8 ...

Adding a filter function to an ngFor loop upon a user's click event in Angular

Just a quick question here. I'm still new to angular so please bear with me. Here is a glimpse of my view page: https://i.sstatic.net/hr1I4.png Initially, the page loads with all the necessary data. The issue arises when a user clicks on any name un ...

AngularTS - Using $apply stops the controller from initializing

Every time I launch the application, the angular {{ }} tags remain visible. Removing $scope.$apply eliminates the braces and displays the correct value. I am utilizing Angular with Typescript. Controller: module Application.Controllers { export class Te ...

Utilizing Observable to dynamically bind ngClass in Angular

I currently have a container with the following structure <mat-sidenav-container [ngClass]="{'sidepanel-opened': ((isSidePanelVisible$ | async) as isSidePanelVisible) == true }"> </mat-sidenav-container> I am trying to u ...

Ratio validation pattern is utilized by ng-pattern

I have set up a pattern validation for an input text box as ng-pattern="^(2[0-3]|[01]?[0-9]):([0-5]?[0-9])$" However, it seems that the validation is not working for the buttons. The complete implementation is shown below: <form name="configuration ...

Python2 regular expression for determining words containing a mix of different languages at the end

In search of a regular expression that can handle text composed of English and other Unicode characters. The issue arises when dealing with mixed-language texts - the word boundary is not behaving as expected: text: הmאפrקt boom sam regex: m\b a ...

Is it possible to use regex to substitute text that includes round brackets (parentheses)?

startingVal = "ABC = (123,456,789,012)" updatedVal = "ABC = (123,789,456,012)" File.WriteAllText("C:\\Users\\coder\\Documents\\file123.txt", Regex.Replace(File.ReadAllText("C:\\Users&bs ...

Analyzing text to extract URLs that may have trailing commas

I've encountered a situation where I need to parse a JSON feed from Twitter and convert URLs into clickable links using regular expressions. The challenge lies in handling URLs that have trailing commas. While commas can be valid characters in a URL, ...

Unicode character splitting function - divide and conquer!

The code snippet below is causing issues with splitting the unicode character \u2013: actualdata=metatry['content'].split("-") print "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"+actualdata[0] dat=actualdata[0].split("\u2013") print "^^^^^^^^^^^^ ...

What are the properties used in functional components of React?

Seeking guidance on passing React component props to another component: interface IMyComponent { props: Props<any> } const MyComponent: FC = ({ props }) => { } Previously, I attempted to utilize the React.Props type after consulting this que ...

Utilizing the dialogue feature within Angular 6

Situation: I am managing two sets of data in JSON format named customers and workers: customers: [ { "cusId": "01", "customerName": "Customer One", "email": "<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data- ...

PHP regex vsprintf method to add a slash on file extension

Seeking help with this code snippet: pastebin. Specifically: /^\/index\.php\/index\/home\/(\w+)$/ The issue is that it's adding a slash before the .php extension. Any suggestions on how to address this problem? ...

Utilizing union type return values in Typescript

Looking to incorporate shelljs (via DefinitelyTyped) into my Typescript 1.5-beta project. I want to utilize the exec function with the specified signature: export function exec(command: string, options: ExecOptions): ExecOutputReturnValue | child.ChildPro ...

Error: The argument passed to the function must be an Array type. Undefined value was received instead of an array

Looking for some assistance with this coding issue, hoping someone with expertise can lend a hand! (Not my forte) I've written this Typescript code snippet for a basic CloudFunction export const add2list = functions.https.onRequest((req:any , res:any ...

After applying the withStyles and withTranslation higher order components to a React component, a Typescript error is displayed

Trying to create a React component using Typescript, incorporating withStyles from @material-ui/core and withTranslation from react-i18next, both of which are Higher Order Components (HOC). Encountering a typescript error when using them together. Specif ...

Is there a way to determine if a browser's Storage object is localStorage or sessionStorage in order to effectively handle static and dynamic secret keys within a client?

I have developed a customizable storage service where an example is getExpirableStorage(getSecureStorage(getLocalStorage() | getSessionStorage())) in typescript/javascript. When implementing getSecureStorage, I used a static cipher key to encrypt every ke ...

Adding a Third-Party JavaScript Plugin to Angular 7

I've been attempting to integrate the read-excel-file JavaScript plugin into my Angular 7 project. Despite following all the methods recommended on various websites, I have yet to succeed. Could anyone provide a better solution? declare var readXlsx ...

Angular: Implementing a Dark and Light Mode Toggle with Bootstrap 4

Looking for suggestions on the most effective way to incorporate dark mode and light mode into my bootstrap 4 (scss) angular application. Since the Angular cli compiles scss files, I'm not keen on the traditional method of using separate css files for ...

What is the process for removing a specific column (identified by its key value) from a JSON table using JavaScript and Typescript?

[{ "name": "employeeOne", "age": 22, "position": "UI", "city": "Chennai" }, { "name": "employeeTwo", "age": 23, "position": "UI", "city": "Bangalore" } ] If I remove the "Position" key and value from the JSON, the updated r ...