Choosing a way of CSV Parsing for CSV Import feature in sao


  1. Papa Parse
    License: MIT
    Bower: :white_check_mark:
    I favor this alternative as it is heavily used by other projects like Wikipedia, and allows streaming large files, can use service workers. A stackoverflow answer even mentioned that it’s the fastest.

  2. CSV.js
    License: MIT
    Bower: :white_check_mark:
    Promises Full RFC 4180 compliance. Doesn’t have an option to specify encoding explicitly.

  3. Server Side parsing by Python:
    I would like to avoid this option, as it would require sending the file to server, which will be a function of the internet speed. User may experience more delay than doing the same task in GTK client.

I do not see why. It is the same amount of data that is sent to the server if it is a CSV text or a jsonified list of list. Indeed I think the CSV raw text will be a little bit smaller.

I don’t see in the doc that they support custom quote char.

I don not see neither that they support custom quote char.

You are right, :disappointed_relieved: I didn’t look for that.

That’s right! And for auto-detect should I write a minimal parser for the reading the first row for fields?

But it doesn’t mean it is not a valid option. But if those library are chosen, this feature could not be implemented.
Also an option will be to see if such feature could not be added to those libraries.

I did not think about that. This makes the server-side parsing less interesting if you need any way to implement one on client-side. And uploading twice is not really a solution or it should upload only the first line.

Could you tell me what the quote character actually does, IIUC it’s the text delimiter (as per the documentation) which indicates the characters within which the text is enclosed. I tried doing few CSV Exports in GTK client. But the text was not enclosed within the custom quote character I specified.

It is used when the value contains the delimiter char.

Tried adding custom quote char feature at prayashm/PapaParse demo

Nice! maybe you should open an issue on the upstream project to discuss if they wan’t to add this functionality and other implementation details. So we can get a little bit more of feedback about the project and if they are interested on this feature. You can add the link here so we can follow the discussion.

There is an already open issue for PapaParse to allow custom character.
Submitted a pull request.

1 Like

4 . D3.js
We already use it. Assumes files to be RFC4180-compliant. Can’t have custom quote character. It seems to be supporting only UTF-8 encoding

It has been merged by the maintainer :slight_smile:

+1 for Papa Parse