Large CSV manipulation

768 Views Asked by At

I have a very large CSV file (over 100 million records) that I would like to delete several columns from. I have tried the application CSVed (http://csved.sjfrancke.nl/#csvuni) but a file this size will not open. Does anybody know what I would need to enter into the command line interface to delete specific columns? I am using Windows 7.

Below are the columns I currently have. I placed an "X" next to columns I would like to remove.

  1. domainName
  2. registrarName - X
  3. contactEmail - X
  4. whoisServer - X
  5. nameServers - X
  6. createdDate - X
  7. updatedDate - X
  8. expiresDate - X
  9. standardRegCreatedDate - X
  10. standardRegUpdatedDate - X
  11. standardRegExpiresDate - X
  12. status - X
  13. Audit_auditUpdatedDate - X
  14. registrant_email
  15. registrant_name
  16. registrant_organization
  17. registrant_street1
  18. registrant_street2
  19. registrant_street3
  20. registrant_street4
  21. registrant_city
  22. registrant_state
  23. registrant_postalCode
  24. registrant_country
  25. registrant_fax - X
  26. registrant_faxExt - X
  27. registrant_telephone
  28. registrant_telephoneExt
  29. administrativeContact_email
  30. administrativeContact_name
  31. administrativeContact_organization
  32. administrativeContact_street1
  33. administrativeContact_street2
  34. administrativeContact_street3
  35. administrativeContact_street4
  36. administrativeContact_city
  37. administrativeContact_state
  38. administrativeContact_postalCode
  39. administrativeContact_country
  40. administrativeContact_fax - X
  41. administrativeContact_faxExt - X
  42. administrativeContact_telephone
  43. administrativeContact_telephoneExt
1

There are 1 best solutions below

2
MC ND On

What you need is called cut, and can get it (for example) from gnuwin32, package coreutils.

And once you have it,

cut -d , -f 1,14-24,27-39,42-43 fileInput.csv > fileOutput.csv