Split string considering brackets - R

132 Views Asked by At

I would like to edit string in R while taking in consideration opening and closing brackets.

The character vector is a WKT reference of coordinate system like used in rgdal, sp, all of them using PROJ library.

The original string is structured with nested parameters with [ ]. The scheme can be sum up as follow :

WKT = COMPD_CS[ PROJCS[....] , VERT_CS[...] ]

WKT = "COMPD_CS[\"NAD83(2011) / California zone 2 (ftUS) + NAVD88 height - Geoid12B (ftUS)\",
PROJCS[\"NAD83(2011) / California zone 2 (ftUS)\",GEOGCS[\"NAD83(2011)\",DATUM[\"NAD83_National_Spatial_Reference_System_2011\",SPHEROID[\"GRS 1980\",6378137,298.257222101,AUTHORITY[\"EPSG\",\"7019\"]],AUTHORITY[\"EPSG\",\"1116\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"6318\"]],PROJECTION[\"Lambert_Conformal_Conic_2SP\"],PARAMETER[\"standard_parallel_1\",39.83333333333334],PARAMETER[\"standard_parallel_2\",38.33333333333334],PARAMETER[\"latitude_of_origin\",37.66666666666666],PARAMETER[\"central_meridian\",-122],PARAMETER[\"false_easting\",6561666.667],PARAMETER[\"false_northing\",1640416.667],UNIT[\"US survey foot\",0.3048006096012192,AUTHORITY[\"EPSG\",\"9003\"]],AXIS[\"X\",EAST],AXIS[\"Y\",NORTH],AUTHORITY[\"EPSG\",\"6418\"]],
VERT_CS[\"NAVD88 height - Geoid12B (ftUS)\",VERT_DATUM[\"North American Vertical Datum 1988\",2005,AUTHORITY[\"EPSG\",\"5103\"]],UNIT[\"US survey foot\",0.3048006096012192,AUTHORITY[\"EPSG\",\"9003\"]],AUTHORITY[\"EPSG\",\"6360\"]]]"

What I would like to get programmatically is the same string but without the characters between [ ] after `VERT_CS. The scheme, compared to the initial is the same, except that the VERT_CS[...] part has been removed:

WKT = COMPD_CS[ PROJCS[....] ]

   WKT2 = "COMPD_CS[\"NAD83(2011) / California zone 2 (ftUS) + NAVD88 height - Geoid12B (ftUS)\",
PROJCS[\"NAD83(2011) / California zone 2 (ftUS)\",GEOGCS[\"NAD83(2011)\",DATUM[\"NAD83_National_Spatial_Reference_System_2011\",SPHEROID[\"GRS 1980\",6378137,298.257222101,AUTHORITY[\"EPSG\",\"7019\"]],AUTHORITY[\"EPSG\",\"1116\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"6318\"]],PROJECTION[\"Lambert_Conformal_Conic_2SP\"],PARAMETER[\"standard_parallel_1\",39.83333333333334],PARAMETER[\"standard_parallel_2\",38.33333333333334],PARAMETER[\"latitude_of_origin\",37.66666666666666],PARAMETER[\"central_meridian\",-122],PARAMETER[\"false_easting\",6561666.667],PARAMETER[\"false_northing\",1640416.667],UNIT[\"US survey foot\",0.3048006096012192,AUTHORITY[\"EPSG\",\"9003\"]],AXIS[\"X\",EAST],AXIS[\"Y\",NORTH],AUTHORITY[\"EPSG\",\"6418\"]]]"

How can I do it programmatically with R?

1

There are 1 best solutions below

0
On BEST ANSWER

You can do this using gsub, specifying a regular expression that matches "VERT_CS" followed by square brackets containing any characters. You can choose to replace this with an empty VERT_CS[] or just removing it altogether.

To just empty the bracket contents you can do:

gsub("VERT_CS\\[.*?\\]+,", "VERT_CS[],", WKT)

To remove the entry completely you can do:

gsub("VERT_CS\\[.*?\\]+,", "", WKT)

Using the second version, we can see this completely removes VERT_CS[...]:

cat(WKT)
COMPD_CS["NAD83(2011) / California zone 2 (ftUS) + NAVD88 height - Geoid12B (ftUS)",
PROJCS["NAD83(2011) / California zone 2 (ftUS)",GEOGCS["NAD83(2011)",DATUM["NAD83_National_Spatial_Reference_System_2011",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","1116"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","6318"]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",39.83333333333334],PARAMETER["standard_parallel_2",38.33333333333334],PARAMETER["latitude_of_origin",37.66666666666666],PARAMETER["central_meridian",-122],PARAMETER["false_easting",6561666.667],PARAMETER["false_northing",1640416.667],UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]],AXIS["X",EAST],AXIS["Y",NORTH],AUTHORITY["EPSG","6418"]],
VERT_CS["NAVD88 height - Geoid12B (ftUS)",VERT_DATUM["North American Vertical Datum 1988",2005,AUTHORITY["EPSG","5103"]],UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]],AUTHORITY["EPSG","6360"]]]

versus

COMPD_CS["NAD83(2011) / California zone 2 (ftUS) + NAVD88 height - Geoid12B (ftUS)",
PROJCS["NAD83(2011) / California zone 2 (ftUS)",GEOGCS["NAD83(2011)",DATUM["NAD83_National_Spatial_Reference_System_2011",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","1116"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","6318"]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",39.83333333333334],PARAMETER["standard_parallel_2",38.33333333333334],PARAMETER["latitude_of_origin",37.66666666666666],PARAMETER["central_meridian",-122],PARAMETER["false_easting",6561666.667],PARAMETER["false_northing",1640416.667],UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]],AXIS["X",EAST],AXIS["Y",NORTH],AUTHORITY["EPSG","6418"]],
UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]],AUTHORITY["EPSG","6360"]]]