I have a PySpark DataFrame and I want to apply "Initcap" on a specific column. However, I want this transformation only on records whose value is not all capitals. For example ,in the sample dataset below, I don't want to apply "Initcap" on USA:
# Prepare Data
data = [(1, "Italy"), \
(2, "italy"), \
(3, "USA"), \
(4, "China"), \
(5, "china")
]
# Create DataFrame
columns= ["ID", "Country"]
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
The expected output will be:
| ID | Country |
|---|---|
| 1 | 'Italy' |
| 2 | 'Italy' |
| 3 | 'USA' |
| 4 | 'China' |
| 5 | 'China' |
