By way of background and spec introduction, Singer taps and target communicate with each other by way of SCHEMA and RECORD message types.
SCHEMA messages are sent from the tap to the target first, and they tell the target what kind of tables need to be created. They allow the target to prepare the destination platform (if necessary) for the data which will arrive.
RECORD messages arrive after the SCHEMA message, and they contain the actual data.
What's happening here
This symptom (columns being created even when the corresponding fields are deselected) occurs when SCHEMA messages are not filtered and are just passed, raw, from the source's data catalog. Ideally SCHEMA records should be filtered based on the same selection logic that RECORD messages are filtered on, but this is not always the case.
Then, because the SCHEMA messages arrive before the RECORD messages, the target will go ahead and create a destination column for all fields, even those which are not going to have data when RECORD messages arrive.
How to fix it
The most direct fix is for the tap developer to add filtering logic into SCHEMA messages, just as then have for RECORD messages. Most tap maintainers will accept an Issue or Pull Request on this topic. If the tap is built on Meltano's SDK, then SCHEMA messages will automatically be filtered, along with RECORD messages - so another option is to port to the SDK or for the user to migrate to a variant of the tap that is already using the SDK.
Full disclosure: I work for Meltano and I work on Meltano's SDK for Singer Taps and Targets (https://sdk.meltano.com). I am also the author of several taps and targets.
A little background info
By way of background and spec introduction, Singer taps and target communicate with each other by way of
SCHEMA
andRECORD
message types.SCHEMA
messages are sent from the tap to the target first, and they tell the target what kind of tables need to be created. They allow the target to prepare the destination platform (if necessary) for the data which will arrive.RECORD
messages arrive after theSCHEMA
message, and they contain the actual data.What's happening here
This symptom (columns being created even when the corresponding fields are deselected) occurs when
SCHEMA
messages are not filtered and are just passed, raw, from the source's data catalog. IdeallySCHEMA
records should be filtered based on the same selection logic thatRECORD
messages are filtered on, but this is not always the case.Then, because the
SCHEMA
messages arrive before theRECORD
messages, the target will go ahead and create a destination column for all fields, even those which are not going to have data whenRECORD
messages arrive.How to fix it
The most direct fix is for the tap developer to add filtering logic into
SCHEMA
messages, just as then have forRECORD
messages. Most tap maintainers will accept an Issue or Pull Request on this topic. If the tap is built on Meltano's SDK, thenSCHEMA
messages will automatically be filtered, along withRECORD
messages - so another option is to port to the SDK or for the user to migrate to a variant of the tap that is already using the SDK.Full disclosure: I work for Meltano and I work on Meltano's SDK for Singer Taps and Targets (https://sdk.meltano.com). I am also the author of several taps and targets.