Where files from tap are kept in Meltano

298 Views Asked by At

I have the following combination of tap/target in Meltano: tap-marketo and target-s3-parquet.

I want to extract data from tap-marketo from data A to date B in the past.

I saw that we can only define start_date and max_export_days.

I have tried to start with start_date A and stop the run once I reach B. But this does not work.

The loader only emit the state once their work is completely done, and the target is not called. So a load was not done.

I also saw that, the export is being done.

{'run_id': '46ba5256-7019-48c7-890a-28746bb5272a', 'state_id': '2023-02-09T152428--tap-marketo--target-s3-parquet', 'stdio': 'stderr', 'cmd_type': 'extractor', 'name': 'tap-marketo', 'event': 'INFO GET: https://XXXXXXX/bulk/v1/activities/export/6636daf1-ad1e-41e1-b8d5-cdd31de5d4e0/file.json', 'level': 'info', 'timestamp': '2023-02-09T17:35:02.098016Z'}

But where do I find this file in my container?

I want to invoke the target separately but need to give the --input.

# meltano invoke target-s3-parquet --help
Environment 'dev' is active
Usage: target-s3-parquet [OPTIONS]

  Execute the Singer target.

Options:
  --input FILENAME          A path to read messages from instead of from
                            standard in.
  --config TEXT             Configuration file location or 'ENV' to use
                            environment variables.
  --format [json|markdown]  Specify output style for --about
  --about                   Display package metadata and settings.
  --version                 Display the package version.
  --help                    Show this message and exit.
1

There are 1 best solutions below

0
On

To invoke the tap and target separately

meltano invoke tap-marketo > ./outfile.singer.jsonl
cat ./outfile.singer.jsonl | meltano invoke target-s3-parquet

Which is equivalent to:

meltano invoke tap-marketo > ./outfile.singer.jsonl
meltano invoke target-s3-parquet --input=./outfile.singer.jsonl

In both of the above cases, you can retry just the second step.

However, if you invoke both together, using meltano run tap-marketo target-s3-parquet or similar, the intermediate file will not be stored on disk, and you would not be able to replay just the target-side processing.

Why these files aren't stored on disk by default

The stream of messages you'll see in the examples above will necessarily contain potentially secret or confidential data, and the volume contained within the stream can be extremely large, since it contains the records themselves as well as metadata used for coordinating between the tap and target. For this reason, this stream of messages from tap to target is not stored to disk during a normal sync operation.