Matching controls with time-dependent covariates to treated cases with varying treatment time without replacement

209 Views Asked by At

I want to estimate the effect of treatment X on variable Y by matching for covariates balance on treatment and control groups using R and the MatchIt package.

I'm compiling a retrospective cohort, and the treatment-time varies across the treatment cases. Moreover, I have multiple covariates (COV_A, COV_B...) that depend on the treatment time. I use a large database to mine controls and query the dependent covariates for a given treatment time. This is a large sample with thousand of treated cases, tens of thousands of potential controls, and many covariates.

To achieve this, I used SQL query to manually perform an "exact match" on some of the covariates as a kind of "initial matching" (for example, checking which controls have been monitored long enough to be treated in a given time). This initial step resulted in a table with multiple rows of potential control cases to match each treated case (TREAD_ID). For each row/case of potential control, I mined the time-depended covariates respecting the treated case treatment time.

The result is a table of potential controls that are stratified for each treatment case. This means that a control case can appear more than once with a different or the same treatment time, and the covariates change accordingly.

My intention is to use the matchit function to perform some kind of distance matching inside a stratum matching using method = "nearest" and exact="TREAT_ID" for example.

Simplified Example Table

CONTROL_ID TREAT_ID TREATMENT_TIME COV_A COV_B
C-1 T-1 1.5 0.6 185
C-2 T-1 1.5 0.7 123
C-3 T-1 1.5 0.8 182
C-4 T-1 1.5 0.6 185
C-1 T-2 2.2 0.9 160
C-2 T-2 2.2 1.4 150
C5 T-2 2.2 0.9 48
C-6 T-2 2.2 3.3 113

* Notice that controls C-1 and C-2 appears twice...

The Question:

I want to do matching "without replacement" (each control unit is matched to only one treated unit) - How can I achieve this if the initial table contains duplicates of the same control cases (some of which with different values for covariates)?

I also want to be able to:

  • have control over the order of matching, and begin with the smallest stratum and move ahead...
  • be able to achieve this also with 1:k matching ratio

(Maybe my whole attitude to the problem is wrong, I'll also be happy to hear different solutions...)

1

There are 1 best solutions below

0
On BEST ANSWER

TL;DR: I used @Noah's suggestion and the unit.id argument.

Full solution

I united the treated cases into the stratified control cases from the example in the question and added the MATCHING_STRATA and MATCHING_CASE columns:

ID MATCHING_STRATA MATCHIN_CASE TREATMENT_TIME COV_A COV_B
T-1 T-1 TREATED 1.5 1.2 112
C-1 T-1 CONTROL 1.5 0.6 185
C-2 T-1 CONTROL 1.5 0.7 123
C-3 T-1 CONTROL 1.5 0.8 182
C-4 T-1 CONTROL 1.5 0.6 185
T-2 T-2 TREATED 2.2 1.6 140
C-1 T-2 CONTROL 2.2 0.9 160
C-2 T-2 CONTROL 2.2 1.4 150
C-5 T-2 CONTROL 2.2 0.9 48
C-6 T-2 CONTROL 2.2 3.3 113

And then used the matchit function with exact="MATCHING_STRATA" to look into each stratum individually and unit.id="ID" to declare no replacement all across strata:

MatchIt::matchit(MATCHING_CASE ~ COV_A + COV_B, 
                 data = df, 
                 method = "nearest",
                 exact="MATCHING_STRATA",
                 unit.id="ID",
                 replace = FALSE)