Converting .txt file to load into dataframe

520 Views Asked by At

I have a text file (.txt) something like this:

{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-12-10T02:14:50", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
.
.
.
.
.
.
.

How to load it into a dataframe?

2

There are 2 best solutions below

0
Tyler On BEST ANSWER

Every line looks to be a separate json object. Using just Python:

  • read every line of the file to a string
  • convert each line from json to its Python dictionary equivalent
  • append those dicts to a list
  • convert the list of dicts to a Pandas DataFrame
import pandas as pd
import json

with open('data.json') as f:
    lines = f.readlines()

data = []
for line in lines:
    data.append(json.loads(line))

df = pd.DataFrame(data)

df

Looks like

accountNumber   customerId  creditLimit availableMoney  transactionDateTime transactionAmount   merchantName    acqCountry  merchantCountryCode posEntryMode    posConditionCode    merchantCategoryCode    currentExpDate  accountOpenDate dateOfLastAddressChange cardCVV enteredCVV  cardLast4Digits transactionType echoBuffer  currentBalance  merchantCity    merchantState   merchantZip cardPresent posOnPremises   recurringAuthInd    expirationDateKeyInMatch    isFraud
0   737265056   737265056   5000.0  5000.0  2016-08-13T14:27:32 98.55   Uber    US  US  02  01  rideshare   06/2023 2015-03-14  2015-03-14  414 414 1803    PURCHASE        0.0             False           False   False
1   737265056   737265056   5000.0  5000.0  2016-10-11T05:05:54 74.51   AMC #191138 US  US  09  01  entertainment   02/2024 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             True            False   False
2   737265056   737265056   5000.0  5000.0  2016-11-08T09:18:39 7.47    Play Store  US  US  09  01  mobileapps  08/2025 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             False           False   False
3   737265056   737265056   5000.0  5000.0  2016-12-10T02:14:50 7.47    Play Store  US  US  09  01  mobileapps  08/2025 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             False           False   False

If the file were just one json object rather than a new json object every line, you could just use pandas.read_json(file_path)

0
Shreya Adhvaryyu On

@Abhishek Mishra, just put 'transactions.txt' instead of 'data.json'. it worked like a magic. I have been struggling with the same task.

Thanks @Tyler