Finding string or empty column in pandas

44 Views Asked by thzCU At 15 August 2023 at 15:04

This is to create a general use case of cleaning up tabular Data from the SEC EDGAR database. Considering this below table, I need to remove columns that may contain '[any_int]'

Column A	Column B	Column C	Column D
val	val	nan	nan
val	val	nan	[1]
val	val	nan	nan

The assumption here is that column names are unknown, as each company in the Database will have unique table structures.

cols_to_drop = df.columns[df.columns.str.contains('\[')]

Using the string.contains() method yields no results, though I was expecting it to assign Column D

Original Q&A

There are 1 best solutions below

Navkar Jain On 15 August 2023 at 15:13

A simple approach to get desired output.

import pandas as pd
import numpy as np

# Sample data (replace this with your actual data)
data = {
    'Column A': ['val', 'val', 'val'],
    'Column B': ['val', 'val', 'val'],
    'Column C': [np.nan, np.nan, np.nan],
    'Column D': [np.nan, '[1]', np.nan]
}

# Convert the data dictionary into a DataFrame
df = pd.DataFrame(data)

# Regular expression pattern to match any integer value inside square brackets
pattern = r'\[\d+\]'

# Identify columns containing the specified pattern
columns_to_drop = [col for col in df.columns if any(df[col].apply(lambda x: re.search(pattern, str(x))))]
df_cleaned = df.drop(columns=columns_to_drop)

print("Original DataFrame:")
print(df)
print("\nCleaned DataFrame:")
print(df_cleaned)

Finding string or empty column in pandas

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DATAFRAME

Related Questions in EDGAR

Trending Questions

Popular # Hahtags

Popular Questions