How to separate Hijri (Arabic) and Gregorian date ranges from on column to separate columns

33 Views Asked by At

This is a sample of the data

Occasion Date Range
EVENT 1 2 / 1 / 1445 هـ - 17 / 6 / 1445 هـ - 20 / 7 / 2023 - 30 / 12 / 2023 م
EVENT 2 13 \ 1 \ 1445 هـ‍ - 16 \ 5 \ 1445 هـ‍ - 31 \ 7 \ 2023 م - 30 \ 11 \ 2023 م
EVENT 3 1445/4/11-1445/3/30 هـ - 15-2023/10/26  م

As you see the patterns differ depending on whether the event would last for few months so it looks like the first two examples, or if it will last for a few days such as the last example those two examples is what I found manually so if there is a way to detect other patterns first before separating them that would be the way to go. Any suggestions ?

I tried this to extract different patterns in the Date Range Column

import pandas as pd
import re

# Load the Excel file
file_path = 'Local Festivals.xlsx'
df = pd.read_excel(file_path)

date_range_column = 'Date Range'

# Extract unique date patterns from the column
unique_date_patterns = set()

for date_range in df[date_range_column]:
    # Use a regular expression to extract date patterns
    date_pattern = re.search(r'(\d+ \/ \d+ \/ \d+ \s?[هـم]\s?-?[^0-9]*\s?\d+ \/ \d+ \/ \d+ \s?[مهـ]\s?)', date_range)
    if date_pattern:
        unique_date_patterns.add(date_pattern.group(0))

# Print the unique date patterns and their counts
for i, pattern in enumerate(unique_date_patterns, start=1):
    print(f"Pattern {i}: {pattern}")

print(f"Total unique date patterns found: {len(unique_date_patterns)}")

But it didn't work with all the date ranges in the data

0

There are 0 best solutions below