Break/wrap long text of column names in Pandas dataframe plain text to_string output?

99 Views Asked by At

Consider this example:

import pandas as pd

df = pd.DataFrame({
  "LIDSA": [0, 1, 2, 3],
  "CAE": [3, 5, 7, 9],
  "FILA": [1, 2, 3, 4], # 2 is default, so table idx 1 is default
  "VUAMA": [0.5, 1.0, 1.5, 2.0],
})
df_colnames = { # https://stackoverflow.com/q/48243818
  "LIDSA": "Lorem ipsum dolor sit amet",
  "CAE": "Consectetur adipiscing elit",
  "FILA": "Fusce imperdiet libero arcu",
  "VUAMA": "Vitae ultricies augue molestie ac",
}

# "Pandas autodetects the size of your terminal window if you set pd.options.display.width = 0" https://stackoverflow.com/q/11707586
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', 0, 'max_colwidth', 20, 'display.float_format', "{:.2f}".format):
  df_str = df.rename(df_colnames,axis=1).to_string()

print(df_str)

This results with the terminal stdout printout, at the time 111 characters wide:

   Lorem ipsum dolor sit amet  Consectetur adipiscing elit  Fusce imperdiet libero arcu  Vitae ultricies augue
 molestie ac
0                           0                            3                            1
        0.50
1                           1                            5                            2
        1.00
2                           2                            7                            3
        1.50
3                           3                            9                            4
        2.00

So, only the last column got line-broken (and correspondingly, the values for it). I would have preferred that each long column name gets line-broken / word-wrapped at say 20 characters, and then the values output correspondingly, something like:

   Lorem ipsum dolor      Consectetur  Fusce imperdiet    Vitae ultricies
            sit amet  adipiscing elit      libero arcu  augue molestie ac
0                  0                3                1               0.50
1                  1                5                2               1.00
2                  2                7                3               1.50
3                  3                9                4               2.00

I thought 'max_colwidth', 20 would do that, but apparently it doesn't.

I even tried adding explicit linebreaks in the long column names, but they just get rendered as \n, and the column name is still in one line (as noted also in Linebreaks in pandas column names)

So, is it possible to "word-wrap"/"line break" long column names in Pandas for plain text string output?

2

There are 2 best solutions below

0
mozway On BEST ANSWER

You could use textwrap.wrap and tabulate:

#  pip install tabulate
from textwrap import wrap
from tabulate import tabulate

df_colnames_wrap = {k: '\n'.join(wrap(v, 20))
                    for k,v in df_colnames.items()}

print(tabulate(df.rename(columns=df_colnames_wrap),
               headers='keys', tablefmt='plain'))

Output:

      Lorem ipsum dolor        Consectetur    Fusce imperdiet      Vitae ultricies
               sit amet    adipiscing elit        libero arcu    augue molestie ac
 0                    0                  3                  1                  0.5
 1                    1                  5                  2                  1
 2                    2                  7                  3                  1.5
 3                    3                  9                  4                  2

With float formatting:

print(tabulate(df.rename(columns=df_colnames_wrap)
                 .convert_dtypes(),
               headers='keys', tablefmt='plain',
               floatfmt='.2f'
              ))

Output:

      Lorem ipsum dolor        Consectetur    Fusce imperdiet      Vitae ultricies
               sit amet    adipiscing elit        libero arcu    augue molestie ac
 0                    0                  3                  1                 0.50
 1                    1                  5                  2                 1.00
 2                    2                  7                  3                 1.50
 3                    3                  9                  4                 2.00
4
user3369545 On

Pandas doesn't offer a built-in way to automatically wrap or break long column names across lines when you're turning a DataFrame into a string. The setting called max_colwidth only affects the data inside the table, not the column titles themselves. If you've tried adding your own line breaks in column names, you've noticed they don't actually change how the title is displayed; instead, you see "\n" characters in your output, which isn't what you want.

To get your column names to wrap onto multiple lines, you'll have to get a bit creative and do it yourself. You'll need to:

Write a function that can take a long column name and break it up into smaller parts, each part being short enough (for example, no more than 20 characters) to fit on its own line. Use this function to process all your column names, then adjust how your DataFrame is displayed so that these multi-line names look right. This method involves manually changing the column names to include line breaks where you want them and then making sure the DataFrame's string representation (when you print it out) respects these breaks. This is more about preparing your data and display settings before you actually print or show your DataFrame.

import pandas as pd

# Original DataFrame
df = pd.DataFrame({
    "LIDSA": [0, 1, 2, 3],
    "CAE": [3, 5, 7, 9],
    "FILA": [1, 2, 3, 4],
    "VUAMA": [0.5, 1.0, 1.5, 2.0],
})

# Dictionary with long column names
df_colnames = {
    "LIDSA": "Lorem ipsum dolor sit amet",
    "CAE": "Consectetur adipiscing elit",
    "FILA": "Fusce imperdiet libero arcu",
    "VUAMA": "Vitae ultricies augue molestie ac",
}

# Custom function to word-wrap text
def word_wrap(text, max_width):
    """
    Word-wrap text at a specified width. Attempts to break lines at word boundaries
    where possible.
    """
    words = text.split()
    lines = []
    current_line = []
    current_length = 0

    for word in words:
        if current_length + len(word) <= max_width:
            current_line.append(word)
            current_length += len(word) + 1  # +1 for space
        else:
            lines.append(' '.join(current_line))
            current_line = [word]
            current_length = len(word) + 1
    lines.append(' '.join(current_line))  # Add the last line

    return '\n'.join(lines)

# Apply word-wrap to column names
wrapped_colnames = {col: word_wrap(name, 20) for col, name in df_colnames.items()}

# Rename DataFrame columns
df = df.rename(columns=wrapped_colnames)

# Print the DataFrame with modified display settings
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', 0, 'max_colwidth', 20, 'display.float_format', "{:.2f}".format):
    print(df.to_string())