Extracting significand and exponent for base-10 representation from decimal formatted string

4.5k Views Asked by At

I am looking for an efficient Python implementation of a function that takes a decimal formatted string, e.g.

2.05000
200
0.012

and returns a tuple of two integers representing the significand and exponent of the input in base-10 floating point format, e.g.

(205,-2)
(2,2)
(12,-3)

List comprehension would be a nice bonus.

I have a gut feeling that there exists an efficient (and possibly Pythonic) way of doing this but it eludes me...


Solution applied to pandas

import pandas as pd
import numpy as np
ser1 = pd.Series(['2.05000', '- 2.05000', '00 205', '-205', '-0', '-0.0', '0.00205', '0', np.nan])

ser1 = ser1.str.replace(' ', '')
parts = ser1.str.split('.').apply(pd.Series)

# remove all white spaces
# strip leading zeros (even those after a minus sign)
parts.ix[:,0] = '-'*parts.ix[:,0].str.startswith('-') + parts.ix[:,0].str.lstrip('-').str.lstrip('0')

parts.ix[:,1] = parts.ix[:,1].fillna('')        # fill non-existamt decimal places
exponents = -parts.ix[:,1].str.len()
parts.ix[:,0] += parts.ix[:,1]                  # append decimal places to digit before decimal point

parts.ix[:,1] = parts.ix[:,0].str.rstrip('0')   # strip following zeros

exponents += parts.ix[:,0].str.len() - parts.ix[:,1].str.len()

parts.ix[:,1][(parts.ix[:,1] == '') | (parts.ix[:,1] == '-')] = '0'
significands = parts.ix[:,1].astype(float)

df2 = pd.DataFrame({'exponent': exponents, 'significand': significands})
df2

Input:

0      2.05000
1    - 2.05000
2       00 205
3         -205
4           -0
5         -0.0
6      0.00205
7            0
8          NaN
dtype: object

Output:

   exponent  significand
0        -2          205
1        -2         -205
2         0          205
3         0         -205
4         0            0
5         0            0
6        -5          205
7         0            0
8       NaN          NaN

[9 rows x 2 columns]
4

There are 4 best solutions below

2
On BEST ANSWER

Here's a straight-forward string processing solution.

def sig_exp(num_str):
    parts = num_str.split('.', 2)
    decimal = parts[1] if len(parts) > 1 else ''
    exp = -len(decimal)
    digits = parts[0].lstrip('0') + decimal
    trimmed = digits.rstrip('0')
    exp += len(digits) - len(trimmed)
    sig = int(trimmed) if trimmed else 0
    return sig, exp

>>> for x in ['2.05000', '200', '0.012', '0.0']:
    print sig_exp(x)

(205, -2)
(2, 2)
(12, -3)
(0, 0)

I'll leave the handling of negative numbers as an exercise for the reader.

1
On

Take a look at decimal.Decimal:

>>> from decimal import Decimal
>>> s = '2.05000'
>>> x = Decimal(s)
>>> x
Decimal('2.05000')
>>> x.as_tuple()
DecimalTuple(sign=0, digits=(2, 0, 5, 0, 0, 0), exponent=-5)

Does almost what you need, just convert the DecimalTuple to your desired Format, for example:

>>> t = Decimal('2.05000').as_tuple()
>>> (''.join(str(x) for i,x in enumerate(t.digits) if any(t.digits[i:])),
... t.exponent + sum(1 for i,x in enumerate(t.digits) if not 
... any (t.digits[i:])))
('205', -2)

Just a sketch, but satisfies your three testcases.

You might want to .normalize() your Decimal before you process it .as_tuple() (thanks @georg), this takes care of trailing zeros. This way, you won't need to do that much formatting:

>>> Decimal('2.05000').normalize().as_tuple()
DecimalTuple(sign=0, digits=(2, 0, 5), exponent=-2)

So your function can be written as:

>>> def decimal_str_to_sci_tuple(s):
...  t = Decimal(s).normalize().as_tuple()
...  return (int(''.join(map(str,t.digits))), t.exponent)
... 
>>> decimal_str_to_sci_tuple('2.05000')
(205, -2)
>>> decimal_str_to_sci_tuple('200')
(2, 2)
>>> decimal_str_to_sci_tuple('0.012')
(12, -3)

(be sure to add t.sign when supporting negative numbers though).

0
On

If you are looking for scientific notation, you could use decimal and format as:

numbers = ['2.05000','200','0.01','111']
print ["{:.2E}".format(Decimal(n)) for n in numbers]

output:

['2.05E+0', '2.00E+2', '1.00E-2']

If you are looking for,

  1. Get the digit other than 0 in the right hand side
  2. Get the scientific notation till right hand side digit

    from decimal import  *
    numbers = ['2.05000','200','0.01','111']
    numbers = [ n.rstrip('0') if '.' in n else n  for n in numbers ] #strip right         zeros if found after .
    for n in numbers:
        if '.' in n:
            num = n.split('.')[0]
            dec = n.split('.')[1]
            tenthNumber = len(dec)
            print (Decimal(num+dec), -1 * tenthNumber)
        elif n.endswith('0'): 
            tenthNumber = 0
            revN = n[::-1]
            for i in range(len(revN)):
                if revN[i]=='0':
                    tenthNumber = tenthNumber + 1
                else:
                    break
            print (n[:(len(n)-tenthNumber)], str(tenthNumber))
    
        else:
            print (n,0)
    

Output:

(Decimal('205'), -2)
('2', '2')
(Decimal('1'), -2)
('111', 0)
0
On

Here's one method using venpa's formatting string (as all credit goes to him) and starting with numbers instead of strings. If you can afford rounding the significand (e.g. after 2 digits), you could simply write:

def scd_exp(scnum):
    scnum = "{:.2e}".format(scnum)
    return (float(scnum[:4]),int(scnum[-3:]))


numbers = [2.05, 205, 0.0001576, 111]
for number in numbers:
    print(scd_exp(number))

result is

(2.05, 0)
(2.05, 2)
(1.58, -4)
(1.11, 2)

If you want to set the significand rounding by yourself each time you call the function (let's say to 6 digits for the example), you could write

def scd_exp(scnum, roundafter):
    formstr = "".join(("{:.",str(roundafter),"e}"))
    scnum = formstr.format(scnum)     
    return (float(scnum[:roundafter+2]),int(scnum[-3:]))


numbers = [2.05, 205, 0.000157595678, 111]
for number in numbers:
    print(scd_exp(number, 6))

which gives back

(2.05, 0)
(2.05, 2)
(1.575957, -4)
(1.11, 2)