Simplify statement '.'.join( string.split('.')[0:3] )

853 Views Asked by At

I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:

version = '1.2.3.4.5-RC4'                 # the end can vary a lot
api = '.'.join( version.split('.')[0:3] ) # extract '1.2.3'

Therefore I wonder:

  • Will this line be executed (interpreted) as creation of a temporary array (memory allocation), then concatenate the first three cells (again memory allocation)?
    Or is the python interpreter smart enough?
    (I am also curious about optimizations made in this context by Pythran, Parakeet, Numba, Cython, and other python interpreters/compilers...)

  • Is there a trick to write a replacement line more CPU efficient and still understandable/elegant?
    (You can provide specific Python2 and/or Python3 tricks and tips)

3

There are 3 best solutions below

0
On BEST ANSWER

I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:

A feel of CPU wasting is absolutely normal for C/C++ programmers facing python code. Your code:

version = '1.2.3.4.5-RC4'                 # the end can vary a lot
api = '.'.join(version.split('.')[0:3])   # extract '1.2.3'

Is absolutely fine in python, there is no simplification possible. Only if you have to do it 1000s of times, consider using a library function or write your own.

5
On

I have no idea of the CPU usage, for this purpose, but isn't it why we use high level languages in some way?

Another solution would be using regular expressions, using compiled pattern should allow background optimisations:

import re
version = '1.2.3.4.5-RC4'
pat = re.compile('^(\d+\.\d+\.\d+)')
res = re.match(version)
if res:
  print res.group(1)

Edit: As suggested @jonrsharpe, I did also run the timeit benchmark. Here are my results:

def extract_vers(str):
   res = pat.match(str)
   if res:
     return res.group(1)
   else:
     return False

>>> timeit.timeit("api1(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.9013631343841553
>>> timeit.timeit("api2(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.3482811450958252
>>> timeit.timeit("extract_vers(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.174590826034546

Edit: But anyway, some lib exist in Python, such as distutils.version to do the job. You should have a look on that answer.

4
On

To answer your first question: no, this will not be optimised out by the interpreter. Python will create a list from the string, then create a second list for the slice, then put the list items back together into a new string.

To cover the second, you can optimise this slightly by limiting the split with the optional maxsplit argument:

>>> v = '1.2.3.4.5-RC4'
>>> v.split(".", 3)
['1', '2', '3', '4.5-RC4']

Once the third '.' is found, Python stops searching through the string. You can also neaten slightly by removing the default 0 argument to the slice:

api = '.'.join(version.split('.', 3)[:3])

Note, however, that any difference in performance is negligible:

>>> import timeit
>>> def test1(version):
    return '.'.join(version.split('.')[0:3])

>>> def test2(version):
    return '.'.join(version.split('.', 3)[:3])

>>> timeit.timeit("test1(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0458565345561743
>>> timeit.timeit("test2(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0842980287537776

The benefit of maxsplit becomes clearer with longer strings containing more irrelevant '.'s:

>>> timeit.timeit("s.split('.')", setup="s='1.'*100")
3.460900054011617
>>> timeit.timeit("s.split('.', 3)", setup="s='1.'*100")
0.5287887450379003