Python in Emacs: Jump to the definition of a global constant

724 Views Asked by At

After creating a TAGS file for my project (find . -name "*.py" | xargs etags) I can use M-. to jump to the definition of a function. That's great. But if I want the definition of a global constant -- say, x = 3 -- Emacs does not know where to find it.

Is there any way to explain to Emacs where constants, not just functions, are defined? I don't need this for anything defined within a function (or a for-loop or whatnot), just global ones.

More detail

Previous incarnations of this question used "top-level" instead of "global", but with @Thomas's help I realized that's imprecise. What I meant by a global definition is anything a module defines. Thus in

import m

if m.foo:
  def f():
    x = 3
    return x
  y, z = 1, 2
else:
  def f():
    x = 4
    return x
  y, z = 2, 3
del(z)

the things defined by the module are f and y, despite the sites of those definitions being indented to the right. x is a local variable, and z's definition is deleted before the end of the module.

I believe that a sufficient rule to capture all global assignments would be to simply ignore them inside def expressions (noting that the def keyword itself might be indented at any level) and otherwise parse for any symbol to the left of = (noting that there might be more than one, because Python supports tuple assignments).

2

There are 2 best solutions below

13
On BEST ANSWER

Etags does not seem to be able to produce such information for Python files which you can easily verify by running it on a trivial test file:

x = 3

def fun():
    pass

Running etags test.py produces a TAGS file with the following contents:

/tmp/test.py,13
def fun(3,7

As you can see, x is completely absent in this file, so Emacs has no chance of finding it.

Invoking etags' man page informs us that there is an option --globals:

   --globals
          Create tag entries for global variables in  Perl  and  Makefile.
          This is the default in C and derived languages.

However, this seems to be one of those sad cases where the documentation is out of sync with the implementation, as this option does not seem to exist. (etags -h does not list it either, only --no-globals - probably because --globals is the default, as it says above.)

However, even if --globals is the default, the documenation snippet says it applies only to Perl, Makesfiles, C, and derived languages. We can check whether this is the case by creating another trivial test file, this time for C:

int x = 3;

void fun() {
}

And indeed, running etags test.c produces the following TAGS file:

/tmp/test.c,26
int x 1,0
void fun(3,12

You see that x is correctly identified for C. So it seems that global variables are simply not supported by etags for Python.

However, because of Python's use of whitespace, it is not too hard to identify global variable definitions in source files - you can basically grep for all lines that don't start with whitespace but contain a = sign (of course, there are exceptions).

So, I wrote the following script to do that, which you can use as a drop-in replacement for etags, as it calls etags internally:

#!/bin/bash

# make sure that some input files are provided, or else there's
# nothing to parse
if [ $# -eq 0 ]; then
    # the following message is just a copy of etags' error message
    echo "$(basename ${0}): no input files specified."
    echo "  Try '$(basename ${0}) --help' for a complete list of options."
    exit 1
fi

# extract all non-flag parameters as the actual filenames to consider
TAGS2="TAGS2"
argflags=($(etags -h | grep '^-' | sed 's/,.*$//' | grep ' ' | awk '{print $1}'))
files=()
skip=0 
for arg in "${@}"; do
    # the variable 'skip' signals arguments that should not be
    # considered as filenames, even though they don't start with a
    # hyphen
    if [ ${skip} -eq 0 ]; then
        # arguments that start with a hyphen are considered flags and
        # thus not added to the 'files' array
        if [ "${arg:0:1}" = '-' ]; then
            if [ "${arg:0:9}" = "--output=" ]; then
                TAGS2="${arg:9}2"
            else
                # however, since some flags take a parameter, we also
                # check whether we should skip the next command line
                # argument: the arguments for which this is the case are
                # contained in 'argflags'
                for argflag in ${argflags[@]}; do
                    if [ "${argflag}" = "${arg}" ]; then
                        # we need to skip the next 'arg', but in case the
                        # current flag is '-o' we should still look at the
                        # next 'arg' so as to update the path to the
                        # output file of our own parsing below
                        if [ "${arg}" = "-o" ]; then
                            # the next 'arg' will be etags' output file
                            skip=2                  
                        else
                            skip=1
                        fi
                        break
                    fi
                done
            fi
        else
            files+=("${arg}")
        fi
    else
        # the current 'arg' is not an input file, but it may be the
        # path to the etags output file
        if [ "${skip}" = 2 ]; then
            TAGS2="${arg}2"
        fi
        skip=0
    fi
done

# create a separate TAGS file specifically for global variables
for file in "${files[@]}"; do
    # find all lines that are not indented, are not comments or
    # decorators, and contain a '=' character, then turn them into
    # TAGS format, except that the filename is prepended
    grep -P -Hbn '^[^[# \t].*=' "${file}" | sed -E 's/([0-9]+):([0-9]+):([^= \t]+)\s*=.*$/\3\x7f\1,\2/'
done |\

# count the bytes of each entry - this is needed for the TAGS
# specification
while read line; do
    echo "$(echo $line | sed 's/^.*://' | wc -c):$line"
done |\

# turn the information above into the correct TAGS file format
awk -F: '
    BEGIN { filename=""; numlines=0 }
    { 
        if (filename != $2) {
            if (numlines > 0) {
                print "\x0c\n" filename "," bytes+1

                for (i in lines) {
                    print lines[i]
                    delete lines[i]
                }
            }

            filename=$2
            numlines=0
            bytes=0
        }

        lines[numlines++] = $3;
        bytes += $1;
    }
    END {
        if (numlines > 0) {
            print "\x0c\n" filename "," bytes+1

            for (i in lines)
                print lines[i]
        }
    }' > "${TAGS2}"

# now run the actual etags, instructing it to include the global
# variables information
if ! etags -i "${TAGS2}" "${@}"; then
    # if etags failed to create the TAGS file, also delete the TAGS2
    # file
    /bin/rm -f "${TAGS2}"
fi

Store this script on your $PATH using a convenient name (I suggest sth. like etags+) and then call it like so:

find . -name "*.py" | xargs etags+

Besides creating a TAGS file, the script also creates a TAGS2 file for all global variable definitions, and adds a line to the original TAGS file that references the latter.

From the perspective of Emacs, there's not difference in usage.

0
On

The other answer only considers lines without indentation to contain global variable declarations. While this effectively excludes the bodies of function and class definition, it misses global variables defined inside if declarations. Such declarations are not uncommon, e.g., for constants that differ depending on the OS used, etc.

As argued in the comments under the question, any static analysis is necessarily imperfect because Python's dynamic nature makes it impossible to decide with perfect accuracy which variables are globally defined unless the program is actually executed.

Therefore, the following is also just an approximation. However, it does consider global variable definitions inside ifs as laid out above. As this is best done by actually analyzing the parse tree of the source file, a bash script is no longer an appropriate choice. Conveniently, though, Python itself allows easy access to a parse tree through its ast package which is used here.

from argparse import ArgumentParser, SUPPRESS
import ast
from collections import Counter
from re import match as re_startswith
import os
import subprocess
import sys

# extract variable information from assign statements
def process_assign(target, results):
    if isinstance(target, ast.Name):
        results.append((target.lineno, target.col_offset, target.id))
    elif isinstance(target, ast.Tuple):
        for child in ast.iter_child_nodes(target):
            process_assign(child, results)

# extract variable information from delete statements
def process_delete(target, results):
    if isinstance(target, ast.Name):
        results[:] = filter(lambda t: t[2] != target.id, results)
    elif isinstance(target, ast.Tuple):
        for child in ast.iter_child_nodes(target):
            process_delete(child, results)

# recursively walk the parse tree of the source file
def process_node(node, results):
    if isinstance(node, ast.Assign):
        for target in node.targets:
            process_assign(target, results)
    elif isinstance(node, ast.Delete):
        for target in node.targets:
            process_delete(target, results)
    elif type(node) not in [ast.FunctionDef, ast.ClassDef]:
        for child in ast.iter_child_nodes(node):
            process_node(child, results)

def get_arg_parser():
    # create the parser to configure
    parser = ArgumentParser(usage=SUPPRESS, add_help=False)

    # run etags to find out about the supported command line parameters
    dashlines = list(filter(lambda line: re_startswith('\\s*-', line),
                            subprocess.check_output(['etags', '-h'],
                                                    encoding='utf-8').split('\n')))

    # ignore lines that start with a dash but don't have the right
    # indentation
    most_common_indent = max([(v,k) for k,v in
                              Counter([line.index('-') for line in dashlines]).items()])[1]
    arglines = filter(lambda line: line.index('-') == most_common_indent, dashlines)

    for argline in arglines:
        # the various 'argline' entries contain the command line
        # arguments for etags, sometimes more than one separated by
        # commas.
        for arg in argline.split(','):
            if 'or' in arg:
                arg = arg[:arg.index('or')]
            if ' ' in arg or '=' in arg:
                arg = arg[:min(arg.index(' ') if ' ' in arg else len(arg),
                               arg.index('=') if '=' in arg else len(arg))]
                action='store'
            else:
                action='store_true'
            arg = arg.strip()
            if arg and not (arg == '-h' or arg == '--help'):
                parser.add_argument(arg, action=action)

    # we know we need files to run on
    parser.add_argument('files', nargs='*', metavar='file')

    # the parser is configured now to accept all of etags' arguments
    return parser


if __name__ == '__main__':
    # construct a parser for the command line arguments, unless
    # -h/-help/--help is given in which case we just print the help
    # screen
    etags_args = sys.argv[1:]
    if '-h' in etags_args or '-help' in etags_args or '--help' in etags_args:
        unknown_args = True
    else:
        argparser = get_arg_parser()
        known_ns, unknown_args = argparser.parse_known_args()

    # if something's wrong with the command line arguments, print
    # etags' help screen and exit
    if unknown_args:
        subprocess.run(['etags', '-h'], encoding='utf-8')
        sys.exit(1)

    # we base the output filename on the TAGS file name.  Other than
    # that, we only care about the actual filenames to parse, and all
    # other command line arguments are simply passed to etags later on
    tags_file = 'TAGS2' if hasattr(known_ns, 'o') is None else known_ns.o + '2'
    filenames = known_ns.files

    if filenames:
        # TAGS file sections, one per source file
        sections = []

        # process all files to populate the 'sections' list
        for filename in filenames:
            # read source file
            offsets = [0]; lines = []
            offsets, lines = [0], []
            with open(filename, 'r') as f:
                for line in f.readlines():
                    offsets.append(offsets[-1] + len(bytes(line, 'utf-8')))
                    lines.append(line)

            offsets = offsets[:-1]

            # parse source file
            source = ''.join(lines)
            root_node = ast.parse(source, filename)

            # extract global variable definitions
            vardefs = []
            process_node(root_node, vardefs)

            # create TAGS file section
            sections.append("")
            for lineno, column, varname in vardefs:
                line = lines[lineno-1]
                offset = offsets[lineno-1]
                end = line.index('=') if '=' in line else -1
                sections[-1] += f"{line[:end]}\x7f{varname}\x01{lineno},{offset + column - 1}\n"

        # write TAGS file
        with open(tags_file, 'w') as f:
            for filename, section in zip(filenames, sections):
                if section:
                    f.write("\x0c\n")
                    f.write(filename)
                    f.write(",")
                    f.write(str(len(bytes(section, 'utf-8'))))
                    f.write("\n")
                    f.write(section)
                    f.write("\n")

        # make sure etags includes the newly created file
        etags_args += ['-i', tags_file]

    # now run the actual etags to take care of all other definitions
    try:
        cp = subprocess.run(['etags'] + etags_args, encoding='utf-8')
        status = cp.returncode
    except:
        status = 1

    # if etags did not finish successfully, remove the tags_file
    if status != 0:
        try:
            os.remove(tags_file)
        except FileNotFoundError:
            # nothing to be removed
            pass

As in the other answer, this script is meant to be a drop-in replacement for the standard etags, as it calls the latter internally. Hence it also accepts all of etags' command line parameters (but currently does not respect -a).

It is recommended to amend the init file of one's shell with an alias, for instance by adding the following line to ~/.bashrc:

alias etags+=python3 -u /path/to/script.py

where /path/to/script.py is the path to the file to which the above code was saved. With such an alias in place, you can simply call

etags+ /path/to/file

etc.