I am working on a project where I have to determine whether a particular call or import is:
- From the standard library of the language (Python) I'm using. (I am already considering to use
sys.stdlib_module_namesfor it) - From a third-party library, or
- An API call made to some service from within the repository.
Is there an efficient way or tool that could help me quickly differentiate between these types of calls or imports? I'm primarily using Python, but methods for other languages are welcome as well.
I am working on a project where I have to collect a dataset of library calls that are made within within that repository.
I am working on a project wherein I aim to compile a dataset of function calls made within a given repository from Github.
So at first, I download any given python repository from Github.
Then my main objectives are:
- To extract all function calls made within the target repository.
- To gather details of these function calls, including the arguments they use.
- For this purpose, I am employing the Python AST (Abstract Syntax Tree) parser to detect and catalogue function calls and their respective arguments.
- My entire analysis pipeline is based within a Python script leveraging the AST module.
- Now I have to determine which of these function calls originate from within the repository itself.
For example, if there is a call
file_b.py
def abc():
....
file_a.py
import numpy as np
from file_b import abc
....
def foo():
..
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)
...
..
c = abc()
I want to only capture abc (as it is defined in that repository) and not the calls from numpy module.
You can use inspect since this module seems written for your purposes in mind. A trivial way to differentiate is using the location in the disk of the library given a function using inspect by e.g. :
result is:
In your case, you seem to have 3 categories.
sys.executable)subprocess.check_output(['git', 'rev-parse', '--show-toplevel'])(from within the repository).Inspect can do a lot more than give you the location on the disk in more complex situations. Here is an example along with some code. In PythonModuleOfTheWeek there are more uses and here you can find some further examples.
A practical note: importing a module means running foreign code, so make sure you trust the code, or you run it using some sandboxed environment/manner. But how to do the later is a question on its own.
A theoretical note: In extreme cases this problem is I think undecidable. The formal proof might involve using a halting function and another non-halting. Any analysis that could discriminate between the two, would therefore solve Turing's halting problem. For our case using inspect, this means that there exist modules that importing them can take potentially forever. Practically this should not be a problem because any reasonable module should be able to be imported in reasonable time.