i am using bytecode analysis to get all imported classes of a classfile (with BCEL). Now, when i read the constant pool, not all imported classes are mentioned as CONSTANT_Class (see spec) but only as CONSTANT_Utf8. My question now: Am i not able to rely solely on the CONSTANT_Class-entries in the constant pool to read the imported files? do i really have to look at every entry and guess, if its a class name? This also does not seem to be correct in every situation imo. Or do i have to read through the whole bytecode? regards
missing classes in classfiles constant pool
910 Views Asked by wrm At
2
There are 2 best solutions below
1

See JVMS 4.2, The Internal Form of FQ Class and Interface Names.
Nutshell: the class structures point to UTF8 entries.
(Or are you instead saying that not all referenced classes are represented by a class and name entry?)
FWIW, be wary of relying solely on this information to determine dependencies as classes can be loaded dynamically and may not appear at all.
No, it is not correct to use CONSTANT_Class_info entries alone to discover dependencies on other classes/interfaces. If you're parsing input files you trust or can tolerate incorrect information, you can get away with parsing the constant pool only except for one corner case. To get precise information on arbitrary input you need to parse the whole class file. (I assume by "dependencies" you mean those classes or interfaces without which loading or linking a class may result in exceptions, as described in JVMS chapter 5. This doesn't include classes obtained via
Class.forName
or other reflective means.)Consider the following class.
javap -p -v Main.class
prints:The class
Foo
, referenced as a parameter to the methodidentity
, does not appear in the constant pool as a CONSTANT_Class_info entry. It does appear in the method descriptor foridentity
(entry #12). Field descriptors may also reference classes not appearing as CONSTANT_Class_info entries. Thus to find all the dependencies from the constant pool alone, you need to look at all UTF8 entries.The corner case: Some UTF8 entries may exist to be referenced by CONSTANT_String_info entries. Duplicate UTF8 entries will be merged, so one UTF8 entry might be a method descriptor, a string literal, or both. If you're only parsing the constant pool, you must live with this ambiguity (probably by overapproximating and treating it as a dependency).
If you trust the input to have been produced by a well-behaved Java compiler under your control, you can parse all UTF8 entries, mindful of the string corner case, and stop reading here. If you need to defend against an attacker feeding your tool handcrafted class files (e.g., you're writing a decompiler and the attacker wants to prevent decompilation), you need to parse the entire class file. Here's a few examples of the potential problems.
Main
. The JVM may or may not try to resolve this reference (JVMS 5.4 permits both lazy and eager loading). As the class exists, either way, no error will be raised, so this extra entry is harmless, but it will fool tools looking at the constant pool into thinking Thread is a dependency.That's just what I came up with off the top of my head. A clever attacker going through the JVMS with a fine-tooth comb could probably find more places to add entries to the constant pool that look used but aren't. If you need precise information even in the face of an attacker, you need to parse the whole class file and understand how a JVM will use it.