Reskewing GCP Document AI Result

45 Views Asked by At

GCP's Document AI is pre-processing images to remove things like skew. The bounding boxes it produces correspond to the pre-processed image, not the image sent to the API. I need to reskew them so that they correspond to the original image. I am able to bring the bounding boxes back to their original rotation, but I can't figure out how to rescale/adjust them for the crop that GCP added.

Below are links to the original image, processed image, and what the final bounding boxes look like on the page. As you can see, skew is being corrected for, but the boxes are not aligning with the words on the original image. Notably, you'll see that the title is 15% lower on the page than it should be.

Original image:

original image

gcp pre-processed image:

gcp pre-processed image

Text output:

text output

The API returns a transformation matrix:

[{'rows': 2, 'cols': 3, 'type': 6, 'data': 'r6uHAEEXyL/OYqWImW3vP2QKzkdFXH9AzmKliJlt77+vq4cAQRfIv5yIBglXqaZA'}]

I decoded it into a skew angle of -100.84832000732422 using

rows = matrix_data[0]["rows"]
cols = matrix_data[0]["cols"]
data_type_code = matrix_data[0]["type"]
data_encoded = matrix_data[0]["data"]
data_binary = base64.b64decode(data_encoded)
matrix = np.frombuffer(data_binary, dtype=dtype).reshape((rows, cols))
rotation_radians = math.atan2(matrix[1, 0], matrix[0, 0])
skew_angle = math.degrees(rotation_radians)

Each bounding box is being parsed with this:

box = Box(root=\[Point(x=v.get("x", -1), y=v.get("y", -1)) for v in vertices\])
box = box.skew_box(self.skew_angle)
box = box.scale_box(self.scale_x, self.scale_y)

This is how the ratios are calculated:

# These are the ratios for original image sent to the API
x_ratio = x_len / y_len if y_len != 0 else 1
y_ratio = y_len / x_len if x_len != 0 else 1
longest_side = "x" if x_len > y_len else "y"

# These are the ratios from the GCP-processed image
gcp_x_ratio = gcp_height / gcp_width if gcp_width != 0 else 1
gcp_y_ratio = gcp_width / gcp_height if gcp_height != 0 else 1
gcp_longest_side = "x" if gcp_width > gcp_height else "y"

if longest_side != gcp_longest_side:
gcp_x_ratio, gcp_y_ratio = gcp_y_ratio, gcp_x_ratio

self.scale_x = x_ratio / gcp_x_ratio
self.scale_y = y_ratio / gcp_y_ratio

This is how the scale is calculated

def scale_box(self, x: float, y: float, origin: Point = Point(x=0.5, y=0.5)):
    def scale_point_about_origin(point: Point, origin: Point) -> Point:
        translated_x = point.x - origin.x
        translated_y = point.y - origin.y
        scaled_x = translated_x * x
        scaled_y = translated_y * y
        return Point(x=scaled_x + origin.x, y=scaled_y + origin.y)

    return Box(root=[scale_point_about_origin(p, origin) for p in self.root])

For reference, the bounding boxes are being reskewed using the following code:

def skew_box(self, angle: float, width: float = 1, height: float = 1):
    if angle == 0:
        return self

    theta = math.radians(-angle)
    cos_theta = np.cos(theta)
    sin_theta = np.sin(theta)
    origin_x = width / 2
    origin_y = height / 2

    def skew_point(point: Point) -> Point:
        x, y = point.x - origin_x, point.y - origin_y
        x_rotated = x * cos_theta - y * sin_theta
        y_rotated = x * sin_theta + y * cos_theta
        x_rotated, y_rotated = x_rotated + origin_x, y_rotated + origin_y

        return Point(x=x_rotated, y=y_rotated)

    return Box([skew_point(p) for p in self.root])
0

There are 0 best solutions below