Label Studio: Access file specified in task record

562 Views Asked by At

I am trying to implement a pre-labeler for Label Studio. The documentation (e.g. https://labelstud.io/guide/ml_create.html#Example-inference-call) is not very helpful here. I have imported images locally and need to read them from a task record to pass them to my model. When I print a task record, it looks like this:

{
  "id": 7,
  "data": {
    "image": "/data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg"
  },
  "meta": {},
  "created_at": "2022-12-29T00:49:34.141715Z",
  "updated_at": "2022-12-29T00:49:34.141720Z",
  "is_labeled": false,
  "overlap": 1,
  "inner_id": 7,
  "total_annotations": 0,
  "cancelled_annotations": 0,
  "total_predictions": 0,
  "comment_count": 0,
  "unresolved_comment_count": 0,
  "last_comment_updated_at": null,
  "project": 1,
  "updated_by": null,
  "file_upload": 7,
  "comment_authors": [],
  "annotations": [],
  "predictions": []
}

My labeler implementation at the moment is this:

class MyModel(LabelStudioMLBase):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)
        self.model = ...
        self.query_fn = ...

    def make_result_record(self, path: Path):
        # Reference: https://labelstud.io/tutorials/sklearn-text-classifier.html

        mask = self.query_fn(path)
        image = Image.open(path)
        result = [
            {
                "original_width": image.width,
                "original_height": image.height,
                "image_rotation": 0,
                "value": {"format": "rle", "rle": [mask], "brushlabels": ["crack"]},
                "id": uuid(),
                "from_name": "tag",
                "to_name": "image",
                "type": "brushlabels",
                "origin": "inference",
            }
        ]
        return {"result": result, "score": 1.0}

    def predict(self, tasks, **kwargs):
        predictions = []
        for task in tasks:
            logger.info("task:")
            logger.info("\n" + json.dumps(task, indent=2))
            result = self.make_result_record(Path(task["data"]["image"]))
            predictions.append(result)
        return predictions

So where is /data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg? It is inside some storage that Label Studio spins up I suppose. How do I access this? (And why does the documentation not talk about this......)

2

There are 2 best solutions below

1
On

those are the fils label studio create when you upload some date to it.

you will finde it in your system if you try for search for the filename "cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg"

or if you run it on docker you will finde it under the same path.

in my case on macOS I found the .jpg file under : "/Users/user/Library/Application Support/label-studio/media/upload/1/1deaeb75-0f29e9df11dbc1cce55cb3529517dcd5.jpg"

I think You don need to read them from a task. If you create a ml-backend and connect it your label-studio to it, your label.s will create and send the tasks for you.

take a look to this backend for example: https://www.kaggle.com/code/yujiyamamoto/semi-auto-annotation-label-studio-and-tf2-od/notebook

0
On

Assuming you are on localhost, it should be:

http://localhost:8080/data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg

To access the file, you can use your api access token: Label Studio UI → Account & Settings → Access Token

Then in your script:

from urllib.request import Request, urlopen
LABEL_STUDIO_URL = 'http://localhost:8080'
API_KEY = 'xxxxxxxxxxxxxxxxx' #Your Token
request = Request(LABEL_STUDIO_URL+'/data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg')
request.add_header('Authorization', 'Token %s' % API_KEY)
image_file = urlopen(request) #Should give a "file-like" object.

I hope that helps you or anyone encountering similar problems. :)