Label Studio: Access file specified in task record

579 Views Asked by At

I am trying to implement a pre-labeler for Label Studio. The documentation (e.g. https://labelstud.io/guide/ml_create.html#Example-inference-call) is not very helpful here. I have imported images locally and need to read them from a task record to pass them to my model. When I print a task record, it looks like this:

{
  "id": 7,
  "data": {
    "image": "/data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg"
  },
  "meta": {},
  "created_at": "2022-12-29T00:49:34.141715Z",
  "updated_at": "2022-12-29T00:49:34.141720Z",
  "is_labeled": false,
  "overlap": 1,
  "inner_id": 7,
  "total_annotations": 0,
  "cancelled_annotations": 0,
  "total_predictions": 0,
  "comment_count": 0,
  "unresolved_comment_count": 0,
  "last_comment_updated_at": null,
  "project": 1,
  "updated_by": null,
  "file_upload": 7,
  "comment_authors": [],
  "annotations": [],
  "predictions": []
}

My labeler implementation at the moment is this:

class MyModel(LabelStudioMLBase):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)
        self.model = ...
        self.query_fn = ...

    def make_result_record(self, path: Path):
        # Reference: https://labelstud.io/tutorials/sklearn-text-classifier.html

        mask = self.query_fn(path)
        image = Image.open(path)
        result = [
            {
                "original_width": image.width,
                "original_height": image.height,
                "image_rotation": 0,
                "value": {"format": "rle", "rle": [mask], "brushlabels": ["crack"]},
                "id": uuid(),
                "from_name": "tag",
                "to_name": "image",
                "type": "brushlabels",
                "origin": "inference",
            }
        ]
        return {"result": result, "score": 1.0}

    def predict(self, tasks, **kwargs):
        predictions = []
        for task in tasks:
            logger.info("task:")
            logger.info("\n" + json.dumps(task, indent=2))
            result = self.make_result_record(Path(task["data"]["image"]))
            predictions.append(result)
        return predictions

So where is /data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg? It is inside some storage that Label Studio spins up I suppose. How do I access this? (And why does the documentation not talk about this......)

2

There are 2 best solutions below

1
Anas Arodake On

those are the fils label studio create when you upload some date to it.

you will finde it in your system if you try for search for the filename "cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg"

or if you run it on docker you will finde it under the same path.

in my case on macOS I found the .jpg file under : "/Users/user/Library/Application Support/label-studio/media/upload/1/1deaeb75-0f29e9df11dbc1cce55cb3529517dcd5.jpg"

I think You don need to read them from a task. If you create a ml-backend and connect it your label-studio to it, your label.s will create and send the tasks for you.

take a look to this backend for example: https://www.kaggle.com/code/yujiyamamoto/semi-auto-annotation-label-studio-and-tf2-od/notebook

0
user1879484 On

Assuming you are on localhost, it should be:

http://localhost:8080/data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg

To access the file, you can use your api access token: Label Studio UI → Account & Settings → Access Token

Then in your script:

from urllib.request import Request, urlopen
LABEL_STUDIO_URL = 'http://localhost:8080'
API_KEY = 'xxxxxxxxxxxxxxxxx' #Your Token
request = Request(LABEL_STUDIO_URL+'/data/upload/1/cfcf4486-0cdd1413486f7d923e6eff475c43809f.jpeg')
request.add_header('Authorization', 'Token %s' % API_KEY)
image_file = urlopen(request) #Should give a "file-like" object.

I hope that helps you or anyone encountering similar problems. :)