Finetuning for OCR #1

SSamDav · 2024-06-24T16:47:25Z

Hi,

Do you know how should I change the dataset creation for the task of OCR?

Is just the concatenation of bbox special tokens with the text or do I need to do more?

Thanks for the finetuning code!

andimarafioti · 2024-06-26T12:33:30Z

Hi SSam!

For OCR, you just need to pass <OCR> as a task_prompt.
There is another task called "OCR with region", there you need to prepare your dataset a bit more. Your task_prompt would be <OCR_WITH_REGION>. Your labels should be dictionaries with keys for "labels" and "quad_boxes". Then you can redraw the boxes like this:

def draw_ocr_bboxes(image, prediction):
  scale = 1
  draw = ImageDraw.Draw(image)
  bboxes, labels = prediction['quad_boxes'], prediction['labels']
  for box, label in zip(bboxes, labels):
    color = random.choice(colormap)
    new_box = (np.array(box) * scale).tolist()
    draw.polygon(new_box, width=3, outline=color)
    draw.text((new_box[0]+8, new_box[1]+2),
    "{}".format(label),
     align="right",
      fill=color)
  return image

SSamDav · 2024-06-26T21:27:47Z

Thanks for the answer! But for creation of the training dataset the model output should be a dictionary or text ?

MonolithFoundation · 2024-07-09T08:18:30Z

Don't use, this model limited max length to 1024...

A simple paper can execeed 2048 characters.

ctgushiwei · 2024-07-25T06:21:01Z

@SSamDav @andimarafioti
hello! For OCR task ，how to label the data and how to define the Dataset class ？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning for OCR #1

Finetuning for OCR #1

SSamDav commented Jun 24, 2024

andimarafioti commented Jun 26, 2024

SSamDav commented Jun 26, 2024

MonolithFoundation commented Jul 9, 2024

ctgushiwei commented Jul 25, 2024

Finetuning for OCR #1

Finetuning for OCR #1

Comments

SSamDav commented Jun 24, 2024

andimarafioti commented Jun 26, 2024

SSamDav commented Jun 26, 2024

MonolithFoundation commented Jul 9, 2024

ctgushiwei commented Jul 25, 2024