Architecture

This description presents a website architecture that is designed to automatically annotate data using Artificial Intelligence.

The architecture focuses on efficient data management, data storage, and uses machine learning capabilities to automatically annotate data.

Our solution works in the form of a microservice supporting the Label Studio tool.

  1. Modules:

  1. E-Motion Website.

  2. Responsible for the visual layer of the application regarding ML models (REST API module).

  3. Allows ML Models to be trained on annotated data and used to generate new annotations.

  4. Allows you to switch to Label Studio to correct annotations generated by the ML model.

  1. Label Studio Frontend:

  1. Responsible for the visual layer of the application regarding manual data annotation.

  2. Provides the ability to correct annotations generated by the ML model.

  3. Load and save annotation information using JSON files with appropriate structure.

  1. REST API:

  1. Responsible for communicating with the Cloud Service and Local Database to train the corresponding ML models.

  1. Database:

  1. Stores user account data, saved ML models, and URL links to user projects stored in the cloud.

  1. Cloud Database:

  1. Stores data for user annotations.

  1. ML models:

  1. Responsible for generating new annotations based on manually annotated data.

User functional perspective:

  1. Annotating data initially using Label Studio Front-end.

  2. Training the ML model based on the created annotations.

  3. Generating the rest of the annotations using the ML model.

  4. Improving selected annotations using Label Studio Front-end.

  5. saving the annotations to the database

Communication with Label Studio Front-end:

Label Studio is an open-source tool for manual data annotation.

It includes Label Studio Front-end, which allows us to integrate this tool into our project.

Label Studio uses JSON files to represent the state of data annotation.

The structure of this tool means that in order to exchange information between ML models (training and generating annotations) and already finished annotations, there will be a need to serialize and deserialize JSON files.

In our solution, the annotations will be generated in the JSON convention used by Label Studio, after which the user will be able to manually load them into Label Studio Front-end to correct errors.

The final for annotated data is saved in the same JSON convention and can be used to train another model, or saved.

Authorization: Label Studio has an internal email authorization system, which means that the user will need to have both an account with us and Label Studio if they want to use the functionality provided by Label Studio Front-end

Diagram C4:

Context

_images/context.png

Container

_images/container.png

Component: WebApp

_images/Component_WebApp.png

Code: WebApp

_images/Api_uml_class.png

Component: Frontend components interaction

_images/frontend_uml_activity.png

Code: React components

_images/frontend_uml_class.png

Component: ML Component

_images/ml_component.png

Code: ML Code

_images/ml_uml.png