Follow the steps below to train a Azure AI Document Intelligence custom extraction machine learning (ML) model.
After deploying the Azure Resources, your Document intelligence source will be ready for you to start creating a new Document Intelligence ML model. From your cloned repository, navigate to the project root directory then to Data/samples/train
You will see two folders that contain sample training forms, as illustrated below.
- Go to the Azure Portal and select the Azure Data Lake Storage Account that was created by deployment script. The name of this account should be adls.
- On the left side of the menu pane, select
Data storage, thenContainersand go to thesamplescontainer. - Create a new folder and name it
train. - In the
trainfolder, create two folders. One namedcontoso_set_1and the other namedcontoso_set_2. - Upload the sample labeling files in Data/samples/train/contoso_set_1 and Data/samples/train/contoso_set_2 into the corresponding folders. You now have two full sets of pre-labeled data to create the machine learning models.
In this step, you will train custom Azure AI Document intelligence customer extraction models and merge them into a composite model. For more information, please refer to Azure online document Compose Custom Models.
-
Go to Document Intelligence Studio, scroll down to
Custom Extraction Modeland selectCreate new, as illustrated below. -
Select
+Create a projectto create a project. -
Enter a project name. For example
SafetyFormProject-Set-1or any other project name of your choice. -
Enter a project description. For example
Custom document intelligence model with samples contoso_set_1and clickContinue. -
Select your Subscription, Resource Group and the Document Intelligence resource.
-
Select the latest, non-preview API Version.
-
Now you will be prompted to enter the training data source, as illustrated below. Select your subscription. Select Resource Group, and Azure storage created by the deployment scripts. Enter
samplesin the Blob container field. Entertrain/contoso_set_1in the Folder path field. ClickContinue. -
Review Information and click
Create Project. This step connects the form recognizer studio to Azure data lake storage/container resource in your subscription to access the training data. -
After the project is created, forms with OCR, field key and value pair will appear as illustrated below. Click '
Train' on upper right corner. -
Fill in information as below, and select the dropdown "Build Mode" to
Template, and then clickTrain. -
Once the training for
contoso_set_1samples is done, the model will be located inModelstab with confidence score of each field, as illustrated below. -
Train a second model with files stored in
train/contoso_set_2, using above steps to create a new project and model. Name your second model asconsoto-set-2or choice of your own. -
Click 'Models' from your project. You will see a list of models already created. You can now merge individual models into a composite model. Select
contoso-set-1andcontoso-set-2, then clickCompose. The system will prompt you for a new model name and description. Name itcontoso-safety-formsand provide a description. ClickCompose. -
Now your model id
contoso-safety-formswill appear in the Model ID list, as illustrated below. -
If you called your composite model
contoso-safety-forms, you can move on to 3_solution_testing -
If you did NOT call your composite model
contoso-safety-forms, follow the instructions below:- From the the Azure Portal, open the resource group you deployed this solution to.
- Find the Azure Functions App, click the resource and get to its overview page.
- On left panel, under section Settings, click Environment variables. Under the App, locate CUSTOM_BUILT_MODEL_ID click it and replace the default value with your composite model id.
- click OK and then Save. After this, your Azure Functions app will work with this document intelligence extraction model.









