Skip to content

Latest commit

ย 

History

History
759 lines (563 loc) ยท 26.7 KB

File metadata and controls

759 lines (563 loc) ยท 26.7 KB
description Training Custom Object Detector Step by Step

๐Ÿค– TensorFlow Object Detection API

๐ŸŒฑ Introduction

  • โœจ Tensorflow object detection API is a powerful tool that allows us to create custom object detectors depending on pre-trained, fine tuned models even if we don't have strong AI background or strong TensorFlow knowledge.
  • ๐Ÿ’โ€โ™€๏ธ Building models depending on pre-trained models saves us a lot of time and labor since we are using models that maybe trained for weeks using very strong machines, this principle is called Transfer Learning.
  • ๐Ÿ—ƒ๏ธ As a data set I will show you how to use OpenImages data set and converting its data to TensorFlow-friendly format.
  • ๐ŸŽ€ You can find this article on Medium too.

๐Ÿšฉ Development Pipeline

  1. ๐Ÿ‘ฉโ€๐Ÿ’ป Environment Preparation
  2. ๐Ÿ–ผ๏ธ Image acquiring
  3. ๐Ÿคนโ€โ™€๏ธ Image Organization
  4. ๐Ÿค– Model Selecting
  5. ๐Ÿ‘ฉโ€๐Ÿ”ง Model Configuration
  6. ๐Ÿ‘ถ Training
  7. ๐Ÿ‘ฎโ€โ™€๏ธ Evaluation
  8. ๐Ÿ‘’ Model Exporting
  9. ๐Ÿ“ฑ Converting to tflite

{% hint style="info" %} ๐Ÿค• While you are applying the instructions if you get errors you can check out ๐Ÿž Common Issues section at the end of the article {% endhint %}

๐Ÿ‘ฉโ€๐Ÿ’ป Environment Preparation

๐Ÿ”ธ Environment Info

๐Ÿ’ป Platform ๐Ÿท๏ธ Version
Python version 3.7
TensorFlow version 1.15

๐Ÿฅฆ Conda env Setting

๐Ÿ”ฎ Create new env

  • ๐Ÿฅฆ Install Anaconda
  • ๐Ÿ’ป Open cmd and run:
# conda create -n <ENV_NAME> python=<REQUIRED_VERSION>
conda create -n tf1 python=3.7

โ–ถ๏ธ Activate the new env

# conda activate <ENV_NAME>
conda activate tf1

๐Ÿ”ฝ Install Packages

๐Ÿ’ฅ GPU vs CPU Computing

๐Ÿš™ CPU ๐Ÿš€ GPU
Brain of computer Brawn of computer
Very few complex cores hundreds of simpler cores with parallel architecture
single-thread performance optimization thousands of concurrent hardware threads
Can do a bit of everything, but not great at much Good for math heavy processes

๐Ÿš€ Installing TensorFlow

{% tabs %} {% tab title="๐Ÿš€ GPU" %}

conda install tensorflow-gpu=1.15

{% endtab %}

{% tab title="๐Ÿš™ CPU" %}

conda install tensorflow=1.15

{% endtab %} {% endtabs %}

๐Ÿ“ฆ Installing other packages

conda install pillow Cython lxml jupyter matplotlib
conda install -c anaconda protobuf

๐Ÿค– Downloading models repository

๐Ÿคธโ€โ™€๏ธ Cloning from GitHub

  • A repository that contains required utils for training and evaluation process
  • Open CMD and run in E disk and run:
# note that every time you open CMD you have 
# to activate your env again by running: 
# under E:\>
conda activate tf1
git clone https://github.com/tensorflow/models.git
cd models/research

{% hint style="warning" %} ๐Ÿง I assume that you are running your commands under E disk, {% endhint %}

๐Ÿ”ƒ Compiling Protobufs

{% tabs %} {% tab title="๐Ÿ’ป Windows" %}

# under (tf1) E:\models\research>
for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.

{% endtab %}

{% tab title="๐Ÿง Linux" %}

# under /models/research
$ protoc object_detection/protos/*.proto --python_out=.

{% endtab %} {% endtabs %}

๐Ÿ“ฆ Compiling Packages

# under (tf1) E:\models\research>
python setup.py build
python setup.py install

๐Ÿšฉ Setting Python Path Temporarily

{% tabs %} {% tab title="๐Ÿ’ป Windows" %}

# under (tf1) E:\models\research> or anywhere ๐Ÿ˜…
set PYTHONPATH=E:\models\research;E:\models\research\slim

{% endtab %}

{% tab title="๐Ÿง Linux" %}

# under /models/research
$ export PYTHONPATH=`pwd`:`pwd`/slim

{% endtab %} {% endtabs %}

{% hint style="info" %} ๐Ÿ‘ฎโ€โ™€๏ธ Every time you open CMD you have to set PYTHONPATH again {% endhint %}

๐Ÿ‘ฉโ€๐Ÿ”ฌ Installation Test

๐Ÿง Check out that every thing is done

๐Ÿ’ป Command

# under (tf1) E:\models\research>
python object_detection/builders/model_builder_tf1_test.py

๐ŸŽ‰ Expected Output

Ran 17 tests in 0.833s

OK (skipped=1)

๐Ÿ–ผ๏ธ Image Acquiring

๐Ÿ‘ฎโ€โ™€๏ธ Directory Structure

  • ๐Ÿ—๏ธ I suppose that you created a structure like:
E:
|___ models
|___ demo
      |___ annotations
      |___ eval
      |___ images
      |___ inference
      |___ OIDv4_ToolKit
      |___ OpenImagesTool
      |___ pre_trainded_model
      |___ scripts
      |___ training
๐Ÿ“‚ Folder ๐Ÿ“ƒ Description
๐Ÿค– models the repo here
๐Ÿ“„ annotations will contain generated .csv and .record files
๐Ÿ‘ฎโ€โ™€๏ธ eval will contain results of evaluation
๐Ÿ–ผ๏ธ images will contain image data set
โ–ถ๏ธ inference will contain exported models after training
๐Ÿ”ฝ OIDv4_ToolKit the repo here (OpenImages Downloader)
๐Ÿ‘ฉโ€๐Ÿ”ง OpenImagesTool the repo here (OpenImages Organizer)
๐Ÿ‘ฉโ€๐Ÿซpre_trained_model will contain files of TensorFlow model that we will retrain
๐Ÿ‘ฉโ€๐Ÿ’ป scripts will contain scripts that we will use for pre-processing and training processes
๐Ÿšดโ€โ™€๏ธ training will contain generated check points during training

๐Ÿš€ OpenImages Dataset

  • ๐Ÿ•ต๏ธโ€โ™€๏ธ You can get images in various methods
  • ๐Ÿ‘ฉโ€๐Ÿซ I will show process of organizing OpenImages data set
  • ๐Ÿ—ƒ๏ธ OpenImages is a huge data set contains annotated images of 600 objects
  • ๐Ÿ” You can explore images by categories from here

{% embed url="https://storage.googleapis.com/openimages/web/index.html" caption="" %}

๐ŸŽจ Downloading By Category

OIDv4_Toolkit is a tool that we can use to download OpenImages dataset by category and by set (test, train, validation)

๐Ÿ’ป To clone and build the project, open CMD and run:

# under (tf1) E:\demo>
git clone https://github.com/EscVM/OIDv4_ToolKit.git
cd OIDv4_ToolKit

# under (tf1) E:\demo\OIDv4_ToolKit>
pip install -r requirements.txt

โฌ To start downloading by category:

# python main.py downloader --classes <OBJECT_LIST> --type_csv <TYPE>
# TYPE: all | test | train | validation 
# under (tf1) E:\demo\OIDv4_ToolKit>
python main.py downloader --classes Apple Orange --type_csv validation

{% hint style="warning" %} ๐Ÿ‘ฎโ€โ™€๏ธ If object name consists of 2 parts then write it with '_', e.g. Bell_pepper {% endhint %}

๐Ÿคนโ€โ™€๏ธ Image Organization

๐Ÿ”ฎ OpenImagesTool

  • ๐Ÿ‘ฉโ€๐Ÿ’ป OpenImagesTool is a tool to convert OpenImages images and annotations to TensorFlow-friendly structure.
  • ๐Ÿ™„ OpenImages provides annotations ad .txt files in a format like:<OBJECT_NAME> <XMIN> <YMIN> <XMAX> <YMAX> which is not compatible with TensorFlow that requires VOC annotation format
  • ๐Ÿ’ซ To do that synchronization we can do the following

๐Ÿ’ป To clone and build the project, open CMD and run:

# under (tf1) E:\demo>
git clone https://github.com/asmaamirkhan/OpenImagesTool.git
cd OpenImagesTool/src

๐Ÿ’ป Applying Organizing

๐Ÿš€ Now, we will convert images and annotations that we have downloaded and save them to images folder

# under (tf1) E:\demo\OpenImagesTool\src> 
# python script.py -i <INPUT_PATH> -o <OUTPUT_PATH>
python script.py -i E:\pre_trainded_model\OIDv4_ToolKit\OID\Dataset -o E:\pre_trainded_model\images

{% hint style="info" %} ๐Ÿ‘ฉโ€๐Ÿ”ฌ OpenImagesTool adds validation images to training set by default, if you wand to disable this behavior you can add -v flag to the command. {% endhint %}

๐Ÿท๏ธ Creating Label Map

  • โ›“๏ธ label_map.pbtxt is a file that maps object names to corresponded IDs
  • โž• Create label_map.pbtxtfile under annotations folder and open it in a text editor
  • ๐Ÿ–Š๏ธ Write your objects names and IDs in the following format
item {
    id: 1
    name: 'Hamster'
}

item {
    id: 2
    name: 'Apple'
}

{% hint style="info" %} ๐Ÿ‘ฎโ€โ™€๏ธ id:0 is reserved for background, so don' t use it

๐Ÿž Related error: ValueError: Label map id 0 is reserved for the background label {% endhint %}

๐Ÿญ Generating CSV Files

  • ๐Ÿ”„ Now we have to convert .xml files to csv file
  • ๐Ÿ”ป Download the script xml_to_csv.py script and save it under scripts folder
  • ๐Ÿ’ป Open CMD and run:

๐Ÿ‘ฉโ€๐Ÿ”ฌ Generating train csv file

# under (tf1) E:\demo\scripts>
python xml_to_csv.py -i E:\demo\images\train -o E:\demo\annotations\train_labels.csv

๐Ÿ‘ฉโ€๐Ÿ”ฌ Generating test csv file

# under (tf1) E:\demo\scripts>
python xml_to_csv.py -i E:\demo\images\test -o E:\demo\annotations\test_labels.csv

๐Ÿ‘ฉโ€๐Ÿญ Generating TF Records

  • ๐Ÿ™‡โ€โ™€๏ธ Now, we will generate tfrecords that will be used in training precess
  • ๐Ÿ”ป Download generate_tfrecords.py script and save it under scripts folder

๐Ÿ‘ฉโ€๐Ÿ”ฌ Generating train tfrecord

# under (tf1) E:\demo\scripts>
# python generate_tfrecords.py --label_map=<PATH_TO_LABEL_MAP> 
# --csv_input=<PATH_TO_CSV_FILE> --img_path=<PATH_TO_IMAGE_FOLDER>
# --output_path=<PATH_TO_OUTPUT_FILE>
python generate_tfrecords.py --label_map=E:/demo/annotations/label_map.pbtxt --csv_input=E:\demo\annotations\train_labels.csv --img_path=E:\demo\images\train --output_path=E:\demo\annotations\train.record

๐Ÿ‘ฉโ€๐Ÿ”ฌ Generating test tfrecord

# under (tf1) E:\demo\scripts>
python generate_tfrecords.py --label_map=E:/demo/annotations/label_map.pbtxt --csv_input=E:\demo\annotations\test_labels.csv --img_path=E:\demo\images\test --output_path=E:\demo\annotations\test.record

๐Ÿค– Model Selecting

  • ๐ŸŽ‰ TensorFLow Object Detection Zoo provides a lot of pre-trained models
  • ๐Ÿ•ต๏ธโ€โ™€๏ธ Models differentiate in terms of accuracy and speed, you can select the suitable model due to your priorities
  • ๐Ÿ’พ Select a model, extract it and save it under pre_trained_model folder
  • ๐Ÿ‘€ Check out my notes here to get insight about differences between popular models

๐Ÿ‘ฉโ€๐Ÿ”ง Model Configuration

โฌ Downloading config File

  • ๐Ÿ˜Ž We have downloaded the models (pre-trained weights) but now we have to download configuration file that contains training parameters and settings
  • ๐Ÿ‘ฎโ€โ™€๏ธ Every model in TensorFlow Object Detection Zoo has a configuration file presented here
  • ๐Ÿ’พ Download the config file that corresponds to the models you have selected and save it under training folder

๐Ÿ‘ฉโ€๐Ÿ”ฌ Updating config File

You have to update the following lines:

{% hint style="info" %} ๐Ÿ™„ Take a look at Loss exploding issue {% endhint %}

// number of classes
num_classes: 1 // set it to total number of classes you have

// path of pre-trained checkpoint
fine_tune_checkpoint: "E:/demo/pre_trained_model/ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18/model.ckpt"

// path to train tfrecord
tf_record_input_reader {
    input_path: "E:/demo/annotations/train.record"
}

// number of images that will be used in evaluation process
eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  // I suggest setting it to total number of testing set to get accurate results
  num_examples: 11193
}

eval_input_reader: {
  tf_record_input_reader {
    // path to test tfrecord
    input_path: "E:/demo/annotations/test.record"
  }
  // path to label map
  label_map_path: "E:/demo/annotations/label_map.pbtxt"
  // set it to true if you want to shuffle test set at each evaluation   
  shuffle: false
  num_readers: 1
}

{% hint style="info" %} ๐Ÿคนโ€โ™€๏ธ If you give the whole test set to evaluation process then shuffle functionality won't affect the results, it will only give you different examples on TensorBoard {% endhint %}

๐Ÿ‘ถ Training

  • ๐ŸŽ‰ Now we have done all preparations
  • ๐Ÿš€ Let the computer start learning
  • ๐Ÿ’ป Open CMD and run:
# under (tf1) E:\models\research\object_detection\legacy> 
# python train.py --train_dir=<DIRECTORY_TO_SAVE_CHECKPOINTS> 
# --pipeline_config_path=<PATH_TO_CONFIG_FILE>
python train.py --train_dir=E:/demo/training --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config
  • ๐Ÿ• This process will take long (You can take a nap ๐Ÿคญ, but a long nap ๐Ÿ™„)
  • ๐Ÿ•ต๏ธโ€โ™€๏ธ While model is being trained you will see loss values on CMD
  • โœ‹ You can stop the process when the loss value achieves a good value (under 1)

๐Ÿ‘ฎโ€โ™€๏ธ Evaluation

๐ŸŽณ Evaluating Script

  • ๐Ÿคญ After training process is done, let's do an exam to know how good (or bad ๐Ÿ™„) is our model doing
  • ๐ŸŽฉ The following command will use the model on whole test set and after that print the results, so that we can do error analysis.
  • ๐Ÿ’ป So that, open CMD and run:
# under (tf1) E:\models\research\object_detection\legacy> 
# python eval.py --logtostderr --pipeline_config_path=<PATH_TO_CONFIG_FILE>
# --checkpoint_dir=<DIRECTORY_OF_CHECKPOINTS> --eval_dir=<DIRECTORY_TO_SAVE_EVAL_RESULTS>
python eval.py --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --checkpoint_dir=E:/demo/training --eval_dir=E:/demo/eval

๐Ÿ‘€ Visualizing Results

  • โœจ To see results on charts and images we can use TensorBoard for better analyzing
  • ๐Ÿ’ป Open CMD and run:

๐Ÿ‘ฉโ€๐Ÿซ Training Values Visualization

  • ๐Ÿง Here you can see graphs of loss, learning rate and other values
  • ๐Ÿค“ And much more (You can investigate tabs at the top)
  • ๐Ÿ˜‹ It is feasable to use it while training (and exciting ๐Ÿคฉ)
# under (tf1) E:\>
tensorboard --logdir=E:/demo/tarining

๐Ÿ‘ฎโ€โ™€๏ธ Evaluation Values Visualization

  • ๐Ÿ‘€ Here you can see images from your test set with corresponded predictions
  • ๐Ÿค“ And much more (You can inspect tabs at the top)
  • โ— You must use this after running evaluation script
# under (tf1) E:\>
tensorboard --logdir=E:/demo/eval
  • ๐Ÿ” See the visualized results on localhost:6006 and
  • ๐Ÿง You can inspect numerical values from report on terminal, result example:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.708
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.984
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.868
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.289
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.767
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.779
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.300
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.703
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.824
  • ๐ŸŽจ If you want to get metric report for each class you have to change evaluating protocol to pascal metrics by configuring metrics_set in .config file:
eval_config: {
  ...
  metrics_set: "weighted_pascal_voc_detection_metrics"
  ...
}

๐Ÿ‘’ Model Exporting

  • ๐Ÿ”ง After training and evaluation processes are done, we have to make the model in such a format that we can use
  • ๐Ÿฆบ For now, we have only checkpoints, so that we have to export .pb file
  • ๐Ÿ’ป So, open CMD and run:
# under (tf1) E:\models\research\object_detection>
# python export_inference_graph.py --input_type image_tensor 
# --pipeline_config_path <PATH_TO_CONFIG_FILE> 
# --trained_checkpoint_prefix <PATH_TO_LAST_CHECKPOINT>
# --output_directory <PATH_TO_SAVE_EXPORTED_MODEL>
python export_inference_graph.py --input_type image_tensor --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix E:/demo/training/model.ckpt-16438 --output_directory E:/demo/inference/ssd_v1_quant
  • If you are using SSD and planning to convert it to tflite later you have to run
# under (tf1) E:\models\research\object_detection>
# python export_tflite_ssd_graph.py --input_type image_tensor 
# --pipeline_config_path <PATH_TO_CONFIG_FILE> 
# --trained_checkpoint_prefix <PATH_TO_LAST_CHECKPOINT>
# --output_directory <PATH_TO_SAVE_EXPORTED_MODEL>
python export_tflite_ssd_graph.py --input_type image_tensor --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix E:/demo/training/model.ckpt-16438 --output_directory E:/demo/inference/ssd_v1_quant

๐Ÿ“ฑ Converting to tflite

  • ๐Ÿ’โ€โ™€๏ธ If you want to use the model in mobile apps or tflite supported embedded devices you have to convert .pb file to .tflite file

๐Ÿ“™ About TFLite

  • ๐Ÿ“ฑ TensorFlow Lite is TensorFlowโ€™s lightweight solution for mobile and embedded devices.
  • ๐Ÿง It enables on-device machine learning inference with low latency and a small binary size.
  • ๐Ÿ˜Ž TensorFlow Lite uses many techniques for this such as quantized kernels that allow smaller and faster (fixed-point math) models.
  • ๐Ÿ“ Official site

๐Ÿซ Converting Command

  • ๐Ÿ’ป To apply converting open CMD and run:
# under (tf1) E:\>
# toco --graph_def_file=<PATH_TO_PB_FILE>
# --output_file=<PATH_TO_SAVE> --input_shapes=<INPUT_SHAPES>
# --input_arrays=<INPUT_ARRAYS> --output_arrays=<OUTPUT_ARRAYS>
# --inference_type=<QUANTIZED_UINT8|FLOAT> --change_concat_input_ranges=<true|false>
# --alow_custom_ops 
# args for QUANTIZED_UINT8 inference
# --mean_values=<MEAN_VALUES> std_dev_values=<STD_DEV_VALUES> 
toco --graph_def_file=E:\demo\inference\ssd_v1_quant\tflite_graph.pb --output_file=E:\demo\tflite\ssd_mobilenet.tflite --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_custom_ops

๐Ÿž Common Issues

๐Ÿฅ… nets module issue

ModuleNotFoundError: No module named 'nets'

This means that there is a problem in setting PYTHONPATH, try to run:

(tf1) E:\models\research>set PYTHONPATH=E:\models\research;E:\models\research\slim

๐Ÿ—ƒ๏ธ tf_slim module issue

ModuleNotFoundError: No module named 'tf_slim'

This means that tf_slim module is not installed, try to run:

(tf1) E:\models\research>pip install tf_slim

๐Ÿ—ƒ๏ธ Allocation error

2020-08-11 17:44:00.357710: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                 10661327
InUse:                 10656704
MaxInUse:              10657688
NumAllocs:                 2959
MaxAllocSize:           3045064

For me it is fixed by minimizing batch_size in .config file, it is related to your computations resources

train_config: {
  ....
  batch_size: 128
  ....
}

โ— no such file or directory error

train.py tensorflow.python.framework.errors_impl.notfounderror no such file or directory

๐Ÿคฏ LossTensor is inf issue

LossTensor is inf or nan. : Tensor had NaN values

  • ๐Ÿ‘€ Related discussion is here, it is common that it is an annotation problem
  • ๐Ÿ™„ Maybe there is some bounding boxes outside the image boundaries
  • ๐Ÿคฏ The solution for me was minimizing batch size in .config file

๐Ÿ™„ Ground truth issue

The following classes have no ground truth examples

  • ๐Ÿ‘€ Related discussion is here
  • ๐Ÿ‘ฉโ€๐Ÿ”ง For me it was a misspelling issue in label_map file,
  • ๐Ÿ™„ Pay attention to small and capital letters

๐Ÿท๏ธ labelmap issue

ValueError: Label map id 0 is reserved for the background label

  • ๐Ÿ‘ฎโ€โ™€๏ธ id:0 is reserved for background, We can not use it for objects
  • ๐Ÿ†” start IDs from 1

๐Ÿ”ฆ No Variable to Save issue

Value Error: No Variable to Save

  • ๐Ÿ‘€ Related solution is here
  • ๐Ÿ‘ฉโ€๐Ÿ”ง Adding the following line to .config file solved the problem
train_config: {
  ...
  fine_tune_checkpoint_type:  "detection"
  ...
}

๐Ÿงช pycocotools module issue

ModuleNotFoundError: No module named 'pycocotools'

{% tabs %} {% tab title="๐Ÿ’ป Windows" %}

  • ๐Ÿ‘€ Related discussion is here
  • ๐Ÿ‘ฉโ€๐Ÿ”ง Applying the downloading instructions provided here solved the problem for me (on Windows 10) {% endtab %}

{% tab title="๐Ÿง Linux" %}

$ conda install -c conda-forge pycocotools

{% endtab %} {% endtabs %}

๐Ÿฅด pycocotools type error issue

pycocotools typeerror: object of type cannot be safely interpreted as an integer.

  • ๐Ÿ‘ฉโ€๐Ÿ”ง I solved the problem by editing the following lines in cocoeval.py script under pycocotools package (by adding casting)
  • ๐Ÿ‘ฎโ€โ™€๏ธ Make sure that you are editting the package in you env not in other env.
self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)

๐Ÿ’ฃ Loss Exploding

INFO:tensorflow:global step 440: loss = 2106942657570782838784.0000 (0.405 sec/step)
INFO:tensorflow:global step 440: loss = 2106942657570782838784.0000 (0.405 sec/step)
INFO:tensorflow:global step 441: loss = 7774169971762292326400.0000 (0.401 sec/step)
INFO:tensorflow:global step 441: loss = 7774169971762292326400.0000 (0.401 sec/step)
INFO:tensorflow:global step 442: loss = 25262924095336287830016.0000 (0.404 sec/step)
INFO:tensorflow:global step 442: loss = 25262924095336287830016.0000 (0.404 sec/step)

๐Ÿ™„ For me there were 2 problems:

First:

  • Some of annotations were wrong and overflow the image (e.g. xmax > width)
  • I could check that by inspecting .csv file
  • Example:
filename width height class xmin ymin xmax ymax
104.jpg 640 480 class_1 284 406 320 492

Second:

  • Learning rate in .config file is too big (the default value was big ๐Ÿ™„)
  • The following values are valid and tested on mobilenet_ssd_v1_quantized (Not very good ๐Ÿ™„)
learning_rate: {
  cosine_decay_learning_rate {
    learning_rate_base: .01
    total_steps: 50000
    warmup_learning_rate: 0.005
    warmup_steps: 2000
  }
}

๐Ÿฅด Getting convolution Failure

Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
  • It may be a Cuda version incompatibility issue
  • For me it was a memory issue and I solved it by adding the following line to train.py script
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

๐Ÿ“ฆ Invalid box data error

raise ValueError('Invalid box data. data must be a numpy array of '
ValueError: Invalid box data. data must be a numpy array of N*[y_min, x_min, y_max, x_max]
  • ๐Ÿ™„ For me it was a logical error, in test_labels.csv there were some invalid values like: file123.jpg,134,63,3,0,0,-1029,-615
  • ๐Ÿท So, it was a labeling issue, fixing these lines solved the problem
  • ๐Ÿ‘€ Related discussion

๐Ÿ”„ Image with id added issue

raise ValueError('Image with id {} already added.'.format(image_id))
ValueError: Image with id 123.png already added.
  • โ˜ It is an issue in .config caused by giving value to num_example that is greater than total number of test image in test directory
eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  num_examples: 1265 // <--- this value was greater than total test images
}

๐Ÿง References