Testing GPU usage of Tensorflow

Use the following code to test if tensorflow-gpu is able to utilize GPU. This is for tensorflow 1.10.0.

import tensorflow as tf
import numpy as np

xx = np.random.normal(0,100,1200)
yy = np.random.normal(0,100,1200)

from tensorflow.python.client import device_lib

def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]




Object Detection using Tensorflow: bee and butterfly Part V, faster

home>ML>Image Processing

This post is a faster alternative to the following post: Object Detection using Tensorflow: bee and butterfly Part V.

In part IV, we end with completing the training of our faster R-CNN model. Since we ran 2000 training steps, the last produced model checkpoints will be model.ckpt-2000. We need to make a frozen graph out of it to be able to successfully utilize it for prediction (see the blue bolded part of the code).

Freezing the graph

Let’s go into command line cmd.exe. Remember to go into the virtual environment if you started with one, as we instructed.

cd C:\Users\acer\Desktop\adhoc\myproject\Lib\site-packages\tensorflow\models\research
SET INPUT_TYPE=image_tensor
SET TRAINED_CKPT_PREFIX="C:\Users\acer\Desktop\adhoc\myproject\models\model\model.ckpt-2000"
SET PIPELINE_CONFIG_PATH="C:\Users\acer\Desktop\adhoc\myproject\models\model\faster_rcnn_resnet101_coco.config"
SET EXPORT_DIR="C:\Users\acer\Desktop\adhoc\myproject\models\export"
python object_detection/export_inference_graph.py --input_type=%INPUT_TYPE% --pipeline_config_path=%PIPELINE_CONFIG_PATH% --trained_checkpoint_prefix=%TRAINED_CKPT_PREFIX% --output_directory=%EXPORT_DIR%

Upon successful completion, the following will be produced in the directory .

+ saved_model
  + variables
  - saved_model.pb
+ checkpoint
+ frozen_inference_graph.pb
+ pipeline.config
+ model.ckpt.data-00000-of-00001
+ model.ckpt.index
+ model.ckpt.meta

Notice that three ckpt files are created. We can use this for further training by replacing the 3 ckpt files from part 4.

frozen_inference_graph.pb is the file we will be using for prediction. We just need to run the following python file with suitable configuration. Create the following directory and put all the images that contain butterflies or bees which you want the algorithm to detect into the folder for_predict. In this example, we use 6 images namely “1.jpeg”, “2.jpeg”, …, “6.jpeg” as .

+ ...
+ for_predict

Finally, to perform prediction, just run the following using cmd.exe after moving into adhoc/myproject folder where we place our prediction2.py (see the script below).

python prediction2.py

and 1_MARKED.png, for example, will be produced in for_predict, with boxes showing the detected object, either butterfly or bee.

See the blue highlight below; most configurations that need to be done are in blue. The variable TEST_IMAGES_NAMES contains the name of the files we are going to predict. You can rename the images or just change the variable. Note that in this code, the variable filetype stores the file type of images we are predicting. For each prediction, thus, we can only perform prediction for the same type of images. Of course we can do better. Modify the script accordingly.


# from distutils.version import StrictVersion
import os, sys,tarfile, zipfile
import numpy as np
import tensorflow as tf
import six.moves.urllib as urllib
from PIL import Image
from io import StringIO
from matplotlib import pyplot as plt
from collections import defaultdict
from object_detection.utils import ops as utils_ops

import time
start_all = time.time()
# Paths settings
THE_PATH = "C:/Users/acer/Desktop/adhoc/myproject/Lib\\site-packages/tensorflow/models/research"
PATH_TO_FROZEN_GRAPH = "C:/Users/acer/Desktop/adhoc/myproject/models/export/frozen_inference_graph.pb" # "C:/Users/ericotjoa/Desktop/I2R/ODRT/models/export/frozen_inference_graph.pb"
filetype = '.jpeg'
PATH_TO_LABELS = "C:/Users/acer/Desktop/adhoc/myproject/data/butterfly_bee_label_map.pbtxt"
PATH_TO_TEST_IMAGES_DIR = 'C:/Users/acer/Desktop/adhoc/myproject/for_predict'
TEST_IMAGE_NAMES = [str(i) for i in range(1,7)]
TEST_IMAGE_PATHS = [''.join((PATH_TO_TEST_IMAGES_DIR, '\\', x, filetype)) for x in TEST_IMAGE_NAMES]
# print("test image path = ", TEST_IMAGE_PATHS)
IMAGE_SIZE = (12, 8) # Size, in inches, of the output images.

from utils import label_map_util
from utils import visualization_utils as vis_util
# MODEL_NAME = 'faster_rcnn_resnet101_pets'

start = time.time()
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    tf.import_graph_def(od_graph_def, name='')

label_map  = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  # return np.array(image.getdata()).reshape(
  #     (im_height, im_width, 3)).astype(np.uint8)
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def run_inference_for_single_image(image, graph):
  with graph.as_default():  
    # Get handles to input and output tensors
    ops = tf.get_default_graph().get_operations()
    all_tensor_names = {output.name for op in ops for output in op.outputs}
    tensor_dict = {}
    for key in [
        'num_detections', 'detection_boxes', 'detection_scores',
        'detection_classes', 'detection_masks'
      tensor_name = key + ':0'
      if tensor_name in all_tensor_names:
        tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
    if 'detection_masks' in tensor_dict:
      # The following processing is only for single image
      detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
      detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
      # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
      real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
      detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
      detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
      detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
          detection_masks, detection_boxes, image.shape[0], image.shape[1])
      detection_masks_reframed = tf.cast(
          tf.greater(detection_masks_reframed, 0.5), tf.uint8)
      # Follow the convention by adding back the batch dimension
      tensor_dict['detection_masks'] = tf.expand_dims(
          detection_masks_reframed, 0)
    image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

    # Run inference
    start0X = time.time()
    output_dict = sess.run(tensor_dict,
                           feed_dict={image_tensor: np.expand_dims(image, 0)})
    # all outputs are float32 numpy arrays, so convert types as appropriate
    output_dict['num_detections'] = int(output_dict['num_detections'][0])
    output_dict['detection_classes'] = output_dict[
    output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
    output_dict['detection_scores'] = output_dict['detection_scores'][0]
    if 'detection_masks' in output_dict:
      output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

config = tf.ConfigProto()
config.intra_op_parallelism_threads = 44
config.inter_op_parallelism_threads = 44
with tf.Session(config=config,graph=detection_graph) as sess:
  for image_path, image_name in zip(TEST_IMAGE_PATHS, TEST_IMAGE_NAMES):
    image = Image.open(image_path).convert('RGB') # !!
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.

    start0 = time.time() # bottleneck in the main detection, 22s per img
    output_dict = run_inference_for_single_image(image_np, detection_graph)

    start1 = time.time()
    # Visualization of the results of a detection.
        # ercx!
        # each element in DETECTION BOX is [ymin, xmin, ymax, xmax]
        # do consider the following
        #           im_width, im_height = image.size
        #       if use_normalized_coordinates:
        #         (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
        #                                       ymin * im_height, ymax * im_height)
        min_score_thresh = 0.05)
    # print("detection_boxes:")
    # print(output_dict['detection_boxes'])
    # print(type(output_dict['detection_boxes']),len(output_dict['detection_boxes']))
    # print('detection_classes')
    # print(output_dict['detection_classes'])
    # print(type(output_dict['detection_classes']),len(output_dict['detection_classes']))
    # print('detection_scores')
    # print(output_dict['detection_scores'], len(output_dict['detection_scores']))
    print('\n**************** detection_scores\n')
    # plt.imshow(image_np)
    plt.imsave(''.join((PATH_TO_TEST_IMAGES_DIR, '\\',image_name,"_MARKED", filetype)), image_np)

print("time 1 = ", end-start)
print("time each:")
for i in range(len(timeset)):
  print(" + ",timeset[i])
  # print(" + ",timeset[i],":",timesetX[i], " : ",timeset2[i])
print("time all= ", end_all-start_all)

# plt.show()

The results should be similar to the ones in Object Detection using Tensorflow: bee and butterfly Part V. The only difference is the processing speed. I used NVIDIA GeForce GTX 1050 and the performance is as the following.

time 1 = 2.0489230155944824
time each:
+ 18.73304057121277
+ 1.6632516384124756
+ 1.7054014205932617
+ 1.5573828220367432
+ 1.6851420402526855
+ 0.5358219146728516
time all= 34.96004343032837

Using previous code, the speed will be ~18 seconds for each image. Better GPU can yield even faster performance. On another project, using GTX 1080 on images with 1920×1080 pixels, the time can be as fast as 0.2s per image second image onwards. Using CPU only, one example I tried yield a performance of ~4.5s per image second image onwards.