The goal of this project is to apply transfer learning to a pre-trained faster-rcnn model for traffic light detection.
The dataset contains several sub-folders with images and csv files of annotation. To better feeding into object detection API, it is necessary to transfer the images into record files with features.
# The annotation dictionary for tf records and for pbtxt.
annotation_dict = {"go":1, "warning":2, "stop":3, "goLeft":4, "warningLeft":5, "stopLeft":6, "goForward":7}
# Label_map generation
# Exporting the label_map.pbtxt with the dictionary previous generated.
# Uncomment if needed
with open ("PATH_TO_PBTXT/label_map.pbtxt", 'w') as file:
for key in annotation_dict.keys():
file.write("item {\n id: %i\n" %(annotation_dict[key]+1))
file.write(" name: '%s'\n}\n\n" %key)
DATA_PATH = 'TOP_LEVEL_FOLDER'
TRAIN_PATH ='TRAINING_folder' # path to dayTrain
TRAIN_RATIO = 0.7
class single_record:
'''
A class for the tf_example protos.
The image data is not loaded and the values are not featurized
'''
def __init__(self):
self.xmins = []
self.xmaxs = []
self.ymins = []
self.ymaxs = []
self.classes_text = []
self.classes = []
# self.encoded_image_data = None
self.height = 0
self.width = 0
self.filename = None
def record_generator(records, writer):
'''
Create tf records from tf_example prototypes.
Param :
records - a list of tf_example protos (class single_record)
writer - the corresponding tf.writer mapped with the flags
Return :
None but creating two tfrecords file
'''
image_format ='png'.encode()
for record in records:
# record = records[0]
with tf.gfile.GFile(record.filename, 'rb') as fid:
encoded_image_data = fid.read()
# print("encode ok")
filename = record.filename.encode()
# print("filename ok", filename)
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(record.height),
'image/width': dataset_util.int64_feature(record.width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(record.xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(record.xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(record.ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(record.ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(record.classes_text),
'image/object/class/label': dataset_util.int64_list_feature(record.classes),
}))
# print("tf_example ok")
writer.write(tf_example.SerializeToString())
writer.close()
In the model section, I chose not to resize the input images and to keep the anchor size. For smaller objects, like the traffic light bulbs, I would change the anchor to 64x64. As for the max detections per class and total detections, 10 is pretty enough for most cases on the road.
* fixed_shape_resizer
- height:960
- width: 1280
* grid_anchor_generator
- height:128
- width: 128
* max_detections_per_class: 10
* max_total_detections: 10
The initial learning rate is set at 0.00002. It is lower that the default value but I get the chance to see if something like overfitting showing during the training via monitoring the loss curve.
* initial_learning_rate: 0.00002
* from_detection_checkpoint: true
Here I took three youtube driving videos of three different cities (Chicago, NYC and SF), made annotated version of them and took some screenshots showing below. If interested, the links to the annotated videos is also attached for references.
Please note the images and the annoted video are darker than the original files. This is a trick to compensate the exposure since most of the videos have weighted-average exposure and the traffic lights could be over-exposure.
During the visualization, the min_score_thresh is set at 0.85 to reduce the noises and false detections.
Despite the result in the screenshots above look promising, the real time detection accuracy is not great. This reflects the lower mAP and Regression results above. With longer training time, I expect to have better result. Plus, as one can found in the video, the car moves in slow-motion due to lower detection speed.
My next step to this model is to apply it to a my cozmo robot and see how it goes!
Morten Bornø Jensen, Mark Philip Philipsen, Andreas Møgelmose, Thomas B Moeslund, and Mohan M Trivedi. “Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives”. In: IEEE Transactions on Intelligent Transportation Systems (2015).
Mark Philip Philipsen, Morten Bornø Jensen, Andreas Møgelmose, Thomas B Moeslund, and Mohan M Trivedi. “Learning Based Traffic Light Detection: Evaluation on Challenging Dataset”. In: 18th IEEE Intelligent Transportation Systems Conference (2015).