Project Management Plan: Developing an explicit plan assuming data is collected on organization’s Request for Proposal

durafshan jawad
10 min readDec 20, 2022

--

Section 1: Project Management Plan

1.1 Project Objective

The data providing company is looking to open up a new retail store. They are looking for a solution that can help them with sales generating tasks like automatic store checkout, automatic auditing of the retail shelves and recognition of the empty shelves etc.

1.2 Scope of Project

● Selection of a machine learning model which can detect objects with good accuracy no less than 80%.

● The primary goal of the project is to help retail companies with their auditing, checkouts and organization of products.

● The project cost will be kept minimum, not exceeding the assigned budget. Moreover, a financial report will be published to show all cash in and out.

● The project will be completed within a year.

1.3 Project Requirements wrt dataset

The dataset used to propose a solution for this problem is ‘objectron’ by Google Research which is trained on multiple classes of objects. We will use this dataset to create a proposal against this problem with technical expertise and specialized capabilities.

1.4 Approach

This object detection approach is supposed to detect the SKU within the images of shelves and classify them based on manufacturer and brand. Depending on the accuracy, the auditor will be getting real time feedback on handy devices to complete tasks like organizing the products and filling the missing shelves in the store. These missing products could reduce 32% [1] of sales of a store according to studies. Adding object detection in retail will make store checkout easy and productive and will empower auditors to fix inventory issues quickly whenever it’s empty.

Figure1: Basic retail object detection flowchart

1.5 Stakeholders

The dataset was provided by google research so they are one of the stakeholders. The other stakeholders will be customers (as they are the buyers), the retail store employees (it’s going to help them in organizing things), auditors (ease in auditing), company’s financing department etc. So, the people associated with the department could be secondary stakeholders. Moreover, project managers, business leaders, sales and marketing people will also be counted as stakeholders.

1.6 Deliverables

● Providing a solution/ project plan to the data providing company which is going to help them in their retail sales.

● Implementation of the best model, which would be able to help in automatic store checkout and identification of empty shelves with a good accuracy.

● The completion of the project by keeping the cost as minimum and providing financial documentations.

Section 2: Project Implementation

The project implementation starts with collection of data which is provided by Google Research. Data is publicly available with some privacy and quality issues mentioned in the next sub-section. A best hardware and software solution will be selected that will be cost friendly and feasible/ scalable in use.

2.1 Privacy, quality & other issues with data

Objectron dataset is released under Computational Use of Data Agreement 1.0 (CUDA-1.0) which says you may use, modify, and distribute the data made available to you by the data provider under this C-UDA for computational use only. This C-UDA does not restrict you to use, modify, distribute or redistribute any portions of the data that are in the public domain. So, this data is publicly available and can be used for this project. But for the redistribution, you have to include all credit or attribution information that you received with the data. Also you have to bind each recipient to whom you redistribute the data to the terms of the C-UDA. Moreover, there are three ways to download the objectron data to the disks (using gsutil, downloading via public HTTP API and downloading using Cloud Python client) [2]. One of the methods is to download using Google Cloud Storage Python client. This cloud storage API method requires you to authenticate before downloading the dataset amid maintaining the privacy of data. I tried to access the data and it throws the error below showing we need authentication to access the data.

Figure2: Authentication before downloading the dataset

As in this case data is collected from ten different countries covering five continents, the samples in data are not uniformly distributed. First all videos don’t have the same length and then the number of samples for each class is not equal. Also, data is collected by using five different smartphones, which have a slight different video recording quality. This makes it hard to maintain the quality and consistency of data.

2.2 HW/ SW Requirements

The people involved in this project will be mainly data scientists and data analysts. Moreover, we need people from finance, business departments, project managers to conduct this project fully. . The videos in selected data are collected by smartphone, which could be counted as a resource. The data is stored in the bucket named ‘objectron bucket’ on Google Cloud storage which is available publically. The used models for this problem can be from Vertex AI, which is a unified artificial intelligence platform used to build, deploy, and scale ML models faster, with pre-trained and custom tooling within. It will be released in MediaPipe, Google’s open source framework for cross-platform customizable machine learning solutions for live and streaming media. We will use Apache Beam to process the datasets on Google Cloud infrastructure. We can use PyTorch, Tensorflow, and Jax pipelines to process data efficiently and for visualization. So, in short, different google cloud services in a bundle can be used to conduct this project. I selected these services, as data is already in google cloud, using cloud services will be easy for syncing of data.

2.3 Build-vs-buy solution

For this solution, the buying approach is better than the build solution. As the data size is already big, which is in terabytes and it’s stored in google cloud. If the selected software tools would be from google, it would be all easy to sync data and process it. It will save one from the hustle of building storage infrastructure, it’s easy to buy the solution which is scalable. Also the data is collected from ten different countries, so it’s better to buy the solution, instead of setting up infrastructure in different countries, which is going to be expensive too. While if we go with buying a solution, it’s just some clicks and our needs are fulfilled. If we are going to select the build option, it would be infeasible, as everytime volume increases we have to add resources which could be tiring for us. It could be less costly at the start, but if we build our own solution it needs expertise and infrastructure which could be expensive as the volume is increasing. We need a solution where we can pay for it based on needs and we can start using it instantly with less interruptions in the tasks. This project can be conducted by using google cloud services which require less skills to learn them, which means they are easy to use. Example how to access data from a bucket, doesn’t need a load of effort, it could be just some API commands. On the other hand, if building the solution option will be selected, we need expertise who should know everything from the scratch. This thing can help us focus more on a retail management project rather than spending time learning the tools. Building solutions needs a heavy budget to start while buying is pay as you go, and we will also be considering cost while giving this proposal. We need a solution that should be manageable, least complex and easy to use. So, after considering all the scenarios, buying an option is the best.

2.4 Relevant metadata

The dataset consists of videos, images and texts. The dataset is basically videos and images of the different categories of objects. Some of the objects are non-rigid and they were kept stationary while data collection. In total there are about 17k object instances that appear as 4M images and almost 15k videos [3]. These characteristics of data makes it unstructured in nature and it can’t be processed using traditional and conventional data tools and methods. The data is coming from different sources ie: it’s collected from ten different countries covering five continents and it’s in different consistency. Almost all the product’s data will contain texts and labels too, like the product information at the back of bottles, cereal boxes etc. The dataset samples contain data in different languages as well as in different local environments. This marked the data as a complex big data.

2.5 Statistical analyses & visualizations

One visualization/ statistical analysis that will be showing in proposal will be distribution of length of video frames, [4] as is mentioned below:

Figure3: Distribution of length of video frames

The analysis shows the distribution of video length in the dataset. You can see here all videos are not of the same length, which means data is not consistent. Majority of the videos are 10 seconds long (300 frames) showing the highest bar in the distribution while the longest video is 2022 frames long. Most of the data is concentrated from 280 to 320 frame size. This inconsistency in data might bring inaccuracy while training the model. But it’s big data and it’s in terabytes, it’s hard to keep the whole volume of data in exactly the same frame length. This problem of cleaning big data could be solved by truncating each video to 10 secs, but this could result in loss of data, and also it will take a lot of time and resources to complete this task. This is one of the complexities of big data.

Another visualization that will be added in the proposal is mentioned below. This visualization will give the requester an overview of how many instances are in training and testing dataset for each class/object. The second one shows the scale distribution of objects, and the third shows instance numbers of the annotated bounding boxes in the training and testing subsets.

Figure4: Training vs Testing distributions

2.6 Time/effort for the study

Usually object detection projects take 3 to 12 months to complete, depending on the complexity of the model. Here as mostly built in models are being used, we can predict that the project will take a maximum of a year to complete, from very beginning to implementation phase.

2.7 Expected value/benefits to the organization

Object detection is a widely used technique nowadays in computer vision. The study uses a dataset with multiple classes which is an advancement to the binary classification models. This object detection is used for face detection, human computer interaction, vehicle detection, robotics, pedestrian counting, security systems, web images, driverless cars etc [5]. Object detection has loads of applications. This technique emerged from detecting the objects in 2D but now 3D datasets are also widely available. By releasing the objectron dataset by google search, we hope that it will enable the research community to push the limits of 3D object geometry and it will become more common and advanced. It will help to bring new researches and applications in 3D understanding, video models, object retrieval, view synthetics, and 3D reconstruction etc [6]. In terms of organization, it will help to increase sales of their stores by maintaining the inventory every time and will help them to make their customers happy.

Section 3: Defined Terms

3.1 Explanation of technical terms

IOU (Intersection over Union)

Intersection over Union in terms of object detection is an evaluation metric. It’s the interaction of the ground-truth bounding boxes and the predicted bounding boxes from the model. Any algorithm that provides predicted bounding boxes as output can be evaluated using IoU [7].

Epoch vs Batch vs Iterations

Epoch is one complete cycle when the entire dataset is passed forward and backward through a neural network by once. Batches are pieces or partitions of a dataset. In other words, the number of intervals/ cycles we need to complete one epoch is iteration. Consider this example: if we have a dataset of 2000 examples and we divide it into batches of 500 then it will take 4 iterations to complete 1 epoch/ complete cycle [8].

Data Annotation

The process of labeling the data is known as data annotation. The data could be in format of text, video or images. Most common data annotations are text annotation, Image annotation, Video annotation, Audio Annotation, Key-point Annotation etc. Supervised machine learning requires labeled data sets, so data annotation is a compulsory part. In this way machines can easily and clearly understand the input patterns [9].

3.2 Domain of the study

The domain of the dataset project is machine learning which is a branch of artificial intelligence. It is a method of data analysis that automates analytical model building and is based on the idea of systems learning from data, identifying patterns and making decisions with minimal human intervention [10].

References
[1]
Image Recognition and Object Detection in Retail. Retrieved from:

https://www.kdnuggets.com/2020/02/image-recognition-object-detection-retail.html

[2] Objectron/notebooks/Download Data.ipynb. Retrieved from: https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Download%20Dat a.ipynb

[3] Objectron. Retrieved from: https://research.google/tools/datasets/objectron/

[4] Objectron Dataset. Retrieved onAug 9, 2021 from: https://github.com/google-research-datasets/Objectron/blob/master/README.md

[5] Object Detection: Current and Future Directions. Retrieved on 19 November 2015 from:https://www.frontiersin.org/articles/10.3389/frobt.2015.00029/full

[6] Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations. Retrieved on 18 Dec 2020 from:
https://arxiv.org/pdf/2012.09988.pdf

[7] Intersection over Union (IoU) for object detection. Retrieved on November 7, 2016 from:https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

[8] Epoch vs Batch Size vs Iterations. Retrieved on Sep 23, 2017 from: https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

[9] Different Types Of Data Annotation and Its Uses. Retrieved from: https://learningspiral.ai/data-annotation-and-its-uses/

[10] Machine Learning. Retrieved from: https://www.sas.com/en_us/insights/analytics/machine-learning.html

--

--