A visual machine learning model is created by importing multiple, labelled images of the same object to create a ‘memory’ that can be used to recognise the same object at any random perspective. The accuracy of the model depends on various factors such as the machine learning system used, the number of images loaded and the quality of the images. The model can be used to scan still images and videos. A good example of this ‘memory’ process is the image-based authentication method used by some websites where you are requested to identify text or objects within an image. As well as proving that you are a human and not a bot the secondary purpose for this process is to teach a model what specific objects are such as a bridge, traffic lights, cars & mountains. With millions of people flagging what certain objects are an accurate model can be built. The model can be used within a self-driving car to recognise traffic lights, signs and bridges. Another example is facial recognition, by creating a model based on multiple images of a person’s face the system can accurately identify them.
By building a model based on correct and incorrect images of mechanical layouts or specific objects it is possible to recognise an anomaly and trigger an alert. This process can be used with remote visual assistance in the context of an inspection or audit where a second pair of eyes reduces the chance of missing something very subtle. The quality of smartphone cameras has improved significantly enough to produce a very high-resolution image. Each pixel is analysed in the anomaly detection system giving tens of millions of reference points.
Anomalous sound detection uses the same principle. By recording sound using the remote smartphone on a remote visual inspection and then comparing this on a model it is possible to detect problems that may be too subtle for a human ear or simply not recognised by the inspector/auditor. This will be very helpful for inspections in manufacturing plants or any environment where machinery is operating. The quality of the recorded sound and therefore accuracy of detection can be improved using a high-quality external microphone, however, most modern smartphones do have very effective ones installed.
Using machine learning on remote visual assistance sessions with remote smartphones really can improve the session effectiveness. Doing so without the need to install a smartphone app means that anyone can be at the remote end as long as they have a smartphone.