作者都是各自领域经过审查的专家,并撰写他们有经验的主题. 我们所有的内容都经过同行评审,并由同一领域的Toptal专家验证.
Altaibayar Tseveenbayar的头像

Altaibayar Tseveenbayar

With a master’s degree in AI and 6+ years of professional experience, Altaibayar does full-stack and mobile development with a focus on AR.

Share

在过去的几年里,手机的平均性能有了显著的提高. 无论是纯粹的CPU能力还是RAM容量, it is now easier to do computation-heavy tasks on mobile hardware. Although these mobile technologies are headed in the right direction, 在移动平台上还有很多事情要做, 尤其是随着增强现实技术的出现, virtual reality, 还有人工智能.

A major challenge in computer vision is to detect objects of interest in images. 人类的眼睛和大脑做着非凡的工作,而且 在机器中复制这些 is still a dream. Over recent decades, approaches have been developed to mimic this in machines, 而且情况正在好转.

在本教程中,我们将探索用于检测图像中的斑点的算法. 我们还将使用来自开源库OpenCV的算法来实现 iPhone应用程序原型 that uses the rear-camera to acquire images and detect objects in them.

OpenCV Tutorial

OpenCV 是提供主要计算机视觉和机器学习算法实现的开源库吗. If you want to implement an application to detect faces, 在牌桌上打牌, or even a simple application for adding effects on to an arbitrary image, 那么OpenCV是一个很好的选择.

OpenCV is written in C/C++, and has wrapper libraries for all major platforms. 这使得它特别容易在 iOS environment. 在Objective-C中使用它 iOS application, download the OpenCV iOS框架来自官方网站. 请确保您使用的是版本2.4.OpenCV for iOS(本文假设您正在使用)的最新版本,3.在头文件的组织方式上有一些破坏兼容性的变化. 关于如何安装它的详细信息是 在其网站上有记录.

MSER

MSER, 是极大稳定极区的缩写, is one of the many methods available for blob detection within images. In simple words, 该算法识别连续的像素集,其外部边界像素强度(通过给定的阈值)高于内部边界像素强度. 如果这些区域在不同强度的情况下变化不大,就被称为最大稳定区域.

尽管有很多其他的 斑点检测算法 exist, 这里选择MSER是因为它具有相当低的运行时复杂度,为O(n log(log(n))),其中n是图像上的总像素数. 该算法对模糊和缩放也具有鲁棒性, 当涉及到处理通过实时源获取的图像时,哪个是有利的, 比如手机的摄像头.

在本教程中,我们将设计 application 以侦测Toptal的标志. 这个符号有尖角, 这可能会让人想到角落检测算法在检测Toptal的标志时有多有效. After all, such an algorithm is both simple to use and understand. 尽管基于角点的方法在检测与背景明显分离的物体(如白色背景上的黑色物体)时成功率很高。, it would be difficult to achieve real-time detection of Toptal’s logo 在现实世界的图像中,算法会不断地检测数百个角落.

Strategy

机器学习和openv

For each frame of image the application acquires through the camera, 它首先被转换成灰度. 灰度图像只有一个颜色通道,但是logo仍然是可见的. 这使得算法更容易处理图像,并显着减少了算法必须处理的数据量,而几乎没有额外的增益.

Next, we will use OpenCV’s implementation the algorithm to extract all MSERs. 接下来,每个MSER将通过将其最小边界矩形转换为正方形来规范化. 这一步很重要,因为标志可能从不同的角度和距离获得,这将增加透视失真的容忍度.

Furthermore, a number of properties are computed for each MSER:

  • Number of holes
  • Ratio of the area of MSER to the area of its convex hull
  • Ratio of the area of MSER to the area of its minimum-area rectangle
  • Ratio of the length of MSER skeleton to area of the MSER
  • Ratio of the area of MSER to the area of its biggest contour

Ios应用和机器学习

以便在图像中检测Toptal的徽标, 将所有mser的属性与已经学习的Toptal徽标属性进行比较. 为本教程的目的, maximum allowed differences for each property were chosen empirically.

Finally, the most similar region is chosen as the result.

iOS Application

在iOS上使用OpenCV很容易. 如果你还没做过, 以下是设置Xcode以创建iOS应用程序并在其中使用OpenCV的快速步骤大纲:

  1. 创建一个新项目名称“SuperCool Logo检测器”.作为语言,选择Objective-C.

  2. 添加一个新的前缀头(.pch)文件并命名为PrefixHeader.pch

  3. 进入项目“SuperCool Logo检测器”构建目标,并在构建设置选项卡, 找到“Prefix Headers”设置. You can find it in the LLVM Language section, or use the search feature.

  4. Add “PrefixHeader.到前缀头设置

  5. 此时,如果您还没有安装 OpenCV for iOS 2.4.11, do it now.

  6. Drag-and-drop the downloaded framework into the project. Check “Linked Frameworks and Libraries” in your Target Settings. (It should be added automatically, but better to be safe.)

  7. 此外,链接以下框架:

    • AVFoundation
    • AssetsLibrary
    • CoreMedia
  8. Open “PrefixHeader.Pch”,并添加以下3行:

     #ifdef __cplusplus 
     #include  
     #endif”
    
  9. Change extensions of automatically created code files from “.m” to “.mm”. OpenCV是用c++编写的,带有*.嗯,你说你将使用objective - c++.

  10. 导入“opencv2 / highgui / cap_ios.h” in ViewController.h and change ViewController to conform with the protocol CvVideoCameraDelegate:

    #import 
    
  11. Open Main.storyboard and put an UIImageView on the initial view controller.

  12. 创建一个ViewController的出口.mm named “imageView”

  13. Create a variable “CvVideoCamera *camera;” in ViewController.h or ViewController.mm, and initialize it with a reference to the rear-camera:

    camera = [[CvVideoCamera alloc] initWithParentView: _imageView];
    camera.defaultAVCaptureDevicePosition = AVCaptureDevicePositionBack;
    camera.defaultAVCaptureSessionPreset = AVCaptureSessionPreset640x480;
    camera.defaultAVCaptureVideoOrientation = AVCaptureVideoOrientationPortrait;
    camera.defaultFPS = 30;
    camera.grayscaleMode = NO;
    camera.delegate = self;
    
  14. 如果您现在构建项目, Xcode会警告你,你没有从CvVideoCameraDelegate实现“processImage”方法. For now, 为了简单起见, 我们将从相机中获取图像,并将其与简单的文本叠加:

    • 给" viewDidAppear "添加一行:
    [camera start];
    
    • 现在,如果你运行这个应用程序,它会要求你允许访问摄像头. 然后你就能看到摄像机的录像了.

    • In the “processImage” method add the following two lines:

    const char* str = [@"Toptal" cStringUsingEncoding: NSUTF8StringEncoding];
    cv::putText(image, str, cv::Point(100, 100), CV_FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(0,0,255));
    

That is pretty much it. 现在你有一个非常简单的应用程序,绘制文本“Toptal”从相机图像. We can now build our target logo detecting application off this simpler one. For brevity, 在本文中,我们将只讨论对理解应用程序如何工作至关重要的几个代码段, overall. GitHub上的代码有相当多的注释来解释每个片段的作用.

因为应用程序只有一个目的, 来检测Toptal的商标, 一旦发射, 从给定的模板图像中提取MSER特征并将其值存储在内存中:

cv::Mat logo = [ImageUtils cvMatFromUIImage: templateImage];

//get gray image
cv::Mat gray;
cvtColor(logo,灰色,CV_BGRA2GRAY);

//最大面积的用户是 
std::vector maxMser = [ImageUtils maxMser: &gray];

//获取maxMSER的4个顶点
cv::RotatedRect = cv::minAreaRect(maxMser);    
cv::Point2f points[4];
rect.points(points);

//normalize image
cv::Mat M = [GeometryUtil getPerspectiveMatrix: points toSize: rect.size];
cv::Mat normalizeImage = [GeometryUtil normalizeImage: &灰色withTranformationMatrix: &M withSize: rect.size.width];

//从归一化图像中获取maxMser
std::vector normalizedMser = [ImageUtils maxMser: &normalizedImage];

//remember the template
self.logoTemplate = [[MSERManager sharedInstance] extractFeature: &normalizedMser];

//store the feature
[self storeTemplate];

The application has only one screen with a Start/Stop button, 以及所有必要的信息, 为FPS和检测到的mser数量, 自动绘制在图像上吗. 只要应用程序没有停止, 对于相机中的每个图像帧, 调用以下processImage方法:

- (void) processImage:(简历::垫 &)image
{    
    cv::Mat gray;
    cvtColor(图像,灰色,CV_BGRA2GRAY);
    
    std::vector> msers;
    [[MSERManager sharedInstance] detectRegions: gray intoVector: msers];
    if (msers.size() == 0) { return; };
    
    std::vector *bestMser = nil;
    double bestPoint = 10.0;
    
    std::for_each(msers.begin(), msers.end(), [&] (std::vector &mser) 
    {
        MSERFeature *feature = [[MSERManager sharedInstance] extractFeature: &mser];

        if(feature != nil)            
        {
            if([[MLManager sharedInstance] isToptalLogo: feature] )
            {
                double tmp = [[MLManager sharedInstance] distance: feature ];
                if ( bestPoint > tmp ) {
                    bestPoint = tmp;
                    bestMser = &mser;
                }
            }
        }
    });

    if (bestMser)
    {
        NSLog(@"minDist: %f", bestPoint);
                
        cv::Rect bound = cv::boundingRect(*bestMser);
        cv::rectangle(image, bound, GREEN, 3);
    }
    else 
    {
        cv::rectangle(image, cv::Rect(0,0, W, H), RED, 3);
    }

    // Omitted debug code
    
    [FPS draw: image]; 
}

This method, in essence, creates a grayscale copy of the original image. It identifies all MSERs and extracts their relevant features, scores each MSER for similarity with the template and picks the best one. 最后,在最佳MSER周围绘制绿色边界,并用元信息覆盖图像.

下面是这个应用程序中几个重要类及其方法的定义. 它们的用途在注释中描述.

GeometryUtil.h

/*
 This static class provides perspective transformation function
 */
@interface geometry: NSObject

/*
 Return perspective transformation matrix for given points to square with 
 Origin[0,0]和size (size.width, size.width)
 */
+ (cv::Mat) getperspectivemmatrix: (cv::Point2f[]) points toSize: (cv::Size2f) size;

/*
 Returns new perspecivly transformed image with given size
 */
+ (cv::Mat) normalizeImage: (cv::Mat *) image withtransformationmatrix: (cv::Mat *) M withSize: (float) size;

@end

MSERManager.h

/*
 提供与用户相关的函数的单例类
 */
@interface MSERManager: NSObject

+ (MSERManager *) shareinstance;

/*
 提取所有的mser到提供的向量
 */
- (void) detectreregions: (cv::Mat . &) gray intoVector: (std::vector> &) vector;

/*
 从mser中提取特征. 对于某些mser功能可以为NULL !!!
 */
- (MSERFeature *) extractFeature: (std::vector *) mser;

@end

MLManager.h

/*
 这个单例类封装了对象识别函数
 */
@interface MLManager: NSObject

+ (MLManager *) shareinstance;

/*
 Stores feature from the biggest MSER in the templateImage
 */
- (void) learn: (UIImage *) templateImage;

/*
 Sum of the differences between logo feature and given feature
 */
-(双倍)距离:(MSERFeature *)特征;

/*
如果给定的特征与从模板中学习到的特征相似,则返回true
 */
- (BOOL) isToptalLogo: (MSERFeature *) feature;

@end

一切都连接好之后, with this application, 你应该能够使用iOS设备的摄像头从不同的角度和方向检测Toptal的标志.

垂直检测图像(Toptal徽标).

Detecting an image (the Toptal logo) diagonally on a shirt.

增强现实应用程序从理解图像开始,这就是你如何做到的.

Conclusion

在本文中,我们展示了使用OpenCV从图像中检测简单对象是多么容易. The entire code is available on GitHub. Feel free to fork and send push requests, as contributions are welcome.

对于任何机器学习问题都是如此, 通过使用不同的特征集和不同的对象分类方法,可以提高本应用中标识检测的成功率. However, 我希望本文将帮助您开始使用MSER和计算机视觉技术的应用程序进行对象检测, in general.

Further Reading

  • J. Matas, O. Chum, M. Urban, and T. Pajdla. “Robust wide baseline stereo from maximally stable extremal regions.”
  • Neumann, Lukas; Matas, Jiri (2011). “A Method for Text Localization and Recognition in Real-World Images”
就这一主题咨询作者或专家.
Schedule a call

世界级的文章,每周发一次.

订阅意味着同意我们的 privacy policy

世界级的文章,每周发一次.

订阅意味着同意我们的 privacy policy

Toptal Developers

Join the Toptal® community.