G06V 10/46

Definition

Diese Klassifikationsstelle umfasst:

(Für diese Definition ist die deutsche Übersetzung noch nicht abgeschlossen)

Feature extraction techniques in which additional (invariant) information is calculated from certain image regions or patches or at certain points, which are visually more relevant in the process of comparison or matching.

Feature extraction techniques in which information from multiple local image patches can be combined into a joint descriptor by using an approach called “bag of features” (from its origin in text document matching), “bag of visual features”, or “bag of visual words”.

Notes – technical background

These notes provide more information about the technical subject matter that is classified in this place:

1. The image regions referred to in this place are called “salient regions”, and the points are called “keypoints”, “interest points” or “salient points”. The information assigned to these regions or points is referred to as a local descriptor due to the inherent aspect of locality in the image analysis.

A local descriptor aims to be invariant to transformations of the depicted image object (e.g., invariant to affine transforms, object deformations, or changes in image capturing conditions such as contrast or scene illumination, etc.).

A local descriptor may capture image characteristics across different scales for reliably detecting objects at different sizes, distances, or resolutions. Typical descriptors of this kind include:

At a salient point, the pixels in its immediate neighbourhood have visual characteristics, which are different from those of the vast majority of the other pixels. The visual appearance of patches around a salient point is, therefore, somewhat unique; this uniqueness increases the chance of finding a similar patch in other images showing the same object.

Generally, salient points can be expected to be located at boundaries of objects and at other image regions having a strong contrast.

2. A “bag of visual words” is a histogram, which indicates the frequencies of patches with particular visual properties; these visual properties are expressed by a codebook, which is commonly obtained by clustering a collection of typical feature descriptors (e.g. SIFT features) in the feature space; each bin of the histogram corresponds to one specific cluster in the codebook.

The process of generating a bag of features typically involves:

A training phase comprising:

And an operating phase comprising:

Examples

Bildreferenz:G06V0010460000_0



Defining key-patches for different objects classes from a training set, computing features from them and using a set of support vector machine (SVM) classifiers to detect those objects in new images.

Querverweise

Einschränkende Querverweise

Diese Klassifikationsstelle umfasst nicht:
Colour feature extraction
G06V 10/56

Informative Querverweise

Image preprocessing for image or video recognition or understanding involving the determination of a region or volume of interest [ROI, VOI]
G06V 10/25
Global feature extraction, global invariant features (e.g. GIST)
G06V 10/42
Local feature extraction; Extracting of specific shape primitives, e.g. corners, intersections; Computing saliency maps with interactions such as reinforcement or inhibition
G06V 10/44
Local feature extraction, descriptors computed by performing operations within image blocks (e.g. HOG, LBP)
G06V 10/50
Organisation of the matching process; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
G06V 10/75
Obtaining sets of training patterns, e.g. bagging
G06V 10/774
Extracting salient feature points for character recognition
G06V 30/18
Image retrieval systems using metadata
G06F 16/583

Spezielle Klassifizierungsregeln

The present group does not cover biologically inspired approaches of feature extraction based on modelling the receptive fields of visual neurons, such as Gabor filters, and convolutional neural networks (CNN).

The use of neural networks for image or video pattern recognition or understanding is classified in group G06V 10/82.

When a document presents details on a sampling technique and a clustering technique (bagging), then it should also be classified in group G06V 10/774.

Classical “bag of words” techniques remove most image localisation information (geometry).

When local features are matched directly from one image to another without involving a bagging technique (and thereby retaining geometric information), e.g. when triplets of features are matched using a geometric transformation with a RANSAC algorithm, then the document should also be classified in group G06V 10/75.

Glossar

BOF

bag of features, see BOW

BOVF

bag of visual features, see BOVF

BOVW

bag of visual words, see BOW

BOW

bag of words, a model originally developed for natural language processing; when applied to images, it represents an image by a histogram of visual words, each visual word representing a specific part of the feature space.

MSER

maximally stable extremal regions, a technique used for blob detection

RANSAC

random sample consensus, a popular regression algorithm

SIFT

scale-invariant feature transform

superpixels
superpixel

sets of pixels obtained by partitioning a digital image for saliency assessment

SURF

speeded up robust features

G06V 10/46

Definition Statement

This place covers:

Feature extraction techniques in which additional (invariant) information is calculated from certain image regions or patches or at certain points, which are visually more relevant in the process of comparison or matching.

Feature extraction techniques in which information from multiple local image patches can be combined into a joint descriptor by using an approach called “bag of features” (from its origin in text document matching), “bag of visual features”, or “bag of visual words”.

Notes – technical background

These notes provide more information about the technical subject matter that is classified in this place:

1. The image regions referred to in this place are called “salient regions”, and the points are called “keypoints”, “interest points” or “salient points”. The information assigned to these regions or points is referred to as a local descriptor due to the inherent aspect of locality in the image analysis.

A local descriptor aims to be invariant to transformations of the depicted image object (e.g., invariant to affine transforms, object deformations, or changes in image capturing conditions such as contrast or scene illumination, etc.).

A local descriptor may capture image characteristics across different scales for reliably detecting objects at different sizes, distances, or resolutions. Typical descriptors of this kind include:

At a salient point, the pixels in its immediate neighbourhood have visual characteristics, which are different from those of the vast majority of the other pixels. The visual appearance of patches around a salient point is, therefore, somewhat unique; this uniqueness increases the chance of finding a similar patch in other images showing the same object.

Generally, salient points can be expected to be located at boundaries of objects and at other image regions having a strong contrast.

2. A “bag of visual words” is a histogram, which indicates the frequencies of patches with particular visual properties; these visual properties are expressed by a codebook, which is commonly obtained by clustering a collection of typical feature descriptors (e.g. SIFT features) in the feature space; each bin of the histogram corresponds to one specific cluster in the codebook.

The process of generating a bag of features typically involves:

A training phase comprising:

And an operating phase comprising:

Examples

Bildreferenz:G06V0010460000_0



Defining key-patches for different objects classes from a training set, computing features from them and using a set of support vector machine (SVM) classifiers to detect those objects in new images.

References

Limiting references

This place does not cover:
Colour feature extraction
G06V 10/56

Informative references

Image preprocessing for image or video recognition or understanding involving the determination of a region or volume of interest [ROI, VOI]
G06V 10/25
Global feature extraction, global invariant features (e.g. GIST)
G06V 10/42
Local feature extraction; Extracting of specific shape primitives, e.g. corners, intersections; Computing saliency maps with interactions such as reinforcement or inhibition
G06V 10/44
Local feature extraction, descriptors computed by performing operations within image blocks (e.g. HOG, LBP)
G06V 10/50
Organisation of the matching process; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
G06V 10/75
Obtaining sets of training patterns, e.g. bagging
G06V 10/774
Extracting salient feature points for character recognition
G06V 30/18
Image retrieval systems using metadata
G06F 16/583

Special rules of classification

The present group does not cover biologically inspired approaches of feature extraction based on modelling the receptive fields of visual neurons, such as Gabor filters, and convolutional neural networks (CNN).

The use of neural networks for image or video pattern recognition or understanding is classified in group G06V 10/82.

When a document presents details on a sampling technique and a clustering technique (bagging), then it should also be classified in group G06V 10/774.

Classical “bag of words” techniques remove most image localisation information (geometry).

When local features are matched directly from one image to another without involving a bagging technique (and thereby retaining geometric information), e.g. when triplets of features are matched using a geometric transformation with a RANSAC algorithm, then the document should also be classified in group G06V 10/75.

Glossary

BOF

bag of features, see BOW

BOVF

bag of visual features, see BOVF

BOVW

bag of visual words, see BOW

BOW

bag of words, a model originally developed for natural language processing; when applied to images, it represents an image by a histogram of visual words, each visual word representing a specific part of the feature space.

MSER

maximally stable extremal regions, a technique used for blob detection

RANSAC

random sample consensus, a popular regression algorithm

SIFT

scale-invariant feature transform

superpixels
superpixel

sets of pixels obtained by partitioning a digital image for saliency assessment

SURF

speeded up robust features