G06V 20/40

Definition

Diese Klassifikationsstelle umfasst:

(Für diese Definition ist die deutsche Übersetzung noch nicht abgeschlossen)

Video summarisation / abstraction, e.g. key-frame extraction, extracting of video features or fingerprints, extraction of representative shots, detecting important frames by analysing the reactions of the viewers or by monitoring parts of the video, such as the TV logo.

High-level semantic clustering, classification and understanding of video scenes, e.g. detection, labelling or Markovian modelling. Examples of video content subject to such analysis are sport broadcast events or TV news.

Low-level semantic clustering or determination of sections in videos such as scenes and shots; classification of shots, e.g. as close-up shot, medium shot or long shot.

Extraction of features, e.g. histogram similarity measures, manifolds, by use of video fingerprints, etc. Examples of low level features are colour or texture based features, local interest points (key-points), filter responses, edge features, local descriptors (SIFT, SURF, etc.) or combinations of them (see also group G06V 10/40). Examples of high level features are features related to camera motion (tracking visual features), the presence of skin, the number of faces present, the size of faces or other human features visible, text or other objects that are identifiable in each frame.

Matching video sequences, e.g. by frame or temporal analysis.

Segmenting video sequences, e.g. parsing or cutting the sequence.

Video categorisation, e.g. classify video content into sport / music/ news or recognise commercials in media content for substitution.

Sport games analysis, e.g. tactic analysis in sport videos for assistance of coaches and players; final pitching shot indexing for baseball game; indexing the important parts, such as shots, score points, etc; video monitoring the score table.

Generation of compact representations of the video sequence as a result of pattern recognition or image understanding, e.g. creating thumbnails or representative icons.

Detection and recognition of harmful/sexual/violent content.

Discovery of relationships between objects or persons in videos.

Detecting a key/anchor person from a video; characterising the main characters.

Association of a video with semantic information (e.g. keywords) to describe the content (using e.g. Markov random fields).

Generation of semantic labels using a graph which describes the video content, where the nodes are objects or activities and edges are the relationships between them.

Examples

Inline image:G06V0020400000_0



Clustering of the representative frames containing a given face and creation of face thumbnails of a video sequence containing faces.

Inline image:G06V0020400000_1



Recognising football players in a football match and displaying the representative shots in which a certain player was active.

Querverweise

Einschränkende Querverweise

Diese Klassifikationsstelle umfasst nicht:
Extracting overlay text
G06V 20/62
Information retrieval of video data
G06F 16/70
Processing of video elementary streams in video servers
H04N 21/234
Processing of video elementary streams in video client devices
H04N 21/44

Informative Querverweise

Arrangements for image or video understanding in general
G06V 10/00
Global feature extraction by analysis of the whole pattern, e.g. global shape, global boundary descriptors or involving frequency domain transformations or autocorrelation
G06V 10/42
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components, edge linking or neighbouring slice analysis
G06V 10/44
Pattern recognition or machine learning in images or video using clustering
G06V 10/762
Recognition of scenes under surveillance or monitoring activities, e.g. recognising suspicious objects
G06V 20/52
Labelling scene content, e.g. deriving syntactic or semantic representations
G06V 20/70
Recognition of human or animal bodies
G06V 40/10
Recognition of human faces, e.g. facial parts, sketches or expressions
G06V 40/16
Recognition of movements or behaviour, e.g. gesture recognition
G06V 40/20
Analysis of motion in images
G06T 7/20
Image analysis using motion-based segmentation
G06T 7/215
Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
G11B 27/00
Television picture signal circuitry for video frequency region
H04N 5/14

Glossar

video fingerprinting

class of dimension reduction techniques for identifying, extracting and summarising characteristic components of a video enabling that video to be uniquely identified.

video summarisation

generation of a short summary of the content of a longer video by selecting and presenting the most informative or interesting video frames.

G06V 20/40

Definition Statement

This place covers:

Video summarisation / abstraction, e.g. key-frame extraction, extracting of video features or fingerprints, extraction of representative shots, detecting important frames by analysing the reactions of the viewers or by monitoring parts of the video, such as the TV logo.

High-level semantic clustering, classification and understanding of video scenes, e.g. detection, labelling or Markovian modelling. Examples of video content subject to such analysis are sport broadcast events or TV news.

Low-level semantic clustering or determination of sections in videos such as scenes and shots; classification of shots, e.g. as close-up shot, medium shot or long shot.

Extraction of features, e.g. histogram similarity measures, manifolds, by use of video fingerprints, etc. Examples of low level features are colour or texture based features, local interest points (key-points), filter responses, edge features, local descriptors (SIFT, SURF, etc.) or combinations of them (see also group G06V 10/40). Examples of high level features are features related to camera motion (tracking visual features), the presence of skin, the number of faces present, the size of faces or other human features visible, text or other objects that are identifiable in each frame.

Matching video sequences, e.g. by frame or temporal analysis.

Segmenting video sequences, e.g. parsing or cutting the sequence.

Video categorisation, e.g. classify video content into sport / music/ news or recognise commercials in media content for substitution.

Sport games analysis, e.g. tactic analysis in sport videos for assistance of coaches and players; final pitching shot indexing for baseball game; indexing the important parts, such as shots, score points, etc; video monitoring the score table.

Generation of compact representations of the video sequence as a result of pattern recognition or image understanding, e.g. creating thumbnails or representative icons.

Detection and recognition of harmful/sexual/violent content.

Discovery of relationships between objects or persons in videos.

Detecting a key/anchor person from a video; characterising the main characters.

Association of a video with semantic information (e.g. keywords) to describe the content (using e.g. Markov random fields).

Generation of semantic labels using a graph which describes the video content, where the nodes are objects or activities and edges are the relationships between them.

Examples

Inline image:G06V0020400000_0



Clustering of the representative frames containing a given face and creation of face thumbnails of a video sequence containing faces.

Inline image:G06V0020400000_1



Recognising football players in a football match and displaying the representative shots in which a certain player was active.

References

Limiting references

This place does not cover:
Extracting overlay text
G06V 20/62
Information retrieval of video data
G06F 16/70
Processing of video elementary streams in video servers
H04N 21/234
Processing of video elementary streams in video client devices
H04N 21/44

Informative references

Arrangements for image or video understanding in general
G06V 10/00
Global feature extraction by analysis of the whole pattern, e.g. global shape, global boundary descriptors or involving frequency domain transformations or autocorrelation
G06V 10/42
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components, edge linking or neighbouring slice analysis
G06V 10/44
Pattern recognition or machine learning in images or video using clustering
G06V 10/762
Recognition of scenes under surveillance or monitoring activities, e.g. recognising suspicious objects
G06V 20/52
Labelling scene content, e.g. deriving syntactic or semantic representations
G06V 20/70
Recognition of human or animal bodies
G06V 40/10
Recognition of human faces, e.g. facial parts, sketches or expressions
G06V 40/16
Recognition of movements or behaviour, e.g. gesture recognition
G06V 40/20
Analysis of motion in images
G06T 7/20
Image analysis using motion-based segmentation
G06T 7/215
Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
G11B 27/00
Television picture signal circuitry for video frequency region
H04N 5/14

Glossary

video fingerprinting

class of dimension reduction techniques for identifying, extracting and summarising characteristic components of a video enabling that video to be uniquely identified.

video summarisation

generation of a short summary of the content of a longer video by selecting and presenting the most informative or interesting video frames.