Internationale Patentklassifikation

Symbol

Typ

Titel

PHYSICS

G10

MUSICAL INSTRUMENTS; ACOUSTICS

G10L

UKL

SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING [4]

G10L 13/00

HGR

Speech synthesis; Text to speech systems [7, 2006.01]

G10L 13/02

UGR1

.	Methods for producing synthetic speech; Speech synthesisers [7, 2006.01, 2013.01]

G10L 13/027

UGR2

. .	Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L 13/08) [2013.01]

G10L 13/033

UGR2

. .	Voice editing, e.g. manipulating the voice of the synthesiser [2013.01]

G10L 13/04

UGR2

. .	Details of speech synthesis systems, e.g. synthesiser structure or memory management [7, 2006.01, 2013.01]

G10L 13/047

UGR3

. . .

Architecture of speech synthesisers [2013.01]

G10L 13/06

UGR1

.	Elementary speech units used in speech synthesisers; Concatenation rules [7, 2006.01, 2013.01]

G10L 13/07

UGR2

. .	Concatenation rules [2013.01]

G10L 13/08

UGR1

.	Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination [7, 2006.01, 2013.01]

G10L 13/10

UGR2

. .	Prosody rules derived from text; Stress or intonation [2013.01]

G10L 15/00

HGR

Speech recognition (G10L 17/00 takes precedence) [7, 2006.01, 2013.01]

G10L 15/01

UGR1

.	Assessment or evaluation of speech recognition systems [2013.01]

G10L 15/02

UGR1

.	Feature extraction for speech recognition; Selection of recognition unit [7, 2006.01]

G10L 15/04

UGR1

.	Segmentation; Word boundary detection [7, 2006.01, 2013.01]

G10L 15/05

UGR2

. .	Word boundary detection [2013.01]

G10L 15/06

UGR1

.	Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice (G10L 15/14 takes precedence) [7, 2006.01, 2013.01]

G10L 15/065

UGR2

. .	Adaptation [2013.01]

G10L 15/07

UGR3

. . .

to the speaker [2013.01]

G10L 15/08

UGR1

.	Speech classification or search [7, 2006.01]

G10L 15/10

UGR2

. .	using distance or distortion measures between unknown speech and reference templates [7, 2006.01]

G10L 15/12

UGR2

. .	using dynamic programming techniques, e.g. dynamic time warping [DTW] [7, 2006.01]

G10L 15/14

UGR2

. .	using statistical models, e.g. Hidden Markov Models [HMM] (G10L 15/18 takes precedence) [7, 2006.01]

G10L 15/16

UGR2

. .	using artificial neural networks [7, 2006.01]

G10L 15/18

UGR2

. .	using natural language modelling [7, 2006.01, 2013.01]

G10L 15/183

UGR3

. . .

using context dependencies, e.g. language models [2013.01]

G10L 15/187

UGR4

. . . .

Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams [2013.01]

G10L 15/19

UGR4

. . . .

Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules [2013.01]

G10L 15/193

UGR5

. . . . .

Formal grammars, e.g. finite state automata, context free grammars or word networks [2013.01]

G10L 15/197

UGR5

. . . . .

Probabilistic grammars, e.g. word n-grams [2013.01]

G10L 15/20

UGR1

.	Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise or of stress induced speech (G10L 21/02 takes precedence) [7, 2006.01]

G10L 15/22

UGR1

.	Procedures used during a speech recognition process, e.g. man-machine dialog [7, 2006.01]

G10L 15/24

UGR1

.	Speech recognition using non-acoustical features [7, 2006.01, 2013.01]

G10L 15/25

UGR2

. .	using position of the lips, movement of the lips or face analysis [2013.01]

G10L 15/26

UGR1

.	Speech to text systems (G10L 15/08 takes precedence) [7, 2006.01]

G10L 15/28

UGR1

.	Constructional details of speech recognition systems [7, 2006.01, 2013.01]

G10L 15/30

UGR2

. .	Distributed recognition, e.g. in client-server systems, for mobile phones or network applications [2013.01]

G10L 15/32

UGR2

. .	Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems [2013.01]

G10L 15/34

UGR2

. .	Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing [2013.01]

G10L 17/00

HGR

Speaker identification or verification [7, 2006.01, 2013.01]

G10L 17/02

UGR1

.	Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction [2013.01]

G10L 17/04

UGR1

.	Training, enrolment or model building [2013.01]

G10L 17/06

UGR1

.	Decision making techniques; Pattern matching strategies [2013.01]

G10L 17/08

UGR2

. .	Use of distortion metrics or a particular distance between probe pattern and reference templates [2013.01]

G10L 17/10

UGR2

. .	Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems [2013.01]

G10L 17/12

UGR2

. .	Score normalisation [2013.01]

G10L 17/14

UGR2

. .	Use of phonemic categorisation or speech recognition prior to speaker recognition or verification [2013.01]

G10L 17/16

UGR1

.	Hidden Markov models [HMMs] [2013.01]

G10L 17/18

UGR1

.	Artificial neural networks; Connectionist approaches [2013.01]

G10L 17/20

UGR1

.	Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions [2013.01]

G10L 17/22

UGR1

.	Interactive procedures; Man-machine interfaces [2013.01]

G10L 17/24

UGR2

. .	the user being prompted to utter a password or a predefined phrase [2013.01]

G10L 17/26

UGR1

.	Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices [2013.01]

G10L 19/00

HGR

Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis (in musical instruments G10H) [7, 2006.01, 2013.01]

G10L 19/002

UGR1

.	Dynamic bit allocation (for perceptual audio coders G10L 19/032) [2013.01]

G10L 19/005

UGR1

.	Correction of errors induced by the transmission channel, if related to the coding algorithm [2013.01]

G10L 19/008

UGR1

.	Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing [2013.01]

G10L 19/012

UGR1

.	Comfort noise or silence coding [2013.01]

G10L 19/018

UGR1

.	Audio watermarking, i.e. embedding inaudible data in the audio signal [2013.01]

G10L 19/02

UGR1

.	using spectral analysis, e.g. transform vocoders or subband vocoders [7, 2006.01, 2013.01]

G10L 19/022

UGR2

. .	Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring [2013.01]

G10L 19/025

UGR3

. . .

Detection of transients or attacks for time/frequency resolution switching [2013.01]

G10L 19/028

UGR2

. .	Noise substitution, e.g. substituting non-tonal spectral components by noisy source (comfort noise for discontinuous speech transmission G10L 19/012) [2013.01]

G10L 19/03

UGR2

. .	Spectral prediction for preventing pre-echo; Temporary noise shaping [TNS], e.g. in MPEG2 or MPEG4 [2013.01]

G10L 19/032

UGR2

. .	Quantisation or dequantisation of spectral components [2013.01]

G10L 19/035

UGR3

. . .

Scalar quantisation [2013.01]

G10L 19/038

UGR3

. . .

Vector quantisation, e.g. TwinVQ audio [2013.01]

G10L 19/04

UGR1

.	using predictive techniques [7, 2006.01, 2013.01]

G10L 19/06

UGR2

. .	Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients [7, 2006.01, 2013.01]

G10L 19/07

UGR3

. . .

Line spectrum pair [LSP] vocoders [2013.01]

G10L 19/08

UGR2

. .	Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters [7, 2006.01, 2013.01]

G10L 19/083

UGR3

. . .

the excitation function being an excitation gain (G10L 25/90 takes precedence) [2013.01]

G10L 19/087

UGR3

. . .

using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC [2013.01]

G10L 19/09

UGR3

. . .

Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor [2013.01]

G10L 19/093

UGR3

. . .

using sinusoidal excitation models [2013.01]

G10L 19/097

UGR3

. . .

using prototype waveform decomposition or prototype waveform interpolative [PWI] coders [2013.01]

G10L 19/10

UGR3

. . .

the excitation function being a multipulse excitation [7, 2006.01, 2013.01]

G10L 19/107

UGR4

. . . .

Sparse pulse excitation, e.g. by using algebraic codebook [2013.01]

G10L 19/113

UGR4

. . . .

Regular pulse excitation [2013.01]

G10L 19/12

UGR3

. . .

the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders [7, 2006.01, 2013.01]

G10L 19/125

UGR4

. . . .

Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP] [2013.01]

G10L 19/13

UGR4

. . . .

Residual excited linear prediction [RELP] [2013.01]

G10L 19/135

UGR4

. . . .

Vector sum excited linear prediction [VSELP] [2013.01]

G10L 19/16

UGR2

. .	Vocoder architecture [2013.01]

G10L 19/18

UGR3

. . .

Vocoders using multiple modes [2013.01]

G10L 19/20

UGR4

. . . .

using sound class specific coding, hybrid encoders or object based coding [2013.01]

G10L 19/22

UGR4

. . . .

Mode decision, i.e. based on audio signal content versus external parameters [2013.01]

G10L 19/24

UGR4

. . . .

Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding [2013.01]

G10L 19/26

UGR2

. .	Pre-filtering or post-filtering [2013.01]

G10L 21/00

HGR

Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility (G10L 19/00 takes precedence) [7, 2006.01, 2013.01]

G10L 21/003

UGR1

.	Changing voice quality, e.g. pitch or formants [2013.01]

G10L 21/007

UGR2

. .	characterised by the process used [2013.01]

G10L 21/01

UGR3

. . .

Correction of time axis [2013.01]

G10L 21/013

UGR3

. . .

Adapting to target pitch [2013.01]

G10L 21/02

UGR1

.	Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B 3/20; echo suppression in hands-free telephones H04M 9/08) [7, 2006.01, 2013.01]

G10L 21/0208

UGR2

. .	Noise filtering [2013.01]

G10L 21/0216

UGR3

. . .

characterised by the method used for estimating noise [2013.01]

G10L 21/0224

UGR4

. . . .

Processing in the time domain [2013.01]

G10L 21/0232

UGR4

. . . .

Processing in the frequency domain [2013.01]

G10L 21/0264

UGR3

. . .

characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques [2013.01]

G10L 21/0272

UGR2

. .	Voice signal separating [2013.01]

G10L 21/028

UGR3

. . .

using properties of sound source [2013.01]

G10L 21/0308

UGR3

. . .

characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques [2013.01]

G10L 21/0316

UGR2

. .	by changing the amplitude [2013.01]

G10L 21/0324

UGR3

. . .

Details of processing therefor [2013.01]

G10L 21/0332

UGR4

. . . .

involving modification of waveforms [2013.01]

G10L 21/034

UGR4

. . . .

Automatic adjustment [2013.01]

G10L 21/0356

UGR3

. . .

for synchronising with other signals, e.g. video signals [2013.01]

G10L 21/0364

UGR3

. . .

for improving intelligibility [2013.01]

G10L 21/038

UGR2

. .	using band spreading techniques [2013.01]

G10L 21/0388

UGR3

. . .

Details of processing therefor [2013.01]

G10L 21/04

UGR1

.	Time compression or expansion [7, 2006.01, 2013.01]

G10L 21/043

UGR2

. .	by changing speed [2013.01]

G10L 21/045

UGR3

. . .

using thinning out or insertion of a waveform [2013.01]

G10L 21/047

UGR4

. . . .

characterised by the type of waveform to be thinned out or inserted [2013.01]

G10L 21/049

UGR4

. . . .

characterised by the interconnection of waveforms [2013.01]

G10L 21/055

UGR2

. .	for synchronising with other signals, e.g. video signals [2013.01]

G10L 21/057

UGR2

. .	for improving intelligibility [2013.01]

G10L 21/06

UGR1

.	Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids (G10L 15/26 takes precedence) [7, 2006.01, 2013.01]

G10L 21/10

UGR2

. .	Transforming into visible information [2013.01]

G10L 21/12

UGR3

. . .

by displaying time domain information [2013.01]

G10L 21/14

UGR3

. . .

by displaying frequency domain information [2013.01]

G10L 21/16

UGR2

. .	Transforming into a non-visible representation (devices or methods enabling ear patients to replace direct auditory perception by another kind of perception A61F 11/04) [2013.01]

G10L 21/18

UGR2

. .	Details of the transformation process [2013.01]

G10L 25/00

HGR

Speech or voice analysis techniques not restricted to a single one of groups G10L 15/00-G10L 21/00 (muting semiconductor-based amplifiers when some special characteristics of a signal are sensed by a speech detector, e.g. sensing when no signal is present, H03G 3/34) [2013.01]

G10L 25/03

UGR1

.	characterised by the type of extracted parameters [2013.01]

G10L 25/06

UGR2

. .	the extracted parameters being correlation coefficients [2013.01]

G10L 25/09

UGR2

. .	the extracted parameters being zero crossing rates [2013.01]

G10L 25/12

UGR2

. .	the extracted parameters being prediction coefficients [2013.01]

G10L 25/15

UGR2

. .	the extracted parameters being formant information [2013.01]

G10L 25/18

UGR2

. .	the extracted parameters being spectral information of each sub-band [2013.01]

G10L 25/21

UGR2

. .	the extracted parameters being power information [2013.01]

G10L 25/24

UGR2

. .	the extracted parameters being the cepstrum [2013.01]

G10L 25/27

UGR1

.	characterised by the analysis technique [2013.01]

G10L 25/30

UGR2

. .	using neural networks [2013.01]

G10L 25/33

UGR2

. .	using fuzzy logic [2013.01]

G10L 25/36

UGR2

. .	using chaos theory [2013.01]

G10L 25/39

UGR2

. .	using genetic algorithms [2013.01]

G10L 25/45

UGR1

.	characterised by the type of analysis window [2013.01]

G10L 25/48

UGR1

.	specially adapted for particular use [2013.01]

G10L 25/51

UGR2

. .	for comparison or discrimination [2013.01]

G10L 25/54

UGR3

. . .

for retrieval [2013.01]

G10L 25/57

UGR3

. . .

for processing of video signals [2013.01]

G10L 25/60

UGR3

. . .

for measuring the quality of voice signals [2013.01]

G10L 25/63

UGR3

. . .

for estimating an emotional state [2013.01]

G10L 25/66

UGR3

. . .

for extracting parameters related to health condition (detecting or measuring for diagnostic purposes A61B 5/00) [2013.01]

G10L 25/69

UGR2

. .	for evaluating synthetic or decoded voice signals [2013.01]

G10L 25/72

UGR2

. .	for transmitting results of analysis [2013.01]

G10L 25/75

UGR1

.	for modelling vocal tract parameters [2013.01]

G10L 25/78

UGR1

.	Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M 9/10) [2013.01]

G10L 25/81

UGR2

. .	for discriminating voice from music [2013.01]

G10L 25/84

UGR2

. .	for discriminating voice from noise [2013.01]

G10L 25/87

UGR2

. .	Detection of discrete points within a voice signal [2013.01]

G10L 25/90

UGR1

.	Pitch determination of speech signals [2013.01]

G10L 25/93

UGR1

.	Discriminating between voiced and unvoiced parts of speech signals (G10L 25/90 takes precedence) [2013.01]

G10L 99/00

HGR

Subject matter not provided for in other groups of this subclass [2013.01]

IPC-Stelle: G10L 21/049 [Version 2017.01]