Measurement of vocal fold vibrations: Invasive and Non Invasive Procedures
There are many means of assessing the
larynx indirectly by aerodynamic, spectrographic, and fundamental frequency
evaluation which provide a great deal of useful information upon which
inferences can be based. But the only way to know how larynx is actually working
is to look at it more or less directly. Laryngeal observation is also the only
way to rule out specific structural or functional pathologies.
There is a tendency to forego
laryngeal observation because of the invasiveness involved. But these techniques
must be performed by a licensed physician. Many other examination procedures
are non-invasive. These can be used by the speech therapist for the diagnostic
purpose.
The following invasive methods are
available:
1) Electromyography
2) Direct
Laryngoscopy
3) Stroboscopy
4) Ultra
high speed photography
5) Endoscopy
6) Ultrasonography
7) Photoglottography
8) Videokymography
The
following are the noninvasive methods available:
1) Electroglottography
(EGG)
2) Inverse
Filtering
Parameters to be assessed in vocal
fold vibration:
1.
Horizontal Excursion of the edge of the vocal folds:
the edge is the part of the vocal fold which is located most medially. The edge
of the VF also moves in vertical and longitudinal direction, but it is
difficult to quantify the movements in these directions.
2.
Glottal Width: It refers to the distance between
the edges of VFs in a given frontal plane.
3.
Glottal Area: The area surrounded by the edges
of the VF is called the glottal area. In normal vibration, glottal area
waveform resembles the glottal width waveform determined at the middle of
membranous part.
4.
Fundamental frequency / fundamental period of
vibration: the time span taken by one vibratory cycle is known as vibratory
period.
5.
Opening phase, closing phase, open phase and closed
phase.
One vibratory cycle
is divided into 2 major phases: the open phase and the closed
phase. The open phase is again divided into the opening and
closing phases.
6.
Open Quotient (OQ), Speed quotient (SQ) and Speed
Index (SI)
O
Q= Duration of open phase
Duration of the
entire cycle
Larger the OP, larger will be the
OQ. The value of OQ is 1 when there is no
complete glottal closure.
S
Q = Duration of opening phase
Duration of the
closing phase
SI (Speed Index) =
SQ-1
7.
Amplitude: The size of the greatest
displacements is called maximum amplitude.
8.
Regularity / periodicity of successive vibration
9.
Homogeneity: The structure of the normal VF is
roughly homogenous.
10. Mucosal
Wave: In the normal vocal fold, waves traveling on the mucosa from interior
to its superior surface are observed during vibration except for falsetto. This
wave is called mucosal wave. Speed of the wave is normally 0.5 to 1m/sec.
11. Symmetry
of the bilateral vocal folds
Symmetrical movement
of vocal folds indicates that their mechanical properties are the same.
12. Upper
lip and lower lip
The certain phases
during each vibratory cycle, 2 lips like eminences are observed near the edge
of the vocal fold. They are termed as upper lip and lower lips. They are
usually best observed immediately after the maximum opening of the vocal folds.
13. Contacts area of vocal folds
During the closed
phase, the contact area of the bilateral vocal folds changes with time. This
cannot be observed from above. In electroglottography and ultrasound
glottography, information about the contacts area is reflected in the output
signals.
INVASIVE PROCEDURES
CINEMATOGRAPHY
Motion pictures of the
functioning vocal folds provide an excellent means of glottal function.
Relatively fast filming (about 50 frames per second) provides sufficient slow
motion to determine the fine points of laryngeal articulatory behavior.
Ultra high speed
photography (> 4000 frames per second) allows the finer aspects of vocal
fold motion during each vibratory cycle to be explored at a speed reduction of
1:250 or more when the film is viewed with a standard projector. The production
and analysis of ultra-high speed films is not easy and requires much more
complex instrumentation than the simple system.
There are several different
approaches to laryngeal cinematography, each with its advantages and
limitations. The clinician will need to select that method which is most likely
to provide the needed information with maximal ease and efficiency.
There are two methods:
·
Endoscopy
·
Ultra high speed photography
Endoscopy
Endoscopy means looking
inside and typically refers to looking inside the human body for medical
reasons using an instrument called an endoscope. Endoscopy can also
refer to using a borescope
in technical situations where direct line-of-sight observation is not feasible.
It is a minimally
invasive diagnostic medical
procedure used to assess the interior surfaces of an
organ by inserting a tube into the body. The instrument may have a rigid or
flexible tube and not only provide an image for visual inspection and photography,
but also enable taking biopsies and retrieval of foreign objects. Endoscopy is
the vehicle for minimally invasive surgery.
Many endoscopic procedures are
considered to be relatively painless and, at worst, associated with mild
discomfort. Most patients tolerate the procedure with only topical anesthesia
of the oropharynx using lignocaine spray.
Components
An endoscope can consist of:
·
a light delivery system to illuminate the organ or object under inspection. The light
source is normally outside the body and the light is typically directed via an optical
fiber system
The fiberoptic
endoscope, commonly called a fiberscope, is inserted through the nasal cavity
(nasoendoscope) and using the positioning controls is visually guided through
the velopharyngeal port, across the oropharynx and into the hypopharynx. There,
the tip is brought to the level of epiglottis and angled to provide an
unobstructed view of the vocal folds. The patient does not have to be
positioned specially, and so the view is
of functioning larynx in its normal postural and relationship to other
structures. The fiberscope can also be used to observe velopharyngeal activity.
The ease, with which patients tolerate the fiberscope and the excellent view of
the larynx that it affords, makes it an instrument of choice in routine
clinical assessment of disorders.
The fiberscope does not
interfere with the oral articulation and does not significantly hinder velar
closure. Therefore, it is of enormous value in assessing the articulatory
performance of the larynx.
Risks
·
Infection
·
Punctured organs
·
Over-sedation
Laryngoscopy
Definition
Endoscopy that is mainly done for the visual examination
of the voice box (larynx) and the vocal cords is called as laryngoscopy. It can
also be done to remove foreign objects stuck in the throat, and to biopsy
growth on the vocal cords. There are two main kinds:
1.
Indirect Laryngoscopy–uses mirrors to examine the larynx and hypo-pharynx (a
portion of the passageway to the lungs and stomach)
2. Direct Laryngoscopy–uses
a special instrument, most often a flexible scope
Reasons for Procedure
Laryngoscopy is used to examine and diagnose problems
inside the throat. It is most often performed for the following reasons:
-To
diagnose the cause of a persistent cough, hoarseness, throat pain, or bad
breath
-To
visualize a mass in the throat
-To
evaluate reasons for difficulty swallowing
-To remove
a foreign object
-To
diagnose suspected cancer
During Procedure –
The client should be given sedation, anesthesia, and
medication to decrease secretions. General anesthesia is not required for the
office-based flexible laryngoscopy.
Anesthesia
–Local, sometimes general for a direct laryngoscopy
Description
of the Procedure –
Direct Fiberoptic Laryngoscopy–The direct method
is most often performed after the more common indirect method to allow for
viewing of a greater area, or if the gag reflex did not allow the doctor to do
a thorough examination. A rigid angled laryngoscope, a thin, fiberoptic
instrument that lights and magnifies images, can provide a more continuous view
of the larynx as the clients breathe. The doctor inserts the scope through the
nostril or mouth and into the throat. Through the eyepiece, the doctor examines
the larynx. This method is often done in the operating room under general
anesthesia.
Possible
Complications –
·
-Cuts on the bottom of
the tongue from stretching it over the teeth
·
-Anesthesia-related
problems
·
-Vomiting
·
-Abrasions or bleeding
when using a fiberoptic scope
·
-Excessive swelling or
bleeding
·
-Bleeding from the
nose if the scope is passed through the nose
Ultra
High Speed Photography
Ultra high Speed Photography is
the science of taking pictures of very fast phenomena. In 1948, the Society of Motion Picture and Television Engineers
(SMPTE) defined high-speed photography as any set of photographs captured by a
camera capable of 128 frames per second or greater and of at least three
consecutive frames.
In common usage, high speed
photography may refer to either or both of the following meanings. The first is
that the photograph itself may be taken in a way as to appear to freeze the
motion, especially to reduce motion
blur. The second is that a series of photographs may
be taken at a high sampling frequency or frame rate. The first requires a
sensor with good sensitivity and either a very good shuttering system or a very
fast light. The second requires some means of capturing successive frames,
either with a mechanical device or by moving data off electronic sensors very
quickly.
Ordinary camera shutter and transport
mechanisms are simply incapable of the speeds required for the ultra high speed
work. The shutter is thus replaced by a rotating prism that, as it turns,
projects successfully images onto continuously moving film.
APPARATUS: Basically consists
of an extremely bright light source, an optical system to reflect light from
the subject’s larynx to a camera unit, an ultra high speed camera, an electric
or electronic system to operate the light source and the camera, and a pulse
generator which provides time marks.
PROCEDURE: The vibrating vocal
folds are photographed with the number of frames at a rate which is about 20 to
30 times the fundamental frequency of the phonation. For example, if the vocal
folds, vibrating at120 Hz, are photographed at a film speed of 3.000 frames per
second, images of 25 phase points are photographed within one vibratory cycle.
When one views this film running at a normal speed (24 frames per second), the
events are observed in an ultra slow motion. The time dimension is expanded by
125 times. Frame by frame analysis of various parameters demonstrates the
vibratory behavior detail
The advantage gained by the very high
frame rate is significant. For example, a larynx phonates at the rate of 100
Hz. Each glottal cycle requires 10ms. Standard film speed is 16 fps, meaning
that 62.5 ms elapses from start of one frame to the start of the next. In this
period, more than six glottal cycles would have occurred. Furthermore the
camera shutter is open for the significant portion of the frame to frame period
certainly long enough for a full glottal cycle to be completed. The record on
the standard speed film will therefore be just as much smear as the blur seen
by the unaided eye.
At 4000 fps, however, the situation
is different. The frame to frame interval is only ¼ ms, providing 40 successive
approximations during each 10 ms period of the 100 Hz phonation. The high speed
shutter is open about 40% of the time so there is relatively little blurring of
the picture.
Ultra high speed photography using a
laryngeal mirror has proven to be an extraordinary means of getting information
about vocal cord function. Many improvements and modifications of the original
technique have been made to overcome limitations or to meet special
requirements.
This technique is very expensive, in
terms of the requisite equipment and operating costs, and film analysis is
tedious and time consuming. It may not be therefore preferred approach in
clinical purposes. It is, however, extremely useful for research and teaching
purposes.
Electromyography
Electromyography (EMG) is a
technique for evaluating and recording physiologic properties of muscles at
rest and while contracting. EMG is performed using an instrument called an electromyograph,
to produce a record called an electromyogram. An electromyograph detects
the electrical potential generated by
muscle cells
when these cells contract, and also when the cells are at rest.
INSTRUMENTATION:
The electromyograph essentially
consists of an electrode system, an amplifier, a cathode-ray oscilloscope, a
loudspeaker and a recording system. Muscle
action potentials are detected by conductors, called electrodes, placed in the
region of the elect disturbance.
Electrodes can be divided into 2
comprehensive classes:
1.
Intra muscular i.e. needle or hooked.
2.
Surface
The choice between
the 2 types and (among the subcategories within each) depends on the kind of
information needed. Electrodes are used in pairs with a differential
amplification. The EMG signal is the difference in the voltages seen by 2
electrodes. If the electrodes can be made small and if they can be kept very
close together, it is possible to observe the electrical response of the single
muscle fiber.
At least 3 important
criteria must be met by the electrode:
1.
Must not interfere with or alter normal motor function.
2.
Must be able to move with muscle without generating
spurious signals (movements artifacts)
3.
Must be enable in confined area such as oral cavity.
In examining the
disorders of the motor units, a needle electrode should be used. In order to
avoid interfering signals from adjacent muscles, a bipolar needle electrode
preferable to a monopolar concentric needle electrode.
When a
kinesiological pattern of a muscle or that of a set of muscles is examined,
hooked-wire electrodes should be used. They have following important
advantages:
1.
They offer minimum discomfort to
the subject, and consequently do not interfere with normal phonation.
2.
They stay fairly well in place
regardless of rapid movements of the vocal folds or the displacements of entire
larynx during phonation.
3.
They permit considerable
localization of the area from which electrical activity is recorded.
Monopolar needle electrodes simply
use the bare tip of a needle as an electrode. 2 monopolar needles are inserted
into the muscle for recording of the electrical activity in the zone between
them.
PROCEDURE:
This is the only procedure that
directly demonstrated muscular activity. It is very useful as one of the
methods of the clinical examination of vocal fold paralysis. It can be
effectively used in examining patients with functional voice disability. It can
be applied to the study and evaluation of muscle and nerve pathology and nerve
muscular disorders.
A muscle fiber maintains a steady
potential across its membrane (inside negative) at rest. When a nerve impulse
arrives at the nerve ending, a chemical transmitter substance, acetylcholine is
liberated from the nerve ending on to the motor end phase of the muscle. This
induces depolarization of the muscle fiber membrane producing an action
potential. The action potential is transmitted along the muscle fiber in both
directions at a speed of approximate 4 meters / sec. exciting the contractile mechanism
of the filter. The muscle fiber begins to contract after an interval of
approximately 1 msec.
When an electrode is placed outside
the muscle fiber membrane, action potential can be recorded. Since action
potential is very small (0.1 to 1mv), amplification is required in order to
record it. The graphic display obtained in this way from several muscle fibers
is called an electromyogram. No action potential is recorded in normal resting
muscle.
During voluntary contraction of a
normal muscle, all the muscle fibers innervated by a single lower motor neuron
act together. The tiny action potential of the muscle fibers is summed up and
they produce a larger action potential.
Electromyography reveals a great deal
about muscle activity, it provides very little information about structural
movement. A muscle contracts in a context of many other opposing or augmenting
forces.
Insertion of the electrode into
the intrinsic laryngeal muscles:
Earlier, needle electrode was
inserted into most of intrinsic laryngeal muscle through mouth. But it often
makes it difficult for the subjects to phonate normally.
Hirano et al. (1962) described
techniques of inserting needle electrode into muscles through the cervical
skin.
The method and insertion into each
muscle is as follows:
Cricothyroid muscle: The skin
is pierced at a pharynx above the lower edge of the cricoid cartilage and
lateral to the midline.
Lateral Cricoarytenoid Muscle:
The needle is inserted through the cricothyroid space penetrating the
cricothyroid muscles anterior to the inferior tubeculum. The needle is directed
posteriorly, laterally and upwards until the LCA muscle is pierced.
Posterior
Cricoarytenoid Muscles: The procedure is triangular to LCA but needle is
inserted some 5-10 mm deeper.
Vocalis Muscle: After topical
anesthesia of the laryngeal mucosa, the needle is inserted into the sub glottal
space through the cricothyroid space at the midline. It is easy to insert the
needle into the vocal fold during phonation than during respiration.
Interarytenoid muscle:
Following local anesthesia of the laryngeal mucosa, the needle is inserted into
the sub glottal cavity through the cricothyroid space at the midline. The
needle is pushed upwards and backwards.
Processing & Display of EMG:
EMG signals are very small ranging
from 200 µv / less to about 1 µv. Very high amplifies gain (order of 10,000) is
often needed to get an output of sufficient amplitude for recording.
The appearance of the electromyogram
will depend on the volume of muscle seen by the electrodes and the nature of
task being examined. If the active region between the electrodes is very small
(as with intramuscular bipolar electrodes) individual muscle action potentials
may be recorded, especially if the muscle is not very active.
The frequency of the spikes of
electrical activity can range up to more than 40/sec, making ‘raw’
electromyogram during a speech event. The No. of spikes per sec. is a direct
measure of the degree of muscle activation.
Most often the EMG signal is the sum
of the electrical action occurring at quasi-random time with respect to each
other. This kind of summation results in an interferences pattern.
Surface electrodes always yield such
a pattern because many motor units influence them.
Most often, EMG signal are rectified
and integrated for analysis. This is the best way of demonstrating the amount
of electrical activity. It has been found that the amplitude of the integrated
signal is and to the force of an isometric contract. The optimal averaging time
will vary with the muscle being observed and the task being evaluated. The
longer the averaging time, more individual muscle action potential are leveled
to create a smooth curve showing electrical activity over time.
Minor differences in muscle
activation occur from one repetition of an utterance to the next. These
variations may interfere with an accurate assessment of what a given muscle
actually does during a speech task. To mitigate this problem, special computer
averaging techniques have been developed. This technique involves sampling many
repetition of the same utterance. The EMG data are stored in the computer and
are aligned at some reference patients chosen by the user.
The values for each of the samples at
each point in time are averaged to produce a final output in which events that
is present in every one samples are emphasized, but those that are unique and
de-emphasized the significance of electromyography for vocal fold paralysis.
Advantages:
1.
It is helpful in differentiating vocal fold paralysis
from immobility of the vocal folds caused by various mechanical fixations.
2.
It gives information about the degree and the extent of
paralysis.
3.
It is useful in determining the side of lateral
fixation of the vocal fold in cases of bilateral vocal fold paralysis.
4.
It is helpful from a prognostic point of view the
presence of action potential induced by voluntary activity indicates a
favorable prognosis.
STROBOSCOPY
Stroboscopic examination, or a
routine clinical test, is the most practical technique for examination of the
vibratory pattern of the vocal folds. It was first developed by Dertel in 1878
in Munich on the human larynx.
Principle: The stroboscopic
effect is based on an optical illusion that arises from the persistence of
vision. According to Talbot’s law, every light admission to the retina leaves a
positive after image for 0.2 secs. A sequence of individual frames presented at
intervals shorter than 0.2 sec appears as a continuously moving picture. The
naked eye can perceive no more than which distinct images per second. Vocal
fold vibration that cannot be resolved by the human eye will become visible for
these reasons.
Apparatus: Basically the
apparatus consists of a microphone, a light source, an electronic control unit
and a pedal. It has at least the following three functions:
1.
To extract the fundamental period of the voice signal
and to emit flashes synchronous signal
2.
To vary the phase point when the light flashes.
3.
To indicate the fundamental frequency of phonation.
The light source of
the stroboscopic emits intermittent flashes of lights, which are synchronous
with the vibratory cycles. When the frequency of flash coincides exactly with
the frequency of vibration of the vocal fold (synchronization), the vocal fold
seems motionless. If the frequency of the flash is slightly different from the
vibratory frequency (asynchronization), the vocal folds will be illuminated in
each passage but not exactly at the same movement and entire vibratory cycle
can be seen in ‘slow motion’. In videostroboscopy a permanent image of the
apparent motion of vocal folds is recorded.
Clinical Process of Stroboscopy
The stroboscopic examination is more
precise through a tele-video-endoscope. To obtain stroboscopic frames, we need:
·
A microphone placed on the patient’s neck near
the thyroid cartilage.
·
Emission of a Fo to light the stroboscopic lamp.
·
A telescope introduced in the mouth or fiberscope
through the nose and an activated foot pedal to control the right ignition. The
patient sustains the vowel ‘I’ for at least 2 seconds and various pitches are
performed.
Early diagnosis of
vocal fold lesion such as soft nodules, vascular pathologies and premalignant
lesions can be enhanced. It provides information about Fo, symmetry of
bilateral movement, regularity, glottis closure, amplitude, mucosal wave,
non-vibrating portion and other findings.
Several parameters that may be evaluated during the course
of the stroboscopic examination are:
·
Fundamental
frequency: The fundamental frequency is
measured by using the strobe unit and used to set the frequency of the light
flashes. Strobe light is typically produced at a frequency several hertz slower
than the vocal frequency to produce the illusion of a slow-motion vibratory
cycle. An identical frequency is emitted in the locked mode that produces a
still image of a single portion of the vibratory cycle.
·
Periodicity: Periodicity refers to the regularity of successive vocal
motions. Normal vibratory activity is regular and periodic.
·
Amplitude: Amplitude refers to the lateral excursion of the vocal
folds during their displacement away from the midline in oscillation. Typical
amplitude is approximately one third of the total width of the vocal fold.
Amplitude is generally graded as normal, less than normal, or greater than
normal.
·
Symmetry: Normal motion of the vocal folds is symmetric, both in
vibratory characteristic and in adductory and abductory motion.
·
Glottic closure: In the healthy person, the membranous portion of the
vocal folds completely closes during the vibratory cycle. The posterior
cartilaginous glottis may remain open (posterior glottic chink) in some healthy
people.
·
Mucosal wave: The pattern of light traveling from medio-laterally along
the superior surface of the vocal fold during vibration under illumination is
referred to as the mucosal wave. It is a correlate of the pliable cover
(epithelium and superficial lamina propria) of the vocal fold being displaced
relative to the body of the vocal fold (vocalis muscle). Focal abnormalities of
mucosal wave help to localize pathology in the vocal fold
·
Non vibrating portion: If there is any
portion of the vocal fold which does not vibrate, in other words, which remains
immobile during phonation, should be specified.
PHOTOGLOTTOGRAPHY
PRINCIPLE: The principle on which
photoglottography rests is simple enough. The glottis is considered as a
shutter through which light passes in proportion to the degree of opening. If a
light is made to shine on glottis, the amount of light passing through is
directly proportional to the glottal area. Optoelectronic devices are adequate
to transduce changes in luminous intensity at the rates typical of laryngeal
function. It is, therefore, possible to obtain an electrical voltage
proportional to the glottal area. Here, glottal area variation is recorded with
the use of photoelectric device which converts light intensity into electric
voltage. The glottis is illuminated from above or below and the intensity of
the light passing through the glottis is measured with a light sensor placed on
the opposite side (relative side to the position of the light source).
APPARATUS: The apparatus
consists a fiber-optic cable, laryngeal mirror, microphone and foot
pedal. The foot pedal controls all functions of the instrument. Stroboscopic
illumination is delivered to the laryngeal mirror via a fiber optic cable and
the microphone tranduces the fundamental frequency.
WORKING: Sonneson (1959, 1960) first
used this device on the human larynx. In his technique a bright DC light source
is placed against the neck just below the larynx. The source causes the
sub-glottal space to be suffused with the light that filters through the
tissues of the neck and trans-illuminates the glottis. A curved light
conducting rod is passed through mouth and terminates at the level of
epiglottis. A photomultiplier tube is connected to the oral end of the rod.
Coelman & Wendal (1968) recorded
photo-electric glottograms simultaneously with ultra high speed motion
photography during sustained phonation. They found a significant difference
between the glottal waveforms obtained by these two methods. They pointed out
the following factors as possible sources of error in photo-electric
glottography:
1.
The light density distribution within the vocal folds
may not be constant.
2.
The changing cross-sectional area of the vocal folds in
an anterior plane may result in an uneven illumination of the vocal folds.
3.
Light reflections from the mucosal surfaces may be
variable.
4.
Vertical movements of the vocal folds towards and from
the light source are not taken into account.
5. The
location of the monitoring devices causes different waveforms.
ULTRASOUND
GLOTTOGRAPHY/ ULTRASONOGLOTTOGRAPHY/ ECHOGLOTTOGRAPHY
Ultrasonic waves, that is, high frequency sound waves (1-10 MHz), can
pass through various kinds of media, including body tissues. This basis can be
used for observation of larynx. But the side of the vocal folds, their
location, the complexity of their movements and the small distances they
transverse during phonation create very special difficulties in adapting
ultrasound for laryngeal examination.
In this technique, vocal fold position can be tracked using the same
principles as SONAR. The process is also called as echoglottography. During
phonation, the velocity of the edges of the vocal folds is moderately great.
Tracking them adequately requires that the ultrasound pulses be very short ant
their repetition rate high. Thus, special instruments have been designed
(Holmer, Kitzing and Lindstrom, 1973) that provides up to 10,000 pulses per
second and offer characteristics for time-motion displays. While the opening
and closing phases are discernible, the closed and open intervals are not well
understood.
Unfortunately, the edge of the vocal fold does not move as a single flat
reflecting plane (Saito, Fukuda, Isogai and Ono, 1981). The complexity of the
changes in its shape creates very confusing echoes. Thus, clear interpretation
of the electroglottogram is not likely to be a simple matter. Since the edges
of the vocal folds constitute a very small target, the ultrasonic beam must be
very narrow and well-defined. The law of physics conspires to make such a beam
very difficult to obtain.
VIDEOKYMOGRAPHY
A recent development in the techniques of the
visualization of the larynx is that of videokymography. This technique trades
off spatial resolution to improve time resolution when imaging laryngeal
movement. Thus, instead of obtaining a relatively large image of the larynx at
50 or 60 frames per second, this technique images a single line at frame rates
of approximately 8000 frames per second. Because of the high speed imaging,
this technique also does not require stroboscopic illumination for viewing
vocal fold movement. Videokymography can also provide information about the
movement of the upper margin of the vocal folds, the mucosal wave and asymmetry
between the left and right vocal folds, open quotient differences along the
glottis, and sometimes in the closing phase, the lower margin of the vocal
folds.
Videokymography was developed as a means of using
television technology to accomplish some of what ultra-high speed filming can
achieve. Videokymography sacrifices two-dimensionality in order to gain speed.
It does that by ignoring all of the field of view and limiting scanning of the
endoscopic image to rapid repetition of single line. Each new scan of the same
line in the field of view is displayed on the screen just under the previous
scan, so that a screen image is built up, with time (advancing downward) as the
vertical dimension.
The principle of the
videokymography is that each frame in NTSC system is comprised by 525
horizontal lines. These lines are read point to point, and successively,
starting from the left upper side and finishing on the right lower point of the
frame. Then, CCD camera system reads these 525 lines in two groups of 262.5
lines each, alternatively. Thus, the first line of the first group is followed
by the first line of the second group, and then the second line of the second
group, and so on. These groups of lines are known as field A and field B.
Videokymography discards field B and reading is made in each frame, only in
field A. However, rather than reading all 262.5 lines in the whole field A, the
system reads only one of these lines and then it is read 262.5 times in thirty
parts of a second, which represents approximately 7,812.5 readings of the same
line in one second.
Standard
vs. High-Speed Mode
The videokymography camera offered by
KayPENTAX functions in either standard or high-speed mode. The standard mode
presents a black-and-white composite video image. In standard mode, the frame
rate is 60 Hz (NTSC) or 50 Hz (PAL). This mode is used to properly position the
endoscope for data recording during the high-speed mode.
In
high-speed mode, the camera scans a single line from the standard image at a
rate of nearly 8000 lines/second. With each line displayed on the monitor in
succession, a time history representing successive glottal cycles is produced.
Clinicians may position the endoscope to “select” which portion of the vocal
folds (e.g., middle, anterior commissure, etc.) is observed. A foot pedal
allows the clinician to easily switch between the standard and the high-speed
modes. KayPENTAX also offers a switch/distribution system (with S-Video/BNC
input and S-Video/BNC outputs) to allow easy switching between the standard
color camera used for stroboscopy and the black-and-white videokymography
camera.
Complement
to Stroboscopy
Given
its unique capabilities, VKG is the ideal complement to stroboscopy. Although
the full screen display of the VKG “image” constituted of single lines is not
as intuitive as stroboscopy, VKG does allow direct viewing of vocal fold
behaviors which may not be observable with a stroboscopic image. For example,
the high scan rate of VKG allows the direct observation of vocal fold
motion, even if the motion is aperiodic. Thus, voicing initiation, diplophonia,
biphonia, vocal fry, creaky voice, and aperiodicity can all be viewed directly.
Even in normal quasi-periodic phonation, vocal asymmetry and mucosal waves are
clearly visible with this powerful technique.
VKG
promises to fill a key role in broadening the understanding of phonatory
dynamics. The VKG System complements the complete KayPENTAX line of voice
instrumentation used by clinicians and researchers throughout the world.
KayPENTAX offers competitive pricing and excellent support for all of its
products.
cycles with key features labeled.
The two
modes of the VKG camera are shown above. The standard mode displays a
black-and-white video image for proper orientation. In high-speed mode, a
single line selected from the standard image is displayed approximately 8000
times per second.
Differences
between Videokymography and other Digital
High-Speed Imaging Systems
The ability
of providing both the standard as well as high-speed images of the vocal folds between which the system can be immediately
switched) distinguishes the VKG system from the (high-speed) linear cameras
that are able to deliver only line-images. The two modes make the VKG system
powerful and more practical for carrying out meaningful laryngeal examinations.
In contrast to the high-speed digital imaging systems that provide full
(laryngeal) images at high-speed, the high speed image rate is achieved at the
expenses of reduced spatial information in videokymography. There are
advantages as well as 58 disadvantages of the VKG approach when compared to
(full-image) digital high-speed systems.
Advantages of Videokymography
The advantages of VKG in contrast to (full-image)
high-speed systems may be listed as
Follows:
·
VKG is significantly less expensive (the
equipment, as well as the storage costs per recorded time)
·
Lesser amount of data has to be stored and
processed.
·
The duration of the recording samples is
virtually unlimited (especially when VCR and Video tapes are used)
·
The format of the image information (CCIR
or NTSC television standard) ensures that VKG works with standard, commercially
available video equipment (standard video Monitors, VCRs, etc.; generally the
same video equipment as in videolaryngostroboscopy is used for VKG)
·
It offers an excellent spatial resolution
(768 pixels/line in CCIR, c.f. usually 256 pixels/line is used in today’s
full-image high-speed systems)
·
It offers an excellent image rate (7812.5
images/s in CCIR; c.f. 1000 – 2000 images/s is the most frequently used rate in
today’s full-image high-speed systems).
Disadvantages of Videokymography
The disadvantages of VKG, as compared to the (full)
high-speed imaging systems, arise mainly as a consequence of the fact that only
a single image line is monitored in VKG. These are:
1) Lack of the
full image in the high-speed mode
2)
anterior-posterior phase differences in the vocal-fold vibration are not
registered
3) The measuring
position has to be selected and properly adjusted before recording using the
standard mode of the camera.
4) Gross movements
of the larynx could make the recording position inaccurate; certain
disadvantages are related also to the use of the CCIR (or NTSC) TV standard
5) VCRs and PCs
often process and display two VKG images simultaneously, in an interlaced form.
6) The standard
television format requires certain time interval within each video field to be
reserved for synchronization purposes (ca. 2 ms per 20 ms in CCIR) and that
interval cannot be used for the image information. These information gaps
slightly complicate image analysis of longer passages of the resulting VKG
signal.
NON-INVASIVE PROCEDURES:
1.
Inverse
Filtering
The source-filter theory of speech
production provides theoretical background for the inverse filtering technique.
If the transfer function of the vocal tract filter is known, an inverse filter
can be constructed. In principle, the glottal excitation signal can then be
reconstructed by feeding the speech signal through the inverse of the vocal
tract filter.
In practice, the transfer function of the
vocal tract filter can be approximated based on the speech signal and general
knowledge about the voice production mechanism. An approximate inverse filter
can then be constructed. Applying the inverse filter to the speech signal yields
an estimate of the excitation signal, the glottal volume velocity waveform.
This signal is also known as the flow glottogram (FGG) (Hertegård et
al., 1992; Hertegård & Gauffin, 1995).
Inverse filtering was first presented by
Miller (1959), who applied analog electronic filters to cancel two lowest
formants and the lip radiation effect from a speech signal captured by a
microphone.
Rothenberg (1973) introduced a different
inverse filtering technique that uses the air flow at the mouth as the source
signal. The subject’s mouth and nose are surrounded by a special mask for
measuring the flow waveform. This method allows the estimation of absolute flow
values including the DC component, as opposed to the inverse filtering of the
pressure signal captured by a microphone, which loses the absolute zero level
of flow due to the lip radiation effect. Rothenberg’s technique is also less
sensitive to low-frequency noise.
However, the flow measurement mask causes
an upper bound on the useful frequency range at approximately 1.6 kHz
(Hertegard & Gauffin, 1992).
Successful inverse filtering is sensitive
to phase distortion in the speech signal in the
frequency range of interest. Traditional
tape recorders are problematic for signal storage in this sense since they
cause substantial phase distortion, which must be compensated for (Childers et
al., 1983). This problem has been overcome by tape recorders utilizing frequency
modulation (FM), which fulfills the requirement of phase linearity (Miller,
1959). Digital filtering techniques provide obvious advantages over analog
techniques. According to Hunt et al. (1978), a digital inverse filtering
approach was applied to speech already by Holmes (1962). Since the 1970’s,
inverse filtering has been increasingly realized using digital techniques (Hunt
et al., 1978; Javkin et al., 1987). Nowadays, practically all
inverse filtering methods in use are digital due to the flexibility,
repeatability, and ease of implementation of the digital techniques compared to
analog filters. Digital sampling and storage techniques also do not have the
phase distortion problem, provided that the equipment is of high quality and
the frequency range of flat amplitude response and linear phase response extends
to low frequencies.
Digital inverse filtering methods can be
categorized to manual and automatic techniques. Manual methods require the
human operator to manually adjust filters to match the formants of the speech
signal, whereas automatic methods build a vocal tract model and automatically find
filter parameters, often by means of LPC analysis (Hertegård et al.,
1992). There are also semiautomatic methods that lie somewhere between these
two extremes. For example, the method proposed by Alku (1992) basically finds
the vocal tract filters automatically but the user still controls a few
parameters that affect the resulting flow signal. Södersten et al. (1999)
compared an automatic and a manual inverse filtering method and reported high agreement
between the airflow parameters calculated from the flow signals of these two methods.
However, noticeable differences were also encountered. Inverse filtering
basically involves extracting two signals, the volume velocity waveform
at the glottis, and the effect of the
vocal tract filter, from a single source signal. The technique thus implies
strong assumptions about the glottal volume velocity waveform and the transfer
function of the acoustic vocal tract filter. Consequently, the result of
inverse filtering has to be regarded as an estimate of the glottal flow. The
actual volume velocity waveform at the glottis is not known exactly.
Furthermore, the accuracy of inverse
filtering deteriorates if the fundamental frequency of speech is high because
the sparse harmonic structure of the excitation spectrum nterferes with formants, which are local
resonances in the spectrum. Nasalized vowels are also not suitable for inverse
filtering because their spectra contain antiformants that are difficult to compensate
for properly (Hertegård et al., 1992).
Despite these limitations of the method,
inverse filtering has proved to be a valuable tool both for clinical use and
for fundamental research of the voice production mechanism (Fritzell, 1992). It
is a non-invasive technique that does not require bulky or expensive equipment.
The restrictions of an application may make inverse filtering the only
practical means of examining the voice source of a subject.
Figure shows a typical example of a speech
pressure signal and the corresponding
inverse filtered glottal flow waveform.
Figure: Speech pressure waveform of a
female speaker’s sustained /a/ vowel
and the corresponding inverse filtered glottal flow
waveform.
2.
Electroglottography
Electroglottography (EGG) is a
non-invasive method for the examination of the vocal fold vibration. According
to several authors (e.g. Colton & Conture (1990); Baken (1992); Henrich et
al. (2004)), the method was first reported by Fabre (Fabre, 1940, 1957).
Now it has been used for clinical and research purposes for decades.
Electroglottography is based on measuring
impedance across the neck of the speaker. When the vocal folds are closed,
electric current can pass through them. When the folds are apart, an insulating
air gap separates them, and the impedance across the larynx is higher. Thus,
the impedance changes across the larynx indicate the variation of the contact
area between the focal folds.
Electrodes are placed on the subject’s
skin on each side of the larynx and a high-frequency alternating current is fed
through them in order to measure the impedance between the electrodes. The
frequency is typically in the megahertz region and the current is limited to a
few milliamperes to ensure that the electric current is imperceptible and
harmless to the subject (Baken, 1992). The voltage between the electrodes is
typically about 0.5 volts (Marasek, 1997).
The resulting electroglottographic signal,
the electroglottogram, shows the impedance variation as a function of time.
Impedance variation due to vibrating vocal folds is relatively small, typically
only 1–2 percent of the total measured impedance (Baken, 1992).
Furthermore, the impedance varies
considerably due to changing skin moistness and vertical movements of the
larynx. Therefore, high-pass filtering is applied to the obtained electroglottographic
signal in order to eliminate low-frequency noise and to extract only the
variations caused by vocal fold vibration. Additionally, automatic gain control
is often built into EGG devices to maintain appropriate signal level despite
considerable impedance
changes between subjects and also during a
single recording session. These techniques cause phase and amplitude distortion
that may influence the EGG waveform (Scherer et al., 1988, page 291).
Consequently, the EGG signal cannot be considered an absolute measure of vocal
fold contact, and care must be taken when interpreting the signal.
Despite its limitations, EGG yields useful
information about the behavior of the vocal folds during phonation.
Electroglottography has been studied widely and its validity has been assessed
by numerous studies comparing EGG with stroboscopic methods, high-speed imaging,
photoglottography, subglottal pressure measurements, and inverse filtering, see
Henrich et al. (2004) for references. The results show convincingly that
the EGG signal is related to the contact area between the vocal folds.
Figure shows a typical example of a high-quality
electroglottogram recorded during phonation. It has been high-pass filtered to
eliminate the low-frequency components that are not related to the vibration of
the vocal folds.
Rothenberg (1981b) presented a model of
the different phases of the EGG signal period and their relations to the
physiological events occurring in the larynx. This model is presented in Figure
3.6. Other similar models exist, see e.g. Childers et al. (1983). Such models
are, however, idealized simplifications that must not be interpreted literally.
Many authors have pointed out that the EGG signal does not allow exact
determination of the instant of closure, and locating the instant of glottal
opening from the EGG signal alone is even much more inaccurate, see e.g.
(Colton & Conture, 1990) and (Baken, 1992). Titze introduced a mathematical
model that describes the vibration pattern of the vocal folds and predicts the
contact area variation, see e.g. Titze (1990).
Figure: Electroglottogram of the normal
phonation of a male subject. The upper panel shows the EGG signal and the lower
panel its first derivative. Upward change in the signal represents decreasing
impedance and thus reduced contact between the vocal folds.
A number of geometric and kinematic
parameters are used to describe the shape and movements of the vocal folds, and
the model gives the corresponding contact area waveform. The model explains
many features of the contact area waveform by relating them to the
physiological pattern of vocal fold vibration: EGG pulse widening is caused by
adduction of the vocal folds, and peak skewing is related to wedge-shaped vocal
folds and vertical phase difference. A knee in rising and falling edges of an
EGG pulse corresponds to the bulging of the contact surfaces of the vocal
folds. Vertical phasing also explains the variation of the pulse waveform
between a triangular and a rectangular shape. Varying characteristics of real
EGG pulses can be explained as combinations of these effects.
By comparing the EGG waveform with
high-speed filming, Childers et al. (1983) related the initial point of
vocal fold contact to a break in the negative slope of the EGG waveform, and
the glottal opening to the instant at which the differentiated EGG (DEGG)
waveform has its absolute maximum. Such peaks of DEGG are clearly visible in
Figure . This approach was carried on by Henrich et al. (2004), who
regarded the peaks of the DEGG signal as reliable indicators of glottal opening
and closing instants defined by reference to the glottal air flow. However,
often such peaks are imprecise or absent, or double peaks occur. All these
cases make this approach unusable.
In addition to resistance across the neck,
the impedance measurement is also influenced by reactance (capacitance or
inductance) of the examined load. Varying capacitance may be hypothesized to
exist in the glottis when the two vocal folds are separated by a thin
insulating layer of air, as pointed out by Rothenberg (1981b). This hypothesis
can be checked by changing the frequency of the alternating current used for
impedance measurement: the current remains unchanged only if the load is purely
resistive. According to Gauffin (Scherer et al., 1988, page 291), the
impedance is essentially resistive in a wide frequency range.
Figure : The phases of the EGG signal
period and their relations to the glottal air flow and physiological events.
The figure illustrates the Rothenberg model (Rothenberg, 1981b). 1–2: Vocal
folds are maximally closed. 2–3: Vocal folds are separating from lower margins
towards upper margins. 3–4: Upper margins are opening. 4–5: Upper margins are
still opening. Changed slope is due to phase differences along the length of
the vocal folds. 5–6: Vocal folds are fully parted. The distance between the
vocal folds is varying but there is little change in contact area. 6–7: Lower
margins are closing with a phase difference along the length of the vocal
folds. 7–1: Vocal folds are closing from lower margins towards upper margins.
The flow pulse begins closely after point 3 and terminates closely before point
7.
JOURNAL
ARTICLES:
Deviant vocal fold
vibration as observed during videokymography: The effect on voice quality
Journal
of Voice, 2001
Aim: To compare videokymographic image
sequences with the synchronized acoustical speech signal of 4 patients to
obtain more insight into the effect of deviant vocal fold vibration on voice
quality as observed during videokymography.
Method:
Videolaryngoscopic images of the larynx and videokymographic images of vocal
fold vibration were recorded using a rigid telescope, a continuous light
source, a charge coupled device black and white camera in normal mode and in
kymographic mode. Simultaneously with the videokymographic recordings, the
acoustic signal was recorded on the audio track of the video recorder.
Videokymographic and acoustic recordings were digitized.
Results:
Observations in this study showed that comparison of videokymographic images
with the speech signal gives the objective evidence of dynamic voice events.
Also, the improvements in
diagnosis and early detection of laryngeal disorders can be anticipated as a
result of videokymographic imaging.
Effects of topical
anesthetic and flexible fiberoptic laryngoscopy on professional sopranos
Journal
of Voice, 2005
Aim:
This study examined the acoustic and perceptual
effects of topical anesthetic and flexible fiberoptic laryngoscopy
against a control conditions on the singing voices of ten professional
sopranos.
Method:
Each participant completed four musical tasks for each experimental condition,
12 bars of an aria, two scales and a messa di voce exercise at 523 Hz.
Results: This study indicates that the young,
highly trained operatic sopranos, in the presence of their opera pedagogue, can
generally sing demanding operatic arias during anesthesia of the nasal cavity
and flexible fiberoptic laryngoscopy without effects on the energy in the
singing voice, vibrato rate and extent and the vocal range.
However, few participants
might show a reduction in the level of energy in the formant region, which was
speculated to be associated with the levels of psychological coping that the singer
possessed in the demanding situation.
Strobovideolaryngoscopy:
results and clinical value
Annals
of Otology Rhinology and laryngology, 1991
Aim:
To determine whether the additional experience of Strobovideolaryngoscopy has
altered the clinical usefulness of the procedure.
Method:
Diagnoses were noted before and after stroboscopy prospectively for 377
strobovideolaryngoscopy procedures performed during the calendar year 1989.
Observations were recorded about voice
quality, laryngeal color, vocal fold motion, structural abnormalities,
supraglottic muscle function and other findings, symmetry, periodicity, glottic
closure, amplitudes, waveforms and nonvibrating segments.
Results: The procedure has proven very
helpful in caring for voice patients, modifying diagnoses in 47%, and in
confirming uncertain diagnosis in many of the other patients studied. It is
also helpful in documenting normal vocal fold function in cases of psychogenic
dysphonia or malingering.
The value of laryngeal
electromygraphy in the evaluation of laryngeal motion abnormalities
Journal
of Voice, 2006
Aim:
To investigate the clinical utility of laryngeal EMG as a diagnostic aid in the evaluation of
movement disorders of the larynx in patients complaining of dysphonia.
Method: A retrospective chart review of all
patients who presented to a university-based tertiary laryngology referral
center for evaluation of dysphonia over the course of 13 month period was
performed.
Results:
Laryngeal EMG is a useful adjunct to the diagnosis and management of motion
abnormalities in the larynx in patients who present with dysphonia.
Laryngeal EMG findings
can affect the treatment plan in more than one half of the patients who present
with a voice disorder and who are found on examination to have abnormal or
asymmetric adduction, abduction and laryngeal tension.
References:
·
Clinical examination of voice- Hirano et
al
·
Clinical measurement of speech and voice-
Baken et al
·
Clinical measurement of speech and voice-
Baken & Orlikoff
·
Voice treatment for children and
adolescents- Andrews & Summer
·
Analysis of Human Voice Production Using
Inverse Filtering, High-Speed Imaging, and Electroglottography - Hannu Pulakka
Comments
Post a Comment