Measurement of vocal fold vibrations: Invasive and Non Invasive Procedures



There are many means of assessing the larynx indirectly by aerodynamic, spectrographic, and fundamental frequency evaluation which provide a great deal of useful information upon which inferences can be based. But the only way to know how larynx is actually working is to look at it more or less directly. Laryngeal observation is also the only way to rule out specific structural or functional pathologies.
There is a tendency to forego laryngeal observation because of the invasiveness involved. But these techniques must be performed by a licensed physician. Many other examination procedures are non-invasive. These can be used by the speech therapist for the diagnostic purpose.

The following invasive methods are available:
1)      Electromyography
2)      Direct Laryngoscopy
3)      Stroboscopy
4)      Ultra high speed photography
5)      Endoscopy
6)      Ultrasonography
7)      Photoglottography
8)      Videokymography

The  following are the noninvasive methods available:
1)      Electroglottography (EGG)
2)      Inverse Filtering

Parameters to be assessed in vocal fold vibration:
1.      Horizontal Excursion of the edge of the vocal folds: the edge is the part of the vocal fold which is located most medially. The edge of the VF also moves in vertical and longitudinal direction, but it is difficult to quantify the movements in these directions.
2.      Glottal Width: It refers to the distance between the edges of VFs in a given frontal plane.
3.      Glottal Area: The area surrounded by the edges of the VF is called the glottal area. In normal vibration, glottal area waveform resembles the glottal width waveform determined at the middle of membranous part.
4.      Fundamental frequency / fundamental period of vibration: the time span taken by one vibratory cycle is known as vibratory period.
5.      Opening phase, closing phase, open phase and closed phase.
One vibratory cycle is divided into 2 major phases: the open phase and the closed phase. The open phase is again divided into the opening and closing phases.
6.      Open Quotient (OQ), Speed quotient (SQ) and Speed Index (SI)
O Q= Duration of open phase
Duration of the entire cycle
            Larger the OP, larger will be the OQ. The value of OQ is 1 when there is no
           complete glottal closure.
S Q = Duration of opening phase
Duration of the closing phase
SI (Speed Index) = SQ-1
SQ+1
7.      Amplitude: The size of the greatest displacements is called maximum amplitude.
8.      Regularity / periodicity of successive vibration
9.      Homogeneity: The structure of the normal VF is roughly homogenous.
10.  Mucosal Wave: In the normal vocal fold, waves traveling on the mucosa from interior to its superior surface are observed during vibration except for falsetto. This wave is called mucosal wave. Speed of the wave is normally 0.5 to 1m/sec.
11.  Symmetry of the bilateral vocal folds
Symmetrical movement of vocal folds indicates that their mechanical properties are the same.
12.  Upper lip and lower lip
The certain phases during each vibratory cycle, 2 lips like eminences are observed near the edge of the vocal fold. They are termed as upper lip and lower lips. They are usually best observed immediately after the maximum opening of the vocal folds.

13.   Contacts area of vocal folds
During the closed phase, the contact area of the bilateral vocal folds changes with time. This cannot be observed from above. In electroglottography and ultrasound glottography, information about the contacts area is reflected in the output signals.

INVASIVE PROCEDURES

CINEMATOGRAPHY

Motion pictures of the functioning vocal folds provide an excellent means of glottal function. Relatively fast filming (about 50 frames per second) provides sufficient slow motion to determine the fine points of laryngeal articulatory behavior.

Ultra high speed photography (> 4000 frames per second) allows the finer aspects of vocal fold motion during each vibratory cycle to be explored at a speed reduction of 1:250 or more when the film is viewed with a standard projector. The production and analysis of ultra-high speed films is not easy and requires much more complex instrumentation than the simple system.

There are several different approaches to laryngeal cinematography, each with its advantages and limitations. The clinician will need to select that method which is most likely to provide the needed information with maximal ease and efficiency.

There are two methods:

·         Endoscopy

·         Ultra high speed photography

Endoscopy
Endoscopy means looking inside and typically refers to looking inside the human body for medical reasons using an instrument called an endoscope. Endoscopy can also refer to using a borescope in technical situations where direct line-of-sight observation is not feasible. It is a minimally invasive diagnostic medical procedure used to assess the interior surfaces of an organ by inserting a tube into the body. The instrument may have a rigid or flexible tube and not only provide an image for visual inspection and photography, but also enable taking biopsies and retrieval of foreign objects. Endoscopy is the vehicle for minimally invasive surgery.
Many endoscopic procedures are considered to be relatively painless and, at worst, associated with mild discomfort. Most patients tolerate the procedure with only topical anesthesia of the oropharynx using lignocaine spray.

Components

An endoscope can consist of:
·         a rigid or flexible tube
·         a light delivery system to illuminate the organ or object under inspection. The light source is normally outside the body and the light is typically directed via an optical fiber system
·         a lens system transmitting the image to the viewer from the fiberscope
·         an additional channel to allow entry of medical instruments or manipulators
The fiberoptic endoscope, commonly called a fiberscope, is inserted through the nasal cavity (nasoendoscope) and using the positioning controls is visually guided through the velopharyngeal port, across the oropharynx and into the hypopharynx. There, the tip is brought to the level of epiglottis and angled to provide an unobstructed view of the vocal folds. The patient does not have to be positioned specially, and so the view  is of functioning larynx in its normal postural and relationship to other structures. The fiberscope can also be used to observe velopharyngeal activity. The ease, with which patients tolerate the fiberscope and the excellent view of the larynx that it affords, makes it an instrument of choice in routine clinical assessment of disorders.
The fiberscope does not interfere with the oral articulation and does not significantly hinder velar closure. Therefore, it is of enormous value in assessing the articulatory performance of the larynx.

Risks

·         Infection
·         Punctured organs
·         Allergic reactions due to Contrast agents or dyes (such as those used in a CT scan)
·         Over-sedation

Laryngoscopy

Definition

Endoscopy that is mainly done for the visual examination of the voice box (larynx) and the vocal cords is called as laryngoscopy. It can also be done to remove foreign objects stuck in the throat, and to biopsy growth on the vocal cords. There are two main kinds:
1.      Indirect Laryngoscopy–uses mirrors to examine the larynx and hypo-pharynx (a portion of the passageway to the lungs and stomach)
2.      Direct Laryngoscopy–uses a special instrument, most often a flexible scope

Reasons for Procedure

Laryngoscopy is used to examine and diagnose problems inside the throat. It is most often performed for the following reasons:
-To diagnose the cause of a persistent cough, hoarseness, throat pain, or bad breath
-To visualize a mass in the throat
-To evaluate reasons for difficulty swallowing
-To remove a foreign object
-To diagnose suspected cancer
-To evaluate a possible cause for persistent ear-ache

During Procedure –

The client should be given sedation, anesthesia, and medication to decrease secretions. General anesthesia is not required for the office-based flexible laryngoscopy.

Anesthesia –Local, sometimes general for a direct laryngoscopy

Description of the Procedure –

Direct Fiberoptic Laryngoscopy–The direct method is most often performed after the more common indirect method to allow for viewing of a greater area, or if the gag reflex did not allow the doctor to do a thorough examination. A rigid angled laryngoscope, a thin, fiberoptic instrument that lights and magnifies images, can provide a more continuous view of the larynx as the clients breathe. The doctor inserts the scope through the nostril or mouth and into the throat. Through the eyepiece, the doctor examines the larynx. This method is often done in the operating room under general anesthesia.

Possible Complications –

·         -Cuts on the bottom of the tongue from stretching it over the teeth
·         -Anesthesia-related problems
·         -Vomiting
·         -Abrasions or bleeding when using a fiberoptic scope
·         -Excessive swelling or bleeding
·         -Bleeding from the nose if the scope is passed through the nose
Ultra High Speed Photography
Ultra high Speed Photography is the science of taking pictures of very fast phenomena. In 1948, the Society of Motion Picture and Television Engineers (SMPTE) defined high-speed photography as any set of photographs captured by a camera capable of 128 frames per second or greater and of at least three consecutive frames.
In common usage, high speed photography may refer to either or both of the following meanings. The first is that the photograph itself may be taken in a way as to appear to freeze the motion, especially to reduce motion blur. The second is that a series of photographs may be taken at a high sampling frequency or frame rate. The first requires a sensor with good sensitivity and either a very good shuttering system or a very fast light. The second requires some means of capturing successive frames, either with a mechanical device or by moving data off electronic sensors very quickly.
Ordinary camera shutter and transport mechanisms are simply incapable of the speeds required for the ultra high speed work. The shutter is thus replaced by a rotating prism that, as it turns, projects successfully images onto continuously moving film.
APPARATUS: Basically consists of an extremely bright light source, an optical system to reflect light from the subject’s larynx to a camera unit, an ultra high speed camera, an electric or electronic system to operate the light source and the camera, and a pulse generator which provides time marks.
PROCEDURE: The vibrating vocal folds are photographed with the number of frames at a rate which is about 20 to 30 times the fundamental frequency of the phonation. For example, if the vocal folds, vibrating at120 Hz, are photographed at a film speed of 3.000 frames per second, images of 25 phase points are photographed within one vibratory cycle. When one views this film running at a normal speed (24 frames per second), the events are observed in an ultra slow motion. The time dimension is expanded by 125 times. Frame by frame analysis of various parameters demonstrates the vibratory behavior detail
The advantage gained by the very high frame rate is significant. For example, a larynx phonates at the rate of 100 Hz. Each glottal cycle requires 10ms. Standard film speed is 16 fps, meaning that 62.5 ms elapses from start of one frame to the start of the next. In this period, more than six glottal cycles would have occurred. Furthermore the camera shutter is open for the significant portion of the frame to frame period certainly long enough for a full glottal cycle to be completed. The record on the standard speed film will therefore be just as much smear as the blur seen by the unaided eye.
At 4000 fps, however, the situation is different. The frame to frame interval is only ¼ ms, providing 40 successive approximations during each 10 ms period of the 100 Hz phonation. The high speed shutter is open about 40% of the time so there is relatively little blurring of the picture.

Ultra high speed photography using a laryngeal mirror has proven to be an extraordinary means of getting information about vocal cord function. Many improvements and modifications of the original technique have been made to overcome limitations or to meet special requirements.
This technique is very expensive, in terms of the requisite equipment and operating costs, and film analysis is tedious and time consuming. It may not be therefore preferred approach in clinical purposes. It is, however, extremely useful for research and teaching purposes.


Electromyography
Electromyography (EMG) is a technique for evaluating and recording physiologic properties of muscles at rest and while contracting. EMG is performed using an instrument called an electromyograph, to produce a record called an electromyogram. An electromyograph detects the electrical potential generated by muscle cells when these cells contract, and also when the cells are at rest.

INSTRUMENTATION:
        The electromyograph essentially consists of an electrode system, an amplifier, a cathode-ray oscilloscope, a loudspeaker and a recording system. Muscle action potentials are detected by conductors, called electrodes, placed in the region of the elect disturbance.
Electrodes can be divided into 2 comprehensive classes:
1.      Intra muscular i.e. needle or hooked.
2.      Surface
The choice between the 2 types and (among the subcategories within each) depends on the kind of information needed. Electrodes are used in pairs with a differential amplification. The EMG signal is the difference in the voltages seen by 2 electrodes. If the electrodes can be made small and if they can be kept very close together, it is possible to observe the electrical response of the single muscle fiber.
At least 3 important criteria must be met by the electrode:
1.      Must not interfere with or alter normal motor function.
2.      Must be able to move with muscle without generating spurious signals (movements artifacts)
3.      Must be enable in confined area such as oral cavity.  

In examining the disorders of the motor units, a needle electrode should be used. In order to avoid interfering signals from adjacent muscles, a bipolar needle electrode preferable to a monopolar concentric needle electrode.

When a kinesiological pattern of a muscle or that of a set of muscles is examined, hooked-wire electrodes should be used. They have following important advantages:

1.      They offer minimum discomfort to the subject, and consequently do not interfere with normal phonation.

2.      They stay fairly well in place regardless of rapid movements of the vocal folds or the displacements of entire larynx during phonation.

3.      They permit considerable localization of the area from which electrical activity is recorded.

Monopolar needle electrodes simply use the bare tip of a needle as an electrode. 2 monopolar needles are inserted into the muscle for recording of the electrical activity in the zone between them.

PROCEDURE:
This is the only procedure that directly demonstrated muscular activity. It is very useful as one of the methods of the clinical examination of vocal fold paralysis. It can be effectively used in examining patients with functional voice disability. It can be applied to the study and evaluation of muscle and nerve pathology and nerve muscular disorders.
A muscle fiber maintains a steady potential across its membrane (inside negative) at rest. When a nerve impulse arrives at the nerve ending, a chemical transmitter substance, acetylcholine is liberated from the nerve ending on to the motor end phase of the muscle. This induces depolarization of the muscle fiber membrane producing an action potential. The action potential is transmitted along the muscle fiber in both directions at a speed of approximate 4 meters / sec. exciting the contractile mechanism of the filter. The muscle fiber begins to contract after an interval of approximately 1 msec.
When an electrode is placed outside the muscle fiber membrane, action potential can be recorded. Since action potential is very small (0.1 to 1mv), amplification is required in order to record it. The graphic display obtained in this way from several muscle fibers is called an electromyogram. No action potential is recorded in normal resting muscle.
During voluntary contraction of a normal muscle, all the muscle fibers innervated by a single lower motor neuron act together. The tiny action potential of the muscle fibers is summed up and they produce a larger action potential.
Electromyography reveals a great deal about muscle activity, it provides very little information about structural movement. A muscle contracts in a context of many other opposing or augmenting forces.

Insertion of the electrode into the intrinsic laryngeal muscles:
Earlier, needle electrode was inserted into most of intrinsic laryngeal muscle through mouth. But it often makes it difficult for the subjects to phonate normally.
Hirano et al. (1962) described techniques of inserting needle electrode into muscles through the cervical skin.
The method and insertion into each muscle is as follows:
Cricothyroid muscle: The skin is pierced at a pharynx above the lower edge of the cricoid cartilage and lateral to the midline.
Lateral Cricoarytenoid Muscle: The needle is inserted through the cricothyroid space penetrating the cricothyroid muscles anterior to the inferior tubeculum. The needle is directed posteriorly, laterally and upwards until the LCA muscle is pierced.
Posterior Cricoarytenoid Muscles: The procedure is triangular to LCA but needle is inserted some 5-10 mm deeper.
Vocalis Muscle: After topical anesthesia of the laryngeal mucosa, the needle is inserted into the sub glottal space through the cricothyroid space at the midline. It is easy to insert the needle into the vocal fold during phonation than during respiration.
Interarytenoid muscle: Following local anesthesia of the laryngeal mucosa, the needle is inserted into the sub glottal cavity through the cricothyroid space at the midline. The needle is pushed upwards and backwards.

Processing & Display of EMG:
EMG signals are very small ranging from 200 µv / less to about 1 µv. Very high amplifies gain (order of 10,000) is often needed to get an output of sufficient amplitude for recording.
The appearance of the electromyogram will depend on the volume of muscle seen by the electrodes and the nature of task being examined. If the active region between the electrodes is very small (as with intramuscular bipolar electrodes) individual muscle action potentials may be recorded, especially if the muscle is not very active.
The frequency of the spikes of electrical activity can range up to more than 40/sec, making ‘raw’ electromyogram during a speech event. The No. of spikes per sec. is a direct measure of the degree of muscle activation.
Most often the EMG signal is the sum of the electrical action occurring at quasi-random time with respect to each other. This kind of summation results in an interferences pattern.
Surface electrodes always yield such a pattern because many motor units influence them.
Most often, EMG signal are rectified and integrated for analysis. This is the best way of demonstrating the amount of electrical activity. It has been found that the amplitude of the integrated signal is and to the force of an isometric contract. The optimal averaging time will vary with the muscle being observed and the task being evaluated. The longer the averaging time, more individual muscle action potential are leveled to create a smooth curve showing electrical activity over time.
Minor differences in muscle activation occur from one repetition of an utterance to the next. These variations may interfere with an accurate assessment of what a given muscle actually does during a speech task. To mitigate this problem, special computer averaging techniques have been developed. This technique involves sampling many repetition of the same utterance. The EMG data are stored in the computer and are aligned at some reference patients chosen by the user.
The values for each of the samples at each point in time are averaged to produce a final output in which events that is present in every one samples are emphasized, but those that are unique and de-emphasized the significance of electromyography for vocal fold paralysis.
Advantages:
1.      It is helpful in differentiating vocal fold paralysis from immobility of the vocal folds caused by various mechanical fixations.
2.      It gives information about the degree and the extent of paralysis.
3.      It is useful in determining the side of lateral fixation of the vocal fold in cases of bilateral vocal fold paralysis.
4.      It is helpful from a prognostic point of view the presence of action potential induced by voluntary activity indicates a favorable prognosis.


STROBOSCOPY
Stroboscopic examination, or a routine clinical test, is the most practical technique for examination of the vibratory pattern of the vocal folds. It was first developed by Dertel in 1878 in Munich on the human larynx.
Principle: The stroboscopic effect is based on an optical illusion that arises from the persistence of vision. According to Talbot’s law, every light admission to the retina leaves a positive after image for 0.2 secs. A sequence of individual frames presented at intervals shorter than 0.2 sec appears as a continuously moving picture. The naked eye can perceive no more than which distinct images per second. Vocal fold vibration that cannot be resolved by the human eye will become visible for these reasons.
Apparatus: Basically the apparatus consists of a microphone, a light source, an electronic control unit and a pedal. It has at least the following three functions:
1.      To extract the fundamental period of the voice signal and to emit flashes synchronous signal
2.      To vary the phase point when the light flashes.
3.      To indicate the fundamental frequency of phonation.
The light source of the stroboscopic emits intermittent flashes of lights, which are synchronous with the vibratory cycles. When the frequency of flash coincides exactly with the frequency of vibration of the vocal fold (synchronization), the vocal fold seems motionless. If the frequency of the flash is slightly different from the vibratory frequency (asynchronization), the vocal folds will be illuminated in each passage but not exactly at the same movement and entire vibratory cycle can be seen in ‘slow motion’. In videostroboscopy a permanent image of the apparent motion of vocal folds is recorded.

Clinical Process of Stroboscopy
The stroboscopic examination is more precise through a tele-video-endoscope. To obtain stroboscopic frames, we need:
·         A microphone placed on the patient’s neck near the thyroid cartilage.
·         Emission of a Fo to light the stroboscopic lamp.
·         A telescope introduced in the mouth or fiberscope through the nose and an activated foot pedal to control the right ignition. The patient sustains the vowel ‘I’ for at least 2 seconds and various pitches are performed.
Early diagnosis of vocal fold lesion such as soft nodules, vascular pathologies and premalignant lesions can be enhanced. It provides information about Fo, symmetry of bilateral movement, regularity, glottis closure, amplitude, mucosal wave, non-vibrating portion and other findings.
Several parameters that may be evaluated during the course of the stroboscopic examination are:
·         Fundamental frequency: The fundamental frequency is measured by using the strobe unit and used to set the frequency of the light flashes. Strobe light is typically produced at a frequency several hertz slower than the vocal frequency to produce the illusion of a slow-motion vibratory cycle. An identical frequency is emitted in the locked mode that produces a still image of a single portion of the vibratory cycle.
·         Periodicity: Periodicity refers to the regularity of successive vocal motions. Normal vibratory activity is regular and periodic.
·         Amplitude: Amplitude refers to the lateral excursion of the vocal folds during their displacement away from the midline in oscillation. Typical amplitude is approximately one third of the total width of the vocal fold. Amplitude is generally graded as normal, less than normal, or greater than normal.
·         Symmetry: Normal motion of the vocal folds is symmetric, both in vibratory characteristic and in adductory and abductory motion.
·         Glottic closure: In the healthy person, the membranous portion of the vocal folds completely closes during the vibratory cycle. The posterior cartilaginous glottis may remain open (posterior glottic chink) in some healthy people.
·         Mucosal wave: The pattern of light traveling from medio-laterally along the superior surface of the vocal fold during vibration under illumination is referred to as the mucosal wave. It is a correlate of the pliable cover (epithelium and superficial lamina propria) of the vocal fold being displaced relative to the body of the vocal fold (vocalis muscle). Focal abnormalities of mucosal wave help to localize pathology in the vocal fold
·         Non vibrating portion: If there is any portion of the vocal fold which does not vibrate, in other words, which remains immobile during phonation, should be specified.

                                            PHOTOGLOTTOGRAPHY
PRINCIPLE: The principle on which photoglottography rests is simple enough. The glottis is considered as a shutter through which light passes in proportion to the degree of opening. If a light is made to shine on glottis, the amount of light passing through is directly proportional to the glottal area. Optoelectronic devices are adequate to transduce changes in luminous intensity at the rates typical of laryngeal function. It is, therefore, possible to obtain an electrical voltage proportional to the glottal area. Here, glottal area variation is recorded with the use of photoelectric device which converts light intensity into electric voltage. The glottis is illuminated from above or below and the intensity of the light passing through the glottis is measured with a light sensor placed on the opposite side (relative side to the position of the light source).
APPARATUS: The apparatus consists a fiber-optic cable, laryngeal mirror, microphone and foot pedal. The foot pedal controls all functions of the instrument. Stroboscopic illumination is delivered to the laryngeal mirror via a fiber optic cable and the microphone tranduces the fundamental frequency.
WORKING: Sonneson (1959, 1960) first used this device on the human larynx. In his technique a bright DC light source is placed against the neck just below the larynx. The source causes the sub-glottal space to be suffused with the light that filters through the tissues of the neck and trans-illuminates the glottis. A curved light conducting rod is passed through mouth and terminates at the level of epiglottis. A photomultiplier tube is connected to the oral end of the rod.
Coelman & Wendal (1968) recorded photo-electric glottograms simultaneously with ultra high speed motion photography during sustained phonation. They found a significant difference between the glottal waveforms obtained by these two methods. They pointed out the following factors as possible sources of error in photo-electric glottography:
1.      The light density distribution within the vocal folds may not be constant.
2.      The changing cross-sectional area of the vocal folds in an anterior plane may result in an uneven illumination of the vocal folds.
3.      Light reflections from the mucosal surfaces may be variable.
4.      Vertical movements of the vocal folds towards and from the light source are not taken into account.
5.      The location of the monitoring devices causes different waveforms.

ULTRASOUND GLOTTOGRAPHY/ ULTRASONOGLOTTOGRAPHY/ ECHOGLOTTOGRAPHY
Ultrasonic waves, that is, high frequency sound waves (1-10 MHz), can pass through various kinds of media, including body tissues. This basis can be used for observation of larynx. But the side of the vocal folds, their location, the complexity of their movements and the small distances they transverse during phonation create very special difficulties in adapting ultrasound for laryngeal examination.
In this technique, vocal fold position can be tracked using the same principles as SONAR. The process is also called as echoglottography. During phonation, the velocity of the edges of the vocal folds is moderately great. Tracking them adequately requires that the ultrasound pulses be very short ant their repetition rate high. Thus, special instruments have been designed (Holmer, Kitzing and Lindstrom, 1973) that provides up to 10,000 pulses per second and offer characteristics for time-motion displays. While the opening and closing phases are discernible, the closed and open intervals are not well understood.
Unfortunately, the edge of the vocal fold does not move as a single flat reflecting plane (Saito, Fukuda, Isogai and Ono, 1981). The complexity of the changes in its shape creates very confusing echoes. Thus, clear interpretation of the electroglottogram is not likely to be a simple matter. Since the edges of the vocal folds constitute a very small target, the ultrasonic beam must be very narrow and well-defined. The law of physics conspires to make such a beam very difficult to obtain.
VIDEOKYMOGRAPHY
A recent development in the techniques of the visualization of the larynx is that of videokymography. This technique trades off spatial resolution to improve time resolution when imaging laryngeal movement. Thus, instead of obtaining a relatively large image of the larynx at 50 or 60 frames per second, this technique images a single line at frame rates of approximately 8000 frames per second. Because of the high speed imaging, this technique also does not require stroboscopic illumination for viewing vocal fold movement. Videokymography can also provide information about the movement of the upper margin of the vocal folds, the mucosal wave and asymmetry between the left and right vocal folds, open quotient differences along the glottis, and sometimes in the closing phase, the lower margin of the vocal folds.
Videokymography was developed as a means of using television technology to accomplish some of what ultra-high speed filming can achieve. Videokymography sacrifices two-dimensionality in order to gain speed. It does that by ignoring all of the field of view and limiting scanning of the endoscopic image to rapid repetition of single line. Each new scan of the same line in the field of view is displayed on the screen just under the previous scan, so that a screen image is built up, with time (advancing downward) as the vertical dimension.
        The principle of the videokymography is that each frame in NTSC system is comprised by 525 horizontal lines. These lines are read point to point, and successively, starting from the left upper side and finishing on the right lower point of the frame. Then, CCD camera system reads these 525 lines in two groups of 262.5 lines each, alternatively. Thus, the first line of the first group is followed by the first line of the second group, and then the second line of the second group, and so on. These groups of lines are known as field A and field B. Videokymography discards field B and reading is made in each frame, only in field A. However, rather than reading all 262.5 lines in the whole field A, the system reads only one of these lines and then it is read 262.5 times in thirty parts of a second, which represents approximately 7,812.5 readings of the same line in one second.

Standard vs. High-Speed Mode

     The videokymography camera offered by KayPENTAX functions in either standard or high-speed mode. The standard mode presents a black-and-white composite video image. In standard mode, the frame rate is 60 Hz (NTSC) or 50 Hz (PAL). This mode is used to properly position the endoscope for data recording during the high-speed mode.
In high-speed mode, the camera scans a single line from the standard image at a rate of nearly 8000 lines/second. With each line displayed on the monitor in succession, a time history representing successive glottal cycles is produced. Clinicians may position the endoscope to “select” which portion of the vocal folds (e.g., middle, anterior commissure, etc.) is observed. A foot pedal allows the clinician to easily switch between the standard and the high-speed modes. KayPENTAX also offers a switch/distribution system (with S-Video/BNC input and S-Video/BNC outputs) to allow easy switching between the standard color camera used for stroboscopy and the black-and-white videokymography camera.

Complement to Stroboscopy

Given its unique capabilities, VKG is the ideal complement to stroboscopy. Although the full screen display of the VKG “image” constituted of single lines is not as intuitive as stroboscopy, VKG does allow direct viewing of vocal fold behaviors which may not be observable with a stroboscopic image. For example, the high scan rate of  VKG allows the direct observation of vocal fold motion, even if the motion is aperiodic. Thus, voicing initiation, diplophonia, biphonia, vocal fry, creaky voice, and aperiodicity can all be viewed directly. Even in normal quasi-periodic phonation, vocal asymmetry and mucosal waves are clearly visible with this powerful technique.
VKG promises to fill a key role in broadening the understanding of phonatory dynamics. The VKG System complements the complete KayPENTAX line of voice instrumentation used by clinicians and researchers throughout the world. KayPENTAX offers competitive pricing and excellent support for all of its products.

A drawing of an idealized VKG image of two glottal
 cycles with key features labeled.

 

 

The two modes of the VKG camera are shown above. The standard mode displays a black-and-white video image for proper orientation. In high-speed mode, a single line selected from the standard image is displayed approximately 8000 times per second.

 
 
 
The three VKG image above are examples of how this technique can be used for viewing vocal fold dynamics regardless of phonatory behavior. Examples are shown of asymmetrical vibration (left), onset of phonation (center), and aperiodic phonation (right).
 
Differences between Videokymography and other Digital
High-Speed Imaging Systems
     The ability of providing both the standard as well as high-speed images of the vocal folds  between which the system can be immediately switched) distinguishes the VKG system from the (high-speed) linear cameras that are able to deliver only line-images. The two modes make the VKG system powerful and more practical for carrying out meaningful laryngeal examinations. In contrast to the high-speed digital imaging systems that provide full (laryngeal) images at high-speed, the high speed image rate is achieved at the expenses of reduced spatial information in videokymography. There are advantages as well as 58 disadvantages of the VKG approach when compared to (full-image) digital high-speed systems.
 
 
Advantages of Videokymography
The advantages of VKG in contrast to (full-image) high-speed systems may be listed as
Follows:
·         VKG is significantly less expensive (the equipment, as well as the storage costs per recorded time)
·         Lesser amount of data has to be stored and processed.
·         The duration of the recording samples is virtually unlimited (especially when VCR and Video tapes are used)
·         The format of the image information (CCIR or NTSC television standard) ensures that VKG works with standard, commercially available video equipment (standard video Monitors, VCRs, etc.; generally the same video equipment as in videolaryngostroboscopy is used for VKG)
·         It offers an excellent spatial resolution (768 pixels/line in CCIR, c.f. usually 256 pixels/line is used in today’s full-image high-speed systems)
·         It offers an excellent image rate (7812.5 images/s in CCIR; c.f. 1000 – 2000 images/s is the most frequently used rate in today’s full-image high-speed systems).

Disadvantages of Videokymography
The disadvantages of VKG, as compared to the (full) high-speed imaging systems, arise mainly as a consequence of the fact that only a single image line is monitored in VKG. These are:
1) Lack of the full image in the high-speed mode
2) anterior-posterior phase differences in the vocal-fold vibration are not registered
3) The measuring position has to be selected and properly adjusted before recording using the standard mode of the camera.
4) Gross movements of the larynx could make the recording position inaccurate; certain disadvantages are related also to the use of the CCIR (or NTSC) TV standard
5) VCRs and PCs often process and display two VKG images simultaneously, in an interlaced form.
6) The standard television format requires certain time interval within each video field to be reserved for synchronization purposes (ca. 2 ms per 20 ms in CCIR) and that interval cannot be used for the image information. These information gaps slightly complicate image analysis of longer passages of the resulting VKG signal.
NON-INVASIVE PROCEDURES:
1.      Inverse Filtering

The source-filter theory of speech production provides theoretical background for the inverse filtering technique. If the transfer function of the vocal tract filter is known, an inverse filter can be constructed. In principle, the glottal excitation signal can then be reconstructed by feeding the speech signal through the inverse of the vocal tract filter.

In practice, the transfer function of the vocal tract filter can be approximated based on the speech signal and general knowledge about the voice production mechanism. An approximate inverse filter can then be constructed. Applying the inverse filter to the speech signal yields an estimate of the excitation signal, the glottal volume velocity waveform. This signal is also known as the flow glottogram (FGG) (Hertegård et al., 1992; Hertegård & Gauffin, 1995).

Inverse filtering was first presented by Miller (1959), who applied analog electronic filters to cancel two lowest formants and the lip radiation effect from a speech signal captured by a microphone.

Rothenberg (1973) introduced a different inverse filtering technique that uses the air flow at the mouth as the source signal. The subject’s mouth and nose are surrounded by a special mask for measuring the flow waveform. This method allows the estimation of absolute flow values including the DC component, as opposed to the inverse filtering of the pressure signal captured by a microphone, which loses the absolute zero level of flow due to the lip radiation effect. Rothenberg’s technique is also less sensitive to low-frequency noise.
However, the flow measurement mask causes an upper bound on the useful frequency range at approximately 1.6 kHz (Hertegard & Gauffin, 1992).

Successful inverse filtering is sensitive to phase distortion in the speech signal in the
frequency range of interest. Traditional tape recorders are problematic for signal storage in this sense since they cause substantial phase distortion, which must be compensated for (Childers et al., 1983). This problem has been overcome by tape recorders utilizing frequency modulation (FM), which fulfills the requirement of phase linearity (Miller, 1959). Digital filtering techniques provide obvious advantages over analog techniques. According to Hunt et al. (1978), a digital inverse filtering approach was applied to speech already by Holmes (1962). Since the 1970’s, inverse filtering has been increasingly realized using digital techniques (Hunt et al., 1978; Javkin et al., 1987). Nowadays, practically all inverse filtering methods in use are digital due to the flexibility, repeatability, and ease of implementation of the digital techniques compared to analog filters. Digital sampling and storage techniques also do not have the phase distortion problem, provided that the equipment is of high quality and the frequency range of flat amplitude response and linear phase response extends to low frequencies.

Digital inverse filtering methods can be categorized to manual and automatic techniques. Manual methods require the human operator to manually adjust filters to match the formants of the speech signal, whereas automatic methods build a vocal tract model and automatically find filter parameters, often by means of LPC analysis (Hertegård et al., 1992). There are also semiautomatic methods that lie somewhere between these two extremes. For example, the method proposed by Alku (1992) basically finds the vocal tract filters automatically but the user still controls a few parameters that affect the resulting flow signal. Södersten et al. (1999) compared an automatic and a manual inverse filtering method and reported high agreement between the airflow parameters calculated from the flow signals of these two methods. However, noticeable differences were also encountered. Inverse filtering basically involves extracting two signals, the volume velocity waveform
at the glottis, and the effect of the vocal tract filter, from a single source signal. The technique thus implies strong assumptions about the glottal volume velocity waveform and the transfer function of the acoustic vocal tract filter. Consequently, the result of inverse filtering has to be regarded as an estimate of the glottal flow. The actual volume velocity waveform at the glottis is not known exactly.

Furthermore, the accuracy of inverse filtering deteriorates if the fundamental frequency of speech is high because the sparse harmonic structure of the excitation spectrum  nterferes with formants, which are local resonances in the spectrum. Nasalized vowels are also not suitable for inverse filtering because their spectra contain antiformants that are difficult to compensate for properly (HertegÃ¥rd et al., 1992).

Despite these limitations of the method, inverse filtering has proved to be a valuable tool both for clinical use and for fundamental research of the voice production mechanism (Fritzell, 1992). It is a non-invasive technique that does not require bulky or expensive equipment. The restrictions of an application may make inverse filtering the only practical means of examining the voice source of a subject.

Figure shows a typical example of a speech pressure signal and the corresponding
inverse filtered glottal flow waveform.
Figure: Speech pressure waveform of a female speaker’s sustained /a/ vowel
and the corresponding inverse filtered glottal flow waveform.

2.      Electroglottography
Electroglottography (EGG) is a non-invasive method for the examination of the vocal fold vibration. According to several authors (e.g. Colton & Conture (1990); Baken (1992); Henrich et al. (2004)), the method was first reported by Fabre (Fabre, 1940, 1957). Now it has been used for clinical and research purposes for decades.

Electroglottography is based on measuring impedance across the neck of the speaker. When the vocal folds are closed, electric current can pass through them. When the folds are apart, an insulating air gap separates them, and the impedance across the larynx is higher. Thus, the impedance changes across the larynx indicate the variation of the contact area between the focal folds.

Electrodes are placed on the subject’s skin on each side of the larynx and a high-frequency alternating current is fed through them in order to measure the impedance between the electrodes. The frequency is typically in the megahertz region and the current is limited to a few milliamperes to ensure that the electric current is imperceptible and harmless to the subject (Baken, 1992). The voltage between the electrodes is typically about 0.5 volts (Marasek, 1997).

The resulting electroglottographic signal, the electroglottogram, shows the impedance variation as a function of time. Impedance variation due to vibrating vocal folds is relatively small, typically only 1–2 percent of the total measured impedance (Baken, 1992).
Furthermore, the impedance varies considerably due to changing skin moistness and vertical movements of the larynx. Therefore, high-pass filtering is applied to the obtained electroglottographic signal in order to eliminate low-frequency noise and to extract only the variations caused by vocal fold vibration. Additionally, automatic gain control is often built into EGG devices to maintain appropriate signal level despite considerable impedance
changes between subjects and also during a single recording session. These techniques cause phase and amplitude distortion that may influence the EGG waveform (Scherer et al., 1988, page 291). Consequently, the EGG signal cannot be considered an absolute measure of vocal fold contact, and care must be taken when interpreting the signal.

Despite its limitations, EGG yields useful information about the behavior of the vocal folds during phonation. Electroglottography has been studied widely and its validity has been assessed by numerous studies comparing EGG with stroboscopic methods, high-speed imaging, photoglottography, subglottal pressure measurements, and inverse filtering, see Henrich et al. (2004) for references. The results show convincingly that the EGG signal is related to the contact area between the vocal folds.

Figure shows a typical example of a high-quality electroglottogram recorded during phonation. It has been high-pass filtered to eliminate the low-frequency components that are not related to the vibration of the vocal folds.
Rothenberg (1981b) presented a model of the different phases of the EGG signal period and their relations to the physiological events occurring in the larynx. This model is presented in Figure 3.6. Other similar models exist, see e.g. Childers et al. (1983). Such models are, however, idealized simplifications that must not be interpreted literally. Many authors have pointed out that the EGG signal does not allow exact determination of the instant of closure, and locating the instant of glottal opening from the EGG signal alone is even much more inaccurate, see e.g. (Colton & Conture, 1990) and (Baken, 1992). Titze introduced a mathematical model that describes the vibration pattern of the vocal folds and predicts the contact area variation, see e.g. Titze (1990).
Figure: Electroglottogram of the normal phonation of a male subject. The upper panel shows the EGG signal and the lower panel its first derivative. Upward change in the signal represents decreasing impedance and thus reduced contact between the vocal folds.

A number of geometric and kinematic parameters are used to describe the shape and movements of the vocal folds, and the model gives the corresponding contact area waveform. The model explains many features of the contact area waveform by relating them to the physiological pattern of vocal fold vibration: EGG pulse widening is caused by adduction of the vocal folds, and peak skewing is related to wedge-shaped vocal folds and vertical phase difference. A knee in rising and falling edges of an EGG pulse corresponds to the bulging of the contact surfaces of the vocal folds. Vertical phasing also explains the variation of the pulse waveform between a triangular and a rectangular shape. Varying characteristics of real EGG pulses can be explained as combinations of these effects.
By comparing the EGG waveform with high-speed filming, Childers et al. (1983) related the initial point of vocal fold contact to a break in the negative slope of the EGG waveform, and the glottal opening to the instant at which the differentiated EGG (DEGG) waveform has its absolute maximum. Such peaks of DEGG are clearly visible in Figure . This approach was carried on by Henrich et al. (2004), who regarded the peaks of the DEGG signal as reliable indicators of glottal opening and closing instants defined by reference to the glottal air flow. However, often such peaks are imprecise or absent, or double peaks occur. All these cases make this approach unusable.

In addition to resistance across the neck, the impedance measurement is also influenced by reactance (capacitance or inductance) of the examined load. Varying capacitance may be hypothesized to exist in the glottis when the two vocal folds are separated by a thin insulating layer of air, as pointed out by Rothenberg (1981b). This hypothesis can be checked by changing the frequency of the alternating current used for impedance measurement: the current remains unchanged only if the load is purely resistive. According to Gauffin (Scherer et al., 1988, page 291), the impedance is essentially resistive in a wide frequency range.
Figure : The phases of the EGG signal period and their relations to the glottal air flow and physiological events. The figure illustrates the Rothenberg model (Rothenberg, 1981b). 1–2: Vocal folds are maximally closed. 2–3: Vocal folds are separating from lower margins towards upper margins. 3–4: Upper margins are opening. 4–5: Upper margins are still opening. Changed slope is due to phase differences along the length of the vocal folds. 5–6: Vocal folds are fully parted. The distance between the vocal folds is varying but there is little change in contact area. 6–7: Lower margins are closing with a phase difference along the length of the vocal folds. 7–1: Vocal folds are closing from lower margins towards upper margins. The flow pulse begins closely after point 3 and terminates closely before point 7.


JOURNAL ARTICLES:
Deviant vocal fold vibration as observed during videokymography: The effect on voice quality
Journal of Voice, 2001
Aim: To compare videokymographic image sequences with the synchronized acoustical speech signal of 4 patients to obtain more insight into the effect of deviant vocal fold vibration on voice quality as observed during videokymography. 

Method: Videolaryngoscopic images of the larynx and videokymographic images of vocal fold vibration were recorded using a rigid telescope, a continuous light source, a charge coupled device black and white camera in normal mode and in kymographic mode. Simultaneously with the videokymographic recordings, the acoustic signal was recorded on the audio track of the video recorder. Videokymographic and acoustic recordings were digitized.
Results: Observations in this study showed that comparison of videokymographic images with the speech signal gives the objective evidence of dynamic voice events.
Also, the improvements in diagnosis and early detection of laryngeal disorders can be anticipated as a result of videokymographic imaging.   

Effects of topical anesthetic and flexible fiberoptic laryngoscopy on professional sopranos
Journal of Voice, 2005
Aim: This study examined the acoustic and perceptual  effects of topical anesthetic and flexible fiberoptic laryngoscopy against a control conditions on the singing voices of ten professional sopranos.
Method: Each participant completed four musical tasks for each experimental condition, 12 bars of an aria, two scales and a messa di voce exercise at 523 Hz.
Results: This study indicates that the young, highly trained operatic sopranos, in the presence of their opera pedagogue, can generally sing demanding operatic arias during anesthesia of the nasal cavity and flexible fiberoptic laryngoscopy without effects on the energy in the singing voice, vibrato rate and extent and the vocal range.
However, few participants might show a reduction in the level of energy in the formant region, which was speculated to be associated with the levels of psychological coping that the singer possessed in the demanding situation.

Strobovideolaryngoscopy: results and clinical value
Annals of Otology Rhinology and laryngology, 1991
Aim: To determine whether the additional experience of Strobovideolaryngoscopy has altered the clinical usefulness of the procedure.
Method: Diagnoses were noted before and after stroboscopy prospectively for 377 strobovideolaryngoscopy procedures performed during the calendar year 1989.
 Observations were recorded about voice quality, laryngeal color, vocal fold motion, structural abnormalities, supraglottic muscle function and other findings, symmetry, periodicity, glottic closure, amplitudes, waveforms and nonvibrating segments.
Results: The procedure has proven very helpful in caring for voice patients, modifying diagnoses in 47%, and in confirming uncertain diagnosis in many of the other patients studied. It is also helpful in documenting normal vocal fold function in cases of psychogenic dysphonia or malingering. 


The value of laryngeal electromygraphy in the evaluation of laryngeal motion abnormalities
Journal of Voice, 2006
Aim: To investigate the clinical utility of laryngeal EMG  as a diagnostic aid in the evaluation of movement disorders of the larynx in patients complaining of dysphonia.
Method: A retrospective chart review of all patients who presented to a university-based tertiary laryngology referral center for evaluation of dysphonia over the course of 13 month period was performed.
Results: Laryngeal EMG is a useful adjunct to the diagnosis and management of motion abnormalities in the larynx in patients who present with dysphonia.
Laryngeal EMG findings can affect the treatment plan in more than one half of the patients who present with a voice disorder and who are found on examination to have abnormal or asymmetric adduction, abduction and laryngeal tension. 


References:
·         Clinical examination of voice- Hirano et al
·         Clinical measurement of speech and voice- Baken et al
·         Clinical measurement of speech and voice- Baken & Orlikoff
·         Voice treatment for children and adolescents- Andrews & Summer
·         Analysis of Human Voice Production Using Inverse Filtering, High-Speed Imaging, and Electroglottography - Hannu Pulakka

Comments

Popular posts from this blog

PERFORMANCE INTENSITY PHONETICALLY BALANCE (PIPB)

THE BUFFALO MODEL

SPECIAL TESTS OF DIAGNOSTIC AUDIOLOGY