9. Connected Speech Reliability
Overview of the Reliability Transcription Process for Connected Speech Samples
To ensure the reliability of transcriptions for connected speech samples, the following structured process will be implemented:
Randomized Sample Selection:
A subset equivalent to 10% of the total transcription samples will be randomly selected for reliability testing. This selection will be project-specific (for example DrSantos_Spa_WABs_FTLD_2024, Kesha_nfvPPA_lvPPA_2024).Reliability procedures for transcribers:
Once the samples are selected, transcribers will complete the following steps to begin the reliability process.
1. Reliability procedures for transcriber 1:
Copy the original transcript from its folder to this folder: 3. Picnic Scene_Rely (transcriber 1)
Edit the title of the transcript by adding Coded_Rely AND your initials (e.g. SMK): CODE001_BACC001_CatRescue_Spa_Pre_20230914_Coded_Rely_SMK.cha
2. Reliability procedures for transcriber 2:
Locate the audio files in the folder/link to the audio files: 1. S3_PicnicScene_Picture Description_Audios_Reliability
Locate the whisper output files in this folder: 2. PicnicScene_whisper_output_Reliability
Create transcription (.cha) file by copying the template that exists within this folder: 4. Picnic Scene_Rely (transcriber 2)
Copy the whisper output CODE001_BACC001_PicnicScene_Spa_Pre_20230914.txt and paste it in the template (.cha)
Fill out some particular fields in the headers of the file (@) (language, participant code, etc.,)
Segment in utterances following the transcription protocol rules CHAT
Code with the transcription protocol rules
Use CLAN to detect typos or spelling mistakes (command CHECK and command MOR)
Save in 4. Picnic Scene_Rely (transcriber 2) and make sure naming is correct by adding Coded_Rely AND your initials (e.g. AQ): CODE001_BACC001_CatRescue_Spa_Pre_20230914_Coded_Rely_AQ.cha
3. Running Rely Initial Comparison for Reliability:
Run transcriber 1 and transcriber 2 Coded_Rely samples through RELY
Save RELY output to this folder: 5. Picnic Scene_Rely output. The name of the file should be the original file name (excluding Coded_Rely_Initials), as provided here: BISE010_PreTX_PicnicDescription_Castellano.rely
Add score in Smartsheet CS Data Analysis in the 2 columns (report)
Spanish CS Data Analysis Report: https://app.smartsheet.com/reports/qJP8g3XRmJ2mqw5RgWwrxVCjxHFR2Xp3Pqg5MPf1
Catalan CS Data Analysis Report: https://app.smartsheet.com/reports/96Xjp4G54hwCXM3pVVHP5xcrvVFfPXgh9wq49g81
The selected samples will be assessed for transcription accuracy using the CLAN software and using Rely (How to use RELY on CLAN). Two key metrics will be evaluated:
Percentage of Utterances with Matching Codes: Calculated as the proportion of all utterances where the assigned codes match perfectly between transcribers.
Percentage of Words with Matching Codes: Calculated as the proportion of individual words with identical coding across transcriptions.
Discrepancy Resolution:
If 100% agreement is not achieved:A meeting will be scheduled between the involved transcribers to review discrepancies. It’s recommended to have this meeting after running through Rely all the samples used for reliability.
Consensus will be reached on the appropriate transcription for all disputed elements. The date of the meeting will be added in CS Data Analysis (example: https://app.smartsheet.com/reports/96Xjp4G54hwCXM3pVVHP5xcrvVFfPXgh9wq49g81)
Finalization of Reliable Transcriptions:
Following the consensus meeting:Edits will be made to the original transcription located in its original folder (e.g. B--Connected Speech_Data) based on agreed-upon revisions in the same meeting.
The transcription will then be considered finalized for reliability and all the samples of that project will be considered reliable enough to extract the linguistic measurements.
Documentation of Process:
All reliability calculations and consensus resolutions will be documented in the Connected Speech Reliability Smartsheet Reports for project records, contributing to transparency and repeatability in the transcription process.
Overview of Reliability Folder Structure:
See scheme here: Connected Speech Diagram
How to use RELY on CLAN
RELY Function 4:
“The fourth function of the RELY command is to estimate the overall match between two transcripts on the main line. It is very difficult to define this type of comparison precisely. Instead, RELY uses a rough-and-ready "bag of words" comparison method that simply looks at the overall match of the main line items in the two versions. The command for this type of analysis adds the +d switch, and the output is the percentage of overall overlap.” (This is directly sourced from the CLAN manual: https://talkbank.org/manuals/CLAN.pdf)
This function provides two output calculations:
% of all utterances with matching codes
% % of all words examined with matching codes
How to run this function:
Step 1:
Prepare Files: Make sure you have two coded files for comparison (e.g., two raters coding the same language sample) in the CLAN-compatible .cha format. Since the files should have the same title, it is helpful to modify each file name to differentiate between the two.
Step 2:
Type in the following code into the command window: rely +d sample.cha samplea.cha
Select “file in” and add the two coded files you wish to process. Make sure that the CLAN more library is linked on BOX. Here are the instructions to complete this step.
Once you’ve done so, select “done”.
Step 3:
After running, RELY provides a report showing percent agreement across your chosen categories. This breakdown helps identify areas where raters agree most and where further clarification might be needed.
Here is an example of what the reliability report looks like:
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.