If you are completing the transcription training process, ignore steps 1-4, since the folder will contain the samples and templates that you will work from.

Go to Connected Speech Data https://utexas.box.com/s/uz6206lel0544c3auou0nsw57egcv5q6
Choose either Therapy Trial or Observational depending on the sample type
Choose Spanish, Catalan, or English, depending on the sample language that you are formatting
Choose clipped audio of tasks
Choose the task that you are formatting. For example, if it's the WAB picnic description, choose S3_PicnicScene_Picture Description.
Go to "Taskname_formatted_for_clan." For example, if it's the WAB picnic description, choose 2. PicnicScene_formatted_for_clan
Once in that folder, you should see the template named according to the task. For example: CODE001_BACC001_PicnicScene_Spa_Timepoint_YYYYMMDD
This template helps you name the file and contains the headers that you will need for the transcription.
Copy paste the whisper transcription in this .cha file

File format for Connected Speech samples

1. 1. You will need to fill out some fields in the headers of the file (headers are the first few lines in the document starting with @)
2. Preserve the format of the headers
  1. There should be one tab after the colon in each header, so if it gets deleted, put a single tab back in
  2. There are a specific number of "pipes" (this symbol: |) in the @ID header, so don't delete any
  3. Don't add spaces in any of the fields where there are none in the template
  4. Fill out the fields in the headers:
    1. @Language
      1. This should be included in the template according to the language you previously chose. If not, enter the language of the sample (in lowercase):
        spa (Spanish)
        cat (Catalan)
        eng (English)
    2. @Participant Code: Enter the participant's code here, e.g. BISE016. You will also need to enter the correct BACC### (Not applicable for local participants. If it's a local participant, only enter participant code). You can find this information in the Connected Speech Data Analysis Smartsheet. https://app.smartsheet.com/sheets/q86RQPjgG33Q7c3PjxqG4P64c37JrJ3gjcfr5g71
    3. @ID: Timepoint: Enter LRT or VISTA or OBS if Participant's code name is BILP or BISE respectively. Also, add the timepoint (e.g., pre, mid, post, etc.).
      LRT (BISD, BILP)
      VISTA (BISE)
      OBSERVATIONAL
      (OBS)
      LRT_Pre VISTA_Pre Obs_1
      LRT_Mid VISTA_Mid Obs_2
      LRT_Post VISTA_Post
      LRT_6m VISTA_6m
      LRT_12m VISTA_12m
      BISD (Bilingual Semantic Dementia), BILP (Bilingual Logopenic), BISE (Bilingual Speech Entrainment)
    4. @Time Duration: Enter the start and end time of the sample
    5. If you used a timer, enter 00:00:00 for the start
    6. Preserve the format of the time as indicated
    7. Note the timecode of the video at the onset of the first word a participant says after the clinician prompt (excluding any words you are omitting from the beginning, as discussed above)
    8. Note the timecode at the offset of the last word the participant says on the script topic/picture description
    9. If the clinician redirects or re-prompts during the probe, omit the duration of this from the total duration
  5. Name of script/sample
    1. After the "comment" header, type the name of the script topic, or type the title of the discourse sample, e.g., "PicnicScene" for the WAB picnic description
      1. @Comment: PicnicScene
        @Comment: CatRescue
        @Comment: ImportantEvent
3. After the Comment header, the transcription starts. Each utterance has to appear after *PAR: and the text it needs to be a TAB (so it's the larger spacing that needs to be present). Do not leave a space between *PAR: and the text.

Filename Format for VISTA Samples

1. 1. The transcription file should be saved in the following format: CODE###_BACC###_TaskName_Language_Timepoint_Date
    1. For the Language it should be Spa pending:
      revise Spanish transcriptions and change naming from "Span" to "Spa" for Spanish, Cat for Catalan, or Eng for English
    2. For the timepoint, it should be: Pre, Mid, Post, 6m, 12m followed by the number of the probe
    3. You will find the date in the date of administration column in the Connected Speech Data Analysis Smartsheet. It is very important that you carefully follow the format of YYYYMMDD. https://app.smartsheet.com/sheets/q86RQPjgG33Q7c3PjxqG4P64c37JrJ3gjcfr5g71
    4. NameofScriptORDiscourseTask
      1. NameofScript For VISTA, for example, the second script probe of the script "My Hobbies" during post-treatment for SE001 would be:
        CODE###_BACC###_NameOfScript_Language_Timepoint_YYYYMMDD
        For example: BISE018_BACC001_MyHobbies_Spa_Post_20240525
      2. NameofDiscourseTask should be:
        CODE###_BACC###_Taskname_Language_Timepoint_YYYYMMDD
        BILP022_BACC002_PicnicScene_Cat_Pre_230517.cha
        BISE011_BACC004_CatRescue_Spa_Obs1_230925.cha
    5. !! Don't include spaces in the filename
    6. For local participants (not Barcelona), follow this file naming format: (CODE001_Taskname_Language (Spa, Eng)_Timepoint_Date)

Please let us know when this is complete and update the Connected Speech Data analysis smartsheet

MADR Lab Wiki

4. Formatting transcription process (transcribers)

Analytics

File format for Connected Speech samples

Filename Format for VISTA Samples

Related content

LRT (BISD, BILP)	VISTA (BISE)	OBSERVATIONAL (OBS)
LRT_Pre	VISTA_Pre	Obs_1
LRT_Mid	VISTA_Mid	Obs_2
LRT_Post	VISTA_Post
LRT_6m	VISTA_6m
LRT_12m	VISTA_12m