Spanish Linguistic Pipeline – Guide

Anchor
_Toc172023296
_Toc172023296
1. Access TACC Account

Access the shared credentials to enter TACC Analysis Portal

Go to https://stache.utexas.edu/
Enter with your UT credentials
Click on “secret”

1.1

Access TACC Analysis Portal https://tap.tacc.utexas.edu/jobs/

Enter Dr. Grasso’s account information generated by Stache and log in.

1.2

Submit a job on TACC by clicking on the dropdowns and selecting:

Lonestar 6
Jupyter Notebook
DBS23006
vm-small
Nodes 1; Tasks 1
Job name (can be anything, we will use SpanishLing_Trial in this tutorial guide)
Time limit (I will use 2 hours in this tutorial guide)
Click on “submit”

*Note. If you're struggling to get nodes on the vm-small queue, I'd recommend trying try the development queue. This applies to everything except transcription (where you should try gpu-a100-small, followed by gpu-a100-dev, followed by gpu-a100).

Anchor
_Toc172023297
_Toc172023297
2. Enter Jupyter

If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps:

Click on “connect”
Click on “work”
Click on “MADR_LingFeatPipelines”

If there are no available nodes (picture B), you will have to wait in a queue until it’s available.

Anchor
_Toc172023298
_Toc172023298
3. Create a folder

Click on “new”
Click on “folder”
Name folder. In this tutorial guide, I will use the name of InputFolder_Trial

If you cannot see the folder you created, click on “last modified” a couple of times. Sometimes it doesn’t update immediately.

Anchor
_Toc172023299
_Toc172023299
4. Upload input file

Enter the folder you just created
Click on “upload” and upload the input files

Anchor
_Toc172023300
_Toc172023300
5. Go to the terminal

Once the files have been uploaded, click on the new menu dropdown
Click on terminal

Anchor
_Toc172023301
_Toc172023301
7. Type the commands

Once you are in the terminal, follow these steps (if running several audio files):

Type cdw, press enter
Type cd MADR_LingFeatPipelines,press enter
Type conda activate spanishPipelineWithSpacyV3_6, press enter
Type or copy the command below, then press enter:

python spanishLingPipeline.py InputFolder_Trial/ spanishLing_TrialspanishLingTrialSupplementary_Output

Keep in mind that the red and green sections change depending on your input and output files. The purple section always remains the same.

The parts circled in red, green, and gold change depending on the name of your input and output files.

InputFolder_Trial is the name of my input file. Your command will change depending on the name you give your file. Remember that the exact ortography must match. If the name of the file that you uploaded is all in lowercase, the command in the terminal must be all in lowercase.

spanishLing_Trialand spanishLingTrialSupplementary_Output are the names is the name of the output file and output folder. You can change this part of the command (in the terminal) depending on the name you want to give your output file and folder.

Anchor
_Toc172023302
_Toc172023302
8. Run the python script

After writing your command and pressing enter, wait a few seconds. You will know when it’s done running when you see this at the bottom of the terminal (see picture, circled in red).

Anchor
_Toc172023303
_Toc172023303
9. Find output files

Once the files are finished running, follow the next steps:

Go back to the notebook
Click on the refresh button (if needed)

If the generated files did not pop up, click on “last modified,” sometimes it takes a minute to update.

You will 1 output file in coma separated value version and 1 folder (see picture).

There will be two outputs: a spreadsheet named spanishLing_Trial_LinguisticFeatures.csv and a folder named PosMorphDepTree_spanishLing_Trial

Anchor
_Toc172023304
_Toc172023304
10. Download output files

First, we want to download the linguistic features. Check the box with the csv file and click on download (picture A).
Then, enter the supplementary output folder and download the files one by one (picture B).

Anchor
_Toc172023305
_Toc172023305
11. Clear cache and log out

Clear cache by typing the code rm -r ~/.cache/
Press enter
Then, log out from the terminal by typing the command logout(see picture A)
Press enter
Go back to the original TACC page and click on “end job”(see picture B).

IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. THE JOB WILL KEEP RUNNING UNLESS YOU COMPLETE THIS STEP.

...

Version	Old Version 3	New Version 7
Changes made by	Stephanie Grasso	Kesha Pugalenthi
Saved on	Sep 06, 2024	Oct 16, 2024

Versions Compared

Key

Anchor
_Toc172023296
_Toc172023296
1. Access TACC Account

Anchor
_Toc172023297
_Toc172023297
2. Enter Jupyter

Anchor
_Toc172023298
_Toc172023298
3. Create a folder

Anchor
_Toc172023299
_Toc172023299
4. Upload input file

Anchor
_Toc172023300
_Toc172023300
5. Go to the terminal

Anchor
_Toc172023301
_Toc172023301
7. Type the commands

Anchor
_Toc172023302
_Toc172023302
8. Run the python script

Anchor
_Toc172023303
_Toc172023303
9. Find output files

Anchor
_Toc172023304
_Toc172023304
10. Download output files

Anchor
_Toc172023305
_Toc172023305
11. Clear cache and log out

Content Comparison

Versions Compared

Key

Anchor_Toc172023296_Toc1720232961. Access TACC Account

Anchor_Toc172023297_Toc1720232972. Enter Jupyter

Anchor_Toc172023298_Toc1720232983. Create a folder

Anchor_Toc172023299_Toc1720232994. Upload input file

Anchor_Toc172023300_Toc1720233005. Go to the terminal

Anchor_Toc172023301_Toc1720233017. Type the commands

Anchor_Toc172023302_Toc1720233028. Run the python script

Anchor_Toc172023303_Toc1720233039. Find output files

Anchor_Toc172023304_Toc17202330410. Download output files

Anchor_Toc172023305_Toc17202330511. Clear cache and log out

Anchor
_Toc172023296
_Toc172023296
1. Access TACC Account

Anchor
_Toc172023297
_Toc172023297
2. Enter Jupyter

Anchor
_Toc172023298
_Toc172023298
3. Create a folder

Anchor
_Toc172023299
_Toc172023299
4. Upload input file

Anchor
_Toc172023300
_Toc172023300
5. Go to the terminal

Anchor
_Toc172023301
_Toc172023301
7. Type the commands

Anchor
_Toc172023302
_Toc172023302
8. Run the python script

Anchor
_Toc172023303
_Toc172023303
9. Find output files

Anchor
_Toc172023304
_Toc172023304
10. Download output files

Anchor
_Toc172023305
_Toc172023305
11. Clear cache and log out