English Linguistic Pipeline – Guide
Contents
1. Access TACC Account | Access the shared credentials to enter TACC Analysis Portal
| |
1.1 | Access TACC Analysis Portal https://tap.tacc.utexas.edu/ Enter Dr. Grasso’s account information generated by Stache and log in. | |
1.2 | Submit a job on TACC by clicking on the dropdowns and selecting:
*Note. If you're struggling to get nodes on the vm-small queue, I'd recommend trying the development queue. This applies to everything except transcription (where you should try gpu-a100-small, followed by gpu-a100-dev, followed by gpu-a100). | |
2. Enter Jupyter | If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps:
If there are no available nodes (picture B), you will have to wait in a queue until it’s available. |
|
3. Create a folder |
If you cannot see the folder you created, click on “last modified” a couple of times. Sometimes it doesn’t update immediately. | |
4. Upload input file |
| |
5. Go to the terminal |
| |
7. Type the commands | Once you are in the terminal, follow these steps (if running several audio files):
python englishLingPipeline.py InputFolder_Trial/ outputName Keep in mind that the red and green sections change depending on your input and output files. The purple section always remains the same. | The parts circled in red, green, and gold change depending on the name of your input and output files. InputFolder_Trial is the name of my input file. Your command will change depending on the name you give your file. Remember that the exact ortography must match. If the name of the file that you uploaded is all in lowercase, the command in the terminal must be all in lowercase. outputName is the name of the output file and output folder. You can change this part of the command (in the terminal) depending on the name you want to give your output file and folder. |
8. Run the python script | After writing your command and pressing enter, wait a few seconds. You will know when it’s done running when you see this at the bottom of the terminal (see picture, circled in red). | |
9. Find output files | Once the files are finished running, follow the next steps:
If the generated files did not pop up, click on “last modified,” sometimes it takes a minute to update. You will 1 output file in coma separated value version and 1 folder (see picture). | The code will produce a spreadsheet called outputName_LinguisticFeatures, and a folder named PosMorphDepTree_outputName |
10. Download output files |
| |
11. Clear cache and log out |
IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. THE JOB WILL KEEP RUNNING UNLESS YOU COMPLETE THIS STEP. |