Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Catalan Linguistic Pipeline – Guide

...

Anchor
_Toc172024728
_Toc172024728
1. Access TACC Account

Access the shared credentials to enter TACC Analysis Portal

1.1

Access TACC Analysis Portal https://tap.tacc.utexas.edu/jobs/

Enter Dr. Grasso’s account information generated by Stache and log in.

1.2

Submit a job on TACC by clicking on the dropdowns and selecting:

  • Lonestar 6

  • Jupyter Notebook

  • DBS23006

  • vm-small

  • Nodes 1; Tasks 1

  • Job name (can be anything, I will use CatalanLing_Trial in this tutorial guide)

  • Time limit (I will use 2 hours in this tutorial guide)

  • Click on “submit”

*Note. If you're struggling to get nodes on the vm-small queue, I'd recommend trying the development queue. This applies to everything except transcription (where you should try gpu-a100-small, followed by gpu-a100-dev, followed by gpu-a100).

image-20240716-174434.png

Anchor
_Toc172024729
_Toc172024729
2. Enter Jupyter

If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps:

  • Click on “connect”

  • Click on “work”

  • Click on “MADR_LingFeatPipelines”

If there are no available nodes (picture B), you will have to wait in a queue until it’s available.

image-20240716-174523.pngimage-20240716-174553.png

Anchor
_Toc172024730
_Toc172024730
3. Create a folder

  • Click on “new”

  • Click on “folder”

  • Name folder. In this tutorial guide, I will use the name of catalanPipelineTrialFiles

If you cannot see the folder you created, click on “last modified” a couple of times. Sometimes it doesn’t update immediately.

image-20240716-174641.png

image-20240716-174703.png

Anchor
_Toc172024731
_Toc172024731
4. Upload input file

  • Enter the folder you just created

  • Click on “upload” and upload the input files

image-20240716-174744.png

Anchor
_Toc172024732
_Toc172024732
5. Go to the terminal

  • Once the files have been uploaded, click on the new menu dropdown

  • Click on terminal

image-20240716-174825.png

Anchor
_Toc172024733
_Toc172024733
7. Type the commands

Once you are in the terminal, follow these steps (if running several audio files):

  • Type cdw, press enter

  • Type MADR_LingFeatPipelines,press enter

  • Type conda activate catalanPipeline, press enter

  • Type or copy the command below, then press enter:

python catalanLingPipeline.py catalanPipelineTrialFiles/ catalanLing_Trial catalanLingTrialSupplementary_Output

Keep in mind that the red and green sections change depending on your input and output files. The purple section always remains the same.

image-20240716-174902.png

The parts circled in red, green, and gold change depending on the name of your input and output files.

  • catalanPipelineTrialFiles is the name of my input file. Your command will change depending on the name you give your file. Remember that the exact ortography must match. If the name of the file that you uploaded is all in lowercase, the command in the terminal must be all in lowercase.

  • catalanLing_Trial and catalanLingTrialSupplementary_Output are the names is the name of the output file and output folder. You can change this part of the command (in the terminal) depending on the name you want to give your output file and folder.

Anchor
_Toc172024734
_Toc172024734
8. Run the python script

After writing your command and pressing enter, wait a few seconds. You will know when it’s done running when you see this at the bottom of the terminal (see picture, circled in red).

image-20240716-175004.png

Anchor
_Toc172024735
_Toc172024735
9. Find output files

Once the files are finished running, follow the next steps:

  • Go back to the notebook

  • Click on the refresh button (if needed)

If the generated files did not pop up, click on “last modified,” sometimes it takes a minute to update.

You will 1 output file in coma separated value version and 1 folder (see picture).

image-20240716-175035.png

The code will produce a spreadsheet called catalanLingTrial_LinguisticFeatures, and a folder named PosMorphDepTree_catalanLingTrial

Anchor
_Toc172024736
_Toc172024736
10. Download output files

  • First, we want to download the linguistic features. Check the box with the csv file and click on download (picture A).

  • Then, enter the supplementary output folder and download the files one by one (picture B).

image-20240716-175114.png

image-20240716-175142.png

Anchor
_Toc172024737
_Toc172024737
11. Clear cache and log out

  • Clear cache by typing the code rm -r ~/.cache/

  • Press enter

  • Then, log out from the terminal by typing the command logout(see picture A)

  • Press enter

  • Go back to the original TACC page and click on “end job”(see picture B).

IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. THE JOB WILL KEEP RUNNING UNLESS YOU COMPLETE THIS STEP.

image-20240716-175216.png

image-20240716-175243.png

...