Basic UNIX/Linux tutorial

For this class, we will be using various software applications to study NGS data. You will need to know the basic commands required to download, compile, and run the software, as well as to view and analyze the results. This tutorial is intended to address this requirement at a basic level. If you are looking for a more comprehensive tutorial, the Unix and Perl Primer for Biologists written by Ian Korf's lab at UC Davis is a great resource.

Note

Linux is not the exact same thing as the UNIX operating system. There are several popular variants of UNIX, including Linux, Mac OS X, and Solaris. A lot of scientific software will run on all UNIX machines, whereas some will only run on Linux. As far as command-line basics are concerned, though, UNIX=Linux and I will use these terms interchangeably throughout the rest of the tutorial.

Terminal

The most common and most powerful method of interacting with a Linux machine is through the command line (sometimes called the “terminal”, the “command prompt”, or just “prompt”). The basic idea behind the terminal is pretty simple–you enter a command, and the computer executes that command. Sometimes a command will print some text to your screen, sometimes it will modify or create or delete files. This tutorial will give you an introduction to the types of commands you will need to know to navigate through the file system and run the scientific software.

Moving around

Files on any computer are stored in a nested directory structure. When you use a Mac or Windows computer, you click on folders to open them, and continue clicking on the appropriate folders until you have located the file you are looking for.

When using the terminal, you navigate through the file system a bit differently. When you open your terminal, you start at a default location called your “home directory”. You then use the cd command to change your location—analogous to clicking on a folder on Mac or Windows.

Whenever the prompt is open, you can type the pwd command to show your current location (pwd is short for print working directory). If I log in to a Linux machine and type pwd, it will show me the location of my default directory.

Last login: Wed Aug 31 13:54:26 2011 from dhrasmus.gdcb.iastate.edu
    _   _                             _
   / \ | |_ _ __ ___   ___  ___ _ __ | |__   ___ _ __ ___
  / _ \| __| '_ ` _ \ / _ \/ __| '_ \| '_ \ / _ \ '__/ _ \
 / ___ \ |_| | | | | | (_) \__ \ |_) | | | |  __/ | |  __/
/_/   \_\__|_| |_| |_|\___/|___/ .__/|_| |_|\___|_|  \___|
                               |_|

iPlant Collaborative 
Cloud Machine Image
Project Atmosphere
Seung-jin Kim <seungjin@email.arizona.edu>
[standage@localhost ~]$ pwd
/home/standage
[standage@localhost ~]$

The pwd command printed /home/standage as my current location. That means by default, my command prompt is in a directory called standage, which is inside of another directory called home, which is at the root of the file system. To see the files in this directory, I can use the ls command.

[standage@localhost ~]$ ls
Desktop
[standage@localhost ~]$

It looks like the only file in my home directory is another directory called Desktop. If I want to go to that directory, I use the cd command (short for “change directory”).

[standage@localhost ~]$ cd Desktop
[standage@localhost Desktop]$

You can see that the command did not print out any output like the previous ones did, but now my prompt looks different. Before, it showed ~ as my current location, but now it shows Desktop. As you navigate around the file system, this will be updated. Remember, you can always figure out where you are by typing the pwd command.

If I want to go back to my home directory, I can do this in several ways.

  • The symbol ~ is a shortcut for my home directory. I can go to my home directory at any time using the command cd ~.
  • The symbol .. represents the parent directory of the current directory. For example, if I am in the directory /home/standage/Desktop, then .. corresponds to /home/standage, ../.. corresponds to /home, and ../../.. corresponds to the root directory /. Another way to get to my home directory from /home/standage/Desktop is to use the command cd ...
  • If you enter the command cd without any file or directory name after it, the command will take you back to your home directory.
[standage@localhost Desktop]$ cd ~
[standage@localhost ~]$ pwd
/home/standage
[standage@localhost ~]$ cd /usr/lib64/httpd/modules/
[standage@localhost modules]$ pwd
/usr/lib64/httpd/modules
[standage@localhost modules]$ cd ..
[standage@localhost httpd]$ pwd
/usr/lib64/httpd
[standage@localhost httpd]$ cd ../..
[standage@localhost usr]$ pwd
/usr
[standage@localhost usr]$ cd
[standage@localhost ~]$ pwd
/home/standage
[standage@localhost ~]$

Tab completion

If you want to save time when typing long directory or file names, start typing the first 2 or 3 letters of the directory and then hit the Tab button. This will autocomplete (fill in) the rest of the directory/file name for you. If more than one directory starts with those 2 or 3 letters, then it will fill in as much as it can automatically. This will save you a lot of time and help prevent typos.

Commands

Path configuration

When you type a command like ls or pwd or cd, you are actually running a program. There is a special program file in some directory on your computer called ls, and when you enter the ls command your computer executes that program.

How does your computer know where to find the ls program (or any other program command you type)? There is a setting on your computer called your path–it is a list of directories on your computer. When you type in the ls command, your computer looks at all the directories in your path until it finds a program called ls, and then it executes it. You will need to remember this when you begin compiling programs on your VM. If you do not copy your program into a path folder or update your path settings, then your computer will not know how to find and execute the program you just installed.

Redirecting output

Many programs will print output to your screen. When you enter the pwd command, the output is a single line of text showing current directory. However, some commands may print thousands of lines–more than can fit on your screen. How do you manage the output?

One way is to redirect the output to a file using the > character.

[standage@localhost ~]$ cd /usr/bin
[standage@localhost bin]$ ls > ~/newfile.txt   # Save the output to a new file
[standage@localhost bin]$ wc -l ~/newfile.txt  # Count the lines in this new file
1672 /home/standage/newfile.txt
[standage@localhost bin]$

There are 1672 files in the /usr/bin directory. If we simply typed in ls without redirecting the output to a file, it would have overloaded the screen!

Another common approach in Linux is to “pipe” programs together. If you place a pipe “|” character between two programs, then the terminal will use the output of the first program as the input for the next program. For example, we can pipe the ls output into the grep program to select only those lines that contain “eat”.

[standage@localhost bin]$ ls | grep eat
create-branching-keyboard
create-jar-links
euca-create-snapshot
euca-create-volume
[standage@localhost bin]$

If you want to or need to, you can use multiple pipes in a single command. In each case, the output from the former command will be used as input for the latter command.

Command history

In the command prompt, you can use the up and down arrows on your keyboard to search through commands you have entered previously. If you want to see your entire (recent) command history, type history. This could be a lot of output, so you may want to redirect it to a file or pipe it into grep to search for a specific command.

Manuals

If you forget how to use a command, there are manuals (or “man pages”) available on your system to remind you. To see the manual for grep, type man grep. You can use your up and down arrows to scroll, and then just hit q when you're done.

File management

Now that you've got the basics down, I will use a lot less words and simply demonstrate by example. Remember that Linux filenames are case-sensitive (in other words, “Desktop” is not the same as “desktop”).

Copying files and directories

[standage@localhost bin]$ cd
[standage@localhost ~]$ ls
Desktop  newfile.txt
[standage@localhost ~]$ cp newfile.txt anotherfile.txt        # Create a copy of 'newfile.txt' called 'anotherfile.txt'
[standage@localhost ~]$ ls                     
Desktop  anotherfile.txt  newfile.txt
[standage@localhost ~]$ ls Desktop/
idrop.desktop
[standage@localhost ~]$ cp newfile.txt Desktop                # Create a copy of 'newfile.txt' and place it in the 'Desktop' directory
[standage@localhost ~]$ ls Desktop/
idrop.desktop  newfile.txt
[standage@localhost ~]$ cp newfile.txt Desktop/crazyfile.txt  # Create a copy of 'newfile.txt' called 'crazyfile.txt' and place it in the 'Desktop' directory
[standage@localhost ~]$ ls Desktop 
crazyfile.txt  idrop.desktop  newfile.txt
[standage@localhost ~]$ cp -r Desktop anotherDirectory        # Create a copy of the 'Desktop' directory and all its contents and call it 'anotherDirectory'
[standage@localhost ~]$ ls
Desktop  anotherDirectory  anotherfile.txt  newfile.txt
[standage@localhost ~]$ ls anotherDirectory 
crazyfile.txt  idrop.desktop  newfile.txt
[standage@localhost ~]$

Moving and renaming files and directories

[standage@localhost ~]$ ls
Desktop  anotherDirectory  anotherfile.txt  newfile.txt
[standage@localhost ~]$ mv newfile.txt oldfile.txt            # Rename 'newfile.txt' to 'oldfile.txt'
[standage@localhost ~]$ ls
Desktop  anotherDirectory  anotherfile.txt  oldfile.txt
[standage@localhost ~]$ mv oldfile.txt anotherDirectory       # Move 'oldfile.txt' to the directory 'anotherDirectory'
[standage@localhost ~]$ ls
Desktop  anotherDirectory  anotherfile.txt
[standage@localhost ~]$ ls anotherDirectory 
crazyfile.txt  idrop.desktop  newfile.txt  oldfile.txt
[standage@localhost ~]$ mv anotherDirectory Desktop           # Move the directory 'anotherDirectory' into the directory 'Desktop'
[standage@localhost ~]$ ls
Desktop  anotherfile.txt
[standage@localhost ~]$ ls Desktop 
anotherDirectory  crazyfile.txt  idrop.desktop  newfile.txt
[standage@localhost ~]$

Creating an empty directory

[standage@localhost ~]$ ls               
Desktop  anotherfile.txt
[standage@localhost ~]$ mkdir testDirectory
[standage@localhost ~]$ ls
Desktop  anotherfile.txt  testDirectory
[standage@localhost ~]$ ls testDirectory 
[standage@localhost ~]$

Deleting files and directories

Be careful…by default, Linux will not ask you if you are sure you want to delete the file. Once you delete it, it's gone. No recovery from Recycle Bin or Trash or anything like that.

[standage@localhost ~]$ ls
Desktop  anotherfile.txt  testDirectory
[standage@localhost ~]$ rm anotherfile.txt    # Delete file
[standage@localhost ~]$ ls
Desktop  testDirectory
[standage@localhost ~]$ rm -r testDirectory/  # Delete directory
[standage@localhost ~]$ ls
Desktop
[standage@localhost ~]$

Viewing files

  • less: will display the file; use up/down arrows to scroll, f/b to page, q to quit
  • head: print the first 10 lines of the file to the terminal; use head -n x to print the first x lines of the file
  • tail: print the last 10 lines of the file to the terminal

Editing files

  • nano: not very popular, but probably the simplest for beginners; command hints are shown at the bottom of the screen
  • vi and/or vim: popular text editor, but different editing modes can be confusing at first
  • emacs: also popular, but it has its own quirks

Beware! Linux users often have very strong feelings about which text editor is best. My favorite is vim, but nano is probably the simplest and best option for beginners.

Archives and compression

Sending large data files or programs over the internet can take a long time, so to speed the process up files are often compressed. You will commonly see .zip files used with Windows and Macs, but there are a few others you will commonly see with Linux. Depending on the type of compression used, you will need to use a different command to decompress and access the files.

Extension Command Example
.zip unzip unzip myfiles.zip
.bz2 bunzip2 bunzip2 allMySequences.fasta.bz2
.gz gunzip gunzip allMyGenes.gff3.gz
.tar tar tar xf myDirectory.tar
.tar.gz or .tgz tar tar xzf myProgram.tar.gz
.tar.bz2 tar tar xjf myApp.tar.bz2

Installing software

Some software can be run directly from plain text files–these are typically called “scripts” (Perl, Python, bash, etc). However, many software applications have source code that must be compiled into a executable binary file before it can be run. This is the case with most of the software we will be using this semester.

The first step to installing software on your Linux machine is to download the source code, using either your web browser or a command like wget or curl. If necessary, you will need to decompress the source code. If you downloaded the program from a web page, then that page probably includes installation instructions as well. However, it is common for installation instructions to be included with the source code as well. They are usually kept in files called README or INSTALL or something like that. You should follow the instructions in these files to install the software. A very typical installation process goes as follows.

./configure
make
sudo make install

This isn't universal, but it is very common and you will see it come up frequently in this course. Each of these commands will generally print a lot of output to the terminal. Don't worry about trying to read it all, you can ignore most of it. However, if there is a problem, hopefully the last few lines of output will make that clear. The more experience you get with this, the easier it will be to recognize.

Take a look at the following terminal recording for a simple example of a typical software installation.

cgss15/orientation/linux-tut.txt · Last modified: 2015/01/12 11:31 by standage
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki