Compiling and installing open source software

  1. Introduction
  2. The Filesystem Hierarchy Standard (FHS)
  3. Environment Variables
  4. Compiling code
  5. Configuration
    1. GNU Autoconf configure script
    2. CMake
  6. Building, testing and installing with make
  7. Post-install tasks
  8. Tips
    1. Take notes
    2. Keep output logs
    3. Make your commands readable

 Introduction

Compiling and installing open-source software is an essential skill for any computational scientist. The majority of scientific software is distributed as open-source software that must be compiled and installed properly before it can be used. If your a computational scientist and don't know how to compile and install open source software yourself, you may be missing out on the following:

  • Running the latest version of software with improved features or performance
  • Running a version of the software that is optimized to get the maximum performance out of the computer hardware you are using it on.
  • Running the software on your own laptop of desktop system at home, or even your work desktop.

Many Linux distributions (aka "distros") come with a large selection of software you can install as part of the operating system, and there are even supplementary software repositories (aka "repos") that you can use to increase the number of software packages you can install using simple package management tools that are provided with your Linux distro. Unfortunately, many scientific application are not available this way because they are not used widely enough to be included in these sources. On top of that, the maintainers of these distros and repos often have a quality-control processes in places that take time to put each new version of an application through, so if an application is available through one of these channels, it can be weeks or months before a new version of an application will be available. If you need a new feature in the latest version of an application for your research, you many not be able wait for that version to "trickle down" to your particular Linux distro or a repo.

If you can install the application(s) you need as part of your Linux distro or a repo, the odds are that it will be compiled for a "generic" x86_64 processor that supports a minimum of instructions x86_64 instructions. This is done on purpose to maximize code portability - newer, more powerful processors can run code compiled for older processors, but not the other way around. Most of the performance in newer processor cores comes from adding additional processor instructions such as vector instructions that increase low-level parallelism in the processors. This is true for GPUs as well. These new features can provide significant speedup for scientific applications, so if your running applications that were compiled for "generic" x86_64 processors, you could be leaving a lot of performance behind.

Even if you are using clusters that have system administrators that can install software for you, that isn't always going to meet your needs, either. You may have applications that don't need to run on a cluster to be useful, so you may want to run them on your work desktop or laptop, or even your personal desktop or laptop at home. Most research computing organizations don't have the resources to manage user desktops or laptops, and supporting applications on personally-owned systems is against policy. Being able to install your own open source software will allow you to run small jobs on your work desktop or laptop, or personal systems. Being able to run applications on laptops is very valuable if you travel alot - it will allow you work while sitting on a plane or train or give live demonstrations at a conference without worrying about how stable your internet connection is.

In this tutorial we'll provide you with the necessary background information you need to understand the process of installing open source software, walk you through an example or two, and give you some advice on troubleshooting problems in the build process when they arise.

The Filesystem Hierarchy Standard (FHS)

Before we start talking about compiling code or the the process of configuring, building and installing applications, we need to know where files we need to build our software (header files, libraries, and various commands) are located, as well as where we can put the software we're installing.

Most of the problems that occur during the build process occur when a needed file can't be found at a particular step in the process. To fix those problems, you need to check if the needed file exists on your system and then tell the build process where that file is located, so being familiar with where files should be located will be a big help with troubleshooting the majority of errors you may encounter.

Fortunately, the Filesystem Hierarchy Standard (FHS) was created to standardize the location of different types of files across different Linux distros. The directories listed below are just some of the directories defined by the FHS, but they are the ones most relevant to this process:

  • /usr -  A major section of the filesystem. It is read-only, and holds most of the programs, libraries, and other files used by the users. The only time this section of the filesystem shouldbe written to is when packages provided by the operating system are added or removed by the system administrator. 3 subdirectories, important to this lesson, /usr/bin, /usr/lib, and /usr/include, are located in this directory.
  • /usr/bin -  Contains binaries and other executable commands (shell scripts, Python scripts, etc.) that any user can run (no
    administrator privileges necessary). In general, directories named ’bin’ anywhere in the filesystem will usually contain programs to be run by non-root users.
  • /usr/include - Contains header files for libraries stored in /usr/lib.
  • /usr/lib - Contains shared and static library files. After the introduction of 64-bit x86 processors, it became common to use this location to store 32-bit libraries, and store 64-bit libraries in /usr/lib64.
  • /usr/lib64 - Similar to /usr/lib, but contains 64-bit libraries.
  • /usr/share - This part of the /usr filesystem contains architecture-independent files that can be shared between systems with different processor architectures. In practice this is where a lot of non-executable text files are stored, including software documentation (/usr/share/doc) and man pages (/usr/share/man)
  • /usr/local - This directory is similar in purpose and organization to /usr, and is meant as a place where the system administrator can install additional software on the system without interfering with the software provided by the operating system.
  • /home - This directories contains subdirectories named after each user account. These subdirectories are known as home
    directories. They are owned by the user they are named after, and the user has full read-write-execute privileges over the entire contents of this directory. Personal settings are stored here, and users can save their files here, and install and run
    software here, too.
  • /opt - This directory is reserved for the installation of add-on application software packages. Like /usr/local/ it can be used by system administrators to install additional software with interfering with software provided by the operating system. It's also where software provided by 3rd-party should be installed using the conventions /opt/<package> or /opt/<provider>. That last part isn't always followed by 3rd party vendors, and system administrators still have the option of overriding that and installing that software elsewhere.

While knowing about these file locations is a good start, it's not the whole story. Not every system administrator will follow the FHS rigorously, and even if they did FHS states "local placement of local files is a local issue, so FHS does not attempt to usurp system administrators.” Which means that your system administrators can place files where ever they want. Therefore, it's always a good idea to familiarize yourself with local conventions or ask your system admins where you should be looking for locally-managed software.

If you're working in an environment where environment modules are used to make locally-managed software available to you, you probably don't even need to know where the software is installed to use it in the build process. The environment modules should set environment variables used by the build process to help find the location of files.

We'll be talking about environment variables in the next section of this tutorial.

To learn more about environment modules, please see our documentation on environment modules. Specifically, you want to use the command 'module show <module name>', which will show you what changes that module makes to your environment, including what environment variables it defines or modifies.
 

Environment Variables

Environment variables are variables that once defined in a shell are available to children processes of that shell. For example, if I define the variable FOO as an environment variable and then run another program from that shell, that program can access the value of FOO. Environment variables can be used to modify the behavior of commands running in the environment where they are set.

To define an environment in Bash, you use the "export" command. You can use the export command two ways. You can define the variable first, and then use the export command:

ENV_VAR="I'm an environment variable"
export ENV_VAR

In this way, you can define multiple variables and then export them all at once:

ENV_VAR_1="value_1"
ENV_VAR_2="value_2"
ENV_VAR_3="value_3
export ENV_VAR_1 ENV_VAR_2 ENV_VAR_3

The second way is to define the variable and use the export command all on one line, like this:

export ENV_VAR="I'm an environment variable"

Both syntaxes are acceptable and used in bash.

Here are a couple of common misconceptions about environment variables I've seen over the years:

  1. Naming an environment variable using ALL CAPS makes it an environment variable.
  2. Environment variables must be named in ALL CAPS.
  3. Once an environment variable is set in one shell, it affects all other shells.
  4. Once an environment variable is set, it is persisten and does not need to be set again.

While it's a common convention to name environment variables using all caps. For example, PATH and HOME, but this is not necessary. Also, using all caps in a variable name does not automatically make it an environment variable. The only thing required to make a variable an enviroment variable is to export it using the export command.

If you have multiple terminal windows open and set an environment variable in one of them, it will not be available to processes running in those other terminal windows. Once you close a shell, the environment variable you set in that shell are lost. You need define environment variables every time a new shell starts. You can automate this by defining environment variables in your ~/.bashrc or ~/.bash_profile.

Environment variables are important to the build process for several reasons:

  1. Environment variables like CC, CXX, F77, F90, or FC to tell the build process what compiler(s) to use for different programming languages.
  2. Environment variables like C_INCLUDE_PATH and LIBRARY_PATH provide lists of directories that should be searched for header files or library files, respectively.
  3. Environment variables like PATH, MANPATH, and LD_LIBRARY_PATH need to be set after your software is installed in order for you to use it

Here are some environment variables that are relevant to the install process and their significance. The exact environment variables used will vary depending on the needs of the software package you're installing.

  • CC - Command to be used for the C compiler
  • CXX - Command to be used for the C++ compiler
  • F77 - Command to be used for the Fortran77 compiler. This variable is being used less and less as Fortran77 is replaced by newer Fortran standards.
  • F90 - Command to be used for the Fortran90 compiler. Like F77, F90 is being used less and less as newer Fortran standards become more common.
  • FC - Command to be used for the Fortran compiler. This is becoming more common, replacing F77 and F90 variables.
  • PATH - Ordered list of directories to search for commands. If you want to use compilers that are not installed in standard locations, you'll need to add their location to your PATH before starting the build process. After installation, you'll need to add the location of programs you just installed to your PATH before you can use them.
  • CPATH - Used by GCC. A list of directories to be searched for header files, independent of the programming language
  • C_INCLUDE_PATH - Used by GCC. A list of directories to be search for header files, but only when compiling files written in C. 
  • CPLUS_INCLUDE_PATH - Used by GCC. A list of directories to be searched for header files, but only when compiling files written in C++.
  • LIBRARY_PATH - A list directories to be searched for library files by the compile-time linker. Only used when compiling.
  • LD_LIBRARY_PATH - A list of directories to be searched for libraries by the run-time linker. Used when trying to run applications.

For environment variables that define search paths, like PATH and CPATH, the values for those variables are a list of directories where each list item is separated by a colon (:) The list of directories is search from left to right when searching for a file. For example, this is what my PATH looks like on Adroit:

$ echo $PATH
/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/opt/dell/srvadmin/bin:/home/pbisbal/.local/bin:/home/pbisbal/bin

This means when I type in a command at the command prompt and hit [ENTER], my shell will look for that command in the directories listed in PATH in the following order:

  1. /usr/share/Modules/bin
  2. /usr/local/bin
  3. /usr/bin
  4. /usr/local/sbin
  5. /usr/sbin
  6. /opt/puppetlabs/bin
  7. /opt/dell/srvadmin/bin
  8. /home/pbisbal/.local/bin
  9. /home/pbisbal/bin

If the command is not in any of those directories, my shell will report an error that the command is not found, like this:

$ foo
-bash: foo: command not found

If you've just installed a new application, you probably want to use that one before any other versions installed in your environment. In order to tell your shell to look in that directory first, you need to prepend it to your existing PATH. If installed a command in my home directory in the apps/bin directory, I would do this to add that directory the first position in my PATH:

$ export PATH=$HOME/apps/bin:$PATH

I can than use echo to print the new value of my PATH to make sure it has been changed to want I want it to be:

$ echo $PATH
/home/pbisbal/apps/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/opt/dell/srvadmin/bin:/home/pbisbal/.local/bin:/home/pbisbal/bin

Knowing how to make changes like to the various search paths defined in your environment is essential to fix issues with the build process or use the applications you just installed. We'll talk more about that later.

Compiling code

Compiling software is the process of taking text files containing the source code written in a language that requires compiling (C, C++, Fortran, etc.) and converting it into machine code that the processor can understand. There are 3 main steps in the process of compiling code and converting it into a library or executable application you can use:

  1. Preprocessing
  2. Compiling
  3. Linking

During preprocessing, the source code is scanned for preprocessor directives. Preprocess directives are instructions that tell preprocessor how to modify the source code before it is actually processed by the compiler. The most common use for preprocessor directives is to include header files. For example:

#include "foo.h"

tells the preprocessor to take the entire contents of the file foo.h and insert it at this point in the source file. The include directive can also be written like this, which has a slightly different meaning:

#include <foo>

The difference between these two syntaxes is that in the first example using quotes, you are providing a relative or absolute path to the header file. If the preprocessor doesn't find it there, it will then look for that relative path in the locations defined in the preprocessor search path. In the second example with angle brackets, we are telling the preprocess to look for foo.h only  locations defined in the preprocessor search path.

Another common preprocessor directive is define, which can be used to define variables. For example:

#define BUFF_SIZE 32

This assigns value of 32 to the string "BUFF_SIZE". So wherever the preprocessor finds the string "BUFF_SIZE" in the source code file it will replace it with "32". There are other preprocessor directives, but discussing them is out-of-scope for this tutorial. In fact, for the purpose of this tutorial, we will only be concerned with the #include directive.

In this context, the compiling step is where the text source code is converted to object files that contain machine code that the processors can understand. There's actually a number of substeps in this step, but understanding the process at that level is unnecessary for this discussion.

The final step is linking. When a source file is compiled, it becomes either an executable or an object file. The object files are collected into larger object files known as libraries. Libraries can be either static (filename ending with ".a" for archive) or dynamic (ending with ".so" for shared object). These libraries file contain the actual instructions for executing functions used by the executable. For example, if an executable needs to take the square root of a number, it will call the function sqrt(), whose instructions are provided by a library. The linker looks at the executable and sees what external functions the executable needs, and then searches for the library that provides that function. For a static library, it adds the entire static library to the executable. For dynamic libraries, it adds information to the executable that it needs to use the function in that external library.

These libraries don't have to be provided by the program you are compiling - they can, and are often, provided by other applications/libraries. For example, if your application needs to read HDF5 files, it will link to libraries provided by the HD5 package itself so it can use the functions provided by HDF5 to operate on HDF5 files.

Most of the problems that occur when compiling occur due to one of 3 reasons:

  1. A header file cannot be found by the preprocessor
  2. A library file cannot be found by the linker
  3. The linking stage fails with 'unresolved symbol errors', caused by the libraries listed in the wrong order, or a necessary library not specified on the command-line.

Errors occurring during the compilation step are not as common. When they do occur, they are normally harder to solve: The code is using a version of the syntax that is older or newer than what the compiler supports, or the code is using language extensions supported by a compiler other than the one being used. Not only are these errors relatively rare, they require knowledge of the programming language being used, and teach programming languages is beyond the scope of this tutorial. 

Let's take a look at how to address these 3 common errors.

If a header file cannot be found, determine the correct lcoation of that file and then specify it's location on the compiler command line using the -I switch. For example, if the correct header file is in $HOME/include, you would tell the compiler to look there like this:

$ gcc -I $HOME/include ... ...

This directory can also be added to the environment variables used by the preprocessor (CPATH, C_INCLUDE_PATH, etc.) as described in the previous section, but I don't recommend that. I recommend using the command-line whenever possibe, since that consolidates all your settings in a single command. Since environment variables only affect the shell they're defined in, if you have a lot of terminal windows open, it's very easy to accidentally type the command (or cut-and-paste it) into a terminal where those variables are not defined, leading to unexpected errors.

For a library that cannot be found by the linker, the solution is almost the same as for a missing header file: determine the correct location of the file, and specify it on the compiler command-line using the -L switch. Assuming the needed library file is located in $HOME/lib, that would look like this:

$gcc -L $HOME/lib ... ...

You can also set the LIBRARY_PATH environment variable as described in the case for preprocessor files, but I don't recommend it.

If the linker reports unresolved symbol errors, that means either the file being compiled makes references to a function that is not provided by one of the included libraries, or one of the libraries being linked to relies on a function provided by another library that is not included. There are two possible causes for this.

The first is that the libraries are not listed in the compiler command with the -l switch in the correct order. The libraries must be listed in the correct order for the linker to resolve the symbols correctly. The libraries are search in order from left to right as they appear on the command-line specified with the -l switch. The library needing the function needs to be listed before the library providing the function. This issue can be tested easily by changing the order of the -l options and seeing if that eliminates the error. Sometimes you can do an Internet search for the names of the unresolved symbols to determine which library provides it, and use that information to correct the order of the libraries.

The other cause of unresolvable symbol errors is that the library that provides that symbol is omitted from the list of libraries to link to. If your not sure what library needs to be added, an Internet search can provide useful clues.

Configuration

Now that we've covered some important prerequisites to help you understand the build process we can talk about the first step of the actual build process - configuring the software with the correct settings for building it in your environment. If you're lucky, this is automated by running a utility that will inspect your environment and determine the proper settings for your environment.

These utilities run a number of small tests to probe the environment the software is being built in to determine if the prerequisites for mandatory and optional features are present, as well as to determine how to optimize the code for your environment. For example, if it detects the that the processor supports AVX2 or AVX512 instructions, it may enable optimization to take care of those processor features.

An example of an optional feature would be a command-line program with an optional GUI interface that uses X11 (X-windows). The program can still perform all of its functions from the command-line without the GUI. If the configuration process can't detect the X11 headers or libraries on the system, it will print out a brief statement mentioning that and continue on with the configuration process. On the other hand, if you specified that you wanted the program to be built with the GUI and X11 wasn't found, the configuration step will fail, since it's unable to configure the software as requested.

There are a number of these configuration utilities available, and which one is used for a particular application is a decision that is made by the developer. The most popular configuration tool is the configure script, which the developer creates using GNU Autoconf. The second most most popular configuration tool is CMake. CMake is a distant second to GNU Autoconf configure scripts, but is still way ahead of whatever is in 3rd place. These two tool cover the majority of open source software, so we're going to take a closer look at both.

GNU Autoconf configure script

When you download and untar the source code your application, there will often be a script named "configure" in the top-level directory of the source code. This is a shell script was created by the software's developer(s) using GNU Autoconf. In the simplest case, all you need to do is run this script, and it will determine the proper settings for your environment, after which you will be ready to run 'make', which will actually compile your code.

Of course, nothing is ever as simple as we want it, and every configure script has a number of options. Some options will be the same from package to package, but most will vary from package to package. The best way to see what options are available for your package is to run the configure script with the --help switch to have it list all of it's arguments. This is often more than a single screenful of output, so it's best to pipe the output into less so you can scroll up and down through the output and read it at your pace:

$ ./configure --help | less

In general, there are 3 types of configuration options available:

  1. Options to specify where the software is installed. Where the libraries, header files, and any executables are installed, for example.
  2. Options that specify which features are enabled or disabled, like whether to build static libraries or dynamic libraries, or include support for HDF5 libraries.
  3. Environment variables that control the behavior of the configure script. This variables can be used to specify what compiler to use (CC, CXX, FC, etc.), or what flags should be passed to the preprocessor (CPPFLAGS) or linker (LDFLAGS).

As mentioned earlier, there are some configuration options that are  are common to all configure scripts. The most important of these is the --prefix option. This option tells configure in what directory the software should be installed. If this is not specified, the default is used, which is usually /usr/local. All other directories and files are then installed under here. For example if the default is used, all header files will be installed in /usr/local/include, all libraries will be installed in /usr/local/lib, and all executables will be installed in /usr/local/bin.

This is typically not what you want, since this will make it harder to keep track of what files in those directories belong to which application, and prevents having multiple versions of an application installed, since the files from whatever version is installed last will overwrite the versions installed earlier.

For software that’s installed manually like this, it’s much easier to put each application in it’s own directory, in a path that makes it easy to understand what application is installed where. For example, if you want to install versions 1.1 and 2.2 of an application named "example", you might install them in /usr/local/example-1.1 and /usr/local/example-2.2, respectively, or /usr/local/example/1.1 and /usr/local/example/2.2, respectively.

For users installing software in your home directory, it is recommended you create a directory named ’apps’ or ’software’ in your home directory, and then install everything under that. For example, using the previous example but installing it in $HOME, those versions could be installed in $HOME/apps/example-1.1 or $HOME/apps/example/2.2.

Some other common options that I like to set are:

  • --disable-silent-rules  - This enables verbose output from the make process, which makes debugging problems much easier.
  • --enable-shared -  Build shared libraries. This is usually the default, but not always, so it’s easier to be explicit every time.
  • --enable-static -  Build static libraries. This is usually not the default. Since I often build libraries for a number of users, some of whom may need/prefer static libraries, I always specifies this.

It's not possible or practical to discuss all the options specific to any software package. Just as an example, here's some of the options from the configure script for FFTW 3.3.10:

  --enable-single         compile fftw in single precision
  --enable-float          synonym for --enable-single
  --enable-long-double    compile fftw in long-double precision
  --enable-quad-precision compile fftw in quadruple precision if available
  --enable-sse            enable SSE optimizations
  --enable-sse2           enable SSE/SSE2 optimizations
  --enable-avx            enable AVX optimizations
  --enable-avx2           enable AVX2 optimizations
  --enable-avx512         enable AVX512 optimizations
  --enable-avx-128-fma    enable AVX128/FMA optimizations
  --enable-kcvi           enable Knights Corner vector instructions
                          optimizations
  --enable-altivec        enable Altivec optimizations
  --enable-vsx            enable IBM VSX optimizations
  --enable-neon           enable ARM NEON optimizations

At the end of the --help output environment variables will be listed that can be used to influence the behavior of configure. This list will be different for every package, but there are some that are common to just about every package, such as these

  CC          C compiler command
  CFLAGS      C compiler flags
  LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries in a
              nonstandard directory <lib dir>
  LIBS        libraries to pass to the linker, e.g. -l<library>
  CPPFLAGS    (Objective) C/C++ preprocessor flags, e.g. -I<include dir> if
              you have headers in a nonstandard directory <include dir>
  CPP         C preprocessor
  MPICC       MPI C compiler command
  F77         Fortran 77 compiler command
  FFLAGS      Fortran 77 compiler flags

It is always a good idea to use these environment variables to specify which compilers you want to use, such as a C compiler with
CC. This makes sure you are using the desired compiler. This is especially critical in environments where you have more than one
compiler installed (Intel and GCC, for example). If you have different versions of the same compiler, specify the full path to the correct
version in CC to make sure you are using the correct version. To install version 2.2 of package "example" in $HOME/apps/example/2.2,using the gcc compiler in /usr/local/bin, and enabling some of the common options the author recommends, the configure command would look like this:

./configure \
--prefix=$HOME/apps/example/2.2 \
--disable-silent-rules \
--enable-shared \
--enable-static \
CC=/usr/local/bin/gcc

Note in the above example the backslashes at the end of each line are to escape the newline character at the end of each line. This enables the shell to treat those multiple lines as if they are one line. There can be nothing after those backslashes other than the newline character for this to work. I prefer this syntax since it allows long configure lines with many options to be easier to read.

CC can be defined as an environment variable before running configure, or put on the command-line before the configure command instead of after it, but the style shown above, where CC (and other environment variables) are defined on the command-line after the configure command, is actually recommended in Section 7.1 of the GNU Coding Standards.

Actually running configure can take several minutes, depending on how large and complicated the package being configured is. When configure completes, it will create a number of makefiles which will the guide the actual compiling and installation of all the files with the correct settings as determined by configure. We'll be talking more about makefiles soon.

 CMake

CMake performs the same function as a configure script, but it it's implemented much differently. Instead of a script provided in the source code, the top-level directory will contain a several files with 'cmake' in the name, which will be used as inputs to the cmake command. You need to have the cmake command installed on your system in order to process these files. Rather than doing things on the command-line, cmake provides a text-based user interface (TUI) that allows you to use the arrow keys to navigate from setting to setting.

The command to start cmake is actually 'ccmake'. ccmake won't start if you run it in the same directory as the source code. You can run ccmake from anywhere other than the source directory, but it makes sense to keep the build directory close to the source directory. I recommend creating a directory called "build" within the source directory, cd-ing into that directory and then running ccmake. You need to provide the path to the source directory as an argument to ccmake. Since the source code directory is the parent directory of the build directory, you can use the relative path to the parent directory as your argument to ccmake, like this:

$ ccmake ../

This will start the TUI interface, which looks look this:

Initial screen when starting CMake

Now press "c" to run the initial configuration step. Once this completes, you'll see the configuration options for your program, which will look similar to this:

CMake screen after pressing "c"

Once you reach this point, you can use the arrow keys to move up and down through the options when you get to an option you want to change, hit the enter key to change it. Once you're done making changes to the settings, press "c" again, and then "g" to generate new files and exit.

Now that we've finished configuring our application the next step is to use make to build, test, and install our software.

Building, testing and installing with make

The actual command that compiles and installs the software is make. Make is tool that automates the compiling of software based instructions provided to it in makefiles. Make is a very powerful tool that deserves it’s own training session. Knowing how it works is not really essential to this lesson, but if you build software a lot it may be useful to learn a little bit about about how it works.

To actually build the software, at this point, simply run the make command:

$ make

When the above command completes, its a good idea to test that the software you just built runs correctly. Not every software package provides this functionality. If it is provided, the command to run it is usually 'make check', but sometimes it will be 'make test'. These commands will run a series of tests provided by the developers that make sure the software you just compiled functions correctly. Whether or not a software package provides this is not always documented, so just try running those commands and see if anything happens:

$ make check

The next step is to actually install the software, which means to copy the files to the correct locations and make sure ownership and permissions are set correctly. That is done with the make install command, like this:

$ make install

Post-Install tasks

Once your software is installed. You need will need to make some changes to your environment in order to use it. This means updating your PATH, LD_LIBRARY_PATH, MANPATH, and other environment variables. If you need add these your environment, you can easily determine that by looking in the install directory (what you defined as the prefix in the configuration step) and see what directories exist. You should see, at a minimum, directories named "bin" and "lib". You will want to the full path to them to your PATH and LD_LIBRARY_PATH environment variables, respectively.

If you see a directory named "share", that directory probably contains a subdirectory 'man', which includes man pages for that software. You'll want to add the full path to that share/man directory to your MANPATH so you can read those man pages using the 'man' command.

To update those environment variables to include these directories for an application installed $HOME/example/2.2, you would do this:

$ export PATH=$HOME/example/-2.2/bin:$PATH
$ export LD_LIBRARY_PATH=$HOME/example/2.2/lib:$LD_LIBRARY_PATH
$ export MANPATH=$HOME/example/2.2/share/man:$MANPATH

There may be other environment variables unique to your application you need to define. To learn about those variables, you need to read the documentation for your particular application.

If you want to always be able to use the software automatically when you log in, you can add those lines to your ~/.bashrc file so these changes are made in every new shell you start. However, if your system is setup to use enviroment modules, I recommend you create a modulefile for the application instead, if you have the skills (see our documentation on creating your own custom environment modules to learn more)

Tips

Take notes

It's highly recommended that you take notes of the exact commands you use to install the software (cut-and-paste) along with any errors you encounter. This has a number of benefits:

  1. Building software seldom goes off without a hitch. Keeping notes helps you keep track of what commands worked and didn't work so you don't repeat the same mistakes
  2. If you get stuck and need help, providing the exact commands you ran and any error messages you encountered will be a tremendous help to whoever is assisting you, especially if they're assisting your remotely over e-mail or an instant messaging service like Slack or Googlel Chat.
  3. You will probably want to install later versions of the software in the future, or install the same package again with different configuration options. Being able to copy what you've already done with only minor modifications will save you a lot of time compared.
  4. If you're ambitious, you can use your notes to create a shell script to automate installing the software on other systems, or installing later versions. If your really ambitious, you can write your notes as that shell script!

When taking notes, it's best to use a lightweight plain-text text editor like vi, Emacs, nano, or something similar. Using a full-featured word processor is overkill for such a simple task, and the saved files aren't ASCII text files, which is what you want shell scripts to be (if you choose option #4 above).  I prefer vi because I can use it on the remote system I'm working on in a separate terminal window. This requires very little bandwidth so there's no network lag, and it allows me to keep the documentation on the same system where I'm installing the software.

As an example, here's some old notes I took for installing FFTW 3.3.6, patch-level 1 with the OpenMPI 1.10.3 using the Intel 2015 compilers back in 2017. Yes, these are old notes, and that's deliberate - old notes can still be useful so hang on to your notes and keep them organized. You never know when you'll need them again. At the bottom of the notes, I've included the contents of the modulefile I created for this install so I don't need to recreate that from scratch in the future, either.

# Notes for installing FFTW 3.3.6-pl1 with Intel 2015 and Open MPI 1.10.3 on CentOS 6.8
# 
# Prentice B.
# January 31, 2017

# http://www.fftw.org/

# As non-root user

module purge
module load intel/2015.u1
module load openmpi/1.10.3
umask 002

mkdir -p /usr/pppl/src/fftw/3.3.6-pl1

cd !$ 

wget http://www.fftw.org/fftw-3.3.6-pl1.tar.gz

mkdir -p /local/pbisbal

cd !$ 

tar xvf /usr/pppl/src/fftw/3.3.6-pl1/fftw-3.3.6-pl1.tar.gz

cd fftw-3.3.6-pl1

./configure \
  --prefix=/usr/pppl/intel/2015-pkgs/openmpi-1.10-pkgs/fftw-3.3.6-pl1 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi \
  --enable-openmp \
  --enable-threads \
  CC=icc \
  MPICC=mpicc \
  F77=ifort \
  2>&1 | tee configure.log

make 2>&1 | tee make.log

make check 2>&1 | tee check.log

# No failures. Install 

# as root

module purge
module load intel/2015.u1
module load openmpi/1.10.3
umask 002

make install 2>&1 | tee install.log

# grep -A1 ^Librar install.log   | sort -r | uniq
Libraries have been installed in:
--
   /usr/pppl/intel/2015-pkgs/openmpi-1.10-pkgs/fftw-3.3.6-pl1/lib

# Create module file 

# cat /usr/pppl/Modules/compiler-pkg/intel/2015/openmpi/1.10.3/fftw/3.3.6-pl1 
#%Module
##
## FFTW 3.3.6-pl1/Intel 2015/OpenMPI 1.10.3
proc ModulesHelp {} {
    puts stderr "This module loads FFTW-3.3.6-pl1 for Intel 2015 + OpenMPI 1.10.3"
}
module-whatis "FFTW 3.3.6-pl1 for Intel 2015 + OpenMPI 1.10.3"

conflict fftw 
prereq intel openmpi

set name     fftw
set version     3.3.6-pl1
set compiler     intel
set compvers     2015
set mpi        openmpi
set mpivers    1.10.3

set prefix  /usr/pppl/$compiler/$compvers-pkgs/$mpi-$mpivers-pkgs/$name-$version

setenv FFTWHOME "${prefix}"
prepend-path MANPATH "${prefix}/share/man"
prepend-path INFOPATH "${prefix}/share/info"
prepend-path PATH "${prefix}/bin"
prepend-path LD_LIBRARY_PATH "${prefix}/lib"
prepend-path LD_RUN_PATH "${prefix}/lib"
prepend-path INCLUDE_PATH "${prefix}/include"
prepend-path C_INCLUDE_PATH "${prefix}/include"

# Done

Keep output logs

When running the major commands (configure, make,  make check, make install), I find it useful to write all of the output from those commands to log files that I keep around until my software is successfully installed and has been tested. Just like with the notes mentioned above, if you encounter a problem, referring to these log files will be very helpful in determining what went wrong. I've found the best way to do this is to redirect the standard output and standard error of these commands to the tee command, like this:

$ configure --prefix=$HOME/example/2.2 --enable-static 2>&1 | tee configure.log

In the example above the "2>&1" tells bash to send output stream 2 (standard error) to the same location as output stream 1 (standard output). The "|" is known as the pipe symbol, and it takes the output of the preceding command (configure) and sends it to the input of the seconds command (tee). The tee command works just like a tee fitting in plumbing. It takes one input stream and sends it to two different places. In this case it's sending the output of confgure to both the screen and the file named 'configure.log'. This allows you to watch the output of the configure command while logging it at the same time. 

One warning - when logging the output of the configure command, do NOT use the file name 'config.log' the configure command already uses that filename to log what it's doing internally. That file is also very useful (but an advanced topic), so you don't want to overwrite it with tee.

You want to do this with the make commands, too:

$ make 2>&1 | tee make.log

$ make check 2>&1 | tee check.log

$ make install 2>&1 | install.log

Make your commands readable

Configure commands can have lots of options and can get pretty long. I like to write my commands in my notes before I run them. When I think I have them complete, I cut-and-paste them on the command-line.When I write my configure commands, or any long command, I like to use  the  backslash character, \, to escape new lines so I could write my configure command going across multiple lines, but still have it be a single command, like this:

../configure \
  --prefix=/usr/local/app/1.2.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --with-foo=/path/to/foo \
  CC=gcc \
  CXX=g++ \
  FC=gfortran \
  CFLAGS=-I/path/to/include \
  LDFLAGS=-L/path/to/lib \
  2>&1 | tee configure.log

I find this useful for a couple of reasons:

  1. It makes it easier to read, and therefore find errors. The eye has trouble following long lines of text across a page, and when a command wraps to the next line or 2, or 3, or..., it gets even harder to follow. Writing your command like this is much easier to read. It's almost as if every option is a separate bullet-point in a presentation slide.
  2. If using vi (like I do), you can remove a a single option by yanking it and then putting (yank and put are vi commands) it anywhere else in your notes as long as it's outside your command. You can then cut-and-paste the resulting command on  your command-line.