Chapter 4. Preparing Software for Packaging

This chapter is about source code and creating software, which are a necessary background for an RPM Packager.

4.1. What is Source Code?

Source code is human-readable instructions to the computer, which describe how to perform a computation. Source code is expressed using a programming language .

This tutorial features three versions of the Hello World program, each written in a different programming language. Programs written in these three different languages are packaged differently, and cover three major use cases of an RPM packager.

Note

There are thousands of programming languages. This document features only three of them, but they are enough for a conceptual overview.

Hello World written in bash:

bello

#!/bin/bash

printf "Hello World\n"

Hello World written in Python:

pello.py

#!/usr/bin/env python

print("Hello World")

Hello World written in C :

cello.c

#include <stdio.h>

int main(void) {
    printf("Hello World\n");
    return 0;
}

The purpose of every one of the three programs is to output Hello World on the command line.

Note

Knowing how to program is not necessary for a software packager, but is helpful.

4.2. How Programs Are Made

There are many methods by which human-readable source code becomes machine code - instructions the computer follows to actually execute the program. However, all methods can be reduced to these three:

  1. The program is natively compiled.
  2. The program is interpreted by raw interpreting.
  3. The program is interpreted by byte compiling.

4.2.1. Natively Compiled Code

Natively compiled software is software written in a programming language that compiles to machine code, with a resulting binary executable file. Such software can be run stand-alone.

RPM packages built this way are architecture -specific. This means that if you compile such software on a computer that uses a 64-bit (x86_64) AMD or Intel processor, it will not execute on a 32-bit (x86) AMD or Intel processor. The resulting package will have architecture specified in its name.

4.2.2. Interpreted Code

Some programming languages, such as bash or Python, do not compile to machine code. Instead, their programs' source code is executed step by step, without prior transformations, by a Language Interpreter or a Language Virtual Machine.

Software written entirely in interpreted programming languages is not architecture -specific. Hence, the resulting RPM Package will have string noarch in its name.

Interpreted languages are either byte-compiled or raw-interpreted. These two types differ in program build process and in packaging procedure.

4.2.2.1. Raw-interpreted programs

Raw-interpreted language programs do not need to be compiled at all, they are directly executed by the interpreter.

4.2.2.2. Byte-compiled programs

Byte-compiled languages need to be compiled into byte code, which is then executed by the language virtual machine.

Note

Some languages give a choice: they can be raw-interpreted or byte-compiled.

4.3. Building Software from Source

This section explains building software from its source code.

  • For software written in compiled languages, the source code goes through a build process, producing machine code. This process, commonly called compiling or translating, varies for different languages. The resulting built software can be run or "executed", which makes computer perform the task specified by the programmer.
  • For software written in raw interpreted languages, the source code is not built, but executed directly.
  • For software written in byte-compiled interpreted languages, the source code is compiled into byte code, which is then executed by the language virtual machine.

4.3.1. Natively Compiled Code

In this example, you will build the cello.c program written in the C language into an executable.

cello.c

#include <stdio.h>

int main(void) {
    printf("Hello World\n");
    return 0;
}

4.3.1.1. Manual Building

Invoke the C compiler from the GNU Compiler Collection (GCC) to compile the source code into binary:

gcc -g -o cello cello.c

Execute the resulting output binary cello.

$ ./cello
Hello World

That is all. You have built and ran natively compiled software from source code.

4.3.1.2. Automated Building

Instead of building the source code manually, you can automate the building. This is a common practice used by large-scale software. Automating building is done by creating a Makefile and then running the GNU make utility.

To set up automated building, create a file named Makefile in the same directory as cello.c:

Makefile

cello:
        gcc -g -o cello cello.c

clean:
        rm cello

Now to build the software, simply run make:

$ make
make: 'cello' is up to date.

Since there is already a build present, make clean it and run make again:

$ make clean
rm cello

$ make
gcc -g -o cello cello.c

Again, trying to build after another build would do nothing:

$ make
make: 'cello' is up to date.

Finally, execute the program:

$ ./cello
Hello World

You have now compiled a program both manually and using a build tool.

4.3.2. Interpreted Code

The next two examples showcase byte-compiling a program written in Python and raw-interpreting a program written in bash.

Note

In the two examples below, the #! line at the top of the file is known as a shebang and is not part of the programming language source code.

The shebang enables using a text file as an executable: the system program loader parses the line containing the shebang to get a path to the binary executable, which is then used as the programming language interpreter.

4.3.2.1. Byte-Compiled Code

In this example, you will compile the pello.py program written in Python into byte code, which is then executed by the Python language virtual machine. Python source code can also be raw-interpreted, but the byte-compiled version is faster. Hence, RPM Packagers prefer to package the byte-compiled version for distribution to end users.

pello.py

#!/usr/bin/env python

print("Hello World")

Procedure for byte-compiling programs is different for different languages. It depends on the language, the language’s virtual machine, and the tools and processes used with that language.

Note

Python is often byte-compiled, but not in the way described here. The following procedure aims not to conform to the community standards, but to be simple. For real-world Python guidelines, see Software Packaging and Distribution.

Byte-compile pello.py:

$ python -m compileall pello.py

$ file pello.pyc
pello.pyc: python 2.7 byte-compiled

Execute the byte code in pello.pyc:

$ python pello.pyc
Hello World

4.3.2.2. Raw Interpreted Code

In this example, you will raw-interpret the bello program written in the bash shell built-in language.

bello

#!/bin/bash

printf "Hello World\n"

Programs written in shell scripting languages, like bash, are raw-interpreted. Hence, you only need to make the file with source code executable and run it:

$ chmod +x bello
$ ./bello
Hello World

4.4. Patching Software

A patch is source code that updates other source code. It is formatted as a diff, because it represents what is different between two versions of text. A diff is created using the diff utility, which is then applied to the source code using the patch utility.

Note

Software developers often use Version Control Systems such as git to manage their code base. Such tools provide their own methods of creating diffs or patching software.

In the following example, we create a patch from the originial source code using diff and then apply it using patch. Patching is used in a later section when creating an RPM, Section 5.1.7, “Working with SPEC files”.

How is patching related to RPM packaging? In packaging, instead of simply modifying the original source code, we keep it, and use patches on it.

To create a patch for cello.c:

  1. Preserve the original source code:

    $ cp cello.c cello.c.orig

    This is a common way to preserve the original source code file.

  2. Change cello.c:

    #include <stdio.h>
    
    int main(void) {
        printf("Hello World from my very first patch!\n");
        return 0;
    }
  3. Generate a patch using the diff utility:

    Note

    We use several common arguments for the diff utility. For more information on them, see the diff manual page.

    $ diff -Naur cello.c.orig cello.c
    --- cello.c.orig        2016-05-26 17:21:30.478523360 -0500
    +++ cello.c     2016-05-27 14:53:20.668588245 -0500
    @@ -1,6 +1,6 @@
     #include<stdio.h>
    
     int main(void){
    -    printf("Hello World!\n");
    +    printf("Hello World from my very first patch!\n");
         return 0;
     }
    \ No newline at end of file

    Lines starting with a - are removed from the original source code and replaced with the lines that start with +.

  4. Save the patch to a file:

    $ diff -Naur cello.c.orig cello.c > cello-output-first-patch.patch
  5. Restore the original cello.c:

    $ cp cello.c.orig cello.c

    We retain the original cello.c, because when an RPM is built, the original file is used, not a modified one. For more information, see Section 5.1.7, “Working with SPEC files”.

To patch cello.c using cello-output-first-patch.patch, redirect the patch file to the patch command:

$ patch < cello-output-first-patch.patch
patching file cello.c

The contents of cello.c now reflect the patch:

$ cat cello.c
#include<stdio.h>

int main(void){
    printf("Hello World from my very first patch!\n");
    return 1;
}

To build and run the patched cello.c:

$ make clean
rm cello

$ make
gcc -g -o cello cello.c

$ ./cello
Hello World from my very first patch!

You have created a patch, patched a program, built the patched program, and run it.

4.5. Installing Arbitrary Artifacts

A big advantage of Linux and other Unix-like systems is the Filesystem Hierarchy Standard (FHS). It specifies in which directory which files should be located. Files installed from the RPM packages should be placed according to FHS. For example, an executable file should go into a directory that is in the system PATH variable.

In the context of this guide, an Arbitrary Artifact is anything installed from an RPM to the system. For RPM and for the system it can be a script, a binary compiled from the package’s source code, a pre-compiled binary, or any other file.

We will explore two popular ways of placing Arbitrary Artifacts in the system: using the install command and using the make install command.

4.5.1. Using the install command

Sometimes using build automation tooling such as GNU make is not optimal - for example, if the packaged program is simple and does not need extra overhead. In these cases, packagers often use the install command (provided to the system by coreutils), which places the artifact to the specified directory in the filesystem with a specified set of permissions.

The example below is going to use the bello file that we had previously created as the arbitrary artifact subject to our installation method. Note that you will either need sudo permissions or run this command as root excluding the sudo portion of the command.

In this example, install places the bello file into /usr/bin with permissions common for executable scripts:

$ sudo install -m 0755 bello /usr/bin/bello

Now bello is in a directory that is listed in the $PATH variable. Therefore, you can execute bello from any directory without specifying its full path:

$ cd ~

$ bello
Hello World

4.5.2. Using the make install command

A popular automated way to install built software to the system is to use the make install command. It requires you to specify how to install the arbitrary artifacts to the system in the Makefile.

Note

Usually Makefile is written by the developer and not by the packager.

Add the install section to the Makefile:

Makefile

cello:
        gcc -g -o cello cello.c

clean:
        rm cello

install:
        mkdir -p $(DESTDIR)/usr/bin
        install -m 0755 cello $(DESTDIR)/usr/bin/cello

The $(DESTDIR) variable is a GNU make built-in and is commonly used to specify installation to a directory different than the root directory.

Now you can use Makefile not only to build software, but also to install it to the target system.

To build and install the cello.c program:

$ make
gcc -g -o cello cello.c

$ sudo make install
install -m 0755 cello /usr/bin/cello

Now cello is in a directory that is listed in the $PATH variable. Therefore, you can execute cello from any directory without specifying its full path:

$ cd ~

$ cello
Hello World

You have installed a build artifact into a chosen location on the system.

4.6. Preparing Source Code for Packaging

Note

The code created in this section can be found here.

Developers often distribute software as compressed archives of source code, which are then used to create packages. In this section, you will create such compressed archives.

Note

Creating source code archives is not normally done by the RPM Packager, but by the developer. The packager works with a ready source code archive.

Software should be distributed with a software license . For the examples, we will use the GPLv3 license. The license text goes into the LICENSE file for each of the example programs. An RPM packager needs to deal with license files when packaging.

For use with the following examples, create a LICENSE file:

$ cat /tmp/LICENSE
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

4.7. Putting Source Code into Tarball

In the examples below, we put each of the three Hello World programs into a gzip-compressed tarball. Software is often released this way to be later packaged for distribution.

4.7.1. bello

The bello project implements Hello World in bash. The implementation only contains the bello shell script, so the resulting tar.gz archive will have only one file apart from the LICENSE file. Let us assume that this is version 0.1 of the program.

Prepare the bello project for distribution:

  1. Put the files into a single directory:

    $ mkdir /tmp/bello-0.1
    
    $ mv ~/bello /tmp/bello-0.1/
    
    $ cp /tmp/LICENSE /tmp/bello-0.1/
  2. Create the archive for distribution and move it to ~/rpmbuild/SOURCES/:

    $ cd /tmp/
    
    $ tar -cvzf bello-0.1.tar.gz bello-0.1
    bello-0.1/
    bello-0.1/LICENSE
    bello-0.1/bello
    
    $ mv /tmp/bello-0.1.tar.gz ~/rpmbuild/SOURCES/

4.7.2. pello

The pello project implements Hello World in Python. The implementation only contains the pello.py program, so the resulting tar.gz archive will have only one file apart from the LICENSE file. Let us assume that this is version 0.1.1 of the program.

Prepare the pello project for distribution:

  1. Put the files into a single directory:

    $ mkdir /tmp/pello-0.1.1
    
    $ mv ~/pello.py /tmp/pello-0.1.1/
    
    $ cp /tmp/LICENSE /tmp/pello-0.1.1/
  2. Create the archive for distribution and move it to ~/rpmbuild/SOURCES/:

    $ cd /tmp/
    
    $ tar -cvzf pello-0.1.1.tar.gz pello-0.1.1
    pello-0.1.1/
    pello-0.1.1/LICENSE
    pello-0.1.1/pello.py
    
    $ mv /tmp/pello-0.1.1.tar.gz ~/rpmbuild/SOURCES/

4.7.3. cello

The cello project implements Hello World in C . The implementation only contains the cello.c and Makefile files, so the resulting tar.gz archive will have only two files apart from the LICENSE file. Let us assume that this is version 1.0 of the program.

Note that the patch file is not distributed in the archive with the program. The RPM Packager applies the patch when the RPM is built. The patch will be placed in the ~/rpmbuild/SOURCES/ directory alongside the .tar.gz.

Prepare the cello project for distribution:

  1. Put the files into a single directory:

    $ mkdir /tmp/cello-1.0
    
    $ mv ~/cello.c /tmp/cello-1.0/
    
    $ mv ~/Makefile /tmp/cello-1.0/
    
    $ cp /tmp/LICENSE /tmp/cello-1.0/
  2. Create the archive for distribution and move it to ~/rpmbuild/SOURCES/:

    $ cd /tmp/
    
    $ tar -cvzf cello-1.0.tar.gz cello-1.0
    cello-1.0/
    cello-1.0/Makefile
    cello-1.0/cello.c
    cello-1.0/LICENSE
    
    $ mv /tmp/cello-1.0.tar.gz ~/rpmbuild/SOURCES/
  3. Add the patch:

    $ mv ~/cello-output-first-patch.patch ~/rpmbuild/SOURCES/

Now the source code is ready for packaging into an RPM.