Chapter 4. Preparing Software for Packaging
This chapter is about source code and creating software, which are a necessary background for an RPM Packager.
4.1. What is Source Code?
Source code is human-readable instructions to the computer, which describe how to perform a computation. Source code is expressed using a programming language .
This tutorial features three versions of the Hello World program, each written in a different programming language. Programs written in these three different languages are packaged differently, and cover three major use cases of an RPM packager.
There are thousands of programming languages. This document features only three of them, but they are enough for a conceptual overview.
Hello World written in bash:
bello
#!/bin/bash printf "Hello World\n"
Hello World written in Python:
pello.py
#!/usr/bin/env python
print("Hello World")
Hello World written in C :
cello.c
#include <stdio.h>
int main(void) {
printf("Hello World\n");
return 0;
}
The purpose of every one of the three programs is to output Hello World on the command line.
Knowing how to program is not necessary for a software packager, but is helpful.
4.2. How Programs Are Made
There are many methods by which human-readable source code becomes machine code - instructions the computer follows to actually execute the program. However, all methods can be reduced to these three:
- The program is natively compiled.
- The program is interpreted by raw interpreting.
- The program is interpreted by byte compiling.
4.2.1. Natively Compiled Code
Natively compiled software is software written in a programming language that compiles to machine code, with a resulting binary executable file. Such software can be run stand-alone.
RPM packages built this way are architecture -specific. This means that if you compile such software on a computer that uses a 64-bit (x86_64) AMD or Intel processor, it will not execute on a 32-bit (x86) AMD or Intel processor. The resulting package will have architecture specified in its name.
4.2.2. Interpreted Code
Some programming languages, such as bash or Python, do not compile to machine code. Instead, their programs' source code is executed step by step, without prior transformations, by a Language Interpreter or a Language Virtual Machine.
Software written entirely in interpreted programming languages is not architecture -specific. Hence, the resulting RPM Package will have string noarch in its name.
Interpreted languages are either byte-compiled or raw-interpreted. These two types differ in program build process and in packaging procedure.
4.2.2.1. Raw-interpreted programs
Raw-interpreted language programs do not need to be compiled at all, they are directly executed by the interpreter.
4.2.2.2. Byte-compiled programs
Byte-compiled languages need to be compiled into byte code, which is then executed by the language virtual machine.
Some languages give a choice: they can be raw-interpreted or byte-compiled.
4.3. Building Software from Source
This section explains building software from its source code.
- For software written in compiled languages, the source code goes through a build process, producing machine code. This process, commonly called compiling or translating, varies for different languages. The resulting built software can be run or "executed", which makes computer perform the task specified by the programmer.
- For software written in raw interpreted languages, the source code is not built, but executed directly.
- For software written in byte-compiled interpreted languages, the source code is compiled into byte code, which is then executed by the language virtual machine.
4.3.1. Natively Compiled Code
In this example, you will build the cello.c program written in the C language into an executable.
cello.c
#include <stdio.h>
int main(void) {
printf("Hello World\n");
return 0;
}4.3.1.1. Manual Building
Invoke the C compiler from the GNU Compiler Collection (GCC) to compile the source code into binary:
gcc -g -o cello cello.c
Execute the resulting output binary cello.
$ ./cello Hello World
That is all. You have built and ran natively compiled software from source code.
4.3.1.2. Automated Building
Instead of building the source code manually, you can automate the building. This is a common practice used by large-scale software. Automating building is done by creating a Makefile and then running the GNU make utility.
To set up automated building, create a file named Makefile in the same directory as cello.c:
Makefile
cello:
gcc -g -o cello cello.c
clean:
rm cello
Now to build the software, simply run make:
$ make make: 'cello' is up to date.
Since there is already a build present, make clean it and run make again:
$ make clean rm cello $ make gcc -g -o cello cello.c
Again, trying to build after another build would do nothing:
$ make make: 'cello' is up to date.
Finally, execute the program:
$ ./cello Hello World
You have now compiled a program both manually and using a build tool.
4.3.2. Interpreted Code
The next two examples showcase byte-compiling a program written in Python and raw-interpreting a program written in bash.
In the two examples below, the #! line at the top of the file is known as a shebang and is not part of the programming language source code.
The shebang enables using a text file as an executable: the system program loader parses the line containing the shebang to get a path to the binary executable, which is then used as the programming language interpreter.
4.3.2.1. Byte-Compiled Code
In this example, you will compile the pello.py program written in Python into byte code, which is then executed by the Python language virtual machine. Python source code can also be raw-interpreted, but the byte-compiled version is faster. Hence, RPM Packagers prefer to package the byte-compiled version for distribution to end users.
pello.py
#!/usr/bin/env python
print("Hello World")Procedure for byte-compiling programs is different for different languages. It depends on the language, the language’s virtual machine, and the tools and processes used with that language.
Python is often byte-compiled, but not in the way described here. The following procedure aims not to conform to the community standards, but to be simple. For real-world Python guidelines, see Software Packaging and Distribution.
Byte-compile pello.py:
$ python -m compileall pello.py $ file pello.pyc pello.pyc: python 2.7 byte-compiled
Execute the byte code in pello.pyc:
$ python pello.pyc Hello World
4.3.2.2. Raw Interpreted Code
In this example, you will raw-interpret the bello program written in the bash shell built-in language.
bello
#!/bin/bash printf "Hello World\n"
Programs written in shell scripting languages, like bash, are raw-interpreted. Hence, you only need to make the file with source code executable and run it:
$ chmod +x bello $ ./bello Hello World
4.4. Patching Software
A patch is source code that updates other source code. It is formatted as a diff, because it represents what is different between two versions of text. A diff is created using the diff utility, which is then applied to the source code using the patch utility.
Software developers often use Version Control Systems such as git to manage their code base. Such tools provide their own methods of creating diffs or patching software.
In the following example, we create a patch from the originial source code using diff and then apply it using patch. Patching is used in a later section when creating an RPM, Section 5.1.7, “Working with SPEC files”.
How is patching related to RPM packaging? In packaging, instead of simply modifying the original source code, we keep it, and use patches on it.
To create a patch for cello.c:
Preserve the original source code:
$ cp cello.c cello.c.orig
This is a common way to preserve the original source code file.
Change
cello.c:#include <stdio.h> int main(void) { printf("Hello World from my very first patch!\n"); return 0; }Generate a patch using the
diffutility:NoteWe use several common arguments for the
diffutility. For more information on them, see thediffmanual page.$ diff -Naur cello.c.orig cello.c --- cello.c.orig 2016-05-26 17:21:30.478523360 -0500 +++ cello.c 2016-05-27 14:53:20.668588245 -0500 @@ -1,6 +1,6 @@ #include<stdio.h> int main(void){ - printf("Hello World!\n"); + printf("Hello World from my very first patch!\n"); return 0; } \ No newline at end of fileLines starting with a
-are removed from the original source code and replaced with the lines that start with+.Save the patch to a file:
$ diff -Naur cello.c.orig cello.c > cello-output-first-patch.patch
Restore the original
cello.c:$ cp cello.c.orig cello.c
We retain the original
cello.c, because when an RPM is built, the original file is used, not a modified one. For more information, see Section 5.1.7, “Working with SPEC files”.
To patch cello.c using cello-output-first-patch.patch, redirect the patch file to the patch command:
$ patch < cello-output-first-patch.patch patching file cello.c
The contents of cello.c now reflect the patch:
$ cat cello.c
#include<stdio.h>
int main(void){
printf("Hello World from my very first patch!\n");
return 1;
}
To build and run the patched cello.c:
$ make clean rm cello $ make gcc -g -o cello cello.c $ ./cello Hello World from my very first patch!
You have created a patch, patched a program, built the patched program, and run it.
4.5. Installing Arbitrary Artifacts
A big advantage of Linux and other Unix-like systems is the Filesystem Hierarchy Standard (FHS). It specifies in which directory which files should be located. Files installed from the RPM packages should be placed according to FHS. For example, an executable file should go into a directory that is in the system PATH variable.
In the context of this guide, an Arbitrary Artifact is anything installed from an RPM to the system. For RPM and for the system it can be a script, a binary compiled from the package’s source code, a pre-compiled binary, or any other file.
We will explore two popular ways of placing Arbitrary Artifacts in the system: using the install command and using the make install command.
4.5.1. Using the install command
Sometimes using build automation tooling such as GNU make is not optimal - for example, if the packaged program is simple and does not need extra overhead. In these cases, packagers often use the install command (provided to the system by coreutils), which places the artifact to the specified directory in the filesystem with a specified set of permissions.
The example below is going to use the bello file that we had previously created as the arbitrary artifact subject to our installation method. Note that you will either need sudo permissions or run this command as root excluding the sudo portion of the command.
In this example, install places the bello file into /usr/bin with permissions common for executable scripts:
$ sudo install -m 0755 bello /usr/bin/bello
Now bello is in a directory that is listed in the $PATH variable. Therefore, you can execute bello from any directory without specifying its full path:
$ cd ~ $ bello Hello World
4.5.2. Using the make install command
A popular automated way to install built software to the system is to use the make install command. It requires you to specify how to install the arbitrary artifacts to the system in the Makefile.
Usually Makefile is written by the developer and not by the packager.
Add the install section to the Makefile:
Makefile
cello:
gcc -g -o cello cello.c
clean:
rm cello
install:
mkdir -p $(DESTDIR)/usr/bin
install -m 0755 cello $(DESTDIR)/usr/bin/celloThe $(DESTDIR) variable is a GNU make built-in and is commonly used to specify installation to a directory different than the root directory.
Now you can use Makefile not only to build software, but also to install it to the target system.
To build and install the cello.c program:
$ make gcc -g -o cello cello.c $ sudo make install install -m 0755 cello /usr/bin/cello
Now cello is in a directory that is listed in the $PATH variable. Therefore, you can execute cello from any directory without specifying its full path:
$ cd ~ $ cello Hello World
You have installed a build artifact into a chosen location on the system.
4.6. Preparing Source Code for Packaging
The code created in this section can be found here.
Developers often distribute software as compressed archives of source code, which are then used to create packages. In this section, you will create such compressed archives.
Creating source code archives is not normally done by the RPM Packager, but by the developer. The packager works with a ready source code archive.
Software should be distributed with a software license . For the examples, we will use the GPLv3 license. The license text goes into the LICENSE file for each of the example programs. An RPM packager needs to deal with license files when packaging.
For use with the following examples, create a LICENSE file:
$ cat /tmp/LICENSE This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
4.7. Putting Source Code into Tarball
In the examples below, we put each of the three Hello World programs into a gzip-compressed tarball. Software is often released this way to be later packaged for distribution.
4.7.1. bello
The bello project implements Hello World in bash. The implementation only contains the bello shell script, so the resulting tar.gz archive will have only one file apart from the LICENSE file. Let us assume that this is version 0.1 of the program.
Prepare the bello project for distribution:
Put the files into a single directory:
$ mkdir /tmp/bello-0.1 $ mv ~/bello /tmp/bello-0.1/ $ cp /tmp/LICENSE /tmp/bello-0.1/
Create the archive for distribution and move it to
~/rpmbuild/SOURCES/:$ cd /tmp/ $ tar -cvzf bello-0.1.tar.gz bello-0.1 bello-0.1/ bello-0.1/LICENSE bello-0.1/bello $ mv /tmp/bello-0.1.tar.gz ~/rpmbuild/SOURCES/
4.7.2. pello
The pello project implements Hello World in Python. The implementation only contains the pello.py program, so the resulting tar.gz archive will have only one file apart from the LICENSE file. Let us assume that this is version 0.1.1 of the program.
Prepare the pello project for distribution:
Put the files into a single directory:
$ mkdir /tmp/pello-0.1.1 $ mv ~/pello.py /tmp/pello-0.1.1/ $ cp /tmp/LICENSE /tmp/pello-0.1.1/
Create the archive for distribution and move it to
~/rpmbuild/SOURCES/:$ cd /tmp/ $ tar -cvzf pello-0.1.1.tar.gz pello-0.1.1 pello-0.1.1/ pello-0.1.1/LICENSE pello-0.1.1/pello.py $ mv /tmp/pello-0.1.1.tar.gz ~/rpmbuild/SOURCES/
4.7.3. cello
The cello project implements Hello World in C . The implementation only contains the cello.c and Makefile files, so the resulting tar.gz archive will have only two files apart from the LICENSE file. Let us assume that this is version 1.0 of the program.
Note that the patch file is not distributed in the archive with the program. The RPM Packager applies the patch when the RPM is built. The patch will be placed in the ~/rpmbuild/SOURCES/ directory alongside the .tar.gz.
Prepare the cello project for distribution:
Put the files into a single directory:
$ mkdir /tmp/cello-1.0 $ mv ~/cello.c /tmp/cello-1.0/ $ mv ~/Makefile /tmp/cello-1.0/ $ cp /tmp/LICENSE /tmp/cello-1.0/
Create the archive for distribution and move it to
~/rpmbuild/SOURCES/:$ cd /tmp/ $ tar -cvzf cello-1.0.tar.gz cello-1.0 cello-1.0/ cello-1.0/Makefile cello-1.0/cello.c cello-1.0/LICENSE $ mv /tmp/cello-1.0.tar.gz ~/rpmbuild/SOURCES/
Add the patch:
$ mv ~/cello-output-first-patch.patch ~/rpmbuild/SOURCES/
Now the source code is ready for packaging into an RPM.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.