Alchemist.Digital

// Jeffrey Reeves

Part 4 – Managing Files

October 14, 2017

Managing Files

Exam Objectives

  • 103.3 – Perform basic file management
  • 104.4 – Manage disk quotas
  • 104.5 – Manage file permissions
  • 104.6 – Create and change hard and symbolic links
  • 104.7 – Find system files and place files in the correct location

Using File Management Commands

Unix-like systems, particularly Linux, treats almost everything as a file, including most hardware devices and various specialized interfaces.

Naming Files

Linux filenames can contain uppercase letters, lowercase letters, numbers, and most punctuation and control characters.

Filenames in Linux are case sensitive. 

To avoid confusion, it is recommended to restrict any non-alphanumeric symbols in filenames to the period (.), the hyphen (-), and the underscore (_).

Some programs create backup files that end in a tilde (~).

Note: While filenames can contain spaces, they must be escaped on the command line with backslashes or by enclosing the entire filename in quotes (ex. “my picture.png” or my\ picture.png).

A few characters have special meaning and should never be used in filenames:

  • asterisk (*)
  • question mark (?)
  • forward slash (/) — Note: cannot be used even with escaping.
  • backslash (\)
  • quotation mark (")

The filename length depends on the filesystem in use. On ext2fs, ext3fs, ext4fs, XFS, Btrfs, and others, the limit is 255 characters in length.

Note: A single character requires 1 byte of storage. Therefore, the limit is often listed in bytes instead of characters. A 255-character file limit is often stated as a 255 byte limit.

Filenames beginning with a dot, often called “dot files” are hidden files; and they are generally used in a home directory for storing configuration files.

Note: If a FAT (File Allocation Table) filesystem is accessed using the msdos filesystem type code it limits filenames to 8.3 style filenames. If the vfat filesystem type code is used instead, Windows-style long filenames can be used. The umsdos filesystem type code for Linux-style long filenames stopped being supported after the 2.6.11 kernel.

A filename that consists of a single dot (.) refers to the current directory. A filename consisting of two dots (..) refers to the parent directory.

Wildcard Expansion Rules

Wildcards can be used with many commands.

A wildcard is a symbol or set of symbols that stands for other characters.

Three classes of wildcards are common in Linux:

Wildcard Matches Example
? Any single character b??k would match:
book, balk, buck, etc.
* Any character or set of characters b*k would match:
book, buck, bk, backtrack, etc.
[] Any single character or range of characters inside the square brackets b[ae]k would match:
bakandbek b[a-z]k would match:
bak, bbk, bck, …, byk, and bzk

Wildcards are implemented in the shell, and are passed to the command called with them. For example, calling ls b??k would be the same as typing ls balk book buck (assuming those three files existed in the current directory). 

Wildcards expanding to form filenames is knows as file globbing, filename expansion, or simply globbing

Understanding File Commands

The ls Command

The ls command is short for “list”, and it displays the names of files within a directory:

ls [options] [files]

If the files argument is omitted, ls displays the contents of the current directory.

File globbing is often used with this command:
$ ls *.txt
myfile.txt backup.txt linux_notes.txt 1995.txt

By default, ls creates a listing that is sorted alphabetically by filename.

Note: In the past, uppercase filenames appeared before lowercase letters. However, recent version of ls sort in a case-insensitive manner.

One of the most common ls options is -l, which outputs a “long listing” — including the permissions, ownership, file size, file creation date, etc. in addition to the filenames.

The most commonly used options for ls:

Feature Option Description
Display All Files

-a
--all

Displays all files, including hidden dot files.
Long Listing -l Displays file permissions, ownership, size, creation date, etc.
Display File Type -F
--classify

Appends an indicator code to the end of the filename to identify its filetype:

  • / Directory
  • * Executable
  • | Named pipe
  • = Socket
  • @ Symbolic link
Color Listing --color

Produces color-coded output to differentiate directories, symbolic links, files, etc.

Note: No standardization exists for the colors used.

Display Directory Names

-d
--directory

Lists only a directory’s name.

Sometimes useful when file globbing for directories — as the files included in these directories are not displayed in the output.

Recursive Listing -R
--recursive
Displays the contents of a directory and the contents of any/all subdirectories within it.

Note: The ls command supports combining multiple options. For example, ls -al instead of ls -a -l.

The cp Command

The cp command copies a file:

cp [options] <source> <destination>

The source argument can be one or more files.

The destination argument can be a directory when the source is one or more files.

When copying to a directory, cp preserves the original filenames in the source argument, unless new filenames are manually specified in the destination argument.

Note: Placing a forward slash (/) at the end of a directory is recommended — as this avoids accidentally copying a file to a new filename that’s a typo of the directory (ex. cp myfile.txt tmp/ would attempt to make a copy into the tmp/ directory, instead of copying the file to ./tmp if the tmp/ directory doesn’t exist).

The most commonly used options for cp:

Feature Option Description
Force Overwrite

-f
–force

Forces the system to overwrite any existing files without prompting.
Use Interactive Mode

-i
--interactive

Causes cp to ask before overwriting any existing files.
Preserve Ownership and Permissions

-p
--preserve

Preserves the ownership and permissions of the source file(s), instead of creating the copy with the ownership and permissions of the user that ran the cp command.

Recursive Copy -R
--recursive

Copies an entire directory and its subdirectories when the source argument is a directory.

Note: Although -r also performs a recursive copy, its behavior with anything other than ordinary files and directories is unspecified.

Perform an Archive Copy

-a
--archive

Recursively copies files and links while preserving ownership.

Unlike -R, -a will copy symbolic links themselves instead of the files that the symbolic links were pointing to.

Perform an Update Copy -u
--update
Copies only if the original is newer than the target, or if the target doesn’t exist.

The mv Command

The mv command is short for “move”, and is used for both moving files/directories and renaming them:

mv [options] <source> <destination>

The options for mv are the same as cp, except for --preserve, --recursive, and --archive.

Note: If a move occurs on one low-level filesystem, Linux completes the action very quickly by rewriting directory entries; the file’s data is not read or rewritten. However, if the target directory is on another partition or disk, Linux must take more time to read the original file, rewrite it to the new location, and delete the original.

The rm Command

The rm command stands for “remove”, and is used to delete a file or directory:

rm [options] <files>

All the options for cp apply, with the exception of --preserve, --archive, and --update

Note: On rm the -r option is synonymous with -R, unlike with the cp command.

The touch Command

Linux-native filesystems maintain three time stamps for every file:

  • Last file-modification time
  • Last inode change time
  • Last access time

Various programs rely on these time stamps. For example, the make utility uses the time stamps to determine which source-code files must be recompiled if an object file already exists when compiling a program from source code.

The touch command can be used to modify these time stamps:

touch [options] <files>

By default, touch sets the modification and access times to the current time. 

If a file passed to touch doesn’t yet exist, it will create it.

The most commonly used options for touch:

Feature Option Description
Change Only the Access Time

-a
--time=atime

Changes the access time alone, not the modification time.
Change Only the Modification Time -m
--time=mtime
Changes the modification time alone, not the access time.
Do Not Create File -c
--no-create

Prevents touch from creating a file if it doesn’t exist.

Set the Time as Specified -t <timestamp>

-r <reffile>

--reference=<reffile>

<timestamp> is a value in the format of MMDDhhmm [[CC] YY][.ss], where MM is the month, DD is the day, hh is the hour (24-hour clock), mm is the minute, [CC] YY is the year (ex. 2012 or 12), and ss is the seconds.

<reffile> is a file whose time stamp will be replicated.

Archiving File Commands

A group of files can be collected into a single package file, called an archive.

Linux supports several archiving commands, the most prominent being tar and cpio.

The dd command, although not technically an archiving command, is similar because it can copy an entire partition or disk into a file and vice versa.

Note: The zip format is supported on Linux via the zip and unzip commands.

Using tar

The tar program is short for “tape archiver”, even though tapes are rarely used for storage these days:

tar [options] <archive> <files-to-archive>

Note: A hyphen (-) is no longer put in front of tar options.

Options consist of both commands and qualifiers.

Only one command is used at a time, and one or more qualifiers may be used with it.

tar commands:

Command Abbreviation Description
--create c Creates an archive
--concatenate A Appends tar files to an archive
--append r Appends non-tar files to an archive
--update u Appends files that are newer than those in an archive
--diff
--compare
d Compares an archive to files on disk
--list t Lists an archive’s contents
--extract
--get
x Extracts files from an archive

tar qualifiers:

Qualifier Abbrev. Description
--directory <dir> C Changes to the <dir> directory before performing operations.
--file <[host:]file> f Uses the file called <file> on the machine called <host> as the archive file.
--listed-incremental <file> g Performs an incremental backup or restore, using <file> as a list of previously archived files.
--multi-volume M Creates or extracts a multi-volume archive.
--preserve-permissions p Preserves all protection information.
--absolute-paths P Retains the leading/on filenames.
--verbose v Lists all files read or extracted.

When used with --list, displays file sizes, ownership, and time stamps.

--verify W Verifies after writing
--gzip
--ungzip
z Compress with gzip
--bzip2 j
(older versions may use:
I or y)
Compress with bzip2
--xz J Compress with xz

The most commonly used commands are:

  • c — create archive
  • x — extract archive
  • t — list archive

The most useful qualifiers are:

  • g — perform incremental backup
  • p — keep permissions
  • z — use gzip compression
  • j — use bzip2 compression
  • J — use xz compression
  • v — verbose output

Using cpio

The cpio utility has three operating modes:

The most commonly used options for ls:

Mode Option Description
Copy-Out

-o
--create

Creates an archive and copies files into it.
Copy-In -i
--extract
Extracts data from an existing archive.

If a filename or pattern is provided as an argument, cpio extracts only  files that match the pattern.

Copy-Pass -p
--pass-through

Combines copy-out and copy-in modes.

Commonly used cpio options:

Option Abbrev. Description
--reset-access-time -a Resets the access time after reading a file so that it doesn’t appear to have been read.
--append -A Appends data to an existing archive.
--pattern-file=<filename> -E <filename> Uses the contents of <filename> as a list of files to be extracted in copy-in mode.
--file=<filename> -F <filename> Uses <filename> as the cpio archive file. 

If omitted, cpio uses standard input or output.

--format=<format> -H <format>

Uses a specified format for the archive file.

Common values for format:

  • bin — old binary format
  • crc — newer binary format with a checksum
  • tar — format used by tar
  -I <filename> Uses <filename> instead of standard input.

Note: Does not redirect output data like -F.

--no-absolute-filenames   Extracts files relative to the current directory when in copy-in mode, even if filenames in the archive contain full directory paths.
  -o <filename> Uses <filename> instead of standard output.

Note: Also does not redirect output data like -F.

--list -t Displays a table of contents for the input.
--unconditional -u Replaces all files without first asking for verification.
--verbose -v Displays filenames as they are added or extracted.

When combined with -t, additional listing information is provided (similar to ls -l).

To use cpio to archive a directory, a list of files must be passed using standard input (STDIN). One way to do this is by piping the STDOUT of the find utility into the cpio command:
$ find ./my-backups | cpio -o > ~/my-backup.cpio

The above file would be uncompressed, to compress the archive it could be further piped to gzip or any other compression utility.

Using dd

The dd utility is a low-level copying program. It can be used to archive an entire filesystem at a very low level:

# dd if=<file> of=<file>

The dd utility can be used to create exact backups of entire partitions (including empty space).

If an empty file of a particular size is needed, the dd utility can do that as well:

$ dd if=/dev/zero of=empty-file.img bs=1024 count=720

The bs option is for block size, and count is the length of the file (i.e. the above example creates a 1024 byte x 720 byte empty file).

Managing Links

A link in Linux gives a file multiple identities, and is similar to shortcuts in Windows and aliases in Mac OS.

A few reasons Linux uses links:

  • Help make files more accessible.
  • Give commands multiple names.
  • Enable programs to access the same files when they look for the same files in different locations.

Two types of links exist:

  1. hard links
  2. symbolic links (soft links)

Hard links are made by creating two directory entries that point to the same inode / file.

Both filenames are equally valid and prominent; and neither is regarded as the “truer” filename over the other.

To delete the file, both hard links to the file must be deleted.

The filesystem must support hard links in order to use them; and all Linux-native filesystems support hard links. 

Because of the way hard links are created, they must exist on a single low-level filesystem (i.e. hard links cannot be created across multiple mounted filesystems).

Symbolic links (soft links), are special file types.

A symbolic link is a separate file whose contents point to the linked-to file.

Because Linux knows how to access the linked-to file, accessing a symbolic link works just like accessing the original file in most respects.

Unlike hard links, symbolic links can point across low-level filesystems — as symbolic links are essentially files that contain filenames.

The lookup process for accessing the original file from the link consumes a tiny bit of time, so symbolic links are slower than hard links — but not enough that it would be noticed by anything other than very odd conditions or artificial tests.

The ln command creates both types of links:

ln [options] <source> <link>

<source> represents the original file, and <link> is the name of the link to be created.

The options for ln:

Feature Option Description
Prompt Before Changes -i
--interactive
Prompts before replacing existing files and/or links.
Remove Target Files -f
--force
Removes any existing links or files that have the target <link> name.
Create Directory Hard Links -d
-F
--directory
Ordinarily, hard links cannot be created to directories.

The root user can attempt to do so by passing one of these options.

Note: In practice, this feature is unlikely to work because most filesystems don’t support it.

Create a Symbolic Link -s
--symbolic
Creates a symbolic link instead of a hard link.

Note: by default, the ln command will create a hard link.

To determine if a file has any hard links to it, check the link count number within the output of ls -l:

$ ls -l /home/jeff/*.txt
-rwxr--r--. 3 jeff alchemists my-file.txt

In the above output, my-file.txt has 3 hard links to the same inode / file.

To get the exact inode number that a file is hard linked to, use the -i option with the ls command:

$ ls -i /home/jeff/*.txt
519205 my-file.txt

In the above example, my-file.txt‘s inode number is 519205.

Unlike with hard links, symbolic links do not increase a file’s link count number.

Symbolic links can be identified with ls -l:

$ ls -l /home/jeff/links/
-rw-rw-r--. 1 jeff alchemists a-normal-file.txt
lrwxrwxrwx. 1 jeff alchemists a-link-file -> linked-file.txt

Symbolic links show both an l for the file type code, as well as an arrow -> to show where the symbolic link points to.

Understanding Directory Commands

Using mkdir

The mkdir command creates a new directory:

mkdir [options] <directory-names>

The options for mkdir:

Feature Option Description
Set Mode -m <mode>
--mode=<mode>
Sets the permissions of a new directory to the desired octal values.
Create Parent Directories -p
--parents
Creates all necessary parent directories for the new directory, if they do not already exist.

Using rmdir

The rmdir command removes directories:

rmdir [options] <directory-names>

Note: rmdir can only remove empty directories. If any files are present, it will generate an error.

The options for rmdir:

Feature Option Description
Ignore Failures on
Non-Empty Directories
 --ignore-fail-on-non-empty Prevents error messages from being displayed when attempting to remove non-empty directories.

Note: use rm -R or rm -r to remove directories filled with files instead.

Delete Tree  -p
--parents
Deletes directory and all parents, recursively. 

Note: directories must be empty.

Managing File Ownership

Security for files is built on file ownership and file permissions.

Accessing File Ownership

Each file has an individual owner and a group that it is associated with.

$ ls -l example-dir/
-rwxrw-r--. 1 jeff alchemists my-file.txt

The owner of this file is jeff, and the group of this file is alchemists.

 

It is important to know that if an account is deleted, the account’s files do not get deleted.

Files left over from a removed user are known as “orphaned files“.

Because Linux uses numbers to represent users internally (rather than usernames), these numbers will be displayed in place of the username and group on orphaned files.

If a new user is assigned the same number as a removed user, they could access their orphaned files. As such, it is recommended to reassign ownership to an existing user, archive them, or delete them.

Changing a File’s Owner

The chown command can be used to change a file’s owner and group:

chown [options] [<new-owner>][:<new-group>] <filenames>

Note: Linux’s chown command accepts a dot (.) in place of a colon (:) to delimit the owner and group. However, the use of a dot has been deprecated, and the colon should be used instead.

Several options are available for the chown command, but the most likely to be used is -R / --recursive, which implements changes on an entire directory tree.

Note: Only the root user can use the chown command to change the ownership of files. However, ordinary users may use chown to change the group of files that they own, as long as the user belongs to the target group as well. 

Changing a File’s Group

The chgrp command only changes a file’s group:

chgrp [options] <newgroup> <filenames>

There are several options for chgrp but the most commonly used is -R / --recursive, which updates the group for an entire directory tree.

Note: Ordinary users can only change the group to one that they belong to.

Controlling Access to Files

Understanding Permissions and Permission Bits

File permissions are set using permission bits

The permissions of a file are easily viewed with ls -l:

$ ls -l example-dir/
-rwxrw-r--. 1 jeff alchemists my-file.txt

The permission string is the first column of output (-rwxrw-r--)

The first character of the permission string represents the file type code:

Code Represents Description
- Normal data file Text, executable programs, graphics, compressed data, etc.
d Directory Contains filenames and pointers to disk inodes.
l Symbolic link Contains the name of another file or directory.
p Named pipe Enables two running Linux programs to communicate with each other. 

One opens the pipe for reading, and the other opens it for writing, enabling data to be transferred between the programs.

s Socket Similar to a named pipe, but permits network and bidirectional links.
b Block device A file that corresponds to a hardware device, where data is transferred in blocks of more than 1 byte to and from.

Disk devices (hard disks, flash drivers, CDs, etc.) are common block devices.

c Character device Similar to a block device, but used for devices such as parallel ports, RS-232 serial ports, and audio devices.

The remaining nine characters in the permission string (rwxrw-r--) can be broken into three groups of 3 characters each. The first group represents the owner, the second group represents the group, and the third group represents everyone else (often referred to as “other“).

$ ls -l example-dir/
-rwxrw-r--. 1 jeff alchemists my-file.txt

In the above output, the owner of this file (jeff) has read, write, and execute permissions (rwx).
The group of this file (alchemists) has read and write permissions (rw).
All other users, who are not jeff and do not belong to the alchemists group, would have only read permissions (r).

Absence of a permission is denoted by a hyphen (-).

Execute permissions allow the file to be executed as a program.

Because the permission bits in a permission string are binary, they can also be expressed as a single 9-bit number. This number  is usually expressed in octal (base 8) form because a base-8 number is 3 bits in length.

The read, write, and execute permissions correspond to these bits:

  • 4 – read permission
  • 2 – write permission
  • 1 – execute permission

Permission bits for a particular permission type are added together as a single digit. For example, read (4), write (2), and execute (1) permissions would be represented as 7.

An example of common octal codes for permissions:

Octal Code Permission String Description
777 rwxrwxrwx Read, write, and execute for all (owner, group, and other).
755 rwxr-xr-x Read and execute for all.
Read, write, and execute for owner.
700 rwx------

Read, write, and execute only for owner.
No permissions for group or other.

666 rw-rw-rw- Read and write for all. 
No execute permissions for any.
644 rw-r--r-- Read for all.
Read and write for owner.
640 rw-r----- Read and write for owner.
Read for group.
No permissions for other.
600 rw-------

Read and write for owner.
No permissions for group or other.

400 r-------- Read for owner.
No permissions for group or other.

Execute permissions are meaningless for most file types, such as device files. However, directories use the execute bit to allow its contents to be searched.

Note: symbolic links always have 777 (rwxrwxrwx) permissions, regardless of the permissions of the file it links to. Attempting to change the link’s permissions will alter the permissions of the file it links to instead.

The root user can read or write any file — even ones set to 000 permissions. However, the execute bit still needs to be set on a file to run it, even as root.

Understanding Special Permission Bits

Set User ID (SUID)

The set user ID (SUID) option is used in conjunction with executable files.

This tells Linux to run the program with the permissions of whoever owns the file, rather than with the permissions of the user who runs the program.

SUID programs are identified by an s in the owner’s execute bit position of the permission string (ex. rwsr-xr-x).

If the SUID bit is set but execute permissions are not, the permission string will show a capital S (ex. rwSr-xr-x). In this case, SUID will not function.

Set Group ID (SGID)

The set group ID (SGID) option is similar to SUID, but it sets the running program’s group to the file’s group.

If SGID is set, the permission string will show an s in the group’s execute bit (ex. rwxr-sr-x). 

If the execute bit is not set but SGID is, a capital S will show in the group’s execute bit (ex. rwxr-Sr-x) — making it benign.

Note: SGID is useful on directories. When the SGID bit is set on a directory, new files or subdirectories created in the original directory will inherit the group ownership of the directory, rather than being set to the user’s current default group.

Sticky Bit

The sticky bit is used to protect files from being deleted by those who do not own the files.

When the sticky bit is present on a directory, only the owner of the directory’s files (or root) can delete them.

The sticky bit is identified as a t in the execute bit of the other/world (ex. rwxr-xr-t).

DAC, ACLs, and SELinux

The permissions covered so far fall under the discretionary access control (DAC) model.

The DAC model is considered inefficient for properly security a Linux system by security professionals.

An improved permission system, called an access control list (ACL) is a list of users or groups and the permissions they’re given.

Linux ACLs, like Linux owner, group, and other permissions, consist of three permission bits. Each bit is for read, write, and execute permissions. 

The setfacl command must be used to set an ACL, and the getfacl command must be used to display ACLs for a file.

An even better security approach is a model called mandatory access control (MAC), and its subcategory, role-based access control (RBAC). These models are implemented by the SELinux utility — available on many Linux distributions.

SELinux is a very complex utility. It implements RBAC security using the Bell-LaPadula model and either type enforcement or multi-level security. 

Changing a File’s Mode

A file’s permissions can be modified with the chmod command:

chmod [options] [mode[,mode...]] <filename>

The chmod command has similar options to chown and chgrp, including the -R / --recursive option to apply changes to an entire directory tree.

The mode of chmod can be specified in two basic forms:

  1. A three-digit octal number.
  2. A symbolic mode.
Octal Mode

The octal numbers are the same as described earlier, and could be set like so:

$ chmod 644 my-file.txt
$ ls -l my-file.txt
-rw-r--r--. 1 jeff alchemists my-file.txt

In addition to using the three-digit octal mode, a fourth digit can be prepended to set SUID, SGID, and/or sticky bit permissions:

  • 4 – SUID permission
  • 2 – SGID permission
  • 1 – Sticky bit permission

If this digit is omitted, Linux clears all three special permission bits (i.e. using chmod 644 is the same as chmod 0644).

Note: Setting SGID requires root privileges.

Symbolic Mode

The symbolic representation consists of there components:

  1. A code indicating the permission set to modify (i.e. the owner, the group, other, etc.).
  2. A symbol indicating whether to add, delete, or set the mode equal to a stated value.
  3. A code specifying what the permission should be.

The permission set codes used in symbolic mode:

Permission Set Code Description
u Owner / User
g Group
o Other / World
a All
(owner, group, and other)

The change type codes used in symbolic mode:

Change Type Code Description
+ Add
- Remove
= Set equal to

The permission to modify code used in symbolic mode:

Permission to Modify Code Description
r Read
w Write
x Execute
X Execute
(only if the file is a directory or already has execute permissions)
s SUID or SGID
t Sticky bit
u Existing owner’s permissions
g Existing group permissions
o Existing other permissions

Examples of symbolic permissions used with chmod:

Command Permissions (Before) Permissions (After)
chmod a+x rw-r--r-- rwxr-xr-x
chmod ug=rw r-------- rw-rw----
chmod o-rwx rwxrwxr-x rwxrwx---
chmod g=u rw-r--r-- rw-rw-r--
chmod g-w,o-rw rw-rw-rw- rw-r-----

A file’s owner and root are the only users that may adjust a file’s permissions.

Setting the Default Mode and Group

When a user creates a file, that file has default ownership and permissions.

The default owner is set to the user who created the file, and the default group is set to the user’s primary group.

Default permissions are configurable via the user mask, which is set by the umask command.

The umask command takes an octal value as input, which represents the bits to be remove from 777 permissions on directories and 666 permissions for files when they are created.

Examples of umask values and their effects:

umask Files Directories
000 666 (rw-rw-rw-) 777 (rwxrwxrwx)
002 664 (rw-rw-r--) 775 (rwxrwxr-x)
022 644 (rw-r--r--) 755 (rwxr-xr-x)
027 640 (rw-r-----) 750 (rwxr-x---)
077 600 (rw-------) 700 (rwx------)
277 400 (r--------) 500 (r-x------)

Note: The umask isn’t a simple subtraction from the values of 777 or 666, it is bit-wise removal. 

Ordinary users can enter the umask command to change the permissions on new files they create.

The root user can modify the default settings for all users by modifying a system configuration file (typically /etc/profile). However, these settings can be overridden at several other points before a file is actually created, such as by a user’s own configuration files.

Most Linux distributions ship with a default umask of 002 or 022.

Entering the umask command without any arguments will display the current umask setting:

$ umask
0022

The first digit represents the octal code for SUID, SGID, and the sticky bit.

To change the umask, enter a four-digit octal code:

$ umask 0002
$ umask
0002

To express the umask symbolically, use the -S option:

$ umask -S
u=rwx,g=rwx,o=rx

The changes to umask are immediate. Any newly created files or directories will show the modification.

Similar to umask, users can change their default group with newgrp:

newgrp <groupname>

To use the newgrp command, the user must be a member of the specified group.

Note: The -l option can be used with the newgrp command to reinitialize the environment as if the user had just logged back in.

Changing File Attributes

Some Linux-native filesystems support several attributes that can be adjusted with the chattr command:

chattr <change><attribute-code>

The change symbols used with chattr:

Change Symbol Description
+ Add
- Remove
= Set equal to
(overwrites any that already exist)

The attribute codes used with chattr:

Attribute Option Description
No Access Time Updates A Linux won’t update the access time stamp when you access a file.

This can reduce disk input/output, which can be helpful for saving battery life on laptops.

Append Only a Disables write access to the file except for appending data.

Useful as a security feature to prevent accidental or malicious changes to files that record data (such as log files).

Compressed c The kernel will automatically compress data written to a file and uncompress it when it’s read back.
Immutable i Makes a file immutable.

The file cannot be deleted, links to it cannot be created, and the file cannot be renamed.

Data Journaling j The kernel will journal all data written to the file.

This improves recoverability of data written to the file after a system crash, but it can slow performance.

Note: This flag has no effect on ext2 filesystems (because they do not use a journal).

Secure Deletion s When a file is deleted, the kernel zeros its data blocks, rather than simply setting its inode block as available for recycling.
No Tail-Merging t Prevents small data pieces from being pushed into empty spaces near the end of blocks.

This may be useful if non-kernel drivers will read the filesystem, such as those that are part of the Grand Unified Boot Loader (GRUB).

Managing Disk Quotas

A single user can cause serious problems if they consume too much disk space.

To prevent these kinds of issues, Linux supports disk quotas.

Disk quotas limit how many files or how much disk space a single user may consume, and is enforced by the OS.

The Linux quota system supports quotas for both individuals and Linux groups.

Enabling Quota Support

Quotas require support in the kernel for the filesystem being used and various user-space utilities.

The following filesystems support quotas:
ext2fs, ext3fs, ext4fs, ReiserFS, JFS, and XFS.

Note: Quota support is missing for some filesystems in early 2.6.x kernels.

The Quota Support kernel option must be explicitly enabled when recompiling a kernel. However, most distributions ship with this support enabled.

Two general quota support systems are available for Linux:

  1. Quota v1 support system — first available through the 2.4.x kernels.
  2. Quota v2 support system — added with the 2.6.x kernel series.

The rest of this section is on the quota v2 support system, although the v1 system works similarly.

In addition to the kernel support, additional support tools are required to use quotas. 

For the quota v2 system, the package of additional support tools is usually called quota and it installs a number of utilities, configuration files, startup scripts, etc.

Each partition that quotas are desired on must have their /etc/fstab entry edited similarly to:

/dev/sda1 /home ext4 usrquota,grpquota 1 1

The line above activates both user and group quota support for the /dev/sda1 partition, which is mounted at /home. Other options may be added as well, depending upon preference.

Depending on the distribution, at this point it may be necessary to configure the quota package’s system startup scripts to run when the system boots. On SysV enabled systems, this command would be:

chkconfig quota on

The startup script runs the quotaon command, which activates quota support.

Note: The root user can turn on quotas at any time by using the quotaon command manually. Likewise, the root user can turn off quotas at any time using the quotaoff command.

Rebooting the system is one way to make configuration file changes take effect. However, a better option is to run the start script for the quota tools and remount the filesystems that have quotas being enabled or disabled.

After relevant filesystems have been remounted, the quotacheck command can be used to survey the filesystems and build current disk usage records.

Two files must be built to enable quotas:

  1. aquota.user 
  2. aquota.group

To build these files:

quotacheck -cug

In the above command, the -c option is for “create”, the -u option is for “users”, and the -g option is for groups.

Note: The quotacheck command can also verify and update quota information on quota-enabled partitions. This command is normally run as part of the quota package’s startup script, but it may be desirable to run it periodically via a cron job.

An example of building the quotas for users in the /home directory:

# quotacheck -cu /home

Setting Quotas for Users

Quotas can be set using the edquota command.

The edquota command uses a text editor, such as the vi editor. However, it will use whatever editor is set for the EDITOR environment variable.

The editor opens a temporary configuration file (/etc/quotatab), which controls the specified user’s quotas.

When the editor is closed, edquota uses the temporary configuration file to write the quota information to low-level disk data structures — which in turn control the kernel’s quota mechanisms.

An example of editing jeff‘s quotas would be:

edquota jeff

The configuration file would contain something such as:

Disk quotas for user jeff (uid 519):
Filesystem    blocks    soft    hard     inodes    soft     hard
/dev/sda1     97119     1023565 1023565  1059      0        0

This temporary configuration file shows the number of disk blocks and inodes in use.

Each file or symbolic link consumes a single inode, so the inode limits are essentially file number limits.

Disk blocks can vary in size depending on the filesystem and filesystem creation options.

The blocks and inodes columns display how many blocks or inodes the user is actually consuming, and changing these values manually has zero effect.

The hard limits for blocks and inodes is the maximum number of blocks or inodes that the user may consume. The kernel will not permit a user to surpass these hard limits.

The soft limits are not as strict. Users may temporarily exceed a soft limit value, but the systems issues warnings when they do.

Soft limits also have a grace period. If the user exceeds their soft limits beyond the grace period, the kernel will treat it like a hard limit and refuse to allow the user to create more files.

The soft limit grace period can be set using the -t option of edquota:

edquota -t

Note: Grace periods are set on a per-filesystem basis rather than a per-user basis.

Setting the limit to 0 eliminates the use of quotas.

Each filesystem can have its quotas adjusted independently using the edquota command, including their users and groups.

To edit quotas for a group, use the -g option:

edquota -g <group-name>

To summarize the quota information about a filesystem use the repquota command. For all filesystems, use the -a option with repquota.

The quota command is similar. The -g option displays group quotas, the -l option omits NFS mounts, and the -q option limits output to filesystems to only those above their limits.

Locating Files

The FHS

Linux’s placement of files is derived from over 40 years of Unix history.

The following sections describe the Linux directory layout standards, with an overview of where everything goes.

Comparing FSSTND with FHS

Unix has a long history, containing numerous splits and variants.

The first split was caused by the Berkeley Standard Distribution (BSD); which was initially a set of patches and extensions to AT&T’s original Unix code.

To prevent fractioning of the Linux community, the Filesystem Standard (FSSTND) was created in early 1994.

The FSSTND standardized several specific features:

  • Programs reside in /bin and /usr/bin.
  • Executable files shouldn’t reside in /etc.
  • Changeable files shouldn’t be placed in the /usr directory tree — allowing it to be mounted read-only.

There have been three major revisions of FSSTND:

  • 1.0
  • 1.1
  • 1.2

The Filesystem Hierarchy Standard (FHS) was then created in 1995 to extend FSSTND and overcome the limitations that were becoming apparent with it.

Note: The FHS comes in numbered versions, such as v2.3. Although it’s not updated often, it is wise to check for FHS modifications at http://www.pathname.com/fhs/.

One important distinction made by FHS is between shareable files and unshareable files.

Shareable files, such as user data files and program binary files, may be reasonably shared between machines.

Note: Typically files are shared through an NFS server.

Unshareable files contain system-specific information, such as configuration files.

A second important distinction made by FHS is between static files and variable files.

Static files do not normally change except through direct intervention by the system admin. 

Most program executables are examples of static files.

Variable files may be changed by users, automated scripts, servers, etc.

User’s home directories and mail queues are made of variable files.

The FHS tries to isolate each directory into one cell of a 2×2 grid of shareable / unshareable vs static / variable:

  Shareable Unshareable
Static /usr
/opt
/etc
/boot
Variable /home
/var/mail
/var/run
/var/lock

Some directories are mixed, but in those cases, the FHS tries to classify its subdirectories. For example, /var is variable but it contains both shareable and unshareable subdirectories.

Important Directories and Their Contents

The most common directories defined by the FHS and/or convention:

Directory Details
/ The root filesystem or root directory.

All other directories branch from this directory.

/boot Contains static and unshareable files related to the computer’s initial booting.

In this directory, GRUB or LILO configuration files can be found, along with other files necessary for the initial boot.

It is usually recommended to store /boot on its own partition.

/etc Contains unshareable and static system configuration files.

These higher-level startup and configuration files control the various programs and services on the system.

Systemd configuration files are stored in /etc/systemd, and SysV configuration files are stored in /etc/ directories.

/bin Contains unshareable and static executable files, such as ls, cp, mv, rm, and mount.

These commands are accessible to all users and constitute the most important commands that ordinary users might use.

/sbin Contains unshareable and static executable files typically run by only the system administrator, such as fdisk, e2fsck, etc.
/lib Contains unshareable and static program libraries.

Program libraries consist of code that’s shared across many programs and stored in separate files to save disk space and RAM.

The /lib/modules subdirectory contains kernel modules — which are drivers that can be loaded and unloaded as required.

/usr Contains shareable and static programs.

This directory contains /usr/bin and /usr/lib, which contain programs and libraries that aren’t absolutely critical to the computer’s basic functioning.

It can be mounted read-only on a separate partition if desired.

/usr/local Contains subdirectories that mirror the organization of /usr, such as /usr/local/bin and /usr/local/lib.

This directory hosts files that a system administrator installs locally, such as packages that are compiled on it. 

The idea behind this directory is to have an area that is safe from automatic software upgrades when the OS is upgraded as a whole.

Some admins split this directory off into its own partition to protect it from OS reinstallation procedures that might erase the parent partition.

/usr/share/man Contains manual pages used by the man command.

Local manual pages are stored in the /usr/local/share/man directory instead.

/usr/X11R6 Contains files related to the X Window System (abbreviated as X) — Linux’s GUI environment.

This directory was common several years ago, but most modern distributions have moved its contents to other directories, such as /usr/bin.

/opt Contains shareable and static ready-made packages that do not ship with the OS, such as word processors and games.

Some admins break it into a separate partition, or make it a symbolic link to a subdirectory of /usr/local and make that a separate partition.

/home Contains shareable and variable files composing regular user’s data.

The /home directory is often added to its own partition.

/root Contains unshareable and variable files, which make up the home directory for the root user.
/var Contains variable files and a mix of subdirectories that are shareable and unshareable. 

Most of its contents are transient files of various types — system log files, print spool files, mail, news, etc.

Many admins put /var in its own partition, especially on systems that have a lot of web activity.

/tmp Contains temporary files often created by many programs during normal operation. 

Most distributions have routines that clean out this directory periodically, and some even wipe the directory clean at bootup.

Some admins put /tmp on its own partition so it doesn’t cause issues on the root filesystem if an excessive amount of files (or very large files) are created within it.

A similar directory exists as part of the /var directory tree at /var/tmp.

/mnt Contains shareable and a mix of variable and static files.

Linux mounts removable-media devices on /mnt.

Some older distributions create subdirectories within /mnt, such as /mnt/cdrom. Others may use /mnt directory, or use separate mount points, like /cdrom.

/media An optional part of the FHS.

It is similar to /mnt, but it should contain subdirectories for specific media types, such as /media/dvd.

Many modern distributions use /media subdirectories as the default mount points for common removable disk types — often creating necessary the subdirectories dynamically as they are attached.

/dev Contains a large number of files that function as hardware interfaces — as Linux treats most hardware devices as if they were files, and the OS must have a location in its filesystem where these device files reside.

A user may access the device hardware by reading from and writing to the associated device files, if they have sufficient privileges.

The Linux kernel supports a device filesystem that enables /dev to be an automatically created virtual filesystem. The kernel and support tools create /dev entries dynamically to accommodate the needs of specific drivers.

/proc Contains a virtual filesystem that’s created on the fly by Linux to provide access to certain types of hardware information that isn’t accessible via /dev.

For example, cat /proc/cpuinfo displays information about the machine’s CPU (including model name, speed, etc.).

The directories that are recommended to have their own partition are:

  • /boot
  • /home
  • /opt
  • /tmp
  • /usr
  • /usr/local
  • /var

The directories that should not be on their own partitions are:

  • /bin
  • /dev
  • /etc
  • /lib
  • /sbin

Note: Some Linux distributions deviate from the FHS.

Using find

The find utility uses a brute-force approach to finding files.

The program finds files by searching through the specified directory tree, checking filenames, file creation dates, etc. to locate the files that match a specified criteria.

The syntax for find is:

find [path...] [expression...]

One or more paths can be specified that find will search through.

The expression option is a way of specifying what is being searched for.

The most commonly used options for find:

Feature Option Details
Search by Filename -name <pattern> <pattern> can be a ordinary filename, or enclosed within quotes to use wildcards.
Search by Permission Mode -perm <mode> <mode> may be expressed either symbolically or in octal form.

If <mode> is preceded by +, it will match files in which any of the specified permission bits are set.

If <mode> is preceded by -, it matches files in which all of the specified permission bits are set.

Search by File Size -size <n> <n> is specified in 512-byte blocks, but this can be changed by trailing the value with a letter code, such as c for bytes or k for kilobytes.
Search by Group -gid <GID>
-group <name>
<GID> is the group ID.

<name> is the name of the group.

Search by User ID -uid <UID>
-user <name>
<UID> is the user ID.

<name> is the user’s name.

Restrict Search Depth -maxdepth <level> <level> is a digit representing the number of subdirectory levels to search within.

Note: Ordinary users may use find, but if the user lacks permissions to list a directory’s contents, find will return that directory’s name and a Permission denied error.

Using locate

The locate utility is similar to find, but with two major exceptions:

  1. It is far less sophisticated in its search options.
  2. It works from a database that it maintains.

Most distributions include a cron job that calls utilities to update the locate database. For this reason, locate may not find recent files, or it may return the names of files that no longer exist.

The locate database can be updated with the updatedb command — which is configured via the /etc/updatedb.conf file.

The syntax for locate is:

locate <search-string>

The <search-string> can be an entire filename or a partial filename.

Note: Some distributions use slocate instead of locate — as the slocate command includes security features to prevent users from seeing the names of files in directories they shouldn’t have access to. On many systems that use slocate, locate is just a link to slocate.

Using whereis

The whereis command searches for files in a restricted set of locations, such as standard binary file directories, library directories, and man page directories.

Generally the whereis command is used to quickly find program executables and related files like documentation or configuration files.

To use this command, type the name of the program to be found. For example:

$ whereis ls
ls: /bin/ls /usr/share/man/man1p/ls.1p.gz
/usr/share/man/man1/ls.1.gz

Using which

The which utility only searches directories defined within the PATH environment variable, and lists the complete path to the first match it finds:

$ which xterm
/usr/bin/xterm

To get all matches, use the -a option.

Managing Files Essentials

  • Describe commands used to copy, move, and rename files in Linux.
    • The cp command copies files, as in cp first second to create a copy of first called second.
    • The mv command does double duty as a file-moving and file-renaming command.
  • Summarize Linux’s directory-manipulation commands.
    • The mkdir command creates a new directory, and rmdir deletes empty directories.
    • Many of the file-manipulation commands, such as mv an rm can be used with the -R / -r option on directories.
  • Explain the difference between hard and symbolic links.
    • Hard links are duplicate directory entries that both point to the same inode and hence to the same file.
    • Symbolic links are special files that point to another file or directory by name.
    • Hard links must reside on a single filesystem, but symbolic links may point across filesystems.
  • Summarize the common Linux archiving programs.
    • The tar and cpio programs are both file-based archiving tools that create archives of files using ordinary file access commands.
    • The dd program is a file-copy program; however, when it’s fed a partition device file, it copies the entire partition on a very low-level basis — which is useful for creating low-level backups of Linux or non-Linux filesystems.
  • Explain the difference between compression utilities.
    • The gzip, bzip2, and xz utilities are compression tools, which reduce a file’s size via compression algorithms.
    • These utilities are often used in conjunction with the tar command.
    • gzip is the oldest compression tool and provides the least compression.
    • bzip2 provides slightly improved file compression over gzip.
    • xz is the newest compression tool and provides the best compression.
  • Describe Linux’s file ownership system.
    • Every file has an owner and a group, identified by number.
    • File permissions can be assigned independently to the file’s owner, the file’s group, and to all other users.
  • Explain Linux’s file permissions system.
    • Linux provides independent read, write, and execute permissions for the file’s owner, group, and everyone else (other); resulting in nine main permission bits.
    • Special permission bits are also available, which enable you to launch program files with modified account features, or alter the rules Linux uses to control who may delete files.
  • Summarize the commands Linux uses to modify permissions.
    • The chmod command is Linux’s main tool for setting permissions.
    • Permissions can be specified using either an octal (base 8) mode or a symbolic notation.
    • The chown and chgrp commands can be used to change the file’s owner and group, respectively.
    • The chown command can change both a file’s owner and group if run by root).
  • Describe the prerequisites of using Linux’s disk quota system.
    • Linux’s disk quota system requires support in the Linux kernel for the filesystem on which quotas are to be used.
    • The quotaon command must also be run, typically from a startup script, to enable disk quotas.
  • Explain how quotas are set.
    • Individual user quotas can be edited using the edquota command (ex. edquota jeff).
    • This command opens an editor on an editable text file that describes the user’s quotas.
  • Summarize how Linux’s standard directories are structured.
    • Linux’s directory tree begins with the root (/) directory, which holds mostly other directories.
    • Specific directories may hold specific types of information, such as user files in /home and configuration files in /etc.
    • Some of these directories and their subdirectories may be separate partitions, which helps isolate data in the event of filesystem corruption or disk space issues.
  • Describe the major file-location commands in Linux.
    • The find command locates files by brute force, searching through the directory tree for files that match the specified criteria.
    • The locate (or slocate) command searches a database of files in publicly accessible directories.
    • The whereis command searches a handful of important directories.
    • The which command searches only path directories.