Skip to content
User Tools: Mail Login Calendar Contacts Network Search Downloads | +/ -

File Archiving Guide

Note: this document is about practical use of UNIX File Archiving tools. This is not about the theory of Data Compression or about Vistua.com archives or about workstation backups.

1.  Abstract

This short document will introduce you to the concept of file "archives" and will show you how to manipulate and create them, as well as discuss some situations in which you would want to do this.

2.  Introduction


An early IBM tape unit¹

You cannot have used computers very much at all without encountering, or at least hearing of "ZIP files". ZIP files are one type of archive that is common on MS Windows.

What exactly is an archive? An archive is a single file that contains within itself other files, and/or directory structures (and their files if any). You can think of archive files as a way to "freeze" a group of files and folders together to be handled as a single item. Why would you want to do this?

The name "archive" suggests the original intent, archives were originally developed so that bundles of files could be transferred to tapes (see illustration) for archival storage. Actual archiving is only one of many purposes that archives can serve.

  • Bundling different files together to send in an e-mail attachment.
  • Sending directory structures as an e-mail attachment (one cannot email a directory directly).
  • Sorting files together for mass compression.
    • Such as by archiving rarely used files and then compressing them to save disk-space.
  • Grouping files together for someone else's convenience.
  • Distributing groups of files to a server

Pretty much any situation which requires you to transfer groups of files together or transfer directory structures could benefit from the use of archives.

3.  Recognizing an Archive

You can recognize archives in two way

  1. The File-manger displays them with an archive-like icon, typically similar to this:
  2. By their extension. (the tag of text that comes after the last dot in the filename)

3.1  Common Extensions for Archives.

Note: compressor/archive is an archive that includes built in compression.

ExtensionComment
.zipThe incumbent Windows compressor/archive Zip-file
.tarUNIX "Tarball", sometimes .tbz2, .taz, .tgz
.gzA compression format often paired with .tar
.bz2"
.7zAn emerging, next-generation compressor/archive format.
.hqxMac BinHex file, mac specific
.sitMac "Stufit" file, mac specific
.zipxA Zip-file variant
.rarA proprietary compressor/archive format that is fairly common.

3.2  Specialized Archives

These types of files are sometimes referred to as (and technically are) archives but require special consideration outside the scope of this document. Please consult your administrator for further assistance.

ExtensionComment
.isoCD or DVD Disc image.
.rpmRed-hat software package.
.debDebian & Ubuntu software package.
.cabA software package format used internally by MS Windows.

4.  Extracting

So, now you have an archive file! Perhaps it came in an attachment, perhaps you downloaded it off of the Internet, now what? How do you get the files out of it? We will show you how to do that. To do this, we will use a program called Ark. Ark is part of the default KDE system and is tightly integrated into it.

If you do not have an archive but want to follow along anyways for practice, please download this conveniently preprepared archive that we have created for demonstration purposes.

4.1  Opening the Archive

After you have downloaded or otherwise acquired your archive, go to the folder where it is located (normally the folder /home/youruid/Downloads). Open it. (In case of a download, your browser may also offer to open the archive in Ark for you and you can do that instead, if you like.)

A window similar to this one will appear


Ark showing a typical archive.

You can see that this archive contains the files transmitter-mast.svg, Hello.txt and a folder called icons which contains several files. Ignore the icon ribbon at the top, for now. Ark also knows a little bit about these files besides their names:

  1. Their (uncompressed, if applicable) size
  2. The "Owner" (The username of the user that created the file, not all operating systems and archive-formats support this)
  3. The "Group" (ditto)
  4. A timestamp, note that the "..." indicates that enlarging the window would show more of the time-stamp which has been truncated to fit.

Ark also tries to show sensible icons that will clue you in to what type of file this is. You should clearly recognize the folder icon. Other types of archives may store other types of information.

You may never have seen this table-like "detail view" before, the file-manager usually uses a grid of icons, lets take a moment to look more closely at it. Notice that next to the "icons" folder there is a little - sign, this signifies that the folder is "expanded", I.E. its contents (slightly indented in the window) are being shown, you can click the - sign to hide them. The - will turn to a + to indicate that you can expand the folder again. If you like this view you can make the file manager show your icons in a similar way, consult its documentation for details.

4.2  Interacting with the Archive

If you click on a file in the archive one time, Ark will try to show you a preview of that, for instance if you click an image it will show it in a little window. This only works for common types of files and if it doesn't work correctly (try transmitter-mast.svg if you are using the sample archive) Ark will show you a string of text like this: "image/svg+xml" this is called a "MIME type", don't worry about it, this just means that Ark does not know how to view that file itself.

4.3  Extracting Files and Folders

You can either drag-and-drop files and folders out of the archive, as if it were a directory being shown in the file-manager, or you can extract the whole thing.

To do this, click the icon on the toolbar that shows a box with an arrow pointing straight up and a chevron pointing down (second from right, the chevron indicates that if you press and hold you get a little menu which we don't need). After you click this icon, you will get a window similar to this one:


The Extract window

There are a couple of things that we can do here, on the very left you will see icons that are shortcuts to any mounted media and to a few other places, then you will see where Ark is suggesting that the files be extracted (the tree-view), and some options.

You can change the default location (which is the same folder that the archive is in) by clicking some other place in the tree-view. To expand folders click the little + sign next to them. In this example, the contents will be extracted to /home/john/Desktop (the absolute path is shown below and you can type a new one directly, if you know how to do that). You can create a new folder in the folder that is currently selected by clicking "New Folder" at the bottom.

There are also some options you can set to influence Ark's behavior.

Extraction into Subfolder
Determines if Ark will create a sub-folder in the current folder and extract the contents into that. This is not a bad idea as it allows you to delete the whole folder in one move, rather than manually drag each individual file and folder of the archive to the trash. The sub-folder's name will be based on the name of the archive, and is shown below.
Open destination folder after extraction
Causes Ark to open a file-manager window onto the folder in which the files will be extracted.
Preserve paths when extracting
Advanced users only! This is almost certiantly what you want to do.

When you have everything set up the way you want, you can click "OK" and the extraction will begin, for large archives this could take a while

5.  Creating an Archive

5.1  Make a new Archive

Suppose you want to create an archive yourself? You can do this easily with Ark. First open Ark, if you don't already have it open ( → Acessories → Ark), then click "New", which is the icon with the page and the green and white + sign. You will be greeted with this window.


Save As

This is slightly confusing, Ark is asking you what you want to name the archive, where to put it and—crucially—what kind of archive you want to create. (Unless you know otherwise, you likely want to create "Tar Archive - Bzip compressed", we'll get to that.)

  1. You can navigate to a location in the usual way.
  2. Enter your filename in the "Name" field (do not put an extension in, this is automatic).
  3. Select the type of archive from the "Filter" menu.

5.2  Adding Files

You will now be greeted by a blank window, into which you can drag files and folders. There is no need to "save", you can close Ark when you finish because Ark will save your work as you go, in real-time.

5.3  Removing Unwanted Files

At any time, you can remove a file or folder (and thus, importantly, it's contents!) by selecting it and clicking the delete icon, which is the icon of the box with the shredded paper and the X (third from right).

5.4  Adding Files to an Existing Archive

Open the archive in the usual manner, then drag and drop files and folders into the Ark window.

6.  Interoperability note

In order to be confusing, UNIX and Windows have developed different methods for dealing with archives. Windows predominantly uses ZIP files and UNIX (including Linux, excluding Mac OS X) predominantly uses Tarballs. Tarballs were in-fact invented by early UNIX developers.

UNIX prefers the philosophy of "small tools" that do one thing only and do it well. Tarballs and the TAR program that Ark uses to create them are one example of this, Tarballs contain no compression and no encryption. This is why we suggested that you choose "Tar file - Bzip compressed" earlier. Bzip is a program that can compress exactly one file, such as an archive. Tar is a program that can create archives containing many files but not compress them. These two programs work in tandem. Any type of encryption would have to be provided by yet another program.

This is what is known as "well engineered". History has proven UNIX right and Microsoft wrong relating to many (but not all) engineering matters. Irrespective, MS Windows is, for various reasons, temporarily the dominant consumer computer platform in the world. MS Windows prefers a monolithic approach and ZIP files exemplify this, they provide compression, can store many files and folders and can provide (very, very flawed) encryption.

ZIP files do many things poorly but de-facto standards are rarely the best. The PDF file-format is a good/terrible example of this.

The Vistua network is completely UNIX oriented so you need to make sure to create ZIP files when making archives for MS Windows users (and since these are the bulk of World Wide Web users, you should also do this when posting files to the Public Internet.

7.  Pitfalls

Some potential problems:

  • Tarballs are not supported by Windows. Please read the preceding section.
  • It is possible to put an archive within another archive, it is not possible to put an archive within itself. Take care not to try to do so accidentally as this could have very peculiar results. Putting an archive inside another archive is pretty pointless.
  • Music, Videos, Openoffice files and most pictures will not benefit from Gzip or Bzip compression because they are already highly compressed. Therefore it is better to create plain Tarballs if you are archiving these as the additional compression will slow down the archiving process and will actually likely make the archive larger than the sum of its parts (different types of compressors can actually work against each other, similarly to a drug interaction)
  • Differentiate between ZIP files and 7zip files. 7zip has much better compression than any common compressor (including Bzip), however it is very slow and it is poorly supported by most operating systems. Do not create one for distribution unless you know your recipient can support it.
  • There is a minor bug in how Ark names Bzipped files. Bzipped files are actually supposed to end in .bz2, not .bz (because there was an older, now obsolete Bzip format that is incompatible). However Ark creates them with .bz, including tarballs compressed with Bzip: .tar.bz2 this should (inconcrete futurity) not cause a problem because sensible archiving programs should always check a file's "magic number". Some archiving programs are insensate, though and may need to have the "2" added. If this causes a problem just tell the recipient to rename the file, or do it yourself before sending.
  • CAVEAT EMPTOR! ZIP files, as mentioned, permit "password protection". You should not rely on this encryption scheme, it's next to useless. Encryption should be performed with a separate program.

¹ IBM 729V picture by TheSentinl64 under CC by-sa 2.5 license!


Text last modified on October 10, 2009, at 02:41 PM
You are here: Support » File Archiving Guide

Vistua Hub version 3.6 © MMVI-MMIX Vistua.com. All Rights Reserved. All times UTC.


About / Contact / Terms / XHTML / RSS / CSS