File extensions: .gz, .tar and tar.gz
27 03 Mar 2016 23:22 by u/roznak
Yes it is part of Linux but none-Linux users should also know this.
Up until now that has always something weird for me when I got this from a hardware vendor. It is something that as a Windows user I did not know.
- .gz = single file compressed
- .tar = Folder and files merged into one big file but not compressed.
- .tar.gz = The .tar folder your created but then compressed.
The other thing that you must realize is that uncompressing a .tar file is risky, you must check its integrity before extracting. If I recall winzip it does not extract if it is corrupted.
In my opinion the default setting of extracting a .tar should refuse when it fails integrity checking. Then you force it with a additional command if you want to do it anyway, you override the safety.
10 comments
2 u/tragicwhale 04 Mar 2016 01:23
Thanks for the knowledge.
2 u/SynfulVisions 04 Mar 2016 03:54
A bit of extra trivia:
The tar name comes from "Tape Archive" and references the original UNIX backup program designed to write to that sort of IO hardware.
Why should Windows users know this? If you're the sort of user who can really make use of that sort of archive (or the sort of files it's likely to contain), I think you're safely past the point where you'd need help with archives. Winzip, really?
Also, uncompressing (technically, the tar isn't compressed) a tar isn't risky if it's corrupt, you'd still have exactly what you started with.... a corrupted archive; I fail to see the risk there. We can say possibly that utilizing tar can be risky because of symbolic link handling or absolute paths (although default behavior of utilities has kept those mostly safe for the past decade)... but realistically if you're opening a file of any sort from a source that can't be trusted you've already fucked up in terms of security.
3 u/tribblepuncher 04 Mar 2016 07:36
Yes, really.
The line between Windows-centric and Linux-centric archive and compression have grown considerably more blurry over the last decade or so. This goes both ways, including some files intended for Linux use being put into a traditional .zip file. While you are unlikely to find Windows binaries by themselves in a tarball, source code and data are often transmitted in this manner, sometimes specifically for a Win32 environment. And if you're downloading source, even if you know C/C++ in and out, you could still pretty easily trip over this if you haven't had experience with Linux, and a lot of people download sources for self-compilation for reasons other than modification of the actual code, e.g. compiling in custom options or attempting to increase security controls.
2 u/tribblepuncher 04 Mar 2016 07:29
This is due to the way that Unix was traditionally designed.
Essentially, most of the commands in Unix were put together in a way that they - in theory - did one thing and did that one thing well. As a result, there were two separate programs for compression and archiving. This made particular sense at the time since tapes were in heavy use at the time, and the tar format - short for Tape ARchive - was made to be usable with them. Furthermore, disk space was especially tight, so saving overhead from archiving a single file as opposed to simply compressing it by itself may have been worth it in a lot of cases.
In a traditional Unix environment, you will usually actually run a .tar.gz through BOTH programs, even if you only invoke one command; tar is capable of directly using GZip to decompress the archive as it extracts it, if given the correct command parameters. Hence why it is .tar.gz, because it is an archive file that has been compressed. This combination is frequently known as a "tarball." The extension .tgz, as someone else noted, is short for .tar.gz. This isn't so popular anymore but it was very popular in the 1990s, especially for users of Linux or BSD that might have to interoperate with a traditional DOS environment heavily, particularly Linux users who might be operating on top of a DOS filesystem (see: the late, lamented UMSDOS file system option, popular on Linux distros circa 1994, long ago having rotted its way out of the kernel source code).
In this vein there are at least two more variations that might be of interest. First and foremost is BZip2. BZip2 is an improvement on the traditional algorithms used by GZip. This results in the following file extensions:
Additionally, to make the whole thing even more complicated, there's the traditional Unix compress/uncompress scheme, which used the .Z extension. These are not terribly common any longer, but AFAIK most distros keep compatibility around for them because the utilities to work with it are simple and lightweight. These provide the following extensions:
Note the capital Z for this scheme - as with all Unix filenames, the case makes a difference.
As a note, 7-Zip's graphical UI is IMO one of the better ways to work with .tar files in a Windows environment, although you have to get used to it since it doesn't quite work like most traditional archival program GUIs. That said, it may be worth your while to learn how to use gzip, bzip2, and tar on your own. I've found that you can manage to squeeze quite a bit of data onto a storage medium when you're short on space by selectively gzipping a few files - this used to be common enough, in fact, that many older web browsers would automatically decompress .gz-compressed text files and display them as though people had just downloaded a text file, e.g. myarticle.txt.gz was treated like myarticle.txt, but only took up the compressed size of myarticle.txt.gz in terms of transmission time.
1 u/Atroce 04 Mar 2016 02:07
Been using Linux for almost a year now and didn't know that. Thanks.
1 u/idunnome 04 Mar 2016 03:45
Not limited to Linux.
I would suggest that the description of ".tar" be amended as follows: Folder and files merged into one single file but not compressed (though one or more of its contents may already be); Directory information of the enclosed files are stored within the tar file as well.
1 u/Vailx 04 Mar 2016 11:06
I don't think uncompressing a tar is risky.
Tar really is excellent for tapes, because the data you need to find out what the file is is right next to the file. This means that to read the whole contents you have to read the entire file, however, making it a little less suited to stuff you want to access randomly.
Gzip operates on a single file, leaving you to determine what archive format you want to use- ar, tar, or something else, should you want to compress multiple files.
Like most Unix, it's pieces you can put together- one piece archives, one piece compresses, and you can use both together.
.tar.gz is sometimes .tgz, but mostly in DOS derivates like Windows.
I think TAR goes back to the late seventies, but it may be early eighties.
Originally, you would first compress files with a binary called compress. This leaves a .Z, so you might see .tar.Z. Compress is from the early eighties, and its overall functionality (compressing a single file) is what gzip and bzip2 model. Compress and GIF both use the same older compression type.
By these standards, the .zip format is vastly more recent, coming out in 1989. Because PKZIP came out for DOS, it didn't have standard Unix commands to play with, and it had to handle both archiving and compression.
0 u/Tommstein 17 Mar 2016 05:24
There's nothing Linux-specific about tarballs.
0 u/TrevorLahey 23 Mar 2016 16:59
Something worth understanding, especially if you're coming from a Windows environment, is that file extensions on Linux are historically meaningless and served mostly as conveniences and conventions for the human. Unix application file types were traditionally identified internal to the file by means of a "magic header". Read up on the 'file' command and how it identifies files. The Windows-ish notion of file extensions crept in to common Unix usage as an outgrowth of the Internet and the platform-agnostic proliferation of MIME types and a desire to define "automatic handlers" for different kinds of files in various applicational contexts (e.g., display a picture in an e-mail reader). This handler definition in many newcomer's minds represents the "is a tar file" concept. Tar itself doesn't give a rat's ass what the file extension is -- "somearchive.important" would be a perfectly valid filename from tar's perspective, although common sense makes it easier on the human if you tack a ".tar" on the end of it so you can find the file when you go looking.
They're meaningful but they're not meaningful. As much of the meaning in the Linux world comes from convention as from any requirement. Given your example of compressed tar files you're just as likely to see tar.gz, tar.gzip, or tgz -- whatever was convenient and concise for whomever named the file. When in doubt as to what something is, type 'file foo.what.ever" and the file command will tell you what you're looking at. Tar will try and process whatever you feed it.
In a similar notion, there is no bullshit .EXE, .BAT, or .CMD. A file is denoted as executable by a filesystem permissions bit. If a file is binary the active shell will attempt to execute it directly. If it's a text file (a script), the first line of the script traditionally denotes the necessary interpreter to use via a #!/path/to/interpreter syntax. The conventions of tacking a .sh, .zsh, .ksh, .perl, or whatever onto the script name are purely a matter of human convenience and communication.
Regarding default tar behaviors, yes many Unix-family commands have default behaviors that may not immediately seem optimal. It will help to understand this when you understand the Unix philosophy of spartan utility and "Ok, boss, whatever you say." Unix does not try to hold your hand and almost never assumes. It does what you tell it and expects you to be clear and unambiguous. You will eventually find this refreshing and empowering and be infuriated by braindead, hand-holding like you are forced to suffer in Windows (like the "Ultimate Professional" version of Windoze repeatedly warning you every damn time you rename a file "If you change a file name extension, the file might become unusable. Are you sure you want to change it?" -- FUCK YES you retarded crapsack of shit toy OS, I know what the hell I am doing!)
Welcome to a better world.