Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to embed AppImage magic bytes in the future? #26

Open
TheAssassin opened this issue Mar 24, 2019 · 19 comments
Open

Where to embed AppImage magic bytes in the future? #26

TheAssassin opened this issue Mar 24, 2019 · 19 comments

Comments

@TheAssassin
Copy link
Member

I've been thinking about the magic bytes problem we have with the current type 2 a bit. The only solution is to move these magic bytes to another location. This location must also be at a specific offset and provide enough space for them.

One solution I find especially appealing is the following: AppImage type 1 used ISO9660 whose magic number may be embedded at 3 specific locations: 0x8001, 0x8801, 0x9001. Obviously these locations didn't interfere with the ELF binary. The offsets are (in decimal numbers) around 32-27kiB, our runtime is larger (~150-200kiB). So, the question is, why did ELF and ISO9660 work together in that scenario. And more interestingly, can we just place our own magic bytes where, in case of type 1, the ISO9660 bytes would be expected?

@TheAssassin
Copy link
Member Author

@azubieta also had a few suggestions. Can you please put them in here, too? Just make sure they're at constant offsets, this is an absolute requirement.

@probonopd
Copy link
Member

There are limits on how far into the file magic bytes should be located (for performance reasons), and what we did with type-1 was outside of what is allowable iirc.

@TheAssassin
Copy link
Member Author

You mean performance wise? Modern filesystems use (at least!) 4kiB blocks, and with caching and everything it doesn't make a large difference how much you read from the file beginning; with HDDs for instance, you lose much more performance by having to relocate the actuator. As the files have most likely been written sequentially, reading 8 or more 4kiB blocks sequentially doesn't really affect performance.

@probonopd
Copy link
Member

probonopd commented Mar 24, 2019

I don't mean anything, I am reciting from the top of my mind what we had learned from shared-mime-info when we were putting together the type-2 spec draft. I think it has to do with network shares. cc @elvisangelaccio

@TheAssassin
Copy link
Member Author

TheAssassin commented Mar 24, 2019

Interesting thought. Do you have a reference?


Here is what @azubieta was explaining to me last week:

The ELF format is, to some extent, flexible. The only part which is really at a static offset (0) is the ELF "header". The ELF header has a fixed size. Figure 1-3 in the ELF specification uses a struct of fixed size data structures to describe it.

#define EI_NIDENT 16

typedef struct {
    unsigned char    e_ident[EI_NIDENT];
    Elf32_Half       e_type;
    Elf32_Half       e_machine;
    Elf32_Word       e_version;
    Elf32_Addr       e_entry;
    Elf32_Off        e_phoff;
    Elf32_Off        e_shoff;
    Elf32_Word       e_flags;
    Elf32_Half       e_ehsize;
    Elf32_Half       e_phentsize;
    Elf32_Half       e_phnum;
    Elf32_Half       e_shentsize;
    Elf32_Half       e_shnum;
    Elf32_Half       e_shstrndx;
} Elf32_Ehdr;

Figure 1-3, ELF specification.

Figure 1-1 provides an overview of the ELF format. There are more structures like the (optional) program header table, the sections' location etc., whose locations are described by some sort of "pointers" in the ELF header. e_phoff for instance holds the offset in bytes at which the program header table starts (if there is none, it's 0), e_shoff contains the offset of the sections, etc. e_ehsize contains the total size of the entire ELF header.

screenshot_2019-03-24_17-12-45
Figure 1-1, ELF specification.

The only entity we know which is static size is the ELF header. What we could do is insert our magic bytes right after the ELF header (e.g., by defining "we use an additional byte there" to achieve some alignment), then append the rest of the ELF header and add <size of magic bytes area, e.g., 4, 8> to all the offsets and sizes where applicable.

This shouldn't be overly hard to implement, we can implement this as a C program which we build before the runtime and run on the runtime as a post-build command to insert our magic bytes.

We should document this thoroughly, but it's not too hard to understand, the ELF specification is well written and easy to understand for anyone with a basic understanding of file formats.

@probonopd
Copy link
Member

Sounds like a good idea to insert a section for our use (e.g., magic bytes) right after the ELF header. I have no clue on how to do it though. What happens if the ELF header grows?

@TheAssassin
Copy link
Member Author

I would not use the word "section" for this approach, since it may be mixed up with real "ELF sections" (if you meant those: they're unsuitable, we need a static offset but you can't predict their location, lookups to them require at least one lookup to the ELF header in advance, which is unsuitable for many applications, e.g., binfmt_misc).

Define "grow". The order is basically, ELF header -> our stuff -> PHT, sections etc. Whatever would be added which could make the header grow would grow after the ELF header and our magic bytes section. As long as applications don't tamper with the offsets, things will continue to work.

Of course this time we should test our new ELF runtime with all platforms we know where things are broken. Basically, testrun current runtime, and if it breaks, try the new binary. I can well imagine some tools ignore e.g., the offsets and make assumptions not mentioned in the standard. But, to be honest, the old magic bytes were doing the same (assuming that interpreters will ignore our magic bytes). This approach has the advantage that if an implementation doesn't like the approach, we have a very good argument why the other projects need to fix their implementation: they violate/ignore the ELF specification (much better than saying "but the ELF spec says it's an optional thing, and you don't have to implement it" or "all the other implementations ignore it, too").

@probonopd
Copy link
Member

I meant "section" as in ELF section. Can't we make an ELF section at a defined offset?

@TheAssassin
Copy link
Member Author

No, see what I described above. Most files don't have a PHT, but what if they do? Sections are all the same size, but that size differs, so how to make guarantees? We've thoroughly analyzed this, and there's no way to make this predictable.

What @azubieta (and now me, too) suggest is also fully compatible with the ELF specification.

@elvisangelaccio
Copy link
Contributor

I don't mean anything, I am reciting from the top of my mind what we had learned from shared-mime-info when we were putting together the type-2 spec draft. I think it has to do with network shares. cc @elvisangelaccio

Basically yes. Quoting from shared-mime-info's hacking file:

Magic offset must be as small as possible. For example, the worst case
scenario for ISO images is 32k inside the file. This is too big for a sniff
buffer, especially on remote locations. Avoid those.

@TheAssassin
Copy link
Member Author

@elvisangelaccio yeah, that makes sense. "sniff buffers"... hadn't heard about those for a while, but I remember what they are for. The ISO9660 thing I suggested above is only interesting and relevant for us because we know it works.

I would claim that anything < 1k (or even 4k) is okay.

@TheBrokenRail
Copy link

I wrote a small program that inserts empty space between the ELF header and everything else: https://gitea.thebrokenrail.com/TheBrokenRail/elf-padder.

Biggest problem is that this empty space must be rounded to a page boundary (usually 4KB) or the ELF will fail to load.

Second biggest problem is that I've barely tested this at all. I was able to add 4KB of empty space to both /bin/sh and qemu-arm-static and both successfully loaded. I've only tested with 64-bit files, but 32-bit should work too.

But it is able to insert 4KB (or more) of empty space to a constant location in an ELF file.

Usage:

elf-padder <minimum-padding> <file> > <output-file>

Example:

$ ./elf-padder 4 /usr/bin/python3.8 > ./new-python
[INFO]: Aligning Padding To Page Boundary: 4096
$ chmod +x new-python 
$ ./new-python 
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

I hope this is helpful!

@probonopd
Copy link
Member

Thanks @TheBrokenRail. Do you think this kind of functionality could/should be added to https://github.com/NixOS/patchelf?

@TheBrokenRail
Copy link

Thanks @TheBrokenRail. Do you think this kind of functionality could/should be added to https://github.com/NixOS/patchelf?

The code is pretty simple, so I don't think it would too difficult to adapt to patchelf's code-base.

However, I think adding empty space after the ELF header is a bit out-of-scope for patchelf, since its main purpose is modifying binaries to work on systems with different library names and/or locations. The real decision would be up to the patchelf maintainers of course.

@azubieta
Copy link
Contributor

@TheBrokenRail agreed that this feature have no room in the patchelf project scope. Also we have a way of doing it at build time using linker scripts. Please check: https://github.com/AppImageCrafters/appimage-runtime

@TheBrokenRail
Copy link

That definitely looks like a much better solution!

@probonopd
Copy link
Member

probonopd commented May 5, 2022

Afer discussion on IRC, @azubieta and @TheAssassin are suggesting 0x400 as the future location for AppImage magic bytes. This way, we would no longer have to put them into a reserved location in the ELF header.

Can possibly be implemented like this, without the need for patchelf nor dd:
probonopd/static-tools@f2c83bc

@probonopd probonopd pinned this issue May 5, 2022
@heinrich5991
Copy link

I've been thinking about the magic bytes problem we have with the current type 2 a bit.

What are the magic bytes problems that currently exist with type 2?

Is it stuff like this? linuxdeploy/linuxdeploy#86 (comment) (app images not executing in certain environments).

I tried to find the root cause, and got to this issue, is this the correct one?

The magic bytes seem to clash with field e_ident[EI_ABIVERSION] of the ELF header (https://en.wikipedia.org/w/index.php?title=Executable_and_Linkable_Format&oldid=1146617882):

Further specifies the ABI version. Its interpretation depends on the target ABI. Linux kernel (after at least 2.6) has no definition of it,[6] so it is ignored for statically-linked executables. In that case, offset and size of EI_PAD are 8.

glibc 2.12+ in case e_ident[EI_OSABI] == 3 treats this field as ABI version of the dynamic linker:[7] it defines a list of dynamic linker's features,[8] treats e_ident[EI_ABIVERSION] as a feature level requested by the shared object (executable or dynamic library) and refuses to load it if an unknown feature is requested, i.e. e_ident[EI_ABIVERSION] is greater than the largest known feature.[9]

Is that what causes the "file not found" error?

@TheAssassin
Copy link
Member Author

Your analysis is pretty much on point. Some ld-linux are more tolerant and won't refuse to run, whereas others are more picky. Per the spec, the latter ones are totally right, though. Best not to clash with these header fields at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants