FTL data file

FTL is a very nice (and quite difficult) rogue-like-ish game with space battles, teleporters, management of the energy of your ship, asteroid fields, alien species, droids (drones), etc. It is quite cheap, DRM-free and available natively on Intel-based GNU/Linux. These are notes taken while trying to find out the format of the .dat files of the game containing the game assets, ships statistics, events, etc. when I had not access to the internet to find the solution. There's a companion C program, ftldat, for extracting the files within the archives and generating archives. Unsurprisingly, similar tools with the same name already exists. However, the description of the process of reverse-engineering a (very simple) binary format might be interesting for someone out there.

Trying to see what's in the FTL data files, we find two binary files, data.dat and resource.dat. The latter is quite large and obviously contains the assets of the game. The former is quite small and looking at it, we find interesting structure as well as embedded XML and text files containing the ship statistics and layouts, the tutorial, character names, events, achievements, etc..

Looking at data.dat

file doesn't know what this file is supposed to be:

$ file data.dat
data.dat: data

Looking at the content of this file with less, we find:

  1. the beginning is very regular with what looks like increasing sequences of little-endian 32-bit values,

                 0 1  2 3  4 5  6 7  8 9  a b  c d  e f
      00000000: 680c 0000 a431 0000 fa37 0000 213b 0000  h....1...7..!;..
      00000010: 7e41 0000 bb48 0000 a34d 0000 ae51 0000  ~A...H...M...Q..
      00000020: ac16 0100 af18 0100 1968 0100 b3a0 0100  .........h......
      00000030: 4fa8 0100 9aae 0100 e5b4 0100 31bb 0100  O...........1...
      00000040: 87bc 0100 dbc2 0100 f2c4 0100 87cc 0100  ................
  2. then a lot of zeros,

                  0 1  2 3  4 5  6 7  8 9  a b  c d  e f
      000002c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
      000002d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
      000002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
      000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
      00000300: 0000 0000 0000 0000 0000 0000 0000 0000  ................
      00000310: 0000 0000 0000 0000 0000 0000 0000 0000  ................
      00000320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  3. then a lot of (XML and text) files prepended with their file names,

                0 1  2 3  4 5  6 7  8 9  a b  c d  e f
      000031a0: 0000 0000 2f06 0000 1f00 0000 6461 7461  ..../.......data
      000031b0: 2f6a 656c 6c79 5f63 726f 6973 7361 6e74  /jelly_croissant
      000031c0: 5f70 6972 6174 652e 786d 6c3c 212d 2d20  _pirate.xml....<

This looks like an archive of files.

File structure

There is no terminator at the end of the file name so the length of the file name must be stored somewhere else. The length of this file name is 31 (0x1f) which is found just a few bytes before the file name. Apparently the bytes 0x000031a8--0x000031ab are the file name size in little-endian. The preceding 32 bits are a bigger integer value 0x62f (1583) which is probably the file length.

It seems that a file is described by the following structure (in pseudo-C):

struct ftl_file {
  uint32_t data_size; // Little endian
  uint32_t name_size; // Little endian
  char name[name_size];
  char data[data_size];

We can get the list of the files in the archive with:

strings data.dat | grep ^data | sed 's/\(\....\).*/\1/'

Which gives:



Now that the role of the structure of the end of the archive is known, the following questions remain:

  1. What is the role of the beginning of the archive?

  2. How are the file structures located inside the archive?

The first part of the archive is a list of increasing little-endian 32 bit values.

The ftl_file structure for the first file (data/jelly_croissant_pirate.xml) is at offset 0x31a04 within the archive. It turns out that this value is the second 32-bit integer in the archive (at offset 0x4). Each subsequent offset is the offset of another file structure.

The following zeros are probably unused/empty offset slots.

The only unexplained part of the archive is the meaning of the first 32-bit value. It does not give the offset of a file structure. It's probably not a magic number because changing the value slightly does not prevent the game from loading. However using 0 makes the program crash so it's not useless either.

There are 0x31a0/4=0xc68 offset slots (either used or unused), (excluding the first 32-bits value of the file which is not an offset within the file): this is exactly the value of the first 32 bits. It seems the first 32 bits is the number of offset/file slots.

Summary of the file structure

  1. number of offset/file slots (32 bit little-endian);

  2. 32-bit little-endian offsets of each file structure within the archive (zeros are ignored);

  3. At each non-zero offset, a file is described by:

a. file data size (32-bit little endian);

b. file name size (32-bit little endian);

c. file name;

d. file data.

In pseudo-C, the archive starts with a header:

struct ftl_data_header {
  uint32_t slots_count; // Little-endian
  uint32_t file_offsets[slots_count]; // Little-endian offsets for struct ftl_file

Extracting the files

We can extract the archive with this code (simplified version excluding error handling and endianness issues for exposition):

FILE* file = fopen("data.dat", "rb");

uint32_t slots_count;
fread(&slots_count, sizeof(slots_count), 1, file);

uint32_t* slots = malloc(slots_count * sizeof(uint32_t));
fread(slots, sizeof(uint32_t), slots_count, file);

for (uint32_t i = 0; i != slots_count; ++i) {

  uint32_t offset = slots[i];
  if (offset == 0)
  fseek(file, offset, SEEK_SET);

  uint32_t data_size;
  fread(&data_size, sizeof(data_size), 1, file);

  uint32_t name_size;
  fread(&name_size, sizeof(name_size), 1, file);

  char* name = malloc(name_size + 1);
  fread(name, 1, name_size, file);
  name[name_size] = '\0';

  void* data = malloc(data_size);
  fread(data, 1, data_size, file);

  FILE* output = fopen(name, "wb");
  fwrite(data, 1, data_size, output);


Extracting resources.dat

The same format is used for resources.dat and we can extract the assets (images, sounds, music, etc.) with the same program:


Recreating the archive

The archive can be recreated with this code (again excluding error handling and endianness):

FILE* output = fopen(archive_name, "wb");

uint32_t slots_count = 0xc68;
if (file_count > slots_count)
  slots_count = file_count;
uint32_t temp_slot_count = htole32(slots_count);
fwrite(&temp_slot_count, sizeof(uint32_t), 1, output);

uint32_t* slots = calloc(slots_count, sizeof(uint32_t));
fwrite(slots, sizeof(uint32_t), slots_count, output);

for (int i = 0; i != file_count; ++i) {

  const char* file_name = files[i];
  long offset = ftell(output);
  slots[i] = htole32(offset);
  FILE* file = fopen(file_name, "rb");

  int fd = fileno(file);
  struct stat file_stat;
  uint32_t data_size =file_stat.st_size;
  uint32_t temp_data_size = htole32(data_size);
  fwrite(&temp_data_size, sizeof(uint32_t), 1, output);

  uint32_t name_size = strlen(file_name);
  uint32_t temp_name_size = htole32(name_size);
  fwrite(&temp_name_size, sizeof(uint32_t), 1, output);
  fwrite(file_name, sizeof(char), name_size, output);

  char* data = malloc(data_size);
  fread(data, sizeof(char), data_size, file);
  fwrite(data, sizeof(char), data_size, output);


fseek(output, sizeof(uint32_t), SEEK_SET);
fwrite(slots, sizeof(uint32_t), slots_count, output);