/dev/posts/

FTL data file

Published:

Updated:

Faster Than Light (FTL) is a very nice (and quite difficult) rogue-like-ish game with space battles, teleporters, management of the energy of your ship, asteroid fields, alien species, droids (drones), etc. It is quite cheap, DRM-free and available natively on Intel-based GNU/Linux. These are notes taken while trying to find out the format of the .dat files of the game containing the game assets, ships statistics, events, etc. when I had not access to the internet to find the solution. There is a companion C program, ftldat, for extracting the files within the archives and generating archives. Unsurprisingly, similar tools with the same name already exists. However, the description of the process of reverse-engineering a (very simple) binary format might be interesting for someone out there.

Trying to see what is in the FTL data files, we find two binary files, data.dat and resource.dat. The latter is quite large and obviously contains the assets of the game. The former is quite small and looking at it, we find interesting structure as well as embedded XML and text files containing the ship statistics and layouts, the tutorial, character names, events, achievements, etc..

Looking at data.dat

file doesn't know what this file is supposed to be:

$ file data.dat
data.dat: data

Looking at the content of this file with less, we find:

  1. the beginning is very regular with what looks like increasing sequences of little-endian 32-bit values,
               0 1  2 3  4 5  6 7  8 9  a b  c d  e f
    00000000: 680c 0000 a431 0000 fa37 0000 213b 0000  h....1...7..!;..
    00000010: 7e41 0000 bb48 0000 a34d 0000 ae51 0000  ~A...H...M...Q..
    00000020: ac16 0100 af18 0100 1968 0100 b3a0 0100  .........h......
    00000030: 4fa8 0100 9aae 0100 e5b4 0100 31bb 0100  O...........1...
    00000040: 87bc 0100 dbc2 0100 f2c4 0100 87cc 0100  ................
    
  2. then a lot of zeros,
                0 1  2 3  4 5  6 7  8 9  a b  c d  e f
    000002c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    000002d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    000002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000300: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000310: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    
  3. then a lot of (XML and text) files prepended with their file names,
              0 1  2 3  4 5  6 7  8 9  a b  c d  e f
    000031a0: 0000 0000 2f06 0000 1f00 0000 6461 7461  ..../.......data
    000031b0: 2f6a 656c 6c79 5f63 726f 6973 7361 6e74  /jelly_croissant
    000031c0: 5f70 6972 6174 652e 786d 6c3c 212d 2d20  _pirate.xml<!--
    000031d0: 436f 7079 7269 6768 7420 2863 2920 3230  Copyright (c) 20
    000031e0: 3132 2062 7920 5375 6273 6574 2047 616d  12 by Subset Gam
    000031f0: 6573 2e20 416c 6c20 7269 6768 7473 2072  es. All rights r
    00003200: 6573 6572 7665 6420 2d2d 3e0d 0a0d 0a3c  eserved -->....<
    

This looks like an archive of files.

File structure

There is no terminator at the end of the file name so the length of the file name must be stored somewhere else. The length of this file name is 31 (0x1f) which is found just a few bytes before the file name. Apparently the bytes 0x000031a8--0x000031ab are the file name size in little-endian. The preceding 32 bits are a bigger integer value 0x62f (1583) which is probably the file length.

It seems that a file is described by the following structure (in pseudo-C):

struct ftl_file {
  uint32_t data_size; // Little endian
  uint32_t name_size; // Little endian
  char name[name_size];
  char data[data_size];
};

We can get the list of the files in the archive with:

strings data.dat | grep ^data | sed 's/\(\....\).*/\1/'

Which gives:

data/jelly_croissant_pirate.xml
data/boss_1_easy.txt
data/mantis_scout_pirate.xml
data/crystal_cruiser.xml
data/jelly_cruiser_2.txt
data/kestral.txt
data/dlcBlueprintsOverwrite.xml
data/rebel_long_pirate.txt
data/tutorial.xml
data/achievements.xml
data/kestral_3.xml
data/fed_scout.xml
data/rock_scout.xml
data/jelly_button_pirate.xml
data/rock_scout.txt
data/rock_assault.xml
data/boss_3_easy.txt
data/rebel_long.xml
data/names.xml
data/dlcAnimations.xml
data/anaerobic_cruiser_2.txt
data/anaerobic_cruiser.txt
data/circle_bomber.xml
data/dlcEvents_anaerobic.xml
data/energy_bomber_pirate.xml
data/crystal_bomber.txt
data/dlcPirateBlueprints.xml
data/dlcSounds.xml
data/blueprints.xml
[...]

Now that the role of the structure of the end of the archive is known, the following questions remain:

  1. What is the role of the beginning of the archive?

  2. How are the file structures located inside the archive?

The first part of the archive is a list of increasing little-endian 32 bit values.

The ftl_file structure for the first file (data/jelly_croissant_pirate.xml) is at offset 0x31a04 within the archive. It turns out that this value is the second 32-bit integer in the archive (at offset 0x4). Each subsequent offset is the offset of another file structure.

The following zeros are probably unused/empty offset slots.

The only unexplained part of the archive is the meaning of the first 32-bit value. It does not give the offset of a file structure. It is probably not a magic number because changing the value slightly does not prevent the game from loading. However using 0 makes the program crash so it is not useless either.

There are 0x31a0/4=0xc68 offset slots (either used or unused), (excluding the first 32-bits value of the file which is not an offset within the file): this is exactly the value of the first 32 bits. It seems the first 32 bits is the number of offset/file slots.

Summary of the file structure

  1. number of offset/file slots (32 bit little-endian);

  2. 32-bit little-endian offsets of each file structure within the archive (zeros are ignored);

  3. At each non-zero offset, a file is described by:

    a. file data size (32-bit little endian);

    b. file name size (32-bit little endian);

    c. file name;

    d. file data.

In pseudo-C, the archive starts with a header:

struct ftl_data_header {
  uint32_t slots_count; // Little-endian
  uint32_t file_offsets[slots_count]; // Little-endian offsets for struct ftl_file
};

Extracting the files

We can extract the archive with this code (simplified version excluding error handling and endianness issues for exposition):

FILE* file = fopen("data.dat", "rb");

uint32_t slots_count;
fread(&slots_count, sizeof(slots_count), 1, file);

uint32_t* slots = malloc(slots_count * sizeof(uint32_t));
fread(slots, sizeof(uint32_t), slots_count, file);

for (uint32_t i = 0; i != slots_count; ++i) {

  uint32_t offset = slots[i];
  if (offset == 0)
    continue;
  fseek(file, offset, SEEK_SET);

  uint32_t data_size;
  fread(&data_size, sizeof(data_size), 1, file);

  uint32_t name_size;
  fread(&name_size, sizeof(name_size), 1, file);

  char* name = malloc(name_size + 1);
  fread(name, 1, name_size, file);
  name[name_size] = '\0';

  void* data = malloc(data_size);
  fread(data, 1, data_size, file);

  create_directory_for(name);
  FILE* output = fopen(name, "wb");
  fwrite(data, 1, data_size, output);

  fclose(output);
  free(data);
  free(name);
}
free(slots);
fclose(file);

Extracting resources.dat

The same format is used for resources.dat and we can extract the assets (images, sounds, music, etc.) with the same program:

audio/music/bp_MUS_RockmenBATTLE.ogg
audio/music/bp_MUS_DebrisEXPLORE.ogg
[...]
audio/waves/ui/select_down2.wav
audio/waves/ui/bp_SFX_NewShipUnlocked.ogg
[...]
img/pause_large_on.png
img/pause_teleport_leave.png
[...]

Recreating the archive

The archive can be recreated with this code (again excluding error handling and endianness):

FILE* output = fopen(archive_name, "wb");

uint32_t slots_count = 0xc68;
if (file_count > slots_count)
  slots_count = file_count;
uint32_t temp_slot_count = htole32(slots_count);
fwrite(&temp_slot_count, sizeof(uint32_t), 1, output);

uint32_t* slots = calloc(slots_count, sizeof(uint32_t));
fwrite(slots, sizeof(uint32_t), slots_count, output);

for (int i = 0; i != file_count; ++i) {

  const char* file_name = files[i];
  long offset = ftell(output);
  slots[i] = htole32(offset);
  FILE* file = fopen(file_name, "rb");

  int fd = fileno(file);
  struct stat file_stat;
  uint32_t data_size =file_stat.st_size;
  uint32_t temp_data_size = htole32(data_size);
  fwrite(&temp_data_size, sizeof(uint32_t), 1, output);

  uint32_t name_size = strlen(file_name);
  uint32_t temp_name_size = htole32(name_size);
  fwrite(&temp_name_size, sizeof(uint32_t), 1, output);
  fwrite(file_name, sizeof(char), name_size, output);

  char* data = malloc(data_size);
  fread(data, sizeof(char), data_size, file);
  fwrite(data, sizeof(char), data_size, output);

  free(data);
  fclose(file);
}

fseek(output, sizeof(uint32_t), SEEK_SET);
fwrite(slots, sizeof(uint32_t), slots_count, output);

free(slots);
fclose(output);