winged twilight




Interpreting the Morrowind File Format

by Farren Hayden

  1. Basic structure of a file
  2. Basic structure of a record
  3. Basic structure of record data
  4. Record and sub-record headers
  5. The structure of sub-record data
  6. Reading and Processing a Morrowind File
    1. Approach 1
    2. Approach 2
    3. Approach 3
    4. Approach 4
  7. A Note on Interpreting Data
Introduction
This tutorial will attempt to provide a conceptual understanding of the structure of Morrowind files and discuss different approaches to dealing with them in code.

Morrowind master files, plugins and save games all have the same underlying file structure and common code can be used to interpret them up to a certain level. My enchanted editor, which is a low level editor treats all of them in exactly the same way. So does TESAME, which is similarly low level.

The technical information in this tutorial is available in its entirety in Dave Humphrey's excellent file format specification.  Nevertheless I always digest information faster when its in pictures. I assume this is true of most people and have consequently come up with some diagrams that may help you understand the file format more rapidly.

A Step by Step Breakdown of the File Format

Back to Top 1) Basic structure of a file
All MW files, be they save, plugin or master, consist of a list of structures known as records. All the data in the file is stored in a record without exception. Even the file header information is stored in a record, which has exactly the same structure and characteristics as all the other records. From start to finish, an MW file is simply a sequential list of these structures:
File Structure

Back to Top 2) Basic structure of a record
Each record consists of two parts, the record header and the record data. The header (which is always 16 bytes) tells you what type of record it is and how big the data portion is, from which you can determine how many bytes of data to read in for the current record and by extension where the next record header begins in the file:

Record Structure
 

Back to Top 3) Basic structure of record data
The record data block of a record, in turn, can be further decomposed into subrecords. All records contain a list of subrecords. No record data apart from the header is found outside of a record.

Like the record itself, each subrecord of a record has a header and a data portion. The subrecord header tells you what type of subrecord it is and how many bytes of data follow the header, which also tells you where the next subrecord begins in the record data:

SubRecord Structure

 


Back to Top 4) The exact structure of record and subrecord headers
This diagram illustrates the exact sequential order of bytes and what they mean in record and subrecord headers

Headers


 


Back to Top 5) The structure of sub-record data
The Morrowind file format is such that the information required to break down the file into records and their subrecords is in the file itself (in the headers).

However, the nature of the data in a subrecord is not provided in the file itself. The bytes of data for a particular subrecord may represent only one item of information, like the name of an NPC, or multiple items, such as the three co-ordinates of an object.

For instance, the "FNAM" subrecord in an "NPC_" record contains a string of bytes representing an NPC's name as you see it in the game, and nothing else. In contrast, the "DATA" subrecord in a "CELL" record contains 3 seperate values: The first 4 bytes represent a set of flags stored in an integer. The next 4 bytes are the integer X-coordinate of the cell on the map and the next 4 bytes are the integer Y-coordinate of the cell.

There is nothing in the file itself that tells you this is how the subrecord should be broken down. Your code must be programmed in such a way that when it encounters a "CELL"/"DATA" subrecord, it breaks it down in this manner.

Dave Humpreys document (link in my introduction) provides the meaning of a great number of record and subrecord type, figured out by the tireless efforts of Dave and a few other people. However, a fair amount of subrecord data (especially in save games) is still a mystery.

Below I discuss various ways of dealing with this
 

Approaches to Reading and Processing a Morrowind File
The manner in which you write code for interpreting a Morrowind file depends on what you are trying to do.

Back to Top Approach 1

If you wish to simply delete records from a file, or copy records from one file to another, without being concerned about breaking the records down into subrecords, you could use the following approach:

1) Create a Record class to store an individual record, consisting of

Record Class:
(Record Header)
--Type (4 byte String)
--Size (4 byte Integer)
--Unknown (4 byte Integer)
--Flags (4 Byte Integer)
(Record Data)
--Data (Byte array, flexible size)

2) Create a Record Collection class to hold any number of Record objects

3) Starting at the first byte in the file:

3.1) Create a new record in the record collection
3.2) Read the 16-bytes at the current file pointer into the new record's Record Header (Type, Size, Unknown, Flags)
3.3) From the Size property of the header, determine the number of bytes of subsequent data this record owns.
3.4) Read the required number of bytes into the Data byte array of the new record.
3.5) Move the file pointer by 16bytes (the size of the header) + Size (the size of the data), this will move you to the first byte of the next record (the first byte of the next record's header)
3.6) If not at end of file, repeat step 3.1)

This process will build up a collection of records in a file, with the header information (type, size) and the raw data (uninterpreted byte array) of each record.

With this collection you can perform several high level functions. To delete all NPCs, for instance, you can delete all records of type "NPC_" from the collection before writing it back to disk.

To copy all cells from one file to another you could load two files into separate collections, then insert all the "CELL" records from one into the other's list, before saving that file.
 

Back to Top Approach 2

This approach is simply an extension of the previous one. Instead of having a byte array for raw record data, you can give each record in the record class a collection of subrecords:

Record Class:
(Record Header)
--Type (4 byte String)
--Size (4 byte Integer)
--Unknown (4 byte Integer)
--Flags (4 Byte Integer)
(Record Data)
--Subrecord Collection, each subrecord consisting of
--(Subrecord Header)
----Subrecord Type (4 byte String)
----Subrecord Size (4 byte Integer)
--(Subrecord Data)
----Subrecord Data (Byte array, flexible size)

In this method, you break down the raw data portion of the record into a subrecord collection for each record, instead of simply reading in a block of data and moving to the next record.

The algorithm for breaking down the raw data of a record into subrecords is exactly the same as the algorithm for breaking down the raw data of the file into records above, except that your starting point is the beginning of the record data block and your stopping point is not End-Of-File, but End-Of-Record (determined by the Record Size in the record header).

The lowest level of decomposition, namely subrecord data, is still in raw (byte array) format, but you have decomposed the file into smaller, more meaningful chunks. TESAME breaks the file down to this extent. For anyone who's used TESAME, the list of items at the top are the records and the list at the bottom is subrecords of the current record, showing the first few bytes of raw data in the subrecord.

This level of decomposition has the same advantages as the previous one, only you're working with smaller chunks. In other words instead of deleting all NPC_ records you could, for instance, delete all NPCO subrecords in NPC_ records (which are inventory items of NPCs), or copy specific subrecords from one file to another.

In addition, while some subrecords represent multiple values, some only represent one value which can easily be interpreted. As a rule of thumb (though not for all types), the bytes of data in the NAME subrecord of most record types is a string of ASCII characters representing the internal name of an object in a mod like "ZC_Jack_Black", and the FNAM subrecord of most record types is a string of characters representing the name shown to the player ("Jack Black").  

Back to Top Approach 3
This is one of two approaches for breaking the subrecord data down into the lowest level of information, namely actual Morrowind values as used in the game (the scale of objects, position of cells, names of characters etc)

Because the structure of each subrecord is not implicit in the file, in my "Enchanted Editor" program I use a template file called "ESTemplate.ini", that defines the sequential structure of bytes in each Record/Subrecord combination.

First I read in the record and decompose it into subrecords, as described above. Since the file itself tells me what type of record and subrecord I'm dealing with (from the record and subrecord headers), I can then go look up that combination in my template file to get the subrecord structure. For instance, the template entry for CELL.DATA (Record type CELL, Subrecord Type DATA) is this:

Group Name: CELL.DATA
@Requirement: 2
@Unique: (No Value)
Flags: Bitfield,4
GridX: Long,4
GridY: Long,4

You can ignore the first three lines, as they are simply additional instructions to my editor. These lines

Flags: Bitfield,4
GridX: Long,4
GridY: Long,4

tell the editor to read the first 4 bytes of data in a CELL.DATA subrecord as an array of bit values (true/false), the next 4 bytes as an integer called "Gridx" and the next 4 as an integer called "Gridy".

If I'm not sure of the purpose of some bytes in a particular subrecord type, I just have an UnknownData property, so that I preserve that information when saving, like this (if for instance, I didn't know what the 4 flag bytes were)

UnknownData: Bytes,4
GridX: Long,4
GridY: Long,4

If you've downloaded my Editor and want to use the ESTemplate.ini file for interpreting data in your own programs, feel free. You just need to write the code to parse the template file, which should be reasonably straight forward as its all text.


Back to Top Approach 4
Of course, the method I described above isn't the method Bethesda's programs interpret the files. Its a hack I've done so that you can read saves, plugins and master files into the same editor and still be able to edit values in the file, rather than just move around records and subrecords. Its very powerful for doing just that.

The "proper" way, of course, is to create classes that represent each of the classes in Morrowind (NPC, Static, Cell, Armour and so on), then read the appropriate Record/Subrecords into the appropriate class. In general each individual Morrowind object (a particular cell, a specifc NPC and so on) corresponds to an individual record of that type (CELL, NPC_) and records of the appropriate type can be read in faster if you're first finding out the type of record/subrecord than using the interpretation code optimised for that particular type.

There are other benefits, such as guiding and limiting how records of different types are dealt with. For instance, an NPC_ record MUST have a NAME (internal name of NPC) and FNAM (game name of NPC) subrecord, but it can have any number of NPCO (inventory item) subrecords, or none at all. This limitation can be specifically engineered in the way the NPC class reads in and serializes its data.

A good example of a program that uses this high level class-oriented approach to
interpreting MW files is Dave Humprey's mwEdit program, which is designed to mimic the way the official TES Construction Kit works, but substantially improve on certain features, such as dragging and dropping, working on multiple plugins at once and colour-coded scripting plus better script error checking.
 


Back to Top A Note on Interpreting Data
In order to interpret MW files you MUST know how to convert (cast) byte data into other forms of data.

String data is relatively simple. Morrowind strings (with the exception of the Type property of Record and Subrecord headers) are Zero-terminated. What this means is that you get the ASCII-value of each byte and add it to a string until you hit ASCII value 0.

Integer values are a little more complex. You can't just add the value of the bytes because you'll get the wrong value. If you're dealing with a 4-byte Integer, the bits of all 4 bytes are appended together and treated as one long binary value, low-order bytes appended rightmost, which is not the same as adding the value of the individual bytes. Its more like taking 23 and 12 and making the number 2312, only in binary.

Real values are even more complicated, as they use mantissa-exponent 2's compliment format, which I'm not going to get into here.

The point is, if you don't understand byte representation of numbers, get yourself a library or module of good casting code to cast byte arrays to numbers. You won't succeed without it. I use the Windows API function "CopyMemory" to directly copy an array of bytes into the memory address of a VB Long, Single or Double variable, as the bytes in the file are already in the exact sequence you would find them in memory were they in a Long, Single or Double variable.

Valid XHTML 1.0!

Mythic Mods  >>   Tutorials  >>   Argent's Morrowind Tech  |     |   Mod Forum