How to get a file out of MindTouch ---------------------------------- Database tables --------------- The following tables are where all the SQL-level action happens: ``resources`` This is the high-level concept. Sort of. Each row in this table *may* represent a file, or may represent a piece of metadata. To find out which, look at the ``res_type`` column; a value of 2 here indicates a file, a value of 4 indicates metadata. This table also has a set of denormalized columns for replicating the contents of the current revision of a file. Metadata will have ``parent_id`` set to the ``res_id`` of the row it is metadata for. ``resourcerevs`` This is where an individual revision of a resource lives. ``resourcecontents`` This table stores a bit of metadata, and either the raw file contents or a pointer to where the file contents may be found. If the raw file itself is in the database, its contents will be in the column ``rescontent_value``. If not, its location will, hopefully, be stored in the column ``rescontent_location``. ``resourcefilemap`` This is a map of file IDs in the API -- used for public display -- to resources. So, for example, a request to the MindTouch URL /@api/deki/files/1234/=filename.png will look up ``file_id`` 1234 in this table, read the associated ``resource_id`` from it, and use that to retrieve the actual file. This is not particularly important information for migrating, but could be very useful when it comes time to migrate all the old embedded URLs of attached files. Several columns exist, duplicated, in each of ``resources`` ``resourcerevs`` and ``resourcecontents``. Piecing together a file ----------------------- So, here's how to completely recover and reconstruct a file and associated metadata from MindTouch. First, get all resources which are actually files, focusing on only the non-duplicated columns we need:: SELECT res_id, res_headrev, res_create_timestamp, res_update_timestamp, res_create_user_id, res_update_user_id FROM resources WHERE res_type = 2; The current revision of the file is the column in ``resourcerevs`` indicated by ``res_headrev``. Suppose one of those files has a ``res_id`` of 1234. We can obtain any metadata resources for it:: SELECT res_id, resrev_name FROM resources WHERE res_type = 4 AND resrev_parent_id = 1234; In this case, the only relevant metadata is the description of the file; this will show up as a resource with a ``resrev_name`` of "urn:deki.mindtouch.com#description". We can also get the IDs of all revisions of this file:: SELECT resrev_id FROM resourcerevs WHERE resrev_res_id = 1234; Other columns in ``resourcerevs`` will tell us the original filename, who created this revision, a description of the revision, etc. Most important is the column ``resrev_content_id``, which is a foreign key to ``resourcecontents``, and lets us retrieve the file contents from that table. Determining how to get the actual file contents goes as follows: 1. Query for the appropriate row in ``resourcecontents``. 2. If the ``rescontent_value`` field is not NULL, it will be a blob of the file contents. 3. If ``rescontent_value`` is NULL, check ``rescontent_location``. If this is not NULL, it will be the location of the file. 4. If ``rescontent_location`` is also NULL, we must turn to the filesystem. First we determine the filesystem root:: SELECT config_value FROM config WHERE config_key = 'storage/fs/path'; This will most likely be "/var/www/dekiwiki/attachments". Now we go to the following directory:: filesystem_root/XXX/Y.res Here "XXX" is the first three digits of ``res_id``, and Y is the fourth digit. In this directory we will find three types of files, whose names will be of the forms: ``X.bin`` The raw contents of the file as uploaded. ``X.thumb`` A thumbnail of the file. ``X.webview`` A possibly-cropped version of the file; this is what will actually be displayed by MindTouch when asked for the file. In all cases "X" is the value of the ``rescontent_res_id`` column of the row from ``resourcecontents``, and represents an incrementing revision number for this resource. From the metadata resource we can get a resource revision with the appropriate description metadata. From the resource contents row we can get the MIME-type (this is also duplicated in resources and in resource revisions). And from the resource revision we can get all other relevant information, including revision timestamp, type of change, etc. All of that information can then be assembled to feed into kuma, and is sufficient for creating an ``AttachmentRevision`` and the ``Attachment`` itself if necessary.