Skip to content
David Bignell edited this page Aug 7, 2017 · 6 revisions

Many of these issues are dealt with in the tutorial - this is a good place to start, as it introduces the main Malmo concepts and includes many hints and tips. For those too impatient to follow the tutorial, or who would like an explanation with a little more depth, read on.

- What is a useful first step when using Malmo?
- Why do stale mission elements persist into later missions?

What is a useful first step when using Malmo?

Before getting to grips with Malmo, and certainly before reporting any issues, it is worth switching on the Malmo mod diagnostic information. It is highly recommended that you do this - in fact, I'm not sure why it's not enabled by default. To switch it on, do the following:

  1. In the Minecraft GUI, go to the Mod options and find Malmo in the list:
  1. Press config, and cycle through the options for debug diagnostic level until "Show all diagnostics" is displayed:
  1. You should now be able to see helpful diagnostic information in the Minecraft window, including:
    • the Mission Control Port (10000 by default)
    • whether Minecraft mouse control is set to "AI" or "human"
    • the current state of the client and server state machines
    • certain warning/error messages will also appear when relevant

This information will almost certainly prove useful at some point.

Why do stale mission elements persist into later missions?

Traditional reinforcement learning techniques work by running the same small task repeatedly, possibly thousands or millions of times. Ideally, the turnaround time between these "episodes" should be as small as possible, and the initial state of each episode should be the same. In Minecraft, the cost of creating a fresh new world - a clean initial state - is very high, taking many seconds even on a fast machine. To make reinforcement learning practical, Malmo needs to compromise between cleanliness and turnaround time.

By default, Malmo aims for high turnaround by reusing the current world (provided the current world matches the world requirements). This makes turnaround significantly quicker, but means that any changes made in the course of the episode (or "mission", in Malmo parlance) will persist into subsequent missions.

This is most commonly seen in the items present in the mission. For example, if a diamond is drawn at a specific location, and the agent fails to collect it during the course of the mission, that diamond will persist into the next mission, where it will be joined by a new diamond, and so on. After 100 missions, there might be 100 diamonds all occupying the same space. (Note that Minecraft is very inefficient at drawing free-floating items, so filling your environment with multiple diamonds will eventually cripple performance.) Aside from the items, any changes the agent makes to the environment will persist too - if they dig a hole in one mission, they might fall into it in the next...

Assuming this behaviour is undesirable (there may be cases where it's not!), there are two main ways of dealing with it. The first is to force Malmo to provide a clean initial state for each mission. This is done via the forceReset flag in the world generator XML - eg:

<ServerSection>
    <ServerHandlers>
        <FlatWorldGenerator generatorString="3;168:1;8;" forceReset="true"/>
        ...
        ...

As remarked above, this will create an enormous amount of work for Minecraft, and will make restarting the mission very slow indeed.

The second method is to do your own cleaning - this is the approach that most of the Python samples take. Make sure your drawing code creates its own blank state before drawing any extra features required by the experiment. For instance, before drawing any items for the mission, draw a block of air large enough to "blank out" any stale items from previous missions:

  <DrawingDecorator>
    <DrawCuboid x1="0" y1="46" z1="0" x2="7" y2="52" z2="7" type="quartz_block" /> <!-- limits of our arena -->
    <DrawCuboid x1="1" y1="47" z1="1" x2="6" y2="51" z2="6" type="air" /> <!-- hollow it out with air -->
    <DrawCuboid x1="1" y1="50" z1="1" x2="6" y2="49" z2="6" type="glowstone" /> <!-- glowstone ceiling for light -->
    <DrawItem   x="4"  y="47"  z="2" type="diamond" />
  </DrawingDecorator>

A third method is to "shift" your environment to a fresh location at the start of each episode - eg create your XML code dynamically, moving the origin to a new spot each time. patchwork_quilt.py uses this approach to build up a vast landscape from small mazes:

for iRepeat in range(num_reps):
    # Find the point at which to create the maze:
    xorg = (iRepeat % 64) * 16
    zorg = ((iRepeat / 64) % 64) * 16
    yorg = 200 + ((iRepeat / (64*64)) % 64) * 8
...
...
<MazeDecorator>
    <SizeAndPosition length="16" width="16" xOrigin="''' + str(xorg) + '''" yOrigin="''' + str(yorg) + '''" zOrigin="''' + str(zorg) + '''" height="8"/>
...
...

The main advantage to this approach is that it provides some sort of history - the end states of previous missions are left in the world. In terms of performance there is little to recommend it - it will force Minecraft to load more chunks, and will increase file and memory consumption. patchwork_quilt.py only does it because it's fun (and because it's a stress test).

Clone this wiki locally