The multi-agent pick up and delivery environment is part of the multi-agent toolbox and has been developed in the context of the AI Flanders Research Program. The software can be found in this Gitlab project.
The environment is made using the Unity game engine and is a grid based world containing walls, tiles, parking locations, pick up and delivery locations. Given are a number of robots/agents that all have an initial location. The problem consist of planning a path for each agent pick up and delivering tasks. A task consists of a pick up and a delivery location. When an agent is assigned a task it moves from its current location to the pick up location. Once it has reached that location, it picks up the task and starts executing it, from this point no other agent can try to execute it as well. The agent then moves to the delivery location where it delivers the task. The task is then considered completed and the agent is again free.
Since we are using the ML-agents toolkit to connect Unity with algorithms written in Python, a distinction has to be made between centralized and decentralized: in the centralized setting actions are created for and given to each agent at once, in the decentralized setting each agent chooses its own task and plans its own path. In this Unity project we have thus different scenes for centralized and decentralized versions of the problem.
Besides centralized and decentralized we also make a distinction between the offline and online version of the problem. In the offline setting all tasks are known beforehand and all paths are planned beforehand. In the online setting this is not the case: new tasks enter the system when others have been completed, i.e. new tasks can enter the system during the execution of the algorithm. The Unity project thus has four different scenes: centralized/decentralized and offline/online. In the “online” scenes there is the option to use randomly created tasks or a predefined list of tasks. In the “offline” scenes you need to provide a list of tasks. Optionally release times can be added in the offline version: this is the first time step at which a task can be picked up. A real life example of an offline setting is when customers can buy products and pick up these products on a specific day and time. The situation where products need to be picked up as soon as someone orders a products corresponds to the online version of the problem.
In the current version it is also possible to add priorities to individual tasks. Indeed, you want packages to be picked up in the most efficient way, but some might be more important than others. By specifying a parameter for the entire warehouse environment specifying how important these priorities, you can add multi-objectiveness to the environment and investigate how you take both objectives in consideration.
Every agent has a position (2-d coordinate) but also a rotation. The idea behind this is that the time it takes to move one tile forward equals the time it takes to rotate over one unit of rotation. This unit of rotation is one of many configurable settings of the environment. All parameters can be set providing a configuration file: the layout of the warehouse, the unit of rotation, the time it takes for an agent to move one tile forward, the number of agents, their initial nodes (coordinate and rotation), whether agents can move backwards and whether tasks are generated randomly or taken from a predefined list of tasks. You should also specify the maximum number of tasks that may be known by the algorithm at every time step: the number of available tasks that are ready to be assigned.
Standardly some metrics are saved whenever an algorithm runs:
These metrics are written to a file in JSON format when all tasks have been delivered. Since this environment allows for several versions of the problem, this will not always happen, e.g. in the online version where tasks are generated randomly and continuously. You can easily add other metrics. We have also added a Python project where you can find examples and inspiration on how to visualize these metrics.
Visualization in this project is done in Unity. Algorithms are connected to the visualization via ML-agents. Several algorithms have been added to the project by AI lab students. A short overview can be found below. There are also toy examples provided that explain how to use the ML-agents toolkit and add your own algorithms. Two types of toy examples have been added: one where robots can rotate and one where they cannot. A comprehensive readme is also provided.
If you do not want to use the Unity editor, you can also use the builds that can be downloaded from our website. Do check the readme in the Gitlab project how you have to connect your algorithm and the Unity build via ML-agents.
Questions and remarks can be send to marjon.blondeel@vub.be.
Token passing (TP) is a decentralized algorithm that tackles the online version of the problem. In this approach, information is exchanged between the agents using a token: a shared block of memory containing information on available tasks, the paths of the agents and the current task assignments. Whenever an agent has reached the end of its planned path (delivered a task), it requests the token and tries to find a task and plan a collision-free path to execute that task. A modified version of A* is used for path planning.
A direct link to the algorithm can be found here and is based on Ma et al. (2017) .
Token passing with task swaps (TPTS) is a decentralized algorithm that tackles the online version of the problem. It improves TP by allowing free agents to swap tasks. When agents look for a new task they will also check whether they can reach the pick up location of tasks already assigned to other agents more efficiently. If so, the agent that was originally assigned this task will stop executing it and the new agent will take over.
A direct link to the algorithm can be found here and is based on Ma et al. (2017) .
Task allocation and prioritized path planning is a centralized algorithm that tackles the offline version of the problem. Both TP and TPTS could also be used for the offline version but they produce less optimal results. The first part of the algorithm, the task allocation phase, uses a modified version of the LKH-3 solver which finds a task sequence with a low makespan. In the second part, the path planning phase, path for each of the agents are planned. This is done by assigning priorities to each agent. We use a modified version of the prioritized path planning presented in Van Den Berg et al. (2005) The main idea is to first plan paths for the agents with the highest execution time. Then we plan new paths for the remaining agents that do not collide with the list of already planned paths. We use dummy paths based on Liu et al. (2019) for deadlock avoidance. This is necessary to avoid situations where a (sub)path cannot be planned due to multiple agents blocking each other. These dummy paths are collision free paths from the goal location of the last subpath to the agent’s parking spot. This way the agent can always move back to its parking location and wait there.
A direct link to the algorithm can be found here and is based on Ma et al. (2018) and the papers mentioned above.
Multi-objective ask allocation and prioritized path planning is a centralized algorithm that tackles the offline version of the problem. More in particular it attempts to optimize several objective at once. On the one hand we want tasks to be delivered as soon as possible, but we also take into account priorities: some tasks might have a higher priority than others. In order to help the algorithm in finding a trade-off between the two objectives, the user can specify a number between 0 and 1 which denotes the importance of these priorities: if set to 0 priorities will be ignored, if set to 1 the system will be forced to pick up tasks with a high priority first, even though they might be far away.
This algorithm is based on another algorithm that is added in the toolbox: Task allocation and prioritized path planning. The path planning phase remains the same, but the allocation phase now takes into account the different objectives. This means that this algorithm can easily be adapted to take into account other objectives.
A direct link to the algorithm can be found here.