Multiple Traveling Salesman Problem (mTSP)

The Multiple Agents Traveling Salesman Problem (mTSP) is a generalized form of the classic TSP, where multiple agents (or salesmen) are involved instead of just one. The objective of mTSP is to minimize the total cost or distance traveled by all agents, where each agent must start and finish at a common depot, visiting a subset of cities without overlap between agents. This problem is pivotal in scenarios such as coordinated logistics, drone delivery systems, and workforce management, where tasks are distributed amongst several agents to enhance efficiency and reduce operational times.

At each step, an agent chooses to visit a city. A maximum of num_agents agents can be employed to visit the cities. The cost can be defined in two ways: - minmax: (default) the reward is the maximum of the path lengths of all the agents - sum: the cost is the sum of the path lengths of all the agents Reward is - cost, so the goal is to maximize the reward (minimize the cost).

Observations

locations of the depot and each customer.
number of agents.
the current agent index.
the current location of the vehicle.

Constrains

each agent's tour starts and ends at the depot.
each customer must be visited exactly once.

Finish condition

all customers are visited and all agents back to the depot.

Reward

There are two ways to calculate the cost (-reward):

minmax: (default) the cost is the maximum of the path lengths of all the agents.
sum: the cost is the sum of the path lengths of all the agents.