Traveling Salesman Problem (TSP)

The Traveling Salesman Problem (TSP) is one of the most extensively studied combinatorial optimization problems, fundamental both in theoretical computer science and operations research. The TSP concerns finding the shortest possible route that a traveling salesman can take to visit each city exactly once and return to the origin city. This problem is famously complex, classified as NP-hard, meaning there is no known polynomial-time algorithm to solve it for all cases. The significance of TSP lies in its ability to model various real-world problems, ranging from logistics to the manufacture of microchips, making it a cornerstone problem in optimization and algorithm design.

At each step, the agent chooses a city to visit. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations

locations of each customer.
the current location of the vehicle.

Constraints

the tour must return to the starting customer.
each customer must be visited exactly once.

Finish condition

the agent has visited all customers and returned to the starting customer.

Reward

(minus) the negative length of the path.