gym_nethack.policies package¶
Submodules¶
gym_nethack.policies.core module¶
-
class
gym_nethack.policies.core.
ParameterizedPolicy
[source]¶ Bases:
gym_nethack.policies.core.Policy
Extension of policy class that allows for grid-search on specified parameters.
-
set_combos
(combos)[source]¶ Update list of parameter combinations to try.
Parameters: combos – list of combinations to use
-
set_config
(grid_search=False, top_models=False, num_episodes_per_combo=200, proc_id=0, num_procs=1, param_combos=None, param_abbrvs=None)[source]¶ Set config.
Parameters: - grid_search – whether to change parameters every certain number of episodes.
- top_models – whether to load from a text file and use the specified param combos inside. (Must have grid_search=True)
- num_episodes_per_combo – if grid search, number of episodes per each combination of alg. parameters.
- proc_id – if grid search, process ID of this environment, to be matched with the argument passed to the daemon launching script.
- num_procs – if grid search, number of processes that will be running in parallel
- param_combos – list of lists of parameter combinations
- param_abbrvs – abbreviated parameter names (for directory name)
-
gym_nethack.policies.combat module¶
-
class
gym_nethack.policies.combat.
ApproachAttackItemPolicy
[source]¶ Bases:
gym_nethack.policies.core.Policy
Heuristic policy for NetHack combat that randomly equips a weapon (and armor, if specified), then uses a random item with probability 0.25, and approaches the monster and attacks it at close range with probability 0.75. (If ranged weapon equipped, it will attack from a distance instead of approaching.)
-
class
gym_nethack.policies.combat.
ApproachAttackPolicy
[source]¶ Bases:
gym_nethack.policies.core.Policy
Heuristic policy for NetHack combat that randomly equips a weapon (and armor, if specified), then approaches the monster and attacks it at close range. (If ranged weapon equipped, it will attack from a distance instead of approaching.)
-
class
gym_nethack.policies.combat.
FireAntPolicy
[source]¶ Bases:
gym_nethack.policies.core.Policy
Heuristic policy for fire ant, as described in my thesis.
gym_nethack.policies.exploration module¶
-
class
gym_nethack.policies.exploration.
GreedyExplorationPolicy
(need_full_map=False)[source]¶ Bases:
gym_nethack.policies.exploration.MapExplorationPolicy
Map exploration policy that always visits closest frontier to player until no frontiers remain.
-
add_to_frontier_list
(pos)[source]¶ Add the given position to the frontier list, if we haven’t visited it already.
Parameters: pos – position that we want to add to the frontier list
-
compute_optimal_solution
()[source]¶ Compute the optimal exploration path length, as detailed in “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018. Requires GLNS solver.
-
done_exploring
()[source]¶ Check if we are done exploring (i.e., if there are no more frontiers, and we are not currently travelling anywhere).
-
get_best_target
(targets, consider_all=False)[source]¶ Find the closest position in the targets list to the player.
Parameters: - targets – positions to consider (i.e., frontiers)
- consider_all – used in subclass methods.
-
mark_explored
(pos)[source]¶ Add the given position to the explored positions list, and delete it from the frontier list if applicable.
Parameters: pos – position that we want to mark as explored
-
name
= 'greedy'¶
-
new_corridor
(corr)[source]¶ Update the passages list with the given corridor position.
Parameters: corr – corridor position
-
new_passage_from_room_exit
(room_centroid, exit)[source]¶ Update the passages list with the given room centroid and exit.
Parameters: - room_centroid – center of the room associated with the room exit below.
- exit – the room exit that we want to make a passage from.
-
process_and_check_target
(target)[source]¶ Check if the given target is valid.
Parameters: target – position we currently want to visit
-
set_config
(compute_optimal_path=False, get_food=False, show_graph=False, **args)[source]¶ Set config.
Parameters: - compute_optimal_path – whether to compute the optimal exploration path after each episode, as detailed in “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018.
- get_food – whether to stop to pick up food in rooms; increases num. of actions taken, but better approximates a real player’s exploration action total.
- show_graph – whether to show the room/corridor graph on screen.
-
-
class
gym_nethack.policies.exploration.
MapExplorationPolicy
(need_full_map=False)[source]¶ Bases:
gym_nethack.policies.core.ParameterizedPolicy
Template map exploration policy.
-
class
gym_nethack.policies.exploration.
OccupancyMapPolicy
[source]¶ Bases:
gym_nethack.policies.exploration.GreedyExplorationPolicy
Occupancy map exploration algorithm for NetHack. Described in the paper “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018.
-
dfs_threshold_prob
(start, prob_threshold)[source]¶ Do a DFS on the current unexplored area of the map. Only visit positions that are above the given probability threshold.
Parameters: - start – position to start at for the DFS
- prob_threshold – positions must be above this threshold value to be visited by DFS
-
done_exploring
()[source]¶ Returns a boolean indicating whether to stop exploring the current map (i.e., end the episode).
-
get_best_frontier
(good_targets, connected_components, return_all=False)[source]¶ Find the best component, and then find the best frontier associated with it.
Parameters: - good_targets – frontiers that have been evaluated and passed utility check
- connected_components – list of components
- return_all – whether to return best frontier for all components (to show on graph), or just best frontier for best component
-
get_best_target
(targets, consider_all=False)[source]¶ Find the closest position in the targets list to the player.
Parameters: - targets – positions to consider (i.e., frontiers)
- consider_all – whether to consider all frontiers, or just ones that have passed the utility check (good_position())
-
get_connected_components
(update=False)[source]¶ Get the connected components (in terms of graph theory) of unexplored space in the current NetHack map.
Parameters: update – whether to recalculate or return previously cached components
-
get_default_params
()[source]¶ Set the default params for the algorithm, which will be used if not using grid search.
-
get_dist_to_component
(component, position)[source]¶ Get the Manhattan distance from the given position to the closest cell of the given component.
Parameters: - component – list of positions (tuples)
- position – tuple representing position
-
get_distance_to_player
(target)[source]¶ Get distance to player, from cache if available.
Parameters: target – position (tuple)
-
get_evaluation_for_cells
(component, frontier, sum_dists, sum_probs)[source]¶ Evaluate component based on distance from closest frontier node to player and summed cell probability.
Parameters: - component – list of cells to evaluate
- frontier – tuple of (position, distance)
- sum_dists – sum of distances between player and frontiers associated with each valid component
- sum_probs – sum of cell probabilities for all valid components
-
get_frontier_near_component
(component, frontiers, frontier_dists_to_player)[source]¶ Get the frontier closest to both the given component and to the player.
Parameters: - component – list of positions (tuples)
- frontiers – list of frontiers to evaluate
- frontier_dists_to_player – distance to player for each frontier
-
get_prob_threshold
(dfs=False)[source]¶ Get the probability threshold.
Parameters: dfs – use different threshold multiplier if we are getting the threshold for DFS search versus threshold for regular component/cell validation.
-
good_position
(pos, prob_threshold)[source]¶ Check if frontier is interesting enough to visit.
Parameters: prob_threshold – probability threshold value
-
name
= 'occmap'¶
-
no_more_targets
(targets)[source]¶ Check if there are any more targets left.
Parameters: of frontiers (list) –
-
normalize_and_diffuse
(p_culled)[source]¶ Normalize occupancy map probabilities and run diffusion as described by D. Isla.
-
-
class
gym_nethack.policies.exploration.
SecretGreedyExplorationPolicy
[source]¶ Bases:
gym_nethack.policies.exploration.GreedyExplorationPolicy
Extension of greedy exploration algorithm to support searching for secret doors and corridors.
-
done_exploring
()[source]¶ Check if finished exploring: including if all search targets are above max num searches per wall.
-
end_episode
()[source]¶ Compute information about how many secret doors/corridors/rooms were discovered and how many were not, in the map for the current episode.
-
get_best_target
(targets, consider_all=False)[source]¶ Find the closest position in the targets list to the player, taking into account walls to search at.
Parameters: - targets – positions to consider (i.e., frontiers)
- consider_all – used in subclass methods.
-
name
= 'secgreedy'¶
-