gym_nethack.policies package

Submodules

gym_nethack.policies.core module

class gym_nethack.policies.core.ParameterizedPolicy[source]

Bases: gym_nethack.policies.core.Policy

Extension of policy class that allows for grid-search on specified parameters.

end_episode()[source]

Record new episode ended.

get_default_params()[source]

Get the default parameters for the policy.

reset()[source]

Called on starting a new episode.

set_combos(combos)[source]

Update list of parameter combinations to try.

Parameters:combos – list of combinations to use
set_config(grid_search=False, top_models=False, num_episodes_per_combo=200, proc_id=0, num_procs=1, param_combos=None, param_abbrvs=None)[source]

Set config.

Parameters:
  • grid_search – whether to change parameters every certain number of episodes.
  • top_models – whether to load from a text file and use the specified param combos inside. (Must have grid_search=True)
  • num_episodes_per_combo – if grid search, number of episodes per each combination of alg. parameters.
  • proc_id – if grid search, process ID of this environment, to be matched with the argument passed to the daemon launching script.
  • num_procs – if grid search, number of processes that will be running in parallel
  • param_combos – list of lists of parameter combinations
  • param_abbrvs – abbreviated parameter names (for directory name)
set_params(params)[source]

Set the current parameters for the policy.

Parameters:params – policy parameters
switch_encounter()[source]

Alter alg. parameters if using grid search.

class gym_nethack.policies.core.Policy(name='obsolete')[source]

Bases: object

Standard policy class taken from Keras-RL with a few extensions.

get_config()[source]
metrics
metrics_names
name = 'unnamed'
select_action(**kwargs)[source]
set_config()[source]

gym_nethack.policies.combat module

class gym_nethack.policies.combat.ApproachAttackItemPolicy[source]

Bases: gym_nethack.policies.core.Policy

Heuristic policy for NetHack combat that randomly equips a weapon (and armor, if specified), then uses a random item with probability 0.25, and approaches the monster and attacks it at close range with probability 0.75. (If ranged weapon equipped, it will attack from a distance instead of approaching.)

select_action(q_values, valid_action_indices)[source]

Return the action corresponding to the heuristic policy.

Parameters:
  • q_values – list of q-values, one per action
  • valid_action_indices – indices of legal actions (corresponding to the abilities list)
set_config(equip_armor=False)[source]

Set policy parameters.

Parameters:equip_armor – whether to randomly choose a piece of armor and equip it, to a maximum of five pieces of armor, before starting to approach and attack the monster.
class gym_nethack.policies.combat.ApproachAttackPolicy[source]

Bases: gym_nethack.policies.core.Policy

Heuristic policy for NetHack combat that randomly equips a weapon (and armor, if specified), then approaches the monster and attacks it at close range. (If ranged weapon equipped, it will attack from a distance instead of approaching.)

select_action(q_values, valid_action_indices)[source]

Return the action corresponding to the heuristic policy.

Parameters:
  • q_values – list of q-values, one per action
  • valid_action_indices – indices of legal actions (corresponding to the abilities list)
set_config(equip_armor=False)[source]

Set policy parameters.

Parameters:equip_armor – whether to randomly choose a piece of armor and equip it, to a maximum of five pieces of armor, before starting to approach and attack the monster.
class gym_nethack.policies.combat.FireAntPolicy[source]

Bases: gym_nethack.policies.core.Policy

Heuristic policy for fire ant, as described in my thesis.

new_episode()[source]

Start a new episode, resetting policy state.

select_action(q_values, valid_action_indices)[source]

Return the action corresponding to the heuristic policy.

Parameters:
  • q_values – list of q-values, one per action
  • valid_action_indices – indices of legal actions (corresponding to the abilities list)

gym_nethack.policies.exploration module

class gym_nethack.policies.exploration.GreedyExplorationPolicy(need_full_map=False)[source]

Bases: gym_nethack.policies.exploration.MapExplorationPolicy

Map exploration policy that always visits closest frontier to player until no frontiers remain.

add_to_frontier_list(pos)[source]

Add the given position to the frontier list, if we haven’t visited it already.

Parameters:pos – position that we want to add to the frontier list
compute_optimal_solution()[source]

Compute the optimal exploration path length, as detailed in “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018. Requires GLNS solver.

done_exploring()[source]

Check if we are done exploring (i.e., if there are no more frontiers, and we are not currently travelling anywhere).

draw_graph()[source]

Draw the room/corridor graph (uses matplotlib).

end_episode()[source]

End the current episode.

end_turn()[source]

End the current turn.

exit_graph()[source]

Close the currently displayed graph.

first_turn_update()[source]

Special preparation taken on first turn only.

get_best_target(targets, consider_all=False)[source]

Find the closest position in the targets list to the player.

Parameters:
  • targets – positions to consider (i.e., frontiers)
  • consider_all – used in subclass methods.
init_graph()[source]

Initialize the room/corridor graph (uses matplotlib).

mark_explored(pos)[source]

Add the given position to the explored positions list, and delete it from the frontier list if applicable.

Parameters:pos – position that we want to mark as explored
name = 'greedy'
need_new_target()[source]

Check if we need a new target.

new_corridor(corr)[source]

Update the passages list with the given corridor position.

Parameters:corr – corridor position
new_passage_from_room_exit(room_centroid, exit)[source]

Update the passages list with the given room centroid and exit.

Parameters:
  • room_centroid – center of the room associated with the room exit below.
  • exit – the room exit that we want to make a passage from.
observe_action()[source]

Parse NetHack map.

process_and_check_target(target)[source]

Check if the given target is valid.

Parameters:target – position we currently want to visit
reset()[source]

Reset policy state.

select_action(q_values, valid_action_indices)[source]

Determine where to move next (greedily).

set_config(compute_optimal_path=False, get_food=False, show_graph=False, **args)[source]

Set config.

Parameters:
  • compute_optimal_path – whether to compute the optimal exploration path after each episode, as detailed in “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018.
  • get_food – whether to stop to pick up food in rooms; increases num. of actions taken, but better approximates a real player’s exploration action total.
  • show_graph – whether to show the room/corridor graph on screen.
class gym_nethack.policies.exploration.MapExplorationPolicy(need_full_map=False)[source]

Bases: gym_nethack.policies.core.ParameterizedPolicy

Template map exploration policy.

done_exploring()[source]

Returns a boolean indicating whether to stop exploring the current map (i.e., end the episode).

class gym_nethack.policies.exploration.OccupancyMapPolicy[source]

Bases: gym_nethack.policies.exploration.GreedyExplorationPolicy

Occupancy map exploration algorithm for NetHack. Described in the paper “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018.

dfs_threshold_prob(start, prob_threshold)[source]

Do a DFS on the current unexplored area of the map. Only visit positions that are above the given probability threshold.

Parameters:
  • start – position to start at for the DFS
  • prob_threshold – positions must be above this threshold value to be visited by DFS
done_exploring()[source]

Returns a boolean indicating whether to stop exploring the current map (i.e., end the episode).

draw_graph()[source]

Draw the occupancy map graph (uses matplotlib).

first_turn_update()[source]

Special preparation taken on first turn only.

get_best_frontier(good_targets, connected_components, return_all=False)[source]

Find the best component, and then find the best frontier associated with it.

Parameters:
  • good_targets – frontiers that have been evaluated and passed utility check
  • connected_components – list of components
  • return_all – whether to return best frontier for all components (to show on graph), or just best frontier for best component
get_best_target(targets, consider_all=False)[source]

Find the closest position in the targets list to the player.

Parameters:
  • targets – positions to consider (i.e., frontiers)
  • consider_all – whether to consider all frontiers, or just ones that have passed the utility check (good_position())
get_connected_components(update=False)[source]

Get the connected components (in terms of graph theory) of unexplored space in the current NetHack map.

Parameters:update – whether to recalculate or return previously cached components
get_default_params()[source]

Set the default params for the algorithm, which will be used if not using grid search.

get_dist_to_component(component, position)[source]

Get the Manhattan distance from the given position to the closest cell of the given component.

Parameters:
  • component – list of positions (tuples)
  • position – tuple representing position
get_distance_to_player(target)[source]

Get distance to player, from cache if available.

Parameters:target – position (tuple)
get_evaluation_for_cells(component, frontier, sum_dists, sum_probs)[source]

Evaluate component based on distance from closest frontier node to player and summed cell probability.

Parameters:
  • component – list of cells to evaluate
  • frontier – tuple of (position, distance)
  • sum_dists – sum of distances between player and frontiers associated with each valid component
  • sum_probs – sum of cell probabilities for all valid components
get_frontier_near_component(component, frontiers, frontier_dists_to_player)[source]

Get the frontier closest to both the given component and to the player.

Parameters:
  • component – list of positions (tuples)
  • frontiers – list of frontiers to evaluate
  • frontier_dists_to_player – distance to player for each frontier
get_prob_threshold(dfs=False)[source]

Get the probability threshold.

Parameters:dfs – use different threshold multiplier if we are getting the threshold for DFS search versus threshold for regular component/cell validation.
good_position(pos, prob_threshold)[source]

Check if frontier is interesting enough to visit.

Parameters:prob_threshold – probability threshold value
init_graph()[source]

Initialize the occupancy map graph (uses matplotlib).

name = 'occmap'
no_more_targets(targets)[source]

Check if there are any more targets left.

Parameters:of frontiers (list) –
normalize_and_diffuse(p_culled)[source]

Normalize occupancy map probabilities and run diffusion as described by D. Isla.

observe_action()[source]

Parse NetHack map and update occupancy map accordingly.

reset()[source]

Prepare for a new episode.

set_config(**args)[source]

Set config.

set_params(params)[source]

Set the current parameters for the policy.

Parameters:params – policy parameters
update_caches(targets, prob_threshold=None)[source]

Update validated frontier and component caches.

Parameters:
  • targets – current list of frontiers
  • prob_threshold – probability threshold value
class gym_nethack.policies.exploration.SecretGreedyExplorationPolicy[source]

Bases: gym_nethack.policies.exploration.GreedyExplorationPolicy

Extension of greedy exploration algorithm to support searching for secret doors and corridors.

done_exploring()[source]

Check if finished exploring: including if all search targets are above max num searches per wall.

end_episode()[source]

Compute information about how many secret doors/corridors/rooms were discovered and how many were not, in the map for the current episode.

get_best_target(targets, consider_all=False)[source]

Find the closest position in the targets list to the player, taking into account walls to search at.

Parameters:
  • targets – positions to consider (i.e., frontiers)
  • consider_all – used in subclass methods.
get_default_params()[source]

Get the default parameters for the policy.

name = 'secgreedy'
observe_action()[source]

Parse NetHack map.

process_and_check_target(target)[source]

Check if the given target is valid.

Parameters:target – position we currently want to visit
reset()[source]

Prepare the environment for a new episode.

select_action(q_values, valid_action_indices)[source]

Determine where to move next (greedily).

set_config(**args)[source]

Set config.

set_params(params)[source]

Set the current parameters for the policy.

Parameters:params – policy parameters

gym_nethack.policies.rl module

class gym_nethack.policies.rl.BoltzmannPossibleQPolicy(tau=1.0, clip=(-500.0, 500.0))[source]

Bases: gym_nethack.policies.core.Policy

get_config()[source]
select_action(q_values, valid_action_indices)[source]
class gym_nethack.policies.rl.EpsGreedyPossibleQPolicy(eps=0.1)[source]

Bases: gym_nethack.policies.core.Policy

get_config()[source]
select_action(q_values, valid_action_indices)[source]
class gym_nethack.policies.rl.LinearAnnealedPolicy(inner_policy, attr, value_max, value_min, value_test, nb_steps)[source]

Bases: gym_nethack.policies.core.Policy

get_config()[source]
get_current_value()[source]
metrics
metrics_names
select_action(**kwargs)[source]

Module contents