gym_nethack.policies package¶

Submodules¶

gym_nethack.policies.core module¶

class gym_nethack.policies.core.ParameterizedPolicy[source]¶

Bases: gym_nethack.policies.core.Policy

Extension of policy class that allows for grid-search on specified parameters.

end_episode()[source]¶: Record new episode ended.

get_default_params()[source]¶: Get the default parameters for the policy.

reset()[source]¶: Called on starting a new episode.

set_combos(combos)[source]¶

Update list of parameter combinations to try.

Parameters:	combos – list of combinations to use

set_config(grid_search=False, top_models=False, num_episodes_per_combo=200, proc_id=0, num_procs=1, param_combos=None, param_abbrvs=None)[source]¶

Set config.

Parameters:

grid_search – whether to change parameters every certain number of episodes.
top_models – whether to load from a text file and use the specified param combos inside. (Must have grid_search=True)
num_episodes_per_combo – if grid search, number of episodes per each combination of alg. parameters.
proc_id – if grid search, process ID of this environment, to be matched with the argument passed to the daemon launching script.
num_procs – if grid search, number of processes that will be running in parallel
param_combos – list of lists of parameter combinations
param_abbrvs – abbreviated parameter names (for directory name)

set_params(params)[source]¶

Set the current parameters for the policy.

Parameters:	params – policy parameters

switch_encounter()[source]¶: Alter alg. parameters if using grid search.

class gym_nethack.policies.core.Policy(name='obsolete')[source]¶

Bases: object

Standard policy class taken from Keras-RL with a few extensions.

get_config()[source]¶

metrics¶

metrics_names¶

name = 'unnamed'¶

select_action(**kwargs)[source]¶

set_config()[source]¶

gym_nethack.policies.combat module¶

class gym_nethack.policies.combat.ApproachAttackItemPolicy[source]¶

Bases: gym_nethack.policies.core.Policy

Heuristic policy for NetHack combat that randomly equips a weapon (and armor, if specified), then uses a random item with probability 0.25, and approaches the monster and attacks it at close range with probability 0.75. (If ranged weapon equipped, it will attack from a distance instead of approaching.)

select_action(q_values, valid_action_indices)[source]¶

Return the action corresponding to the heuristic policy.

Parameters:	q_values – list of q-values, one per action valid_action_indices – indices of legal actions (corresponding to the abilities list)

set_config(equip_armor=False)[source]¶

Set policy parameters.

Parameters:	equip_armor – whether to randomly choose a piece of armor and equip it, to a maximum of five pieces of armor, before starting to approach and attack the monster.

class gym_nethack.policies.combat.ApproachAttackPolicy[source]¶

Bases: gym_nethack.policies.core.Policy

Heuristic policy for NetHack combat that randomly equips a weapon (and armor, if specified), then approaches the monster and attacks it at close range. (If ranged weapon equipped, it will attack from a distance instead of approaching.)

select_action(q_values, valid_action_indices)[source]¶

Return the action corresponding to the heuristic policy.

Parameters:	q_values – list of q-values, one per action valid_action_indices – indices of legal actions (corresponding to the abilities list)

set_config(equip_armor=False)[source]¶

Set policy parameters.

Parameters:	equip_armor – whether to randomly choose a piece of armor and equip it, to a maximum of five pieces of armor, before starting to approach and attack the monster.

class gym_nethack.policies.combat.FireAntPolicy[source]¶

Bases: gym_nethack.policies.core.Policy

Heuristic policy for fire ant, as described in my thesis.

new_episode()[source]¶: Start a new episode, resetting policy state.

select_action(q_values, valid_action_indices)[source]¶

Return the action corresponding to the heuristic policy.

Parameters:	q_values – list of q-values, one per action valid_action_indices – indices of legal actions (corresponding to the abilities list)

gym_nethack.policies.exploration module¶

class gym_nethack.policies.exploration.GreedyExplorationPolicy(need_full_map=False)[source]¶

Bases: gym_nethack.policies.exploration.MapExplorationPolicy

Map exploration policy that always visits closest frontier to player until no frontiers remain.

add_to_frontier_list(pos)[source]¶

Add the given position to the frontier list, if we haven’t visited it already.

Parameters:	pos – position that we want to add to the frontier list

compute_optimal_solution()[source]¶: Compute the optimal exploration path length, as detailed in “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018. Requires GLNS solver.

done_exploring()[source]¶: Check if we are done exploring (i.e., if there are no more frontiers, and we are not currently travelling anywhere).

draw_graph()[source]¶: Draw the room/corridor graph (uses matplotlib).

end_episode()[source]¶: End the current episode.

end_turn()[source]¶: End the current turn.

exit_graph()[source]¶: Close the currently displayed graph.

first_turn_update()[source]¶: Special preparation taken on first turn only.

get_best_target(targets, consider_all=False)[source]¶

Find the closest position in the targets list to the player.

Parameters:	targets – positions to consider (i.e., frontiers) consider_all – used in subclass methods.

init_graph()[source]¶: Initialize the room/corridor graph (uses matplotlib).

mark_explored(pos)[source]¶

Add the given position to the explored positions list, and delete it from the frontier list if applicable.

Parameters:	pos – position that we want to mark as explored

name = 'greedy'¶

need_new_target()[source]¶: Check if we need a new target.

new_corridor(corr)[source]¶

Update the passages list with the given corridor position.

Parameters:	corr – corridor position

new_passage_from_room_exit(room_centroid, exit)[source]¶

Update the passages list with the given room centroid and exit.

Parameters:	room_centroid – center of the room associated with the room exit below. exit – the room exit that we want to make a passage from.

observe_action()[source]¶: Parse NetHack map.

process_and_check_target(target)[source]¶

Check if the given target is valid.

Parameters:	target – position we currently want to visit

reset()[source]¶: Reset policy state.

select_action(q_values, valid_action_indices)[source]¶: Determine where to move next (greedily).

set_config(compute_optimal_path=False, get_food=False, show_graph=False, **args)[source]¶

Set config.

Parameters:

compute_optimal_path – whether to compute the optimal exploration path after each episode, as detailed in “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018.
get_food – whether to stop to pick up food in rooms; increases num. of actions taken, but better approximates a real player’s exploration action total.
show_graph – whether to show the room/corridor graph on screen.

class gym_nethack.policies.exploration.MapExplorationPolicy(need_full_map=False)[source]¶

Bases: gym_nethack.policies.core.ParameterizedPolicy

Template map exploration policy.

done_exploring()[source]¶: Returns a boolean indicating whether to stop exploring the current map (i.e., end the episode).

class gym_nethack.policies.exploration.OccupancyMapPolicy[source]¶

Bases: gym_nethack.policies.exploration.GreedyExplorationPolicy

Occupancy map exploration algorithm for NetHack. Described in the paper “Exploration with Secret Discovery”, J. Campbell & C. Verbrugge, IEEE Transactions on Games, 2018.

dfs_threshold_prob(start, prob_threshold)[source]¶

Do a DFS on the current unexplored area of the map. Only visit positions that are above the given probability threshold.

Parameters:	start – position to start at for the DFS prob_threshold – positions must be above this threshold value to be visited by DFS

done_exploring()[source]¶: Returns a boolean indicating whether to stop exploring the current map (i.e., end the episode).

draw_graph()[source]¶: Draw the occupancy map graph (uses matplotlib).

first_turn_update()[source]¶: Special preparation taken on first turn only.

get_best_frontier(good_targets, connected_components, return_all=False)[source]¶

Find the best component, and then find the best frontier associated with it.

Parameters:	good_targets – frontiers that have been evaluated and passed utility check connected_components – list of components return_all – whether to return best frontier for all components (to show on graph), or just best frontier for best component

get_best_target(targets, consider_all=False)[source]¶

Find the closest position in the targets list to the player.

Parameters:	targets – positions to consider (i.e., frontiers) consider_all – whether to consider all frontiers, or just ones that have passed the utility check (good_position())

get_connected_components(update=False)[source]¶

Get the connected components (in terms of graph theory) of unexplored space in the current NetHack map.

Parameters:	update – whether to recalculate or return previously cached components

get_default_params()[source]¶: Set the default params for the algorithm, which will be used if not using grid search.

get_dist_to_component(component, position)[source]¶

Get the Manhattan distance from the given position to the closest cell of the given component.

Parameters:	component – list of positions (tuples) position – tuple representing position

get_distance_to_player(target)[source]¶

Get distance to player, from cache if available.

Parameters:	target – position (tuple)

get_evaluation_for_cells(component, frontier, sum_dists, sum_probs)[source]¶

Evaluate component based on distance from closest frontier node to player and summed cell probability.

Parameters:	component – list of cells to evaluate frontier – tuple of (position, distance) sum_dists – sum of distances between player and frontiers associated with each valid component sum_probs – sum of cell probabilities for all valid components

get_frontier_near_component(component, frontiers, frontier_dists_to_player)[source]¶

Get the frontier closest to both the given component and to the player.

Parameters:	component – list of positions (tuples) frontiers – list of frontiers to evaluate frontier_dists_to_player – distance to player for each frontier

get_prob_threshold(dfs=False)[source]¶

Get the probability threshold.

Parameters:	dfs – use different threshold multiplier if we are getting the threshold for DFS search versus threshold for regular component/cell validation.

good_position(pos, prob_threshold)[source]¶

Check if frontier is interesting enough to visit.

Parameters:	prob_threshold – probability threshold value

init_graph()[source]¶: Initialize the occupancy map graph (uses matplotlib).

name = 'occmap'¶

no_more_targets(targets)[source]¶

Check if there are any more targets left.

Parameters:	of frontiers (list) –

normalize_and_diffuse(p_culled)[source]¶: Normalize occupancy map probabilities and run diffusion as described by D. Isla.

observe_action()[source]¶: Parse NetHack map and update occupancy map accordingly.

reset()[source]¶: Prepare for a new episode.

set_config(**args)[source]¶: Set config.

set_params(params)[source]¶

Set the current parameters for the policy.

Parameters:	params – policy parameters

update_caches(targets, prob_threshold=None)[source]¶

Update validated frontier and component caches.

Parameters:	targets – current list of frontiers prob_threshold – probability threshold value

class gym_nethack.policies.exploration.SecretGreedyExplorationPolicy[source]¶

Bases: gym_nethack.policies.exploration.GreedyExplorationPolicy

Extension of greedy exploration algorithm to support searching for secret doors and corridors.

done_exploring()[source]¶: Check if finished exploring: including if all search targets are above max num searches per wall.

end_episode()[source]¶: Compute information about how many secret doors/corridors/rooms were discovered and how many were not, in the map for the current episode.

get_best_target(targets, consider_all=False)[source]¶

Find the closest position in the targets list to the player, taking into account walls to search at.

Parameters:	targets – positions to consider (i.e., frontiers) consider_all – used in subclass methods.

get_default_params()[source]¶: Get the default parameters for the policy.

name = 'secgreedy'¶

observe_action()[source]¶: Parse NetHack map.

process_and_check_target(target)[source]¶

Check if the given target is valid.

Parameters:	target – position we currently want to visit

reset()[source]¶: Prepare the environment for a new episode.

select_action(q_values, valid_action_indices)[source]¶: Determine where to move next (greedily).

set_config(**args)[source]¶: Set config.

set_params(params)[source]¶

Set the current parameters for the policy.

Parameters:	params – policy parameters

gym_nethack.policies.rl module¶

class gym_nethack.policies.rl.BoltzmannPossibleQPolicy(tau=1.0, clip=(-500.0, 500.0))[source]¶

Bases: gym_nethack.policies.core.Policy

get_config()[source]¶

select_action(q_values, valid_action_indices)[source]¶

class gym_nethack.policies.rl.EpsGreedyPossibleQPolicy(eps=0.1)[source]¶

Bases: gym_nethack.policies.core.Policy

get_config()[source]¶

select_action(q_values, valid_action_indices)[source]¶

class gym_nethack.policies.rl.LinearAnnealedPolicy(inner_policy, attr, value_max, value_min, value_test, nb_steps)[source]¶

Bases: gym_nethack.policies.core.Policy

get_config()[source]¶

get_current_value()[source]¶

metrics¶

metrics_names¶

select_action(**kwargs)[source]¶

gym_nethack.policies package¶

Submodules¶

gym_nethack.policies.core module¶

gym_nethack.policies.combat module¶

gym_nethack.policies.exploration module¶

gym_nethack.policies.rl module¶

Module contents¶

gym_nethack

Navigation

Related Topics