Spaces

exception textworld.gym.spaces.text_spaces.VocabularyHasDuplicateTokens[source]

Bases: ValueError

class textworld.gym.spaces.text_spaces.Char(max_length, vocab=None, extra_vocab=[])[source]

Bases: MultiDiscrete

Character observation/action space

This space consists of a series of gym.spaces.Discrete objects all with the same parameters. Each gym.spaces.Discrete can take integer values between 0 and len(self.vocab).

Notes

The following special token will be prepended (if needed) to the vocabulary:

  • ‘#’ : Padding token

Parameters
  • max_length (int) – Maximum number of characters in a text.

  • vocab (list of char, optional) – Vocabulary defining this space. It shouldn’t contain any duplicate characters. If not provided, the vocabulary will consists in characters [a-z0-9], punctuations [” “, “-”, “’”] and padding ‘#’.

  • extra_vocab (list of char, optional) – Additional tokens to add to the vocabulary.

filter_unknown(text)[source]

Strip out all characters not in the vocabulary.

tokenize(text, padding=False)[source]

Tokenize characters found in the vocabulary.

Note: text will be padded up to self.max_length.

class textworld.gym.spaces.text_spaces.Word(max_length, vocab)[source]

Bases: MultiDiscrete

Word observation/action space

This space consists of a series of gym.spaces.Discrete objects all with the same parameters. Each gym.spaces.Discrete can take integer values between 0 and len(self.vocab).

Notes

The following special tokens will be prepended (if needed) to the vocabulary:

  • ‘<PAD>’ : Padding

  • ‘<UNK>’ : Unknown word

  • ‘<S>’ : Beginning of sentence

  • ‘</S>’ : End of sentence

Example

Let’s create an action space that can be used with textworld.gym.register_game. We are going to assume actions are short phrases up to 8 words long.

>>> import textworld
>>> gamefiles = ["/path/to/game.ulx", "/path/to/another/game.z8"]
>>> vocab = textworld.vocab.extract_from(gamefiles)
>>> vocab = sorted(vocab)  # Sorting the vocabulary, optional.
>>> action_space = textworld.gym.text_spaces.Word(max_length=8, vocab=vocab)
Parameters
  • max_length (int) – Maximum number of words in a text.

  • vocab (list of strings) – Vocabulary defining this space. It shouldn’t contain any duplicate words.

tokenize(text, padding=False)[source]

Tokenize words found in the vocabulary.

Note: text will be padded up to self.max_length.