bpe_load_model {tokenizers.bpe} | R Documentation |
Load a Byte Pair Encoding model trained with bpe
bpe_load_model(file, threads = -1L)
file |
path to the model |
threads |
integer with number of CPU threads to use for model processing. If equal to -1 then minimum of the number of available threads and 8 will be used |
an object of class youtokentome
which is a list with elements
model: an Rcpp pointer to the model
model_path: the path to the model
threads: the threads argument
vocab_size: the size of the BPE vocabulary
vocabulary: the BPE vocabulary with is a data.frame with columns id and subword
## Reload a model path <- system.file(package = "tokenizers.bpe", "extdata", "youtokentome.bpe") model <- bpe_load_model(path) ## Build a model and load it again data(belgium_parliament, package = "tokenizers.bpe") x <- subset(belgium_parliament, language == "french") model <- bpe(x$text, coverage = 0.999, vocab_size = 5000, threads = 1) model <- bpe_load_model(model$model_path, threads = 1) ## Remove the model file (Clean up for CRAN) file.remove(model$model_path)