Title: | Count Words, Chars and Non-Whitespace Chars in R Markdown Docs |
---|---|
Description: | If you are using R Markdown documents then you have sometimes restrictions about the size of the documents, e.g. number of words, number of characters or non-whitespace characters. rmdcount() computes these counts with and without code chunks and returns the result as data frame. |
Authors: | Sigbert Klinke [aut, cre] |
Maintainer: | Sigbert Klinke <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0 |
Built: | 2024-11-04 03:02:49 UTC |
Source: | https://github.com/sigbertklinke/rmdwc |
rmdcount
counts lines, words, bytes, characters and non-whitespace characters in R Markdown files excluding code chunks.
txtcount
counts lines, words, bytes, characters and non-whitespace characters in plain text files.
Note that the counts may differ a bit from unix wc
and Libre Office because
it depends on the definition of a line, a word and a character.
rmdcount( files = NULL, space = "[[:space:]]", word = "[[:space:]]+", line = "\n", exclude = "```\\{.*?```" ) txtcount( files = NULL, space = "[[:space:]]", word = "[[:space:]]+", line = "\n" )
rmdcount( files = NULL, space = "[[:space:]]", word = "[[:space:]]+", line = "\n", exclude = "```\\{.*?```" ) txtcount( files = NULL, space = "[[:space:]]", word = "[[:space:]]+", line = "\n" )
files |
character: file name(s) |
space |
character: pattern to split a text at spaces (default: |
word |
character: pattern to split a text at word boundaries (default: |
line |
character: pattern to split lines (default: |
exclude |
character: pattern to exclude text parts, e.g. code chunks (default: |
We define:
the number of lines. It differs from unix wc -l
since wc
counts the number of newlines.
it is considered to be a character or characters delimited by white space. However, a "word" is in general a fuzzy concept, for example is "3.141593" a word? Therefore different programs may count differently, for more details see the discussion to the Libreoffice bug Word count gives wrong results - Another Example Comment 5.
The following approach is used to detect lines, words, characters and non-whitespace characters.
strsplit(rmd, line)[[1]]
with line='\n'
charToRaw(rmd)
strsplit(rmd, word)[[1]]
with word='[[:space:]]+'
strsplit(rmd, '')[[1]]
strsplit(gsub(space, '', rmd), '')[[1]]
with space='[[:space:]]'
If txtcount
is used then code chunks are deleted with gsub('```\\{.*?```', '', rmd)
before counting.
a data frame with following elements
basename of file
number of lines
number of words
number of bytes
number of characters
number of non-whitespace characters
path of file
# count excluding code chunks files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") rmdcount(files) # count including code chunks txtcount(files) # or rmdcount(files, exclude='') # count for a set of R Markdown docs files <- list.files(path=system.file('rmarkdown', package="rmdwc"), pattern="*.Rmd", full.names=TRUE) rmdcount(files) # use of rmdcount() in a R Markdown document if (interactive()) { files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") file.edit(files) # SAVE(!) the file and knit it } # count including code chunks files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") txtcount(files)
# count excluding code chunks files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") rmdcount(files) # count including code chunks txtcount(files) # or rmdcount(files, exclude='') # count for a set of R Markdown docs files <- list.files(path=system.file('rmarkdown', package="rmdwc"), pattern="*.Rmd", full.names=TRUE) rmdcount(files) # use of rmdcount() in a R Markdown document if (interactive()) { files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") file.edit(files) # SAVE(!) the file and knit it } # count including code chunks files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") txtcount(files)
Applies rmdcount
to the current R Markdown document
rmdcountAddin()
rmdcountAddin()
nothing
if (interactive()) rmdcountAddin()
if (interactive()) rmdcountAddin()
Counts words, characters and non-whitespace characters in a string. Is used in rmdcount
, see details there.
rmdwcl(rmd, space = "[[:space:]]", word = "[[:space:]]+", line = "\n")
rmdwcl(rmd, space = "[[:space:]]", word = "[[:space:]]+", line = "\n")
rmd |
character: R Markdown document as string |
space |
character: pattern to split a text at spaces (default: |
word |
character: pattern to split a text at word boundaries (default: |
line |
character: pattern to split lines (default: |
a list
file <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") fcont <- readChar(file, file.info(file)$size) rmdwcl(fcont)
file <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc") fcont <- readChar(file, file.info(file)$size) rmdwcl(fcont)