Title: | Use Data from the Czech Public Finance Database |
---|---|
Description: | Get programmatic access to data from the Czech public budgeting and accounting database, Státní pokladna <https://monitor.statnipokladna.cz/>. |
Authors: | Petr Bouchal [aut, cre] |
Maintainer: | Petr Bouchal <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.7.4 |
Built: | 2024-11-01 11:16:02 UTC |
Source: | https://github.com/petrbouchal/statnipokladna |
Deprecated, use sp_add_codelist()
instead.
add_codelist( data, codelist = NULL, period_column = .data$vykaz_date, redownload = FALSE, dest_dir = NULL )
add_codelist( data, codelist = NULL, period_column = .data$vykaz_date, redownload = FALSE, dest_dir = NULL )
data |
a data frame returned by |
codelist |
The codelist to add. Either a character vector of length one (see |
period_column |
Unquoted column name of column identifying the data period in |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
A tibble of same length as data
, with added columns from codelist
. See Details.
Other Core workflow:
get_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
,
sp_get_table()
Deprecated: use sp_get_codelist()
get_codelist(codelist_id, n = NULL, dest_dir = NULL, redownload = FALSE)
get_codelist(codelist_id, n = NULL, dest_dir = NULL, redownload = FALSE)
codelist_id |
A codelist ID. See |
n |
Number of rows to return. Default (NULL) means all. Useful for quickly inspecting a codelist. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
A tibble
Other Core workflow:
add_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
,
sp_get_table()
Joins a provided codelist, or downloads and processes one if necessary, and adds it to the data.
sp_add_codelist( data, codelist = NULL, period_column = .data$vykaz_date, by = NULL, redownload = FALSE, dest_dir = NULL )
sp_add_codelist( data, codelist = NULL, period_column = .data$vykaz_date, by = NULL, redownload = FALSE, dest_dir = NULL )
data |
a data frame returned by |
codelist |
The codelist to add. Either a character vector of length one (see |
period_column |
Unquoted column name of column identifying the data period in |
by |
character. Columns by which to join the codelist. Same form as for |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
The data
argument should be a data frame produced by sp_get_table()
If this is true, the period_column
argument is not needed.
The codelist
argument, if a data frame, should be a data frame produced by
sp_get_codelist()
. Specifically, it assumes it contains the following columns:
start_date, a date
end_date, a date
column with the code, character usually named the same as the codelist
#' You can usually tell which codelist you need from the name of the column whose code you are looking to expand, e.g. the codes in column paragraf can be expanded by codelist paragraf.
The function filters the codelist to obtain a set of entries relevant to the time period of data
.
If data
contains tables for multiple periods, this is handled appropriately.
Codelist-originating columns in the resulting data frame are renamed so they do not interfere with
joining additional codelists, perhaps in a single pipe call.
Note that some codelists are "secondary" and can only be joined onto other codelists.
If a codelist does not join using sp_add_codelis()
, store the output of sp_get_codelist()
and join
it manually using dplyr
.
A tibble of same length as data
, with added columns from codelist
. See Details.
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
,
sp_get_table()
## Not run: sp_get_table("budget-central", 2017) %>% sp_add_codelist("polozka") %>% sp_add_codelist("paragraf") pol <- sp_get_codelist("paragraf") par <- sp_get_codelist("polozka") sp_get_table("budget-central", 2017) %>% sp_add_codelist(pol) %>% sp_add_codelist(par) ## End(Not run)
## Not run: sp_get_table("budget-central", 2017) %>% sp_add_codelist("polozka") %>% sp_add_codelist("paragraf") pol <- sp_get_codelist("paragraf") par <- sp_get_codelist("polozka") sp_get_table("budget-central", 2017) %>% sp_add_codelist(pol) %>% sp_add_codelist(par) ## End(Not run)
Contains IDs and names of all (most) available codelists that can be retrieved by sp_get_codelist.
sp_codelists
sp_codelists
A data frame with 27 rows and 2 variables:
id
character. ID, used as codelist_id
argument in sp_get_codelist
.
name
character. Short name, mostly corresponds to title used on statnipokladna.cz.
The id
is to be used as the codelist_id
parameter in sp_get_codelist
.
See https://monitor.statnipokladna.cz/datovy-katalog/ciselniky for a more detailed
descriptions and a GUI for exploring the lists.
Other Lists of available entities:
sp_datasets
,
sp_tables
Contains IDs and names of all available datasets that can be retrieved by get_dataset.
sp_datasets
sp_datasets
A data frame with 9 rows and 3 variables:
id
character. Dataset ID, used as dataset_id
argument to sp_get_dataset
.
name
character. Dataset name, mostly corresponds to title on the statnipokladna GUI.
See https://monitor.statnipokladna.cz/datovy-katalog/transakcni-data for a more detailed descriptions of the datasets.
Other Lists of available entities:
sp_codelists
,
sp_tables
Downloads and processes codelist identified by codelist_id
. See sp_codelists
for a list of
of available codelists with their IDs and names.
sp_get_codelist(codelist_id, n = NULL, dest_dir = NULL, redownload = FALSE)
sp_get_codelist(codelist_id, n = NULL, dest_dir = NULL, redownload = FALSE)
codelist_id |
A codelist ID. See |
n |
Number of rows to return. Default (NULL) means all. Useful for quickly inspecting a codelist. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
You can usually tell which codelist you need from the name of the column whose code you are looking to expand, e.g. the codes in column paragraf can be expanded by codelist paragraf.
The processing ensures that the resulting codelist can be correctly joined to
the data, automatically using sp_add_codelist()
or manually.
The entire codelist is downloaded and not filtered for any particular date.
Codelist XML files are stored in a temporary directory as determined by tempdir()
and persist per session to avoid redownloads.
a tibble
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_add_codelist()
,
sp_get_dataset()
,
sp_get_table()
## Not run: sp_get_codelist("paragraf") ## End(Not run)
## Not run: sp_get_codelist("paragraf") ## End(Not run)
This is normally called inside sp_get_codelist()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
sp_get_codelist_file( codelist_id = NULL, url = NULL, dest_dir = NULL, redownload = FALSE )
sp_get_codelist_file( codelist_id = NULL, url = NULL, dest_dir = NULL, redownload = FALSE )
codelist_id |
A codelist ID. See |
url |
DESCRIPTION. Either this or |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
path to XML file; character vector of length one.
Other Detailed workflow:
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_codelist()
,
sp_load_table()
## Not run: sp_get_codelist_file("druhuj") codelist_url <- sp_get_codelist_url("druhuj") sp_get_codelist_file(url = codelist_url) ## End(Not run)
## Not run: sp_get_codelist_file("druhuj") codelist_url <- sp_get_codelist_url("druhuj") sp_get_codelist_file(url = codelist_url) ## End(Not run)
This is normally called inside sp_get_codelist()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
sp_get_codelist_url(codelist_id, check_if_exists = TRUE)
sp_get_codelist_url(codelist_id, check_if_exists = TRUE)
codelist_id |
DESCRIPTION. |
check_if_exists |
Whether to check that the URL works (HTTP 200). |
character vector of length one containing URL
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_codelist()
,
sp_load_table()
## Not run: sp_get_codelist_url("ucjed", FALSE) if(FALSE) sp_get_codelist_url("ucjed_wrong", TRUE) # fails, invalid codelist ## End(Not run)
## Not run: sp_get_codelist_url("ucjed", FALSE) if(FALSE) sp_get_codelist_url("ucjed_wrong", TRUE) # fails, invalid codelist ## End(Not run)
Downloads ZIP archives for a given dataset. If year
or month
have length > 1, gets all combinations.
sp_get_dataset( dataset_id, year, month = 12, dest_dir = NULL, redownload = FALSE )
sp_get_dataset( dataset_id, year, month = 12, dest_dir = NULL, redownload = FALSE )
dataset_id |
A dataset ID. See |
year |
year, numeric vector of length <= 1 (can take multiple values), 2015-2019 for some datasets, 2010-2020 for others. Defaults to 2018. (see Details for how to work with data across time periods.) |
month |
month, numeric vector of length <= 1 (can take multiple values). Must be between 1 and 12. Defaults to 12. (see Details for how to work with data across time periods.) |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
Files are stored in a temp folder as determined by tempdir()
or the dest_dir
param or the statnipokladna.dest_dir
option.
and further sorted into subdirectories by dataset, year and month. If saved to tempdir()
(the default), downloaded files per session to avoid redownloads.
How data for different time periods is exported differs by dataset.
This has significant implications for how you get to usable full-year numbers or time series in different tables.
See vignette("statnipokladna")
for details on this.
character string with complete paths to downloaded ZIP archives.
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_table()
## Not run: budget_2018 <- sp_get_dataset("finm", 2018) budget_mid2018 <- sp_get_dataset("finm", 2018, 6) ## End(Not run)
## Not run: budget_2018 <- sp_get_dataset("finm", 2018) budget_mid2018 <- sp_get_dataset("finm", 2018, 6) ## End(Not run)
Downloads XLS file with dataset documentation, or opens link to this file in browser.
sp_get_dataset_doc(dataset_id, dest_dir = NULL, download = TRUE)
sp_get_dataset_doc(dataset_id, dest_dir = NULL, download = TRUE)
dataset_id |
dataset ID. See |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
download |
Whether to download (the default) or open link in browser. |
(invisible) path to file if download = TRUE
, URL otherwise
Other Utilities:
sp_get_codelist_viewer()
## Not run: sp_get_dataset_doc("finm") ## End(Not run)
## Not run: sp_get_dataset_doc("finm") ## End(Not run)
Useful for workflows where you want to keep track of URLs and intermediate files, rather than having all steps performed by one function.
sp_get_dataset_url(dataset_id, year, month = 12, check_if_exists = TRUE)
sp_get_dataset_url(dataset_id, year, month = 12, check_if_exists = TRUE)
dataset_id |
Dataset ID. See |
year |
year, numeric vector of length <= 1 (can take multiple values), 2015-2019 for some datasets, 2010-2020 for others. (see Details for how to work with data across time periods.) |
month |
month, numeric vector of length <= 1 (can take multiple values). Must be between 1 and 12. Defaults to 12. (see Details for how to work with data across time periods.) |
check_if_exists |
Whether to check that the URL works (HTTP 200). |
a character vector of length one, containing a URL
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_table_file()
,
sp_load_codelist()
,
sp_load_table()
## Not run: sp_get_dataset_url("finm", 2018, 6, FALSE) sp_get_dataset_url("finm", 2029, 6, FALSE) # works but returns invalid URL if(FALSE) sp_get_dataset_url("finm_wrong", 2018, 6, TRUE) # fails, invalid dataset ID if(FALSE) sp_get_dataset_url("finm", 2022, 6, TRUE) # fails, invalid time period ## End(Not run)
## Not run: sp_get_dataset_url("finm", 2018, 6, FALSE) sp_get_dataset_url("finm", 2029, 6, FALSE) # works but returns invalid URL if(FALSE) sp_get_dataset_url("finm_wrong", 2018, 6, TRUE) # fails, invalid dataset ID if(FALSE) sp_get_dataset_url("finm", 2022, 6, TRUE) # fails, invalid time period ## End(Not run)
This is normally called inside sp_get_table()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
sp_get_table_file(table_id, dataset_path, reunzip = FALSE)
sp_get_table_file(table_id, dataset_path, reunzip = FALSE)
table_id |
Table ID; see |
dataset_path |
Path to downloaded dataset, as output by |
reunzip |
Whether to overwrite existing CSV files by unzipping the archive downlaoded by |
Character vector of length one - a path.
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_load_codelist()
,
sp_load_table()
## Not run: ds <- sp_get_dataset("rozv", 2018, 12) sp_get_table_file("balance-sheet", ds) ## End(Not run)
## Not run: ds <- sp_get_dataset("rozv", 2018, 12) sp_get_table_file("balance-sheet", ds) ## End(Not run)
This is normally called inside sp_get_codelist()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
sp_load_codelist(path, n = NULL)
sp_load_codelist(path, n = NULL)
path |
Path to a file as returned by |
n |
Number of rows to return. Default (NULL) means all. Useful for quickly inspecting a codelist. |
a tibble
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_table()
## Not run: cf <- sp_get_codelist_file("druhuj") sp_load_codelist(cf) ## End(Not run)
## Not run: cf <- sp_get_codelist_file("druhuj") sp_load_codelist(cf) ## End(Not run)
This is normally called inside sp_get_table()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
sp_load_table(path, ico = NULL)
sp_load_table(path, ico = NULL)
path |
path to a CSV file, as output by |
ico |
Organisation ID to filter by, if supplied. |
a tibble. See help for sp_get_table()
for a key to the columns.
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_codelist()
## Not run: ds <- sp_get_dataset("rozv", 2018, 12) tf <- sp_get_table_file("balance-sheet", ds) sp_load_table(tf) ## End(Not run)
## Not run: ds <- sp_get_dataset("rozv", 2018, 12) tf <- sp_get_table_file("balance-sheet", ds) sp_load_table(tf) ## End(Not run)
Contains IDs and names of all available tables that can be
retrieved by sp_get_table. Look inside the XLS documentation for each dataset at https://monitor.statnipokladna.cz/datovy-katalog/transakcni-data
to see more detailed descriptions. Note that tables do not correspond to the tabulka/vtab
attribute of the tables, they represent files inside datasets.
sp_tables
sp_tables
A data frame with 2 rows and 4 variables:
id
character Table id, used as table_id
argument to sp_get_table
.
dataset_id
integer Table number.
czech_name
character Czech name of the table.
note
character Note.
Other Lists of available entities:
sp_codelists
,
sp_datasets