File Operations¶
-
load_json
(json_file, **kwargs)¶ Open and load data from a JSON file
reusables.load_json("example.json") # {u'key_1': u'val_1', u'key_for_dict': {u'sub_dict_key': 8}}
Parameters: - json_file – Path to JSON file as string
- kwargs – Additional arguments for the json.load command
Returns: Dictionary
-
list_to_csv
(my_list, csv_file)¶ Save a matrix (list of lists) to a file as a CSV
my_list = [["Name", "Location"], ["Chris", "South Pole"], ["Harry", "Depth of Winter"], ["Bob", "Skull"]] reusables.list_to_csv(my_list, "example.csv")
example.csv
Parameters: - my_list – list of lists to save to CSV
- csv_file – File to save data to
-
save_json
(data, json_file, indent=4, **kwargs)¶ Takes a dictionary and saves it to a file as JSON
my_dict = {"key_1": "val_1", "key_for_dict": {"sub_dict_key": 8}} reusables.save_json(my_dict,"example.json")
example.json
{ "key_1": "val_1", "key_for_dict": { "sub_dict_key": 8 } }
Parameters: - data – dictionary to save as JSON
- json_file – Path to save file location as str
- indent – Format the JSON file with so many numbers of spaces
- kwargs – Additional arguments for the json.dump command
-
csv_to_list
(csv_file)¶ Open and transform a CSV file into a matrix (list of lists).
reusables.csv_to_list("example.csv") # [['Name', 'Location'], # ['Chris', 'South Pole'], # ['Harry', 'Depth of Winter'], # ['Bob', 'Skull']]
Parameters: csv_file – Path to CSV file as str Returns: list
-
extract
(archive_file, path='.', delete_on_success=False, enable_rar=False)¶ Automatically detect archive type and extract all files to specified path.
import os os.listdir(".") # ['test_structure.zip'] reusables.extract("test_structure.zip") os.listdir(".") # [ 'test_structure', 'test_structure.zip']
Parameters: - archive_file – path to file to extract
- path – location to extract to
- delete_on_success – Will delete the original archive if set to True
- enable_rar – include the rarfile import and extract
Returns: path to extracted files
-
archive
(files_to_archive, name='archive.zip', archive_type=None, overwrite=False, store=False, depth=None, err_non_exist=True, allow_zip_64=True, **tarfile_kwargs)¶ Archive a list of files (or files inside a folder), can chose between
- zip
- tar
- gz (tar.gz, tgz)
- bz2 (tar.bz2)
reusables.archive(['reusables', '.travis.yml'], name="my_archive.bz2") # 'C:\Users\Me\Reusables\my_archive.bz2'
Parameters: - files_to_archive – list of files and folders to archive
- name – path and name of archive file
- archive_type – auto-detects unless specified
- overwrite – overwrite if archive exists
- store – zipfile only, True will not compress files
- depth – specify max depth for folders
- err_non_exist – raise error if provided file does not exist
- allow_zip_64 – must be enabled for zip files larger than 2GB
- tarfile_kwargs – extra args to pass to tarfile.open
Returns: path to created archive
-
config_dict
(config_file=None, auto_find=False, verify=True, **cfg_options)¶ Return configuration options as dictionary. Accepts either a single config file or a list of files. Auto find will search for all .cfg, .config and .ini in the execution directory and package root (unsafe but handy).
reusables.config_dict(os.path.join("test", "data", "test_config.ini")) # {'General': {'example': 'A regular string'}, # 'Section 2': {'anint': '234', # 'examplelist': '234,123,234,543', # 'floatly': '4.4', # 'my_bool': 'yes'}}
Parameters: - config_file – path or paths to the files location
- auto_find – look for a config type file at this location or below
- verify – make sure the file exists before trying to read
- cfg_options – options to pass to the parser
Returns: dictionary of the config files
-
config_namespace
(config_file=None, auto_find=False, verify=True, **cfg_options)¶ Return configuration options as a Namespace.
reusables.config_namespace(os.path.join("test", "data", "test_config.ini")) # <Namespace: {'General': {'example': 'A regul...>
Parameters: - config_file – path or paths to the files location
- auto_find – look for a config type file at this location or below
- verify – make sure the file exists before trying to read
- cfg_options – options to pass to the parser
Returns: Namespace of the config files
-
os_tree
(directory, enable_scandir=False)¶ Return a directories contents as a dictionary hierarchy.
reusables.os_tree(".") # {'doc': {'build': {'doctrees': {}, # 'html': {'_sources': {}, '_static': {}}}, # 'source': {}}, # 'reusables': {'__pycache__': {}}, # 'test': {'__pycache__': {}, 'data': {}}}
Parameters: - directory – path to directory to created the tree of.
- enable_scandir – on python < 3.5 enable external scandir package
Returns: dictionary of the directory
-
check_filename
(filename)¶ Returns a boolean stating if the filename is safe to use or not. Note that this does not test for “legal” names accepted, but a more restricted set of: Letters, numbers, spaces, hyphens, underscores and periods.
Parameters: filename – name of a file as a string Returns: boolean if it is a safe file name
-
count_files
(*args, **kwargs)¶ Returns an integer of all files found using find_files
-
directory_duplicates
(directory, hash_type='md5', **kwargs)¶ Find all duplicates in a directory. Will return a list, in that list are lists of duplicate files.
Parameters: - directory – Directory to search
- hash_type – Type of hash to perform
- kwargs – Arguments to pass to find_files to narrow file types
Returns: list of lists of dups
-
dup_finder
(file_path, directory='.', enable_scandir=False)¶ Check a directory for duplicates of the specified file. This is meant for a single file only, for checking a directory for dups, use directory_duplicates.
This is designed to be as fast as possible by doing lighter checks before progressing to more extensive ones, in order they are:
- File size
- First twenty bytes
- Full SHA256 compare
list(reusables.dup_finder( "test_structure\files_2\empty_file")) # ['C:\Reusables\test\data\fake_dir', # 'C:\Reusables\test\data\test_structure\Files\empty_file_1', # 'C:\Reusables\test\data\test_structure\Files\empty_file_2', # 'C:\Reusables\test\data\test_structure\files_2\empty_file']
Parameters: - file_path – Path to file to check for duplicates of
- directory – Directory to dig recursively into to look for duplicates
- enable_scandir – on python < 3.5 enable external scandir package
Returns: generators
-
file_hash
(path, hash_type='md5', block_size=65536, hex_digest=True)¶ Hash a given file with md5, or any other and return the hex digest. You can run hashlib.algorithms_available to see which are available on your system unless you have an archaic python version, you poor soul).
This function is designed to be non memory intensive.
reusables.file_hash(test_structure.zip") # '61e387de305201a2c915a4f4277d6663'
Parameters: - path – location of the file to hash
- hash_type – string name of the hash to use
- block_size – amount of bytes to add to hasher at a time
- hex_digest – returned as hexdigest, false will return digest
Returns: file’s hash
-
find_files
(directory='.', ext=None, name=None, match_case=False, disable_glob=False, depth=None, abspath=False, enable_scandir=False)¶ Walk through a file directory and return an iterator of files that match requirements. Will autodetect if name has glob as magic characters.
Note: For the example below, you can use find_files_list to return as a list, this is simply an easy way to show the output.
list(reusables.find_files(name="ex", match_case=True)) # ['C:\example.pdf', # 'C:\My_exam_score.txt'] list(reusables.find_files(name="*free*")) # ['C:\my_stuff\Freedom_fight.pdf'] list(reusables.find_files(ext=".pdf")) # ['C:\Example.pdf', # 'C:\how_to_program.pdf', # 'C:\Hunks_and_Chicks.pdf'] list(reusables.find_files(name="*chris*")) # ['C:\Christmas_card.docx', # 'C:\chris_stuff.zip']
Parameters: - directory – Top location to recursively search for matching files
- ext – Extensions of the file you are looking for
- name – Part of the file name
- match_case – If name or ext has to be a direct match or not
- disable_glob – Do not look for globable names or use glob magic check
- depth – How many directories down to search
- abspath – Return files with their absolute paths
- enable_scandir – on python < 3.5 enable external scandir package
Returns: generator of all files in the specified directory
-
find_files_list
(*args, **kwargs)¶ Returns a list of find_files generator
-
join_here
(*paths, **kwargs)¶ Join any path or paths as a sub directory of the current file’s directory.
reusables.join_here("Makefile") # 'C:\Reusables\Makefile'
Parameters: - paths – paths to join together
- kwargs – ‘strict’, do not strip os.sep
- kwargs – ‘safe’, make them into a safe path it True
Returns: abspath as string
-
join_paths
(*paths, **kwargs)¶ Join multiple paths together and return the absolute path of them. If ‘safe’ is specified, this function will ‘clean’ the path with the ‘safe_path’ function. This will clean root decelerations from the path after the first item.
Would like to do ‘safe=False’ instead of ‘**kwargs’ but stupider versions of python cough 2.6 don’t like that after ‘*paths’.
Parameters: - paths – paths to join together
- kwargs – ‘safe’, make them into a safe path it True
Returns: abspath as string
-
remove_empty_directories
(root_directory, dry_run=False, ignore_errors=True, enable_scandir=False)¶ Remove all empty folders from a path. Returns list of empty directories.
Parameters: - root_directory – base directory to start at
- dry_run – just return a list of what would be removed
- ignore_errors – Permissions are a pain, just ignore if you blocked
- enable_scandir – on python < 3.5 enable external scandir package
Returns: list of removed directories
-
remove_empty_files
(root_directory, dry_run=False, ignore_errors=True, enable_scandir=False)¶ Remove all empty files from a path. Returns list of the empty files removed.
Parameters: - root_directory – base directory to start at
- dry_run – just return a list of what would be removed
- ignore_errors – Permissions are a pain, just ignore if you blocked
- enable_scandir – on python < 3.5 enable external scandir package
Returns: list of removed files
-
safe_filename
(filename, replacement='_')¶ Replace unsafe filename characters with underscores. Note that this does not test for “legal” names accepted, but a more restricted set of: Letters, numbers, spaces, hyphens, underscores and periods.
Parameters: - filename – name of a file as a string
- replacement – character to use as a replacement of bad characters
Returns: safe filename string
-
safe_path
(path, replacement='_')¶ Replace unsafe path characters with underscores. Do NOT use this with existing paths that cannot be modified, this to to help generate new, clean paths.
Supports windows and *nix systems.
Parameters: - path – path as a string
- replacement – character to use in place of bad characters
Returns: a safer path
-
touch
(path)¶ Native ‘touch’ functionality in python
Parameters: path – path to file to ‘touch’