API¶
HDFileSystem ([host, port, connect, …]) |
Connection to an HDFS namenode |
HDFileSystem.cat (path) |
Return contents of file |
HDFileSystem.chmod (path, mode) |
Change access control of given path |
HDFileSystem.chown (path, owner, group) |
Change owner/group |
HDFileSystem.df () |
Used/free disc space on the HDFS system |
HDFileSystem.du (path[, total, deep]) |
Returns file sizes on a path. |
HDFileSystem.exists (path) |
Is there an entry at path? |
HDFileSystem.get (hdfs_path, local_path[, …]) |
Copy HDFS file to local |
HDFileSystem.getmerge (path, filename[, …]) |
Concat all files in path (a directory) to local output file |
HDFileSystem.get_block_locations (path[, …]) |
Fetch physical locations of blocks |
HDFileSystem.glob (path) |
Get list of paths mathing glob-like pattern (i.e., with “*”s). |
HDFileSystem.info (path) |
File information (as a dict) |
HDFileSystem.ls (path[, detail]) |
List files at path |
HDFileSystem.mkdir (path) |
Make directory at path |
HDFileSystem.mv (path1, path2) |
Move file at path1 to path2 |
HDFileSystem.open (path[, mode, replication, …]) |
Open a file for reading or writing |
HDFileSystem.put (filename, path[, chunk, …]) |
Copy local file to path in HDFS |
HDFileSystem.read_block (fn, offset, length) |
Read a block of bytes from an HDFS file |
HDFileSystem.rm (path[, recursive]) |
Use recursive for rm -r, i.e., delete directory and contents |
HDFileSystem.set_replication (path, replication) |
Instruct HDFS to set the replication for the given file. |
HDFileSystem.tail (path[, size]) |
Return last bytes of file |
HDFileSystem.touch (path) |
Create zero-length file |
HDFile (fs, path, mode[, replication, buff, …]) |
File on HDFS |
HDFile.close () |
Flush and close file, ensuring the data is readable |
HDFile.flush () |
Send buffer to the data-node; actual write may happen later |
HDFile.info () |
Filesystem metadata about this file |
HDFile.read ([length]) |
Read bytes from open file |
HDFile.readlines () |
Return all lines in a file as a list |
HDFile.seek (offset[, from_what]) |
Set file read position. |
HDFile.tell () |
Get current byte location in a file |
HDFile.write (data) |
Write bytes to open file (which must be in w or a mode) |
HDFSMap (hdfs, root[, check]) |
Wrap a HDFileSystem as a mutable mapping. |
-
class
hdfs3.core.
HDFileSystem
(host=<class 'hdfs3.utils.MyNone'>, port=<class 'hdfs3.utils.MyNone'>, connect=True, autoconf=True, pars=None, **kwargs)[source]¶ Connection to an HDFS namenode
>>> hdfs = HDFileSystem(host='127.0.0.1', port=8020) # doctest: +SKIP
-
cancel_token
(token=None)[source]¶ Revoke delegation token
Parameters: - token: str or None
If None, uses the instance’s token. It is an error to do that if there is no token.
-
chmod
(path, mode)[source]¶ Change access control of given path
Exactly what permissions the file will get depends on HDFS configurations.
Parameters: - path : string
file/directory to change
- mode : integer
As with the POSIX standard, each octal digit refers to user-group-all, in that order, with read-write-execute as the bits of each group.
Examples
Make read/writeable to all >>> hdfs.chmod(‘/path/to/file’, 0o777) # doctest: +SKIP
Make read/writeable only to user >>> hdfs.chmod(‘/path/to/file’, 0o700) # doctest: +SKIP
Make read-only to user >>> hdfs.chmod(‘/path/to/file’, 0o100) # doctest: +SKIP
-
concat
(destination, paths)[source]¶ Concatenate inputs to destination
Source files should all have the same block size and replication. The destination file must be in the same directory as the source files. If the target exists, it will be appended to.
Some HDFSs impose that the target file must exist and be an exact number of blocks long, and that each concated file except the last is also a whole number of blocks.
The source files are deleted on successful completion.
-
delegate_token
(user=None)[source]¶ Generate delegate auth token.
Parameters: - user: bytes/str
User to pass to delegation (defaults to user supplied to instance); this user is the only one that can renew the token.
-
du
(path, total=False, deep=False)[source]¶ Returns file sizes on a path.
Parameters: - path : string
where to look
- total : bool (False)
to add up the sizes to a grand total
- deep : bool (False)
whether to recurse into subdirectories
-
getmerge
(path, filename, blocksize=65536)[source]¶ Concat all files in path (a directory) to local output file
-
glob
(path)[source]¶ Get list of paths mathing glob-like pattern (i.e., with “*”s).
If passed a directory, gets all contained files; if passed path to a file, without any “*”, returns one-element list containing that filename. Does not support python3.5’s “**” notation.
-
ls
(path, detail=False)[source]¶ List files at path
Parameters: - path : string/bytes
location at which to list files
- detail : bool (=True)
if True, each list item is a dict of file properties; otherwise, returns list of filenames
-
open
(path, mode='rb', replication=0, buff=0, block_size=0)[source]¶ Open a file for reading or writing
Parameters: - path: string
Path of file on HDFS
- mode: string
One of ‘rb’, ‘wb’, or ‘ab’
- replication: int
Replication factor; if zero, use system default (only on write)
- buf: int (=0)
Client buffer size (bytes); if 0, use default.
- block_size: int
Size of data-node blocks if writing
-
put
(filename, path, chunk=65536, replication=0, block_size=0)[source]¶ Copy local file to path in HDFS
-
read_block
(fn, offset, length, delimiter=None)[source]¶ Read a block of bytes from an HDFS file
Starting at
offset
of the file, readlength
bytes. Ifdelimiter
is set then we ensure that the read starts and stops at delimiter boundaries that follow the locationsoffset
andoffset + length
. Ifoffset
is zero then we start at zero. The bytestring returned will not include the surrounding delimiter strings.If offset+length is beyond the eof, reads to eof.
Parameters: - fn: string
Path to filename on HDFS
- offset: int
Byte offset to start read
- length: int
Number of bytes to read
- delimiter: bytes (optional)
Ensure reading starts and stops at delimiter bytestring
See also
hdfs3.utils.read_block
Examples
>>> hdfs.read_block('/data/file.csv', 0, 13) # doctest: +SKIP b'Alice, 100\nBo' >>> hdfs.read_block('/data/file.csv', 0, 13, delimiter=b'\n') # doctest: +SKIP b'Alice, 100\nBob, 200'
-
renew_token
(token=None)[source]¶ Renew delegation token
Parameters: - token: str or None
If None, uses the instance’s token. It is an error to do that if there is no token.
Returns: - New expiration time for the token
-
set_replication
(path, replication)[source]¶ Instruct HDFS to set the replication for the given file.
If successful, the head-node’s table is updated immediately, but actual copying will be queued for later. It is acceptable to set a replication that cannot be supported (e.g., higher than the number of data-nodes).
-
-
class
hdfs3.core.
HDFile
(fs, path, mode, replication=0, buff=0, block_size=0)[source]¶ File on HDFS
Matches the standard Python file interface.
Examples
>>> with hdfs.open('/path/to/hdfs/file.txt') as f: # doctest: +SKIP ... bytes = f.read(1000) # doctest: +SKIP >>> with hdfs.open('/path/to/hdfs/file.csv') as f: # doctest: +SKIP ... df = pd.read_csv(f, nrows=1000) # doctest: +SKIP
-
next
()¶ Enables reading a file as a buffer in pandas
-
readline
(chunksize=256, lineterminator='\n')[source]¶ Return a line using buffered reading.
A line is a sequence of bytes between ``’- ‘`` markers (or given
line-terminator).
Line iteration uses this method internally.
Note: this function requires many calls to HDFS and is slow; it is in general better to wrap an HDFile with an
io.TextIOWrapper
for buffering, text decoding and newline support.
-
seek
(offset, from_what=0)[source]¶ Set file read position. Read mode only.
Attempt to move out of file bounds raises an exception. Note that, by the convention in python file seek, offset should be <=0 if from_what is 2.
Parameters: - offset : int
byte location in the file.
- from_what : int 0, 1, 2
if 0 (befault), relative to file start; if 1, relative to current location; if 2, relative to file end.
Returns: - new position
-
-
class
hdfs3.mapping.
HDFSMap
(hdfs, root, check=False)[source]¶ Wrap a HDFileSystem as a mutable mapping.
The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.
Parameters: - hdfs : HDFileSystem
- root : string
path to contain the stored files (directory will be created if it doesn’t exist)
- check : bool (=True)
performs a touch at the location, to check writeability.
Examples
>>> hdfs = hdfs3.HDFileSystem() # doctest: +SKIP >>> mw = HDFSMap(hdfs, '/writable/path/') # doctest: +SKIP >>> mw['loc1'] = b'Hello World' # doctest: +SKIP >>> list(mw.keys()) # doctest: +SKIP ['loc1'] >>> mw['loc1'] # doctest: +SKIP b'Hello World'