API

HDFileSystem([host, port, user, ...]) Connection to an HDFS namenode
HDFileSystem.cat(path) Return contents of file
HDFileSystem.chmod(path, mode) Change access control of given path
HDFileSystem.chown(path, owner, group) Change owner/group
HDFileSystem.df() Used/free disc space on the HDFS system
HDFileSystem.du(path[, total, deep]) Returns file sizes on a path.
HDFileSystem.exists(path) Is there an entry at path?
HDFileSystem.get(hdfs_path, local_path[, ...]) Copy HDFS file to local
HDFileSystem.getmerge(path, filename[, ...]) Concat all files in path (a directory) to output file
HDFileSystem.get_block_locations(path[, ...]) Fetch physical locations of blocks
HDFileSystem.glob(path) Get list of paths mathing glob-like pattern (i.e., with “*”s).
HDFileSystem.info(path) File information (as a dict)
HDFileSystem.ls(path[, detail]) List files at path
HDFileSystem.mkdir(path) Make directory at path
HDFileSystem.mv(path1, path2) Move file at path1 to path2
HDFileSystem.open(path[, mode, replication, ...]) Open a file for reading or writing
HDFileSystem.put(filename, path[, chunk]) Copy local file to path in HDFS
HDFileSystem.read_block(fn, offset, length) Read a block of bytes from an HDFS file
HDFileSystem.rm(path[, recursive]) Use recursive for rm -r, i.e., delete directory and contents
HDFileSystem.set_replication(path, replication) Instruct HDFS to set the replication for the given file.
HDFileSystem.tail(path[, size]) Return last bytes of file
HDFileSystem.touch(path) Create zero-length file
HDFile(fs, path, mode[, replication, buff, ...]) File on HDFS
HDFile.close() Flush and close file, ensuring the data is readable
HDFile.flush() Send buffer to the data-node; actual write to disc may happen later
HDFile.info() Filesystem metadata about this file
HDFile.read([length]) Read bytes from open file
HDFile.readlines() Return all lines in a file as a list
HDFile.seek(offset[, from_what]) Set file read position.
HDFile.tell() Get current byte location in a file
HDFile.write(data) Write bytes to open file (which must be in w or a mode)
HDFSMap(hdfs, root[, check]) Wrap a HDFileSystem as a mutable mapping.
class hdfs3.core.HDFileSystem(host=None, port=None, user=None, ticket_cache=None, token=None, pars=None, connect=True)[source]

Connection to an HDFS namenode

>>> hdfs = HDFileSystem(host='127.0.0.1', port=8020)  
cat(path)[source]

Return contents of file

chmod(path, mode)[source]

Change access control of given path

Exactly what permissions the file will get depends on HDFS configurations.

Parameters:

path : string

file/directory to change

mode : integer

As with the POSIX standard, each octal digit refers to user-group-all, in that order, with read-write-execute as the bits of each group.

Examples

>>> hdfs.chmod('/path/to/file', 0o777)  # make read/writeable to all 
>>> hdfs.chmod('/path/to/file', 0o700)  # make read/writeable only to user 
>>> hdfs.chmod('/path/to/file', 0o100)  # make read-only to user 
chown(path, owner, group)[source]

Change owner/group

connect()[source]

Connect to the name node

This happens automatically at startup

df()[source]

Used/free disc space on the HDFS system

disconnect()[source]

Disconnect from name node

du(path, total=False, deep=False)[source]

Returns file sizes on a path.

Parameters:

path : string

where to look

total : bool (False)

to add up the sizes to a grand total

deep : bool (False)

whether to recurse into subdirectories

exists(path)[source]

Is there an entry at path?

get(hdfs_path, local_path, blocksize=65536)[source]

Copy HDFS file to local

get_block_locations(path, start=0, length=0)[source]

Fetch physical locations of blocks

getmerge(path, filename, blocksize=65536)[source]

Concat all files in path (a directory) to output file

glob(path)[source]

Get list of paths mathing glob-like pattern (i.e., with “*”s).

If passed a directory, gets all contained files; if passed path to a file, without any “*”, returns one-element list containing that filename. Does not support python3.5’s “**” notation.

head(path, size=1024)[source]

Return first bytes of file

info(path)[source]

File information (as a dict)

ls(path, detail=True)[source]

List files at path

Parameters:

path : string/bytes

location at which to list files

detail : bool (=True)

if True, each list item is a dict of file properties; otherwise, returns list of filenames

mkdir(path)[source]

Make directory at path

mv(path1, path2)[source]

Move file at path1 to path2

open(path, mode='rb', replication=0, buff=0, block_size=0)[source]

Open a file for reading or writing

Parameters:

path: string

Path of file on HDFS

mode: string

One of ‘rb’, ‘wb’, or ‘ab’

replication: int

Replication factor; if zero, use system default (only on write)

block_size: int

Size of data-node blocks if writing

put(filename, path, chunk=65536)[source]

Copy local file to path in HDFS

read_block(fn, offset, length, delimiter=None)[source]

Read a block of bytes from an HDFS file

Starting at offset of the file, read length bytes. If delimiter is set then we ensure that the read starts and stops at delimiter boundaries that follow the locations offset and offset + length. If offset is zero then we start at zero. The bytestring returned will not include the surrounding delimiter strings.

If offset+length is beyond the eof, reads to eof.

Parameters:

fn: string

Path to filename on HDFS

offset: int

Byte offset to start read

length: int

Number of bytes to read

delimiter: bytes (optional)

Ensure reading starts and stops at delimiter bytestring

See also

hdfs3.utils.read_block

Examples

>>> hdfs.read_block('/data/file.csv', 0, 13)  
b'Alice, 100\nBo'
>>> hdfs.read_block('/data/file.csv', 0, 13, delimiter=b'\n')  
b'Alice, 100\nBob, 200'
rm(path, recursive=True)[source]

Use recursive for rm -r, i.e., delete directory and contents

set_replication(path, replication)[source]

Instruct HDFS to set the replication for the given file.

If successful, the head-node’s table is updated immediately, but actual copying will be queued for later. It is acceptable to set a replication that cannot be supported (e.g., higher than the number of data-nodes).

tail(path, size=1024)[source]

Return last bytes of file

touch(path)[source]

Create zero-length file

walk(path)[source]

Get all file entries below given path

class hdfs3.core.HDFile(fs, path, mode, replication=0, buff=0, block_size=0)[source]

File on HDFS

Matches the standard Python file interface.

Examples

>>> with hdfs.open('/path/to/hdfs/file.txt') as f:  
...     bytes = f.read(1000)  
>>> with hdfs.open('/path/to/hdfs/file.csv') as f:  
...     df = pd.read_csv(f, nrows=1000)  
close()[source]

Flush and close file, ensuring the data is readable

flush()[source]

Send buffer to the data-node; actual write to disc may happen later

info()[source]

Filesystem metadata about this file

read(length=None)[source]

Read bytes from open file

readline(chunksize=65536, lineterminator='\n')[source]

Return a line using buffered reading.

Reads and caches chunksize bytes of data, and caches lines locally. Subsequent readline calls deplete those lines until empty, when a new chunk will be read. A read and readline are not therefore generally pointing to the same location in the file; seek() and tell() will give the true location in the file, which will be one chunk in even after calling readline once.

Line iteration uses this method internally.

readlines()[source]

Return all lines in a file as a list

seek(offset, from_what=0)[source]

Set file read position. Read mode only.

Attempt to move out of file bounds raises an exception. Note that, by the convention in python file seek, offset should be <=0 if from_what is 2.

Parameters:

offset : int

byte location in the file.

from_what : int 0, 1, 2

if 0 (befault), relative to file start; if 1, relative to current location; if 2, relative to file end.

Returns:

new position

tell()[source]

Get current byte location in a file

write(data)[source]

Write bytes to open file (which must be in w or a mode)

class hdfs3.mapping.HDFSMap(hdfs, root, check=False)[source]

Wrap a HDFileSystem as a mutable mapping.

The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.

Parameters:

hdfs : HDFileSystem

root : string

path to contain the stored files (directory will be created if it doesn’t exist)

check : bool (=True)

performs a touch at the location, to check writeability.

Examples

>>> hdfs = hdfs3.HDFileSystem() 
>>> mw = HDFSMap(hdfs, '/writable/path/') 
>>> mw['loc1'] = b'Hello World' 
>>> list(mw.keys()) 
['loc1']
>>> mw['loc1'] 
b'Hello World'