Files
impala/tests/util/filesystem_base.py
Sahil Takiar ac87278b16 IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp
Add the -d option and -f option to the following commands:

`hdfs dfs -copyFromLocal <localsrc> URI`
`hdfs dfs -put [ - | <localsrc1> .. ]. <dst>`
`hdfs dfs -cp URI [URI ...] <dest>`

The -d option "Skip[s] creation of temporary file with the suffix
._COPYING_." which improves performance of these commands on S3 since S3
does not support metadata only renames.

The -f option "Overwrites the destination if it already exists" combined
with HADOOP-13884 this improves issues seen with S3 consistency issues by
avoiding a HEAD request to check if the destination file exists or not.

Added the method 'copy_from_local' to the BaseFilesystem class.
Re-factored most usages of the aforementioned HDFS commands to use
the filesystem_client. Some usages were not appropriate / worth
refactoring, so occasionally this patch just adds the '-d' and '-f'
options explicitly. All calls to '-put' were replaced with
'copyFromLocal' because they both copy files from the local fs to a HDFS
compatible target fs.

Since WebHDFS does not have good support for copying files, this patch
removes the copy functionality from the PyWebHdfsClientWithChmod.
Re-factored the hdfs_client so that it uses a DelegatingHdfsClient
that delegates to either the HadoopFsCommandLineClient or
PyWebHdfsClientWithChmod.

Testing:
* Ran core tests on HDFS and S3

Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144
Reviewed-on: http://gerrit.cloudera.org:8080/14311
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-05 00:04:08 +00:00

72 lines
2.5 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Filsystem access abstraction
from abc import ABCMeta, abstractmethod
class BaseFilesystem(object):
__metaclass__ = ABCMeta
@abstractmethod
def create_file(self, path, file_data, overwrite):
"""Create a file in 'path' and populate with the string 'file_data'. If overwrite is
True, the file is overwritten. Returns True if successful, False if the file already
exists and throws an exception otherwise"""
pass
@abstractmethod
def make_dir(self, path, permission):
"""Create a directory in 'path' with octal umask 'permission'.
Returns True if successful and throws an exception otherwise"""
pass
@abstractmethod
def copy(self, src, dst, overwrite):
"""Copy a file from 'src' to 'dst'. Throws an exception if unsuccessful."""
pass
@abstractmethod
def copy_from_local(self, src, dst):
"""Copies a file from 'src' file on the local filesystem to the 'dst', which can be
on any HDFS compatible filesystem. Fails if the src file is not on the local
filesystem. Throws an exception if unsuccessful."""
pass
@abstractmethod
def ls(self, path):
"""Return a list of all files/dirs/keys in path. Throws an exception if path
is invalid."""
pass
@abstractmethod
def exists(self, path):
"""Returns True if a particular path exists, else it returns False."""
pass
@abstractmethod
def delete_file_dir(self, path, recursive):
"""Delete all files/dirs/keys in a path. Returns True if successful or if the file
does not exist. Throws an exception otherwise."""
pass
@abstractmethod
def get_all_file_sizes(self, path):
"""Returns a list of integers which are all the file sizes of files found under
'path'."""
pass