Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 5. Data Processing Service

The Data processing service (sahara) provides a scalable data-processing stack and associated management interfaces.

5.1. Data Processing Configuration Options

Note

The common configurations for shared services and libraries, such as database connections and RPC messaging, are described at Common configurations.

5.1.1. Description of Configuration Options

The following tables provide a comprehensive list of the Data processing service configuration options.

Table 5.1. Description of API configuration options

Configuration option = Default valueDescription

[oslo_messaging_rabbit]

 

connection_factory = single

(String) Connection factory implementation

[oslo_middleware]

 

enable_proxy_headers_parsing = False

(Boolean) Whether the application is behind a proxy or not. This determines if the middleware should parse the headers or not.

max_request_body_size = 114688

(Integer) The maximum body size for each request, in bytes.

secure_proxy_ssl_header = X-Forwarded-Proto

(String) The HTTP Header that will be used to determine what the original request protocol scheme was, even if it was hidden by a SSL termination proxy.

  • Deprecated

No deprecation reason provided for this option.

[retries]

 

retries_number = 5

(Integer) Number of times to retry the request to client before failing

retry_after = 10

(Integer) Time between the retries to client (in seconds).

Table 5.2. Description of clients configuration options

Configuration option = Default valueDescription

[cinder]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to cinder.

api_version = 2

(Integer) Version of the Cinder API to use.

ca_file = None

(String) Location of ca certificates file to use for cinder client requests.

endpoint_type = internalURL

(String) Endpoint type for cinder client requests

[glance]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to glance.

ca_file = None

(String) Location of ca certificates file to use for glance client requests.

endpoint_type = internalURL

(String) Endpoint type for glance client requests

[heat]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to heat.

ca_file = None

(String) Location of ca certificates file to use for heat client requests.

endpoint_type = internalURL

(String) Endpoint type for heat client requests

[keystone]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to keystone.

ca_file = None

(String) Location of ca certificates file to use for keystone client requests.

endpoint_type = internalURL

(String) Endpoint type for keystone client requests

[manila]

 

api_insecure = True

(Boolean) Allow to perform insecure SSL requests to manila.

api_version = 1

(Integer) Version of the manila API to use.

ca_file = None

(String) Location of ca certificates file to use for manila client requests.

[neutron]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to neutron.

ca_file = None

(String) Location of ca certificates file to use for neutron client requests.

endpoint_type = internalURL

(String) Endpoint type for neutron client requests

[nova]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to nova.

ca_file = None

(String) Location of ca certificates file to use for nova client requests.

endpoint_type = internalURL

(String) Endpoint type for nova client requests

[swift]

 

api_insecure = False

(Boolean) Allow to perform insecure SSL requests to swift.

ca_file = None

(String) Location of ca certificates file to use for swift client requests.

endpoint_type = internalURL

(String) Endpoint type for swift client requests

Table 5.3. Description of common configuration options

Configuration option = Default valueDescription

[DEFAULT]

 

admin_project_domain_name = default

(String) The name of the domain for the service project(ex. tenant).

admin_user_domain_name = default

(String) The name of the domain to which the admin user belongs.

api_workers = 1

(Integer) Number of workers for Sahara API service (0 means all-in-one-thread configuration).

cleanup_time_for_incomplete_clusters = 0

(Integer) Maximal time (in hours) for clusters allowed to be in states other than "Active", "Deleting" or "Error". If a cluster is not in "Active", "Deleting" or "Error" state and last update of it was longer than "cleanup_time_for_incomplete_clusters" hours ago then it will be deleted automatically. (0 value means that automatic clean up is disabled).

cluster_remote_threshold = 70

(Integer) The same as global_remote_threshold, but for a single cluster.

compute_topology_file = etc/sahara/compute.topology

(String) File with nova compute topology. It should contain mapping between nova computes and racks.

coordinator_heartbeat_interval = 1

(Integer) Interval size between heartbeat execution in seconds. Heartbeats are executed to make sure that connection to the coordination server is active.

default_ntp_server = pool.ntp.org

(String) Default ntp server for time sync

disable_event_log = False

(Boolean) Disables event log feature.

edp_internal_db_enabled = True

(Boolean) Use Sahara internal db to store job binaries.

enable_data_locality = False

(Boolean) Enables data locality for hadoop cluster. Also enables data locality for Swift used by hadoop. If enabled, 'compute_topology' and 'swift_topology' configuration parameters should point to OpenStack and Swift topology correspondingly.

enable_hypervisor_awareness = True

(Boolean) Enables four-level topology for data locality. Works only if corresponding plugin supports such mode.

executor_thread_pool_size = 64

(Integer) Size of executor thread pool when executor is threading or eventlet.

global_remote_threshold = 100

(Integer) Maximum number of remote operations that will be running at the same time. Note that each remote operation requires its own process to run.

hash_ring_replicas_count = 40

(Integer) Number of points that belongs to each member on a hash ring. The larger number leads to a better distribution.

heat_enable_wait_condition = True

(Boolean) Enable wait condition feature to reduce polling during cluster creation

heat_stack_tags = data-processing-cluster

(List) List of tags to be used during operating with stack.

job_binary_max_KB = 5120

(Integer) Maximum length of job binary data in kilobytes that may be stored or retrieved in a single operation.

job_canceling_timeout = 300

(Integer) Timeout for canceling job execution (in seconds). Sahara will try to cancel job execution during this time.

job_workflow_postfix =

(String) Postfix for storing jobs in hdfs. Will be added to '/user/<hdfs user>/' path.

min_transient_cluster_active_time = 30

(Integer) Minimal "lifetime" in seconds for a transient cluster. Cluster is guaranteed to be "alive" within this time period.

nameservers =

(List) IP addresses of Designate nameservers.

node_domain = novalocal

(String) The suffix of the node’s FQDN. In nova-network that is the dhcp_domain config parameter.

os_region_name = None

(String) Region name used to get services endpoints.

periodic_coordinator_backend_url = None

(String) The backend URL to use for distributed periodic tasks coordination.

periodic_enable = True

(Boolean) Enable periodic tasks.

periodic_fuzzy_delay = 60

(Integer) Range in seconds to randomly delay when starting the periodic task scheduler to reduce stampeding. (Disable by setting to 0).

periodic_interval_max = 60

(Integer) Max interval size between periodic tasks execution in seconds.

periodic_workers_number = 1

(Integer) Number of threads to run periodic tasks.

plugins = vanilla, spark, cdh, ambari, storm, mapr

(List) List of plugins to be loaded. Sahara preserves the order of the list when returning it.

proxy_command =

(String) Proxy command used to connect to instances. If set, this command should open a netcat socket, that Sahara will use for SSH and HTTP connections. Use {host} and {port} to describe the destination. Other available keywords: {tenant_id}, {network_id}, {router_id}.

rootwrap_command = sudo sahara-rootwrap /etc/sahara/rootwrap.conf

(String) Rootwrap command to leverage. Use in conjunction with use_rootwrap=True

swift_topology_file = etc/sahara/swift.topology

(String) File with Swift topology.It should contain mapping between Swift nodes and racks.

use_floating_ips = True

(Boolean) If set to True, Sahara will use floating IPs to communicate with instances. To make sure that all instances have floating IPs assigned in Nova Network set "auto_assign_floating_ip=True" in nova.conf. If Neutron is used for networking, make sure that all Node Groups have "floating_ip_pool" parameter defined.

use_identity_api_v3 = True

(Boolean) Enables Sahara to use Keystone API v3. If that flag is disabled, per-job clusters will not be terminated automatically.

use_namespaces = False

(Boolean) Use network namespaces for communication (only valid to use in conjunction with use_neutron=True).

use_neutron = True

(Boolean) Use Neutron Networking (False indicates the use of Nova networking).

use_rootwrap = False

(Boolean) Use rootwrap facility to allow non-root users to run the sahara services and access private network IPs (only valid to use in conjunction with use_namespaces=True)

use_router_proxy = False

(Boolean) Use ROUTER remote proxy.

[cluster_verifications]

 

verification_enable = True

(Boolean) Option to enable verifications for all clusters

verification_periodic_interval = 600

(Integer) Interval between two consecutive periodic tasks forverifications, in seconds.

[conductor]

 

use_local = True

(Boolean) Perform sahara-conductor operations locally.

[healthcheck]

 

backends =

(List) Additional backends that can perform health checks and report that information back as part of a request.

detailed = False

(Boolean) Show more detailed information as part of the response

disable_by_file_path = None

(String) Check the presence of a file to determine if an application is running on a port. Used by DisableByFileHealthcheck plugin.

disable_by_file_paths =

(List) Check the presence of a file based on a port to determine if an application is running on a port. Expects a "port:path" list of strings. Used by DisableByFilesPortsHealthcheck plugin.

path = /healthcheck

(String) The path to respond to healtcheck requests on.

  • Deprecated

No deprecation reason provided for this option.

Table 5.4. Description of domain configuration options

Configuration option = Default valueDescription

[DEFAULT]

 

proxy_user_domain_name = None

(String) The domain Sahara will use to create new proxy users for Swift object access.

proxy_user_role_names = Member

(List) A list of the role names that the proxy user should assume through trust for Swift object access.

use_domain_for_proxy_users = False

(Boolean) Enables Sahara to use a domain for creating temporary proxy users to access Swift. If this is enabled a domain must be created for Sahara to use.

Table 5.5. Description of Auth options for Swift access for VM configuration options

Configuration option = Default valueDescription

[object_store_access]

 

public_identity_ca_file = None

(String) Location of ca certificate file to use for identity client requests via public endpoint

public_object_store_ca_file = None

(String) Location of ca certificate file to use for object-store client requests via public endpoint

Table 5.6. Description of Redis configuration options

Configuration option = Default valueDescription

[matchmaker_redis]

 

check_timeout = 20000

(Integer) Time in ms to wait before the transaction is killed.

host = 127.0.0.1

(String) Host to locate redis.

  • Deprecated

Replaced by [DEFAULT]/transport_url

password =

(String) Password for Redis server (optional).

  • Deprecated

Replaced by [DEFAULT]/transport_url

port = 6379

(Port number) Use this port to connect to redis host.

  • Deprecated

Replaced by [DEFAULT]/transport_url

sentinel_group_name = oslo-messaging-zeromq

(String) Redis replica set name.

sentinel_hosts =

(List) List of Redis Sentinel hosts (fault tolerance mode), e.g., [host:port, host1:port …​ ]

  • Deprecated

Replaced by [DEFAULT]/transport_url

socket_timeout = 10000

(Integer) Timeout in ms on blocking socket operations.

wait_timeout = 2000

(Integer) Time in ms to wait between connection attempts.

Table 5.7. Description of SSH configuration options

Configuration option = Default valueDescription

[DEFAULT]

 

ssh_timeout_common = 300

(Integer) Overrides timeout for common ssh operations, in seconds

ssh_timeout_files = 120

(Integer) Overrides timeout for ssh operations with files, in seconds

ssh_timeout_interactive = 1800

(Integer) Overrides timeout for interactive ssh operations, in seconds

Table 5.8. Description of timeouts configuration options

Configuration option = Default valueDescription

[timeouts]

 

delete_instances_timeout = 10800

(Integer) Wait for instances to be deleted, in seconds

detach_volume_timeout = 300

(Integer) Timeout for detaching volumes from instance, in seconds

ips_assign_timeout = 10800

(Integer) Assign IPs timeout, in seconds

wait_until_accessible = 10800

(Integer) Wait for instance accessibility, in seconds

5.1.2. New, updated, and deprecated options in Ocata for Data Processing service

Table 5.9. New default values

OptionPrevious default valueNew default value

[DEFAULT] use_neutron

False

True

Table 5.10. Deprecated options

Deprecated optionNew Option

[DEFAULT] rpc_thread_pool_size

[DEFAULT] executor_thread_pool_size

[DEFAULT] use_syslog

None