Openshift : problem with horizontal pod autoscaler & metric gathering system

Posted on

Metric collection deployed in Openshift infra.

Problem 1 :
Heapster process seems to freeze after 15 to 30 minutes. Need to restart heapster to collect metrics again.

Problem 2:
When I try to deploy an horizontal pod autoscaler, I get an error messages :

5:28:59 PM HorizontalPodAutoscaler
phpscaler HorizontalPodAutoscaler FailedGetMetrics FailedGetMetrics
failed to get CPU consumption and request: failed to unmarshall heapster response: invalid character 'E' looking for beginning of value (23 times in the last 11 minutes, 2 seconds)

I tried to solve the issue by following the following docs :
https://github.com/openshift/origin/issues/6293
https://bugzilla.redhat.com/show_bug.cgi?id=1289503

Hera are some info about project config :
Pod configuration :
Containers:
cakephp-mysql-example:
Container ID: docker://5dea2c9619c5decb2df730e90cf9f49f45ab8aa4be669aa4e0e786136604fc5f
Image: 172.30.224.123:5000/cakephp/cakephp-mysql-example@sha256:010c970899a757b4e5444de35eac311190f2f176bb0580e80c9fdb682468ac0c
Image ID: docker://ed8936c944d7228e85a9c4481d6be55dc43a35983e9b718694fa3f221419cd1e
QoS Tier:
cpu: Guaranteed
memory: Guaranteed
Limits:
cpu: 500m
memory: 512Mi
Requests:
memory: 512Mi
cpu: 500m
State: Running
Started: Wed, 13 Apr 2016 16:44:54 +0200

Autoscaler config :
apiVersion: extensions/v1beta1
kind: HorizontalPodAutoscaler
metadata:
name: phpscaler
spec:
scaleRef:
kind: DeploymentConfig
name: cakephp-mysql-example
apiVersion: v1
subresource: scale
minReplicas: 1
maxReplicas: 10
cpuUtilization:
targetPercentage: 70

HPA role description :
apiVersion: v1
kind: ClusterRole
metadata:
creationTimestamp: 2016-03-03T14:59:27Z
name: system:hpa-controller
resourceVersion: "40"
selfLink: /oapi/v1/clusterroles/system:hpa-controller
uid: 864c18c6-e150-11e5-ba5d-00505685783c
rules:
- apiGroups:
- extensions
attributeRestrictions: null
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- apiGroups:
- extensions
attributeRestrictions: null
resources:
- horizontalpodautoscalers/status
verbs:
- update
- apiGroups:
- extensions
attributeRestrictions: null
resources:
- replicationcontrollers/scale
verbs:
- get
- update
- apiGroups: null
attributeRestrictions: null
resources:
- deploymentconfigs/scale
verbs:
- get
- update
- apiGroups: null
attributeRestrictions: null
resources:
- events
verbs:
- create
- patch
- update
- apiGroups: null
attributeRestrictions: null
resources:
- pods
verbs:
- list
- apiGroups: null
attributeRestrictions: null
resourceNames:
- 'https:heapster:'
resources:
- services
verbs:
- proxy