How Feature Flags Work#

Understand feature flag concepts, the SKAO implementation, and best practices.

What are feature flags?#

Feature flags (also called feature toggles or flippers) let you turn functionality on or off at runtime without deploying new code.

A feature flag is a decision point that changes application behaviour based on the flag’s state:

if feature_flags.is_enabled("new-shiny-feature"):
    show_new_shiny_feature()
else:
    show_old_feature()

Implement feature flags as:

  • Runtime configurable toggles — Via API calls (Unleash, GitLab)

  • Tango attributes — For device-level control

  • Static configuration — Deployment configuration or program parameters

  • Compile-time options — Build flags

Feature flag decision flowchart

Use this flowchart to determine which implementation suits your needs. Many SKA components only require static configuration options.

Use cases#

Feature flags provide these advantages:

Decouple deployment from release

Deploy code to production frequently, but only release features when ready. Configure systems fully before enabling functionality.

Canary releases and gradual rollouts

Release features to a subset of users (PSIs, AIV, software-only cloud) before full rollout, reducing risk.

A/B testing and compatibility

Expose different versions of a feature to test new functionality or compatibility with other components — different algorithms, data formats, or UIs.

Kill switches

Disable problematic features in production instantly without rollback or hotfix deployment.

Development

Merge incomplete features to the main branch, hidden behind a flag. This reduces merge conflicts and integration pain, especially for features requiring updates to many components.

Operational control

Enable or disable features for specific operational needs, such as disabling resource-intensive features during peak load.

Feature flag types

Read more about feature flag types.

Anti-patterns#

Feature flags require discipline. Avoid these pitfalls:

Long-term configuration

Flags are temporary. Don’t use them as a permanent configuration system — use proper configuration management instead. Plan for flag removal from the start.

Excessive complexity

Too many flags, especially nested ones, make code hard to reason about, test, and maintain. Each flag doubles the number of possible code paths.

Replacing proper design

Don’t use flags to implement architectural decisions, refactor code, or reduce technical debt. Address these through proper engineering practices.

Core architectural changes

Don’t use flags to toggle fundamental architectural differences. Moving from NoSQL to SQL introduces too much code complexity and migration issues to control via a flag.

Naming conventions#

Follow these conventions for consistent, understandable flags:

Prefix with component and/or subsystem name

This identifies the flag’s purpose and scope.

Example: component-x-enable-new-function

Use the same flag name across repositories

If a flag in Component A needs control during integration testing of Subsystem X, keep the flag name consistent. The definition and control plane shifts, but the code-level flag name stays the same.

Use the same flag name across environments and components

This simplifies configuration management and reduces errors.

Example configuration matrix:

Datacentre

Environment

Component

Flag Status

STFC

CI/Test

Component X

enabled

STFC

Integration/Staging

Subsystem A

enabled

ITF

Integration

SKA MID ITF

disabled

AA

Production

SKA MID AA

enabled

Best practices#

Define flags at the highest necessary level

If a flag in Component A only affects A’s internal behaviour and isn’t relevant to higher-level systems, manage it within its own project. If the feature needs coordinated rollout across the integrated system, document usage and default behaviour clearly.

Use configurable client initialisation

Always configure Unleash client options via environment variables so different data centres can use different environments.

Remove flags after fulfilling their purpose

Old flags add complexity and cognitive load. Schedule flag removal as part of the feature rollout plan.

Provide graceful degradation

Always use fallback_value or check client.is_initialized in case the Unleash server is unreachable.

Log flag decisions

Log when behaviour changes based on flags to aid debugging.

Secure API tokens

Manage tokens using Kubernetes secrets or Vault. Never hardcode or commit tokens to version control.

Feature flag lifecycle#

A typical lifecycle for a feature controlled by runtime flags using the Unleash client and GitLab backend:

Feature flag lifecycle

The lifecycle has six stages:

1. Local development#

A developer introduces new functionality in the Component X repository, wrapping the new and old code paths in a conditional controlled by the new-x-feature flag:

if ff.is_enabled('new-x-feature', fallback=True):
    new_logic()
else:
    old_logic()

During local development, use:

  • A mock client

  • Cached values

  • A local Unleash instance

Define the flag initially in the repository’s GitLab project settings for team control during development.

2. CI/CD pipeline#

Git push triggers the CI/CD pipeline, which builds, tests, and deploys the application to subsequent environments.

3. Integration testing#

Automated tests run against the integrated code: unit tests, component tests, and integration tests.

Tests fetch flag configurations from GitLab Feature Flags defined in the repository’s project. For CI/test environments, configure the flag to ON to test the new code path.

4. Staging environment#

The CI/CD pipeline deploys to a persistent staging environment.

An Unleash Proxy service runs within staging, periodically fetching flag configurations from GitLab. The application checks flag status by querying the local proxy.

Configure the flag to ON for the staging environment. Strategies might enable it for specific subsystems or user groups.

AIV, Cloud, or other testers can verify the new feature in staging.

5. Production environment#

After successful staging validation, deploy to production.

Manage the flag strategy for production environment in GitLab:

  • Toggle OFF/ON as needed

  • Use Gradual Rollout — enable for specific subsystems, user percentages, or user IDs

  • Eventually enable for 100% of users

Once stable, proceed to cleanup.

6. Cleanup#

After full rollout and stability confirmation:

  1. Remove flag logic from Component X code, keeping only the new path

  2. Deploy the cleaned code through CI/CD

  3. Delete the flag definition from GitLab

  4. Remove related tests for the old code path

JIRA organisation#

For complex feature rollouts, create JIRA tickets to track:

  • Flag creation and initial configuration

  • Environment-by-environment rollout

  • Cleanup and flag removal

  • Documentation updates

Link these tickets to the feature’s main story or epic for traceability.