Using AWS ECS Service Connect and Service Discovery Together

A major consideration when deploying services in AWS Elastic Container Service (ECS) is how to determine how to connect services to other services. Traditionally AWS provides ECS Service Discovery by leveraging AWS Cloud Map, a DNS-based service lookup. Because of well-known downsides to this general approach (primarily a latency in DNS updates), AWS now provides Service Connect, a simple-to-use service mesh using a sidecar which proxies requests. But Service Connect doesn't update the DNS, making it difficult to access your services by name from outside ECS (e.g. via the command prompt in Cloud9). It turns out that ECS Service Connect and ECS service discovery can both be used together if you make certain configuration adjustments.

AWS Cloud Map allows you to set up some namespace for your VPC, and then assign names within that namespace to individual services. The names can either be A) privately discoverable only by API calls, B) discoverable via API calls or via DNS privately within the VPC, or C) discoverable via public DNS and by API calls. ECS can interact with Cloud Map to automatically register services. All this is referred to in AWS ECS as service discovery.

A well-known problem with DNS-based service discovery is that the DNS entries may be stale. Using DNS-based lookup with Cloud Map means that when a service goes down, it may take a while (based on TTL settings) for your client to realize that it should get a new IP address. Even worse, your client library may keep the same IP address cached even longer, and/or your client's retry logic may keep trying trying the same IP address upon failure.

One interim solution was AWS App Mesh, which allowed you to run the Envoy proxy as a sidecar, creating a service mesh like Istio. Envoy would keep track of the service IP addresses and proxy any connections coming out of the first service and route the correctly to the other service. The first serivce would never need to know the true IP address of the second one—the Envoy sidecar proxy would sort it out.

AWS ECS now has a newer product called Service Connect, which leverages Cloud Map but also automatically and transparently adds a sidecard to your ECS deployment. This brings the standard benefits of a service mesh such as Envoy, except that in this case Service Connect manages the sidecar proxy for you. The benefit that ECS Service Connect brings over ECS Service Discovery using plain Cloud Map is faster failover when service instances go down. (See Migrate existing Amazon ECS services from service discovery to Amazon ECS Service Connect for more discussion on these benefits.)

Because Service Connect doesn't rely on DNS, it doesn't bother registering even private DNS entries, and instead registers endpoints with Cloud Map that are privately discoverable only by API calls. (These API calls are used by the Service Connect sidecar proxy, for example.) There seems to be no way to tell Service Connect to register the service names in the DNS as well.

However you might still desire private DNS access, if nothing else than to be able to go to Cloud9 and issue e.g. a curl my-service.example.internal/api/test without needing to look up the IP address of one of the my-service instances. You cannot use both Service Connect and Service Discovery at once for the same service name, because both Service Connect and Service Discovery will try to register the same service name with Cloud Map. For example using CloudFormation you might try to define a AWS::ServiceDiscovery::PrivateDnsNamespace and a AWS::ServiceDiscovery::Service (using the same name my-service) and even associate the latter with my ECS service using ServiceRegistries. But then when try to deploy my CloudFormation stack, you will get an error:

Invalid request provided: CreateService error: Service already exists.

Internally to get Service Connect to work, ECS creates its own AWS::ServiceDiscovery::Service, at which point it will see your CloudFormation stack had already created a AWS::ServiceDiscovery::Service with the same name and generate an error. But if you don't create AWS::ServiceDiscovery::Service, the one that ECS creates won't provide a DNS entry for my-service.

The solution to using both ECS Service Connect and ECS Service Discovery together is to use two different service names! If you define your task definition port mappings and service ServiceConnectConfiguration using a service/port discovery name such as my-service-connect, having specified the ECS cluster ServiceConnectDefaults with a namespace of example.internal, your other ECS services can connect to my-service-connect.example.internal even though there are no DNS entries for that name.

You can additionally define a AWS::ServiceDiscovery::PrivateDnsNamespace of example.internal (which Service Connect will use instead of creating a new one) along with a AWS::ServiceDiscovery::Service using a different service name such as my-service. Associate this service discovery with the ECS service using ServiceRegistries, and you will have the best of both worlds! ECS services can communicate using my-service-connect.example.internal, but you can still go to Cloud9 and connect to my-service.example.internal (note the different name) via the DNS, as ECS will ensure that both are registered. It's not guaranteed that both approaches will refer to the same service instance at any particular time, and if a service instance goes down the DNS approach my-service.example.internal may be stale until the new DNS value is propagated, but for ad-hoc tests in Cloud9 that hardly matters.

Configuring ECS Service Discovery using CloudFormation

Let's look at how that would work using CloudFormation. First define a private DNS namespace to use for service discovery, referencing the appropriate VPC. (In some use cases you might even want to define a AWS::ServiceDiscovery::PublicDnsNamespace.) This namespace functions as the "domain" in the DNS.

  ServiceDiscoveryNamespace:
    Type: AWS::ServiceDiscovery::PrivateDnsNamespace
    Properties: 
      Name: example.internal
      Vpc: !Ref VPC

Define service discovery for each service. Note that we are giving this service a DNS name of my-service.example.internal.

  ServiceDiscovery:
    Type: AWS::ServiceDiscovery::Service
    Properties: 
      Name: my-service
      NamespaceId: !GetAtt ServiceDiscoveryNamespace.Id
      DnsConfig: 
        DnsRecords:
          - Type: A
            TTL: 60
          - Type: AAAA
            TTL: 60
          - Type: SRV
            TTL: 60
        RoutingPolicy: WEIGHTED

Now in your ECS service definition, associate this service discovery with the service. This example assumes you have configured ServicePort appropriately for your service, e.g. 8080.

  Service:
    Type: AWS::ECS::Service
    Properties:
      …
      ServiceRegistries:
        - RegistryArn: !GetAtt ServiceDiscovery.Arn
          Port: !Ref ServicePort

Now ECS will automatically register your service with the private DNS for your VPC as my-service.example.internal! This is very handy when connecting to your services manually for testing.

Configuring ECS Service Connect using CloudFormation

For the services themselves to connect to each other, configure Service Connect for each service receiving connections. To facilitate the configuration, set the Service Connect defaults for the entire ECS cluster. We'll refer to the same namespace defined above to continue using example.internal.

  ServiceCluster:
    Type: AWS::ECS::Cluster
    Properties:
      …
      ServiceConnectDefaults:
        Namespace: !GetAtt ServiceDiscoveryNamespace.Arn

In the task container definition, set up the service port using the name to use for Service Connect. Importantly here we'll use my-service-connect rather than my-service.

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      …
      ContainerDefinitions:
        - …
          PortMappings:
            - Name: my-service-connect
              ContainerPort: !Ref ServicePort
              Protocol: tcp
              AppProtocol: http

Finally in the service definition itself, add a configuration for Service Connect using the same name as you did in the task container definition port maping:

  Service:
    Type: AWS::ECS::Service
    Properties:
      …
      ServiceRegistries:
        … # see above to configure Service Discovery
      ServiceConnectConfiguration:
        Enabled: true
        Services:
          - PortName: my-service-connect
            ClientAliases:
              - Port: !Ref ServicePort

Now ECS will keep both Cloud Map and the private DNS updated with the service IP address, albeit with different names. Another ECS service can request a connection to my-service-connect.example.internal, and the ECS Service Connect sidecar proxy will automatically route it to a live service instance. If you want to spin up Cloud9 and use the terminal to connect manually to the service, you can use my-service.example.internal, which will find the service based upon DNS. Just remember that it's possible that the DNS record might be little stale: if connecting to my-service.example.internal doesn't work, try again, in case a service instance has gone done and been replaced by another while your client was using slightly out-of-date IP address.