Automation of ServiceNow Data Extraction on AWS Cloud.

5 min readMay 16, 2023

In order to provide complete visibility into the business, organisations need to replicate data from multiple systems including ITMS such as ServiceNow, into a data lake to enable business intelligence, data analytics, and machine learning.

AWS offers the best cloud platform to build and run data extraction and transformation services and building blocks to implement a secure, flexible, and cost-effective data lake. Many companies deploy data extraction and transformation services on AWS cloud serverless compute engine. Extracting data from ServiceNow without impacting its performance is a challenging task. The purpose of this post is to guide you through creating a generic AWS Batch service backed by DynamoDB configuration to extract data from any ServiceNow table in CSV format and store it in an Amazon S3 bucket. Amazon S3 provides the foundation for building the data lake on AWS.

ServiceNow provides Rest APIs to retrieve data from tables. The following architecture diagram illustrates the automation of ServiceNow data extraction on the AWS cloud. Data extraction services can be scheduled to run at a specific time or on demand. Service will load configuration data from DynamoDB, extract the data from ServiceNow as per the configuration, transform the data to CSV format, and store it in an S3 bucket. Using configuration, you can instruct the service to only extract the data you need.

The architecture uses the following AWS resources:
Amazon ECR: Amazon Elastic Container Registry (ECR) is a fully managed Docker container registry and it is used to store extraction service container images.
AWS Batch: AWS Batch is a fully managed batch computing service, which allows running the extraction service container image on the serverless compute engine, AWS Fargate.
Amazon S3: It is an object storage service that stores data as objects within buckets and provides the foundation to create a data lake.
Amazon DynamoDB: It is a fully managed, serverless, key-value NoSQL database that let you store configuration data in JSON format.
Amazon EventBridge Scheduler: It is a serverless scheduler that allows scheduling AWS batch service to run at the specified time.

Configuration Data

Configuration data is stored in DynamoDB as follows.

The service reads configuration data stored in DynamoDB, which contains the table name, a list of fields to be extracted, filters and pagination details, etc, and extracts the data from the ServiceNow accordingly.

For example, if you want to extract data from the core company table, configuration data will be stored in DynamoDB as follows. Here, you define the table name, the list of field names, filters, and the page limit.

{
  "tableName": {
    "S": "core_company"
  },
  "fieldList": {
    "S": "name,customer,u_vip,vendor,u_third_party,u_cidn,u_abbreviation,u_active,u_restricted,sys_id,sys_created_on,sys_updated_on,u_cid"
  },
  "filters": {
    "SS": [
      "customer=true",
      "u_active=true"
    ]
  },
  "pageLimit": {
    "N": "1000"
  }
}

Python code snippet

Here is the Python code snippet to extract data from the ServiceNow. You need to pass the service connection parameters (URL, username, and password) and the configuration data stored in DynamoDB. You can use the boto3 client to read the configuration data stored in DynamoDB. This code snippet provides a generic API that enables you to extract data for any table in ServiceNow. You just need to create the configuration data for that table in the DynamoDB table. NO CODE CHANGE IS REQUIRED.

def extract_snow_data(snow_username, snow_password, snow_url, table_name, fields_list, page_limit, sys_parm_query):
    """
    Extract the SNOW data for a given table and fields.
    @param snow_username: SNOW username for basic authentication
    @param snow_password: SNOW password for basic authentication
    @param snow_url: SNOW base URL
    @param table_name: Table name to extract data
    @param fields_list: List of fields to extract data from the table
    @param page_limit: page limit for the pagination
    @param sys_parm_query: Additional filters
    @return: Return the extracted JSON data
    """
    offset = 0
    new_result = []
    auth = HTTPBasicAuth(snow_username, snow_password)
    try:
        fields = ",".join(fields_list)
        while page_limit > 0:
            url = f'{snow_url}/api/now/table/{table_name}?sysparm_display_value=all&sysparm_fields={fields}' \
                  f'&sysparm_limit={str(page_limit)}&sysparm_offset={str(offset)}&sysparm_query={sys_parm_query}'

            log.info(f'Calling SNOW API with pagination: {url}.')
            res = requests.get(url, auth=auth)
            res.raise_for_status()
            offset = offset + page_limit
            result = res.json()['result']
            if len(result) == 0:
                break
            else:
                new_result.append(result)
        return new_result
    except requests.exceptions.HTTPError as err:
        log.error(f'HTTP Error occurred while extracting data from SNOW for table: {table_name}, {err}')
        raise
    except requests.exceptions.ConnectionError as err:
        log.error(f'Connection failed for table: {table_name}, {err}')
    except requests.exceptions.Timeout as err:
        log.error(f'Timout while extract data from table: {table_name}, {err}')
    except requests.exceptions.RequestException as err:
        log.error(f'Request failed for table: {table_name}, {err}')
        raise
    except Exception as err:
        log.error(f'Error occurred while extracting data from SNOW for table: {table_name}. {err}')
        raise

Here is the code snippet to convert JSON data to CSV and write to a temp file.

def create_csv_file(result, table_name, fields_list):
    """
    Convert SNOW table JSON Data to CVS and write to tmp file.
    @param result: SNOW table JSON Data
    @param table_name: SNOW table name
    @param fields_list: List of fields to create CSV header
    @return:
    """
    data_file = open('/tmp/' + table_name + '.csv', 'w')
    try:
        csv_writer = csv.writer(data_file)
        count = 0
        for data in result:
            for field in data:
                new_field = sort_fields(field, fields_list)
                if count == 0:
                    # Writing headers of CSV file
                    header = new_field.keys()
                    csv_writer.writerow(header)
                    count += 1

                # Writing data of CSV file
                csv_writer.writerow(new_field.values())
    except Exception as ex:
        log.error(f'Error occurred while extracting data from SNOW for table: {table_name}. {ex}')
        raise
    finally:
        data_file.close()


def sort_fields(obj, fields_list):
    """
    Rearrange the data as in the fields list order
    @param obj: data row
    @param fields_list: List of fields to be ingested
    @return: Rearranged data row
    """
    new_obj = {}
    for item in fields_list:
        item = item.strip()
        new_obj[item + '_display'] = obj[item]['display_value']
        new_obj[item + '_value'] = obj[item]['value']
    return new_obj

Here is the code to upload CSV data stored in the temp file into an S3 bucket.

def upload_csv_file_to_s3(client, file_name, s3_bucket, s3_prefix):
    """
      Update the CVS file to S3 bucket
      @param client: s3 client
      @param s3_prefix: S3 prefix
      @param file_name: File name
      @param s3_bucket: S3 bucket name to upload the CSV file
      """
    try:
        time_now = datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
        client.meta.client.upload_file('/tmp/' + file_name + '.csv', s3_bucket, s3_prefix + '/' + file_name + '_' +
                                       time_now + '.csv')
    except Exception as ex:
        log.error(f'Error occurred while uploading the file to S3. {ex}')
        raise

You can include the above code snippet in your code, create the container image and publish it to ECR. Then, create the AWS batch job definition using the following CloudFormation resource.

  SNOWJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      Type: container
      PlatformCapabilities:
        - FARGATE
      JobDefinitionName: snow-data-ingestion-batch-service-job-definition
      Timeout:
        AttemptDurationSeconds: <Timeout In Seconds>
      ContainerProperties:
        Image: <ECR image>
        JobRoleArn: <Role ARM>
        ExecutionRoleArn: <Execution Role Arn>
        Command:
          - /table_data_extractor.py  #script name

Extracted data is stored in an S3 bucket as CSV files, which can be easily loaded into any datasource using a lambda function.

This post describes how to automate ServiceNow data extraction on the AWS cloud and provides sample code snippets for ease of implementation.

Automation of ServiceNow Data Extraction on AWS Cloud.

Written by Nanthan Rasiah