Devops

Release Process

  • Get main branch to desired state and commit

Note

If you have made changes to the requirements.in file, you need to run make with the build-requirements target (i.e. “make build-requirements”) in the docker directory and commit the resulting requirements.text file before labeling.

  • Set environment variable RELNO to the desired value

    • Releases are numbered x.y.z (i.e no “v” prefix)

    • The osnver.sh script is helpful to show what’s currently running

    Warning

    The RELNO environment variable must be set for the git and make commands in the following steps to work correctly. Make sure that it is set to the right value before proceeding.

  • Label the release

    git tag -f ${RELNO}
    git push origin main --tags
    
    • If you make a mistake and want to delete the tag

      git tag -d ${RELNO}
      git push origin :refs/tags/${RELNO}
      
  • Build and push the new release image

    cd docker
    make push-release
    

Note

The dockerfile pulls coldfront from the UBCCR repo during the image build. When it becomes necessary to update the version of coldfront used in the build, change the CF_RELEASE docker environment variable to the desired release.

  • Delete the onscoldfront pod in the osncoldfront namespace

  • Verify the pod restarts without an error

  • Verify the running instance version tag information matches RELNO using the osnver.sh script

    Note

    The running production instance will have two version tags, the “release” tag and a tag of the form x.y.z

Image Building

Building the image is controlled via a makefile in the docker directory. The makefile supports several targets the most important of which is push-release. The push-release target reads the environment variable RELNO, fetches that version of the osn-coldfront repo from GitHub, builds the image using docker tagging it with “release” and with the value of RELNO, then pushes that image to Dockerhub. There are three secrets that the Makefile reads from prod/osn/osncoldfront (see secrets discussion below) when satisfying the push-release target:

GITHUB_CULBERT_DOCKER_PAT (FIX NAME)

This is a Docker personal access token with read-only repository access that is used to pull the RELNO taged version from the repo when building

DOCKER_USER

This is the name of a user who is allowed to push to the mghpcc Dockerhub account.

DOCKER_TOKEN

This is a credential used to access Dockerhub in order to push the image after building it.

Transaction and Bucket processing

Three Django management commands combine to implement AMIE transaction and bucket processing. The commands are normally run as kubernetes cronjobs.

AMIE: amie_fetch_packets

The amie_fetch_packets command retrieves packets from the AMIE system and creates corresponding AMIETransaction objects in the Django database. Multiple packets are exchanged between a resource provider (e.g. OSN) and AMIE while processing a transaction.The amie_fetch_packets command is responsible for creating the AMIETransaction object when a transaction starts and is responsible for marking it complete when both the osncoldfront and AMIE systems agree that processing has completed successfully.

The AMIETransaction object created by amie_fetch_packets contains the original AMIE json packet which is used by commands that process the transactions.

AMIE: amie_process_packets

The amie_process_packets command inspects all Django AMIETransaction objects and dispatches non-completed objects to type-specific processing modules based on the type of AMIE request (e.g. RPC, RAC, RUM). Processing modules act on the contents of the AMIE json packet that is stored in the transaction and set the state of the transaction when processing completes. Most transactions require multiple passes to complete. The amie_process_packets command will continue to dispatch a transaction object for processing on each invocation until the transaction succeeds or fails.

The amie_process_packets command can take two arguments:

  • –count <n> - If the count argument is passed, the command will only <n> commands before stopping. The default is 10.

  • –txnid <id> - If a txid is specified, the command will only process transactions with the specified id

OSN: provision_buckets

The provision_buckets command is responsible for creating buckets on the OSN network in response to OSN Bucket resource allocations in coldfront. The script processes all active OSN Bucket resource allocations that have changed attributes. The command keeps track of which attributes have changed by adding a timestamp attribute to the allocation each time the command processes the allocation. Any attribute with a more recent modification time than the timestamp attribute has been changed. The command can take the following arguments:

  • –dryrun: The dryrun option prints out the actions that will be taken when the script runs but does not actually provisioning any buckets.

  • –count <n>: Only process n allocations.

  • –bucket <name>: Only process the bucket with the specified bucket name

Note

All attributes on a new allocation are considered “changed” and, therefore, all new allocations are processed.

Canned Policies

There are two policies available for buckets (depending on whether the anonymous bucket attribute is set), the default policy and the anonymous policy. The default policy allows read/write access to the “datamanager” user and read-only access to the “readonly” user.

Default Policy

The Default policy for a bucket named “rr1” is shown below.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "bucket read-write policy",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam:::user/rr1_datamanager"
        ]
      },
      "Action": [
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::rr1/*",
        "arn:aws:s3:::rr1"
      ]
    },
    {
      "Sid": "bucket read-only policy",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam:::user/rr1_readonly"
        ]
      },
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::rr1/*",
        "arn:aws:s3:::rr1"
      ]
    }
  ]
}
Anonymous Policy

The anonymous policy modifies the default policy by adding the fragment shown below to the policy’s Statement list attribute.

{
    "Sid": "bucket  anon-read-policy",
    "Effect": "Allow",
    "Principal": {'AWS': ['*']},
    "Action": [
        "s3:GetObject",
        "s3:ListBucket"
    ],
    "Resource": [
        f"arn:aws:s3:::{bucket_name}/*",
        f"arn:aws:s3:::{bucket_name}"
    ]
}

Custom Policies

Custom policies are discouraged but they are sometimes necessary. The general process to create a custom policy is to download (get) the current policy, modify it then upload (put) it back to the bucket. The commands shown below assume that an AWS profile exists that has the credentials of osnadmin on the pod at SITENAME.

aws s3api get-bucket-policy --profile osnadmin --bucket rr1 --endpoint-url https://SITENAME.osn.xsede.org | jq '.Policy | fromjson' > rr1pol.json
aws s3api put-bucket-policy --profile osnadmin --bucket rr1 --endpoint-url https://SITENAME.osn.xsede.org --policy file://rr1pol_changed.json

Manual Processing

Packet and bucket processing normally runs automatically via kubernetes cronjobs; each of the processing commands is run once per minute. There may be times when it is desireable or necessary to stop automatic processing to deal with an issue ( e.g., bug in the code, bad data) in this case one can suspend the cronjobs and run the processing commands manually.

Suspend and Resume Cronjob Processing

Use the suspendcron.sh utility script to suspend and resume cronjobs. With no flags, the script will suspend all the OSN cronjobs in the production (osncoldfront) namespace. The script also accepts the following flags:

  • -l - List all the job names and their current status

  • -r - Resume jobs instead of suspending

  • -j - Suspend or resume the specified job name

  • -n - Specify a namespace other than osncoldfront (e.g. osncoldfrontdev for development)

Manually Running Django Commands

You must run the django commands in a container that is running django. The most straightforward way to do this is to exec into the osncoldfront pod; the default container in that pod is the django application.

kubectl -n osncoldfront exec -it deploy/osncoldfront -- bash

Once in the container, you execute the OSN commands just like any other django command, e.g.:

python manage.py amie_fetch_packets

AMIE Transaction Processing

For every transaction, there are a series of packets exchanged and a series of processing steps that are performed. Each run of a processing script exchanges one packet (amie_fetch_packets) or processes one step (amie_process_packets) for each active transaction. In general, fetching and processing proceeds in the following manner:

  • Fetch - Fetch sees a new AMIE request packet in the AMIE request queue (e.g. Request Project Create (RPC)) creates a transaction object in the DB and marks it as “received”

  • Process - Process sees the new transaction in the “received” state and dispatches to the appropriate handler

    • Handle - Handler processes transaction and marks as “success”

  • Process - Process command runs again, dispatches again to handler (because transaction is not yet marked as “complete”)

    • Handle - Handler sees the “success” status, sends a notification packet to AMIE and marks the transaction “notify”

    • AMIE receives notification and takes request packet out of its request queue

  • Fetch - Fetch command runs, sees the transaction in the “notify” state and checks the AMIE request queue; if the transaction is no longer there, fetch marks the transaction “completed”

For some of the more common AMIE transactions (e.g. RPC, RAC) this process runs twice for each transaction. In the case of a RPC, for instance, if a PI user is created as part of the process, AMIE will respond to our “I’ve successfully completed your RPC command” notification not by removing the RPC from their queue but by adding another request (i.e Data Process Create (DPC)) to provide additional info about the newly created PI. AMIE will not remove the RPC from its queue until we have successfully responded to the DPC request at which point both the DPC and RPC will be removed from the AMIE processing queue and fetch will be able to mark the original transaction as complete. So, the steps for an RPC would look like:

  • Fetch - Get RPC

  • Process - Do RPC processing

  • Process - Notify AMIE that we processed the RPC

  • Fetch - Get DPC in response to notify (RPC still in AMIE queue)

  • Process - Do DPC processing

  • Process - Notify AMIE that we processed the DPC

  • Fetch - Both the DPC and RPC are no no longer in the AMIE queue and we then mark both the DPC and RPC as complete.

Note

This pattern is per transaction. Each run of the commands processes more than one transaction at a time so, for sanity sake, it’s often helpful to run the amie_process_packets command with the –txnid option so you can see what’s going on.

Note

AMIE transactions and osncoldfront AMIETransaction objects are not “one-to-one”. An osncoldfront transaction is uniquely identified by a transaction id and a request type. An AMIE transaction will, therefore, frequently correspond to multiple osncoldfront transaction objects; each of those objects will have the same transaction ID but different request types (e.g. RPC, DPC, ITC)

Practical Considerations

Instead of trying to remember exactly what’s going on under the hood and issuing the correct processing pattern, the following is easily executed/remembered:

  1. Run amie_fetch_packets repeatedly until it reports “NO NEW PACKETS TO FETCH”

  2. Run amie_process_packets –txnid <<ID>> repeatedly until it reports “NO PACKETS AVAILABLE TO PROCESS”

  3. If any packets were processed during step 2 go to step 1 otherwise stop.

Troubleshooting

Things don’t always work as expected!

Failed packet, ignored packets and other processing

You can see the status of AMIETransaction objects using the django admin interface. This UI is accessed from the “Admin -> ColdFront Administration” menu item on the navbar. The OSN-specific items can be found on that page in the section labeled “Coldfront_Plugin_Osn”; the AMIE transactions are accessed via the “AMIE transactions” link.

The default view displays all the transactions that are not in the “complete” state. To view everything click the link “Also show completed” in filter section.

Failed and Stuck Transactions

The longest an AMIE transaction should take to process is approximately 7 iterations. The commands are run every minute so, worst-case, a transaction should be completed within approximately 10 minutes from being received. If any transaction does not complete in that period of time, there is likely something wrong. Most of these transactions will (hopefully) be in the “failed” state. In this case, a handler tried to process the transaction and something went wrong; the details of what went wrong will be found in the command processing log.

In the past, there have been edge cases (bugs) that caused transactions to get stuck in the “processing” state; it’s likely undiscovered edge cases still exist. These are a little tricker to diagnose but, one should start with the command logs and additionally inspect the content of the AMIE request; each AMIETransaction object stores the json content of its related AMIE request which can be inspected via the admin interface. Often, one can figure out what is going on from looking at the contents of the request.

Manual State Changes

In either the failed or stuck transaction case, it is often possible to continue processing after fixing the code or data issue. Once an issue has been resolved, the administrator can use the admin interface to set the transaction state to “continue” and processing will resume. All handlers are (supposed to be - more QA probably needed) idempotent so continuing a failed or stuck transaction which may have partially executed before failing is harmless.

Finally, transactions can be manually put into the “ignore” state. The two most common reasons to set a transaction to “ignore” are data issues and code issues.

With data issues, the AMIE request or the state of the coldfront database is such that the transaction cannot be processed until something is done to correct the data. An example of this is receiving an AMIE RPC that asks for a nonsensical number of service units (e.g. the result of someone specifying su in bytes instead of TB).

Code-related issues are similar in that, if we know a transaction has or will fail due to a processing bug, we can put the transaction into the “ignore” state to prevent triggering the bug over and over (stuck case). Or, in the situation where a bug incorrectly fails a transaction, to flag the transaction differently than a “normal” failed transaction.

Logs

The osncoldfront application logs to three places, the console, a log file and to Azure Insights.

Console logs are available via the standard kubernetes logs command. All loggers display console logs.

Management command logs are also stored in a persistent volume which is most easily accessed via a mount in the osncoldfront-container container. A log-rotated file is stored there at /var/logs/osn/osncommand.log. It can be useful to download this file (and/or its historical archives) to troubleshoot more complex issues.

Azure Insights

Finally, the application sends telemetry to Azure Insights; all logging is visible there (in the traces table) as well as metrics from django (page load times, 404s, 500s, etc.)

Each of the manifests for the cronjobs and the main application manifest set Open Telemetry environment variables (Azure Insights is based on Open Telemetry) which can be used as query parameters in Azure Insight queries. The most useful result is that events from the individual processes can be selected by specifying the resource “cloud_RoleName” in Azure Insight queries. All role names are of the form <<namespace>>.<<service name>> where:

  • Namespace is one of:

    • osncoldfront

    • osncoldfrontdev

  • And service name is one of

    • osncoldfront

    • osncoldfront-bucket-cronjob

    • osncoldfront-fetch-cronjob

    • osncoldfront-process-cronjob

    • osncoldfront-usage-cronjob

So, for instance, selecting all trace table items where cloud_RoleName=’osncoldfront.osncoldfront-process-cronjob’ will return all events logged from the amie_fetch_packets cronjob in the production instance.

Azure insights provides a sql-like query syntax that is useful for tracking down issues from the logs.

Other Management Commands

The application has several other management commands in addition to the ones already described.

Usage Management

The update_usage command examines every project and for every bucket allocaiton in the project updates the usage for the Open Storage Network (Cloud) resource. The resource has one attribute, “OSN Network Quota (TB)”, which both specifies quota and has a “slot” to track usage against that quota. The update_usage command updates that slot with the sum of the bucket quotas for the “OSN Bucket” resources allocated to the project.

Note

The script sums the bucket quotas, not the actual bucket usage. If a user has allocated a 1T bucket and has a 1T network allocation, the usage will be reported as 100% even if the bucket is empty.

Changing an ACCESS user’s username

It is ocasionally necessary to change the local username of an ACCESS-provisioned user. In this case, both the local username and the remote ACCESS entity need to be changed. The change_amie_id command is used to perform this task. The command takes the current username and the new username as positional arguments and updates the local and remote information.

Run-once Commands

There are four commands that are run only once and are described here for informational purposes.

add_osn_defaults

This command is run once when the software is first setup. It creates the OSN-specific objects in the coldfront/Django database (e.g. creates the “OSN Bucket” resource and associated attribute types)

migrate_osn_portal

This command is responsible for reading all legacy data from the OSN Keycloak instance and creating the corresponding entities in the coldfront system (e.g. projects, users and resource allocations).

migrate_xsede_inactive

This command supplements the migrate_osn_portal migrator. The migrate_osn_portal command, by design, ignores inactive users and projects. AMIE, however, “remembers” all users and projects that it has ever instantiated (active or otherwise) and may send commands to manipulate those entities. The most common scenario is when an inactive project is reactivated. This command imports items that the original migrator (incorrectly) ignored.

update_amie_ids

The original migrator assumed that the coldfront system would need to translate between keycloak usernames (a guid) that AMIE was tracking and the usernames for newly created coldfront users (e.g. a keycloak legacy user might have been known as c3e2d8f3-7690-45bd-85e3-dd1f44ecec0d whereas the coldfront user might be djones@access-ci.org). To support this the original migrator created a translation table. It was later learned that AMIE has an API call to support just this scenario (i.e. to inform AMIE that a user has a new identitfier). The update_amie_ids script walks the translation table and informs AMIE about the username changes.

Secrets

All secrets are managed in AWS Secrets Manager. Secrets are stored as plain text yaml and json documents. There are two major categories of secrets, OSN Pod Secrets and Project Secrets.

OSN Pod Secrets

Each pod has a secret located at prod/site/<<sitename>> (e.g. prod/site/amnh3) which contains all secret information for that pod. These data are used in numerous places including pod installation and management. The primary overlap with the osncoldfront application is that the source of truth for the osnadmin rgw user credentials are stored in that secret. This credential is needed by the application to create buckets and users.

Note

The source for osn credentials is stored in these secrets. They are copied into the project secret file and not read directly from the pod secrets. This will likely change in the future.

Project Secrets

Project-wide secrets for the osncoldfront application are stored in a secret named prod/osn/coldfront. This secret is programatically read by Kubernetes and the values in the secret are made available as environment variables in the osncoldfront kubernetes pods via the External Secrets operator. The secret is structured as a json object; each object field corresponds to an environment variable that gets set in the runtime environment. The following values are stored in the secret.

OSN Pod Information

A PODLIST variable encodes the list of all pods. Information for a specific pod is stored in variables that following the naming convention <<PODNAME>>_<<VARNAME>>. The program parses the PODLIST variable then uses the pod names stored there to retrieve pod-specific configuration information.

OSN pod information are stored in the secret in the following variables.

PODLIST

Colon separated list of pod names e.g., “<PODNAME>:NCSA:AAMU:MGHPCC:URI”

<<PODNAME>>_ENDPOINT

The endpoint url for a pod with PODNAME (e.g. https://minipod.osn.mghpcc.org)

<PODNAME>_ACCESS_KEY

The ACCESS_KEY for the osnadmin user on the pod with name PODNAME

<PODNAME>_SECRET_KEY

The SECRET_KEY for the osnadmin user on the pod with name PODNAME

Adding New Pods

Once a new pod has been created it can be added to the application by updating the PODLIST variable and adding the associated variables for the new pod. So, to add a new pod named MYPOD you would:

  • Add MYPOD to the PODLIST variable separating it from the other podnames with a colon(s)

  • Add the variables MYPOD_ENDPOINT, MYPOD_ACCESS_KEY and MYPOD_SECRET_KEY to the secret

OSNADMIN user

Each pod needs a privileged RGW user named “osnadmin” (this is where the ACCESS and SECRET keys come from). The following recipe may be used to create the osnadmin user on a newly created OSN pod.

radosgw-admin user create –uid=osnadmin –display-name=“osnadmin” –email=admin@mghpcc.org
radosgw-admin caps add –uid=osnadmin –caps=“buckets=read;metadata=read;usage=read;users=*”
radosgw-admin quota enable –quota-scope=bucket –uid=osnadmin

Note

In the future, this RGW user creation step should be take care of by ansible/awx

Operational Configuration

Configuration information is stored in two places, AWS Secret Manager and in the osncoldfront_configmap.yaml file. These data are converted into environment variables and are available to the application runtime. Standard environment variables for Django and Coldfront (and explained in their respective documentation) are set via these mechanisms. In addition to those settings, the OSN software supports the following configuration variables.

DEBUG_STATIC (Default: False)

When True, Django will serve static files even when not running under runserver.

OSNCMD_LOG_LEVEL (Default: INFO)

Allows the logging level of the commands (e.g. amie_fetch_packets, amie_process_packets) to be controlled separately from the rest of the application.

AMIE_SITENAME

Location of the AMIE endpoint; AMIE supports a production and testing endpoint.

AMIE_API_KEY

The credential used to login to the AMIE site to send and retrieve packets.

AZURE_INSIGHTS (Default: True)

When set, the application sends logging and telemetry to Azure Insights

Note

Of all of these configuration settings, OSNCMD_LOG_LEVEL is the one most relevant to operations as one will frequently want to set this to DEBUG when troubleshooting command operations. One can edit this directly with kubectl and restart the pod to change the logging level.

Other Operational Info/Recipes

This stuff probably needs to go somewhere more organized but better to write it down then get it “just right”…

Bucket renaming

Bucket renaming in ceph is not particularly difficult (though the naming of the operation as “link” is confusing). The issue for OSN is that we emulate “bucket keys” by creating users behind the scenes that encorporate the bucket name into the username. There are several spots in the code where this user naming convention is assumed. Therefore, if we rename a bucket, we need to additionally

  • recreate the associated bucket users

  • update the bucket policy and

  • change the object ownership for all objects in the bucket.

We should decouple the hidden users from the bucketname as it creates an unnecessary dependency, however, until then, we need to follow the process below if a user needs a bucket renamed.

  • CEPH

    • Link bucket to new name

    radosgw-admin bucket link --bucket <<old-bucket>> --new-bucket-name <<new-bucket>> --uid osnadmin
    
  • COLDFRONT

    • Add OSN Bucket Name attr == <<new-bucket>>

    • Remove OSN Bucket Name attr == <<old-bucket>>

    • Run bucket provisioning management command

      kc config use-context cfprod
      kc exec -it deploy/osncoldfront -- bash
      python manage.py provision_buckets --bucket <<new-bucket>>
      

      Note

      Running the management command creates new <<new-bucket>>_readonly and <<new-bucket>>_datamanager users and fixes bucket policy for the new bucket name and users

  • CEPH

    • Fixup bucket, objects and users

    radosgw-admin bucket chown --bucket <<new-bucket>> --uid <<new-bucket>>_datamanager
    radosgw-admin bucket unlink --bucket <<new-bucket>> --uid <<new-bucket>>_datamanager
    radosgw-admin bucket link --bucket <<new-bucket>> --uid osnadmin
    radosgw-admin user rm --uid <<old-bucket>>_datamanager
    radosgw-admin user rm --uid <<old-bucket>>_readonly
    

    Warning

    The chown in the first step is what changes the object onwership of all the objects in the bucket. This touches every object so should be aware that this may be an expensive operation

Username Algorithms

User authentication is the process of challenging a user to prove their identity. Many applications handle this process internally, matching credentials (such as a username and password) with information in a user database. Other applications delegate authentication to trusted third parties known as Identity Providers (IdPs). OSN authenticates users via external IdPs.

IdPs are organizations that have relationships with “verified” users. E.g. a university has a relationship with faculty/student/staff which is verified by their enrollment or employment status; a cloud provider has a relationship with account holders which is verified by billing information.

In addition to the verified relationship, IdPs have a method to authenticate users over the internet. IdPs use this authentication function to support their own services (e.g. university help desk, cloud provider dashboard) and some also make this authentication function available to other organizations via secure authentication protocols such as SAML and OIDC. When a user is authenticated via these protocol, cryptographically secured user “claim” information (e.g. email address, IdP unique identifier, eppn) is sent back to the delegating application.

OSN integrates with third party IdPs via the OIDC protocol and uses claim information to logon existing users and to create new users on first logon.

First Logon

When a user logs onto OSN via an IdP the username, which is the key used to retrieve the coldfront/django user object, is derived from the OIDC claims presented from the IdP. The following algorithm is used to derive the username from the claims:

  • If the IdP is an OIDC IdP (Google, Microsoft, Git, ORCID) then:

    • If the IdP is ORCID then parse the ORCID ID from the claims and create a username = ORCID_ID@orcid.org

    • Otherwise use the email

    Note

    We would love to just use email everwhere but most users coming from ORCID do not release email, so we standardize on detecting ORCID, parsing and using the id@orcid.org form described

  • If the IdP is SAML (the rest of the InCommon universe) then:

    • If the user is coming from the ACCESS IdP then we know the eppn is present and we use that

    • Otherwise (non-ACCESS), we try to use the eppn which may or may not be available

    • if eppn is not available, we try to use email

    • if email is not available we cannot auto-create a user and first logon will fail

    Note

    We use the eppn (as opposed to the simpler email approach) with ACCESS because email is not a stable identifier for ACCESS users. Researcher emails change frequently as they move among institutions and organizations. All ACCESS users have an ACCESS IdP unique identifier. This identifier is used during AMIE provisioning (e.g. RAC, RPC) and is, therefore, used as the unique identifier during authentication. This identifier gets sent to OSN in the eppn claim. [1].

Note

The algorithm used to create a user is used to find and update the user every time that user logs on. This way a name change or email update will propagate from logon claims to the user. This may prove confusing if any manual updates to those fields are done.

AMIE Provisioning

Some AMIE transactions create new OSN users. For these users to be able to logon to OSN, there needs to be a way to link information used to create the OSN user at provisioning time with information provided by the IdP at authentication time. OSN relies on the ACCESS user identifier for this purpose which is present in the AMIE packet (X-PORTAL info) and in the ACCESS IdP OIDC claims (eppn). Users provisioned in OSN via ACCESS/AMIE, therefore, are required to logon to OSN via the ACCESS IdP.

Unfortunately, AMIE has some weird and wonderful eccentricities including that, sometimes, it requests user creation but does not provide an ACCESS identifier. In these cases, we still need to provision the user; the following algorithm is used to create the unique ID for users in OSN.

  • Use the ACCESS portal id (which gets asserted as eppn, default/happy case)

  • Use the user email

  • generate a unique username hash@osn.mghpcc.org

    • where hash is unique based on the contents of the RPC/RAC (txnid, type, tstamp)

    • The username must be limited to 30 characters so the hash is limited to 15 character

Manual Linking

If an AMIE packet does not have an ACCESS logon identifier, then it’s not possible to create a user that is linked to their ACCESS logon (one can search the Azure Insights logs for warnings in the timeframe of the processed RPC/RAC to find this issue. Yes, we need a better mechanism.) In this case, the user is provisioned with either their email as the unique ID or a randomly generated ID. When this happens, an OSN admin must work with a user to link one of their other (i.e. not ACCESS) IdP logons with their OSN account.

Note

In theory this is not supposed to ever happen and we have not seen it in prodcution however, the test environment does generate these RACs and RPCs.

The general approach is (based on the authentication algorithm described previously), the admin works with the user to determine either the email address or eppn provided in claims from one of the other IdPs that they currently use. Then, the admin finds the user in the OSN portal and changes the username using the management command change_amie_id. This command changes the local username and sends a packet informing ACCESS that the old username has been changed to the new username.

Warning

Do not use the admin interface to change the username. The name must be changed both locally and at the ACCESS end. You must use the change_amie_id management command to do this.

Note

If it won’t be necessary to change the username in all cases. If an email is available, provisioning will user that. If that email is sent by one of the user’s other IdPs, then the user can simply use that IdP instead of the ACCESS IdP to logon.

Some possible alternative IdPs and the related username frobbing:

MS365, Google, GitHub

Set the username to their ms365/google/github email address

ORCID

Set the username to their orcid ID @orcid.org (e.g. 12345@orcid.org)

University Logon

Set the username to their eppn if available; if not Set the username to their email address if availble; if not Use a different IdP

Note

You can see inspect the email and eppn claims that OSN will see by having the user visit cilogon.org, authenticate with the IdP in question, then visiting the “User Attributes” link. Have them screenshot that page and send it to you. This will show the exact email and eppn claim that OSN will see and you can use it to link the user.

UI Notes

We’ve made some small tweaks to the UI which should survive coldfront updates but may not. The following information may be useful if UI issues crop up after upgrades.

Template Tweaks

We’ve created overrides for the following templates.

user/login.html

Swapped the order of logon choices and button text.

common/navbar_login.html

Add OIDC information populated by plugin middleware to show the user’s username and the IdP they logged in from where the coldfront username is normally shown.

allocation/allocation_allocationattribute_create.html

This template extends the original to include javascript that provides podname informational text and form validation support. This extension simply “decorates” the original and should not get in the way of future underlying template changes E.g. if future coldfront versions change the form field identifiers, this template will not need to be changed to avoid breaking coldfront. Updates may cause the javascript to cease to function correctly (e.g. entity ids targetted by the javascript change) but basic form processing should continue to work normally.

Warning

Upgrades may break OSN Pod helper javescript.

common/css/common.css

This css file overrides the common.css file. It simply adds a single css rule that targets the navbar-center-summary entity and hides it. We don’t have any content for this link so we’re hiding it for now. It’s a bit silly to have to override the whole file and possibly suffer breaking upgrades but we’re living with it for now.

Warning

It will be necessary to compare this file with the coldfront version on upgrades. We only need the one line at the bottom so the action to take if there have been css changes is to copy the coldfront version and add the one OSN rule to the bottom.

Footnotes