I studied in detail how this works. I will try to describe this in detail here, so that we can later use this text as description of the merge request.
That description includes sequence diagrams that can be generated with https://stackedit.io/ or plantuml. (BTW a MR for supporting this directly in gitlab have been accepted).
That description does not applies to the version of the code from this merge request but assumes that:
- user creation and
Base_createOAuth2User
has been removed. ( f2cb1f6c ) - we do not use the
id
from google user data, but just use email address. ( cfff3fe3 ) - the "Add google login to current logged in user" feature is removed. ( 072762e7 )
This way, the only way to enable google sign-on for a person is to create a google login in the person and validate it.
Purpose of this merge request
There was already support for google login and also facebook login, and a modular architecture that factorize the common part of oauth2 authentication, but this was written only for the use case of public websites where we allow everybody to sign it. This implementation was creating a person external user upon the first sign-in.
With the work of !185 (merged) it is now possible to have multiple logins for the same user and this also includes external logins. In this merge request we introduce a Google Login
portal type to store in Persons, the same way we have ERP5 Login
.
The use case we want to support now is an organisation who uses many google services in parallel of ERP5 and where all user management is done in Google. When a person leaves the organisation, the admin just have to invalidate the login in the google console to prevent access to all their applications, including ERP5.
At the moment we are not trying to update the use cases of allowing logins on public website, we believe this should be the job of another signup action that's out of our current scope.
How it works
Initial Authentication sequence
This contain the traditional oauth2 flow, as we can see on google documentation:
In our case in the Use token to call Google API we'll call google API to get user email and use this as an external login.
Browser->ERP5: Click on "Login with Google"
ERP5->Browser: redirect to google to authenticate with scopes "profile" and "email" \n (ERP5Site_redirectToGoogleLoginPage)
Browser->Google_Auth_Server: User authenticates in Google service
Google_Auth_Server->Browser: User is redirected to ERP5's oauth2 callback \n ERP5Site_receiveGoogleCallback
Browser->ERP5: User is redirected to ERP5Site_receiveGoogleCallback with Code
ERP5->Google_Auth_Server: Exchange Code for Access Token
Google_Auth_Server->ERP5: Access Token
ERP5->Google_Account_Resource_Server: Query for user info
Google_Account_Resource_Server->ERP5: User info is {id: 1234567, email:'bob@gmail.com', ...}
note right of ERP5: Store this token in erp5_bearer's token session storage using Base_setBearerToken\nThe key is the token signed with the token, using Base_getHMAC.
note right of ERP5: Prepares a __ac_google_hash cookie containing that key.
ERP5->Browser: setCookie __ac_google_hash.
Subsequent requests
ERP5ExternalOauth2ExtractionPlugin.extractCredentials
implements Pluggable Auth Service IExtractionPlugin.
ERP5ExternalOauth2ExtractionPlugin
maintains two "caches", using its setToken
and getToken
methods.
The first cache maps cookie_hash -> token_dict
. This is the session storage of erp5_bearer that is filled during the initial authentication sequence. Let's call this Tokens session storage.
The second cache is really a cache, it maps token_dict['access_token'] -> google email address
. Let's call this user login cache.
Browser->ERP5ExternalOauth2ExtractionPlugin: request with __ac_google_hash cookie
note right of ERP5ExternalOauth2ExtractionPlugin: lookup in tokens session storage the token entry for that cookie value
note right of ERP5ExternalOauth2ExtractionPlugin: lookup in user login cache if there is an entry for (plugin prefix + token)
ERP5ExternalOauth2ExtractionPlugin->Google_Account_Resource_Server: If local cache is empty, query resource server for user email address using that token.
note right of ERP5ExternalOauth2ExtractionPlugin: cache `prefix+token -> google email address`
ERP5ExternalOauth2ExtractionPlugin->ERP5LoginUserManager: passes {'login_portal_type': 'Google Login', 'reference': login} to ERP5LoginUserManager for login lookup.
ERP5LoginUserManager
will lookup for a validated Google Login
document whose reference matchs the google email address. This mean that someone have to create and validate a google login inside a person with valid assignments.
When Tokens session storage misses, login fail and user has to restart initial authentication sequence to get a new auth token.
When user login cache misses, as we can see on the above sequence diagram, google account resource server is queried for user id using the google user login.
When token is expired, upon the next user login cache cache miss we won't be able to refresh the google user login.
Still TODO / open questions
- The first version of google login was using google use id, I changed to use google email, because this way user can create google login based on email address and not an opaque id, but this may have implications I did not realize. @luke do you remember why first version used id and what problems switching to email could cause ?
- I'm not sure our cookie is secure. With the way we're creating hmac, one can guess the cookie when knowing the token. https://oauth.net/articles/authentication/ lists some "common security mistakes" that I did not study at this point.
- Do we implement https://tools.ietf.org/html/rfc7636 ?
- There are code duplication (we query Google Account Resource Server in two different places of the code).
- Some requests are hand crafted ( in
ERP5Site_redirectToGoogleLoginPage
andGoogleLoginUtility
) and not using google-api-client, but since we already depend on this egg it would be better to use the methods provided by this library. - The private key is not securely stored in ERP5. A bit better after 8c4b9714 but problems discussed in !181 (merged) are still here.
- We should introduce document classes instead of
ERP5Site_...
scripts and external methods. Design is not clear yet, but we could haveportal_web_services/google_oauth2_authorization
, being the callback endpoint and storing the private key,portal_web_services/google_user_api
holding methods to query for user info, relying on toportal_web_services/google_oauth2_authorization
to get a token. In a similar fashion, there could be other tools, for exempleportal_web_services/google_drive_api
to store / read files in user's google drive; and this is independent of using google login from this merge request.
Further references
- https://oauth.net/2/
- https://developers.google.com/identity/protocols/OAuth2WebServe explains Oauth2 in the context of google APIs, how to create application keys and how to use use client libraries. This simple example in flask helped me a lot understanding how we can consume Google APIs authenticating with Oauth2. This example lists the files in google drive, but in our case we are interested in knowing "who is this user on google" and we use a google login document in the person to associate the google user to the ERP5 person.
[ edit update hash of referenced commit after push force ]