Skip to content

Add Office365 Source Connector#5713

Open
alparish wants to merge 17 commits into
opensearch-project:mainfrom
alparish:feature/office365-source
Open

Add Office365 Source Connector#5713
alparish wants to merge 17 commits into
opensearch-project:mainfrom
alparish:feature/office365-source

Conversation

@alparish

Copy link
Copy Markdown
Contributor

Description

This change implements an Office 365 source plugin for Data Prepper. The plugin interacts with the Office 365 Management API to retrieve audit logs. Key components include:

  • Office365RestClient: Handles REST API calls to Office 365 Management API.
  • Office365Service: Manages the business logic for fetching and processing audit logs.
  • Office365CrawlerClient: Implements the CrawlerClient interface for Data Prepper integration.
  • Office365Iterator: Manages the iteration over audit log items.

This PR also includes renaming the existing state objects to be more generic and reuse the same for Atlassian as well as Office365

  • Renamed AtlassianLeaderProgressState to PaginationCrawlerLeaderProgressState
  • Renamed AtlassianWorkerProgressState to PaginationCrawlerWorkerProgressState

Note: This is an initial PR for review. Unit tests will be added while awaiting review feedback.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Savit Aluri and others added 17 commits May 1, 2025 10:01
Signed-off-by: Savit Aluri <savaluri@amazon.com>
Signed-off-by: Savit Aluri <savaluri@amazon.com>
…bi-toolbag/content/workspace-summary.md

Delete wasabi testing files.
…bi-toolbag/.metadata/bootstrap-time.txt

Delete Wasabi toolbox used for testing.
feat: Office365 Source Initial Commit
Signed-off-by: Alekhya Parisha <aparisha@amazon.com>
…ce365-updates

Rename Atlassian state classes to PaginationCrawler
Alekhya Parisha <aparisha@amazon.com>
Signed-off-by: Alekhya Parisha <aparisha@amazon.com>
Signed-off-by: Aparna Parisha <aparisha@amazon.com>
Signed-off-by: Alekhya Parisha <aparisha@amazon.com>
…-processor

 Add OCSF transformation processor for Office 365 events
Signed-off-by: Alekhya Parisha <aparisha@amazon.com>
…ce365plugin-updates

Add pagination support and fix time window handling

@dlvenable dlvenable left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alparish for this change. I left some general comments and suggestions for the configurations. We also need unit tests.

implementation 'org.projectlombok:lombok:1.18.30'
annotationProcessor 'org.projectlombok:lombok:1.18.30'

testImplementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.13.4'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not specify Jackson versions. They are inherited.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Jackson versions


testImplementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.13.4'
testImplementation project(path: ':data-prepper-test-common')
testImplementation 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.13.0'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not specify Jackson versions. They are inherited.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Jackson versions

implementation(libs.spring.web)
}

test {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need these lines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

* The type Constants.
*/
public class Constants {
public static final String PLUGIN_NAME = "office365";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft Office 365 is a massive product. The name of this source should reflect what part of Office 365 this is.

I understand this is what this is:

https://learn.microsoft.com/en-us/office/office-365-management-api/office-365-management-activity-api-reference

So call it something like office365_management_activity to use the name of this feature within Office 365.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the source to ‘microsoft-office365’ as this is approved in blueprint.

* The type Constants.
*/
public class Constants {
public static final String PLUGIN_NAME = "office365";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you only use this in the Office365Source class. Move it there and make it package protected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remain PLUGIN_NAME public as it's used by Office365Source class in a different package for the @DataPrepperPlugin annotation

package org.opensearch.dataprepper.plugins.source.office365.configuration;

public class MetadataKeyAttributes {
public static final String CONTENT_TYPE = "contentType";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into Office365CrawlerClient and make it private.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


@Override
public void initCredentials() {
log.info("Initializing credentials...");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.info("Initializing credentials...");
log.info("Initializing credentials.");

Do not use ellipses in logs. It is confusing to users.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

},
authConfig::renewCredentials
);
} catch (Exception e) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find expected exceptions and handle differently from unexpected exceptions. Then use the stack trace for the unexpected only.

} catch (SomeClientException e) {
  log.error(NOISY, "Received {} response while fetching audit logs for content type {}", e.getSomeWayToGetErrorCodeAndMessage(), contentType);
  searchRequestsFailedCounter.increment();
  throw new RuntimeException("Failed to fetch audit logs", e);
} catch (Exception e) {
  log.error(NOISY, "Error while fetching audit logs for content type {}", contentType, e);
  searchRequestsFailedCounter.increment();
  throw new RuntimeException("Failed to fetch audit logs", e);
}

Here is some similar code:

} catch (JsonProcessingException e) {
if (handleFailedEventsOption.shouldLog()) {
LOG.error(SENSITIVE, "An exception occurred due to invalid JSON while parsing [{}] due to {}", message, e.getMessage());
}
parseErrorsCounter.increment();
return Optional.empty();
} catch (Exception e) {
if (handleFailedEventsOption.shouldLog()) {
LOG.error(SENSITIVE, "An exception occurred while using the parse_json processor while parsing [{}]", message, e);
}
processingFailuresCounter.increment();
return Optional.empty();
}

It catches parsing exceptions which are expected and handles them without the stack trace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

* Office 365 Connector main entry point.
* This class extends CrawlerSourcePlugin to provide Office 365 specific functionality.
*/
@DataPrepperPlugin(name = PLUGIN_NAME,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the @Experimental annotation on this until it is ready to reach a stable state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @experimental annotation

*/
@Slf4j
@Named
public class Office365CrawlerClient implements CrawlerClient<PaginationCrawlerWorkerProgressState> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these classes will need unit tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests for Office365RestClient. Will be adding unit tests in the next PR

private static final String SEARCH_REQUESTS_FAILED = "searchRequestsFailed";

private final RestTemplate restTemplate = new RestTemplate();
private final Office365AuthenticationProvider authConfig;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the interface? Office365AuthenticationInterface

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

public class Office365Source extends CrawlerSourcePlugin {
private static final Logger LOG = LoggerFactory.getLogger(Office365Source.class);
private final Office365SourceConfig office365SourceConfig;
private final Office365AuthenticationProvider office365AuthProvider;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the interface? Office365AuthenticationInterface

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants