Creating an Importer
Introduction
Section titled “Introduction”An importer is a vital component of your plugin that facilitates the seamless transfer of data between your source and the Newsteam platform. It implements the wirebucket interface, which defines essential methods for importing, processing, and integrating data. By implementing these functions, you ensure smooth interaction with the Newsteam system, enabling efficient data handling and making your plugin a reliable data source within the platform.
The importer interacts with various components, including the GetEnv, GetLogfiles, and ProcessLogfile functions, each designed to handle specific tasks like data retrieval and transformation. This section will guide you through implementing these functions and integrating your data source with Newsteam.
Required Functions
Section titled “Required Functions”Your importer must implement the following functions defined in the wirebucket interface to ensure proper data integration with Newsteam. These functions are:
GetEnv: Advertises the capabilities of your plugin.GetLogfiles: Retrieves raw data from your data source.ProcessLogfile: Processes raw log data into structured articles.
Each function plays a unique role in the importer’s operation, from initialization and metadata advertisement to data retrieval and transformation.
GetEnv Function
Section titled “GetEnv Function”The GetEnv function is the entry point for informing Newsteam about the capabilities of your plugin. It returns a structured GetEnvResponse detailing the features your importer supports and optionally provides metadata about publications your plugin can integrate with.
Purpose
Section titled “Purpose”The primary purpose of GetEnv is to:
- Advertise Plugin Capabilities: Declare which types of data your importer can handle (e.g., articles, images, videos).
- List Publications (Optional): Provide a list of publications supported by your plugin, including their names, IDs, and additional metadata like logos and menu items.
Implementation
Section titled “Implementation”The function must return an instance of GetEnvResponse, which includes:
- Capabilities (
WireCapabilities): A set of boolean flags indicating supported features. - Publications: An optional array of publication metadata.
Below are example implementations:
public Task<GetEnvResponse> GetEnv(GetEnvRequest request) { return Task.FromResult(new GetEnvResponse { Capabilities = new WireCapabilities { Article = true, // Indicates that the importer supports articles. Image = false, // Does not support fetching images. Video = false, // Does not support fetching videos. Audio = false, // Does not support fetching audio. Authentication = false // Does not support authentication integration. }, Publications = new List<Publication> { new Publication { Id = "pub-001", Name = "Tech Times", Description = "Your go-to source for technology news.", Font = "Arial", Colors = new List<int> { 0x123456, 0x654321 } } } }); }async function getEnv(): Promise<GetEnvResponse> {return new GetEnvResponse({capabilities: {article: true, // Indicates that the importer supports articles.image: false, // Does not support fetching images.video: false, // Does not support fetching videos.audio: false, // Does not support fetching audio.authentication: false, // Does not support authentication integration.},publications: [{id: "pub-001",name: "Tech Times",logo: undefined,menu: [],description: "Your go-to source for technology news.",font: "Arial",colors: [0x123456, 0x654321],},],});}GetLogfiles Function
Section titled “GetLogfiles Function”The GetLogfiles function is responsible for retrieving raw log files from your data source. These files are usually in binary format and need to be returned as an array.
Purpose
Section titled “Purpose”The primary responsibilities of the GetLogfiles function are:
- Data Retrieval: Fetch log files from a specified location in the data source. These files are typically binary and must be returned as an array.
- State Management: Use a
Cursorobject to indicate the current position in the data source, enabling efficient and resumable data fetching.
Function Behavior
Section titled “Function Behavior”- Input: A
Cursorobject that specifies the starting position for fetching log files. The cursor contains metadata such as the source ID, position, and additional state information. - Output: An array of log files represented as binary data.
Cursor Object
Section titled “Cursor Object”The Cursor object helps manage the state of log file retrieval and provides metadata such as:
id: Unique identifier for the cursor.bucketId: Specifies the data source or bucket.seekDate/seekPos: Indicates where to start fetching logs.status/error: Tracks the cursor’s operational status.
This allows the GetLogfiles function to resume from a specific point in case of failure or pagination.
Implementation Overview
Section titled “Implementation Overview”The GetLogfiles function typically follows these steps:
- Validate the Cursor: Ensure that the provided cursor is valid and points to a valid data source location.
- Fetch Data: Retrieve log files starting from the position indicated by the cursor. This may involve reading from a database, a file system, or an API.
- Return Results: Return the fetched log files as an array of binary data.
Example Logic
Section titled “Example Logic”- Simulate fetching two log files:
- Log file 1: Contains binary data
[0x01, 0x02, 0x03]. - Log file 2: Contains binary data
[0x04, 0x05, 0x06].
- Log file 1: Contains binary data
- Return these as part of the response.
Example Output
Section titled “Example Output”The function might return an array of log files encoded as binary data:
[ [1, 2, 3], [4, 5, 6]]ProcessLogfile Function
Section titled “ProcessLogfile Function”The ProcessLogfile function is responsible for processing a single log file, extracting meaningful information, and transforming it into a structured format suitable for further use. This function plays a pivotal role in converting raw data into articles.
Purpose
Section titled “Purpose”The primary responsibilities of the ProcessLogfile function are:
- Log File Parsing: Analyze and extract relevant information from the provided log file content.
- Data Transformation: Convert the raw log file data into structured articles.
- Integration with Buckets: Use the associated bucket information to contextualize the transformation process.
Function Behavior
Section titled “Function Behavior”- Input:
Bucket: A metadata object representing the context or source of the log file. It typically includes information about the data source, such as its identifier or configuration.Content: The binary data (Uint8Arrayor equivalent) representing the raw log file to be processed.
- Output:
An array of structured
Articleobjects or similar entities. EachArticlerepresents a meaningful unit of information extracted from the log file.
Article Object
Section titled “Article Object”The Article object is the structured output of the processing step. It contains the following key fields:
- Identifiers:
id: Unique identifier for the article.organizationIdandshareId: Contextual identifiers for the organization or sharing scope.
- Content Details:
- Titles (
title,title2, etc.): Multiple title options for the article. summaryandplainText: Summary and main content of the article.keywords,tags, andgroups: Metadata for categorization and searchability.
- Titles (
- Additional Metadata:
created,modified, andpublished: Timestamps for lifecycle tracking.status: Current state of the article (e.g., draft, published).authors,creatorIds, andassigned: Information about contributors.
- Advanced Features:
canonicalUrl: Canonical link for the article.sectionsandrelatedArticles: Structural relationships with other entities.
Implementation Overview
Section titled “Implementation Overview”The typical workflow of the ProcessLogfile function is as follows:
- Parse Log File:
- Extract raw data fields based on the log file structure.
- Perform any necessary decoding or deserialization.
- Transform Data:
- Map extracted data fields into the
Articlestructure. - Populate metadata fields such as title, tags, and status.
- Map extracted data fields into the
- Return Results:
- Generate and return a collection of
Articleobjects. - Handle empty or invalid log files gracefully by returning an empty array.
- Generate and return a collection of
Example Output
Section titled “Example Output”If a log file contains information for two articles, the function may produce the following output:
[ { "id": "article-1", "title": "First Article Title", "summary": "Summary of the first article.", "plainText": "Full content of the first article.", "created": 1672531200, "tags": ["news", "feature"], "status": "published" }, { "id": "article-2", "title": "Second Article Title", "summary": "Summary of the second article.", "plainText": "Full content of the second article.", "created": 1672531300, "tags": ["editorial"], "status": "draft" }]