Skip to main content
Restaurant Tech · Completed

ADP Payroll Scheduler

Automated payroll report generation and scheduling for restaurant staff across multiple locations.

PythonAWS LambdaPDF Parsingpytest

Problem

Keke's Breakfast Cafe runs on ParPOS, a restaurant POS system that generates payroll reports as PDFs. Every bi-weekly pay period, two reports land in an Outlook inbox: one for declared tips, one for employee hours and wages by job. To run payroll in ADP, those PDFs need to become a structured import CSV. The process existed, but it was manual: open the PDFs, extract the numbers, build the file, send it to ADP.

The problems were compounding. ParPOS breaks multi-job employees into separate rows, so a server who also picks up hosting shifts shows up twice. The tip pool calculation requires a 3.5% FOH Support allocation from net sales. These transformations have to happen in the right order, with no room for error. A wrong PayrollID or a row count off by one means ADP rejects the import entirely.

The goal was to automate the full pipeline: fetch the PDFs, process the data, validate the output, and deliver the ADP import file and an Excel audit report to the right inboxes without any manual steps.

Approach

I built a Python pipeline that fetches the ParPOS PDFs from Outlook via the Microsoft Graph API, parses them with pdfplumber, processes the extracted data with pandas, validates the output against a hard row count requirement, and emails the final files.

The pipeline produces two outputs each run: PR{COMPANY_CODE}EPI_{date}.csv for ADP import and Palm_Springs_PAR_Payroll_{date}.xlsx as an audit trail. Both are emailed to the accountant and archived to S3.

Scheduling is bi-weekly: 4PM on the payroll Saturday after a pay period closes, with a catch-up run on Monday in case the Saturday job was missed. I deployed it two ways: locally via Windows Task Scheduler with environment variables, and in the cloud via AWS Lambda with DynamoDB for run history, S3 for archival, and Secrets Manager for credentials. Both deployments share the same scheduling and processing logic.

Architecture

OutlookParPOS PDFsSourceGraph APIMSAL authFetchpdfplumberregex extractParsepandasconsolidate + tipsOutputValidatorTransformADP CSV+ Excel auditemail + S3OutputBi-weekly payroll pipeline: ParPOS PDFs → ADP import CSV
DiagramExpand for a closer read
ADP Payroll Scheduler pipeline

The pipeline runs linearly. ParPOS generates tip and payroll PDFs that land in an Outlook inbox. The Microsoft Graph API fetches the attachments using MSAL OAuth2. pdfplumber extracts data line by line from the unstructured PDFs. pandas consolidates multi-job employees across both source reports, applies the tip pool allocation, and builds the output data frame. OutputValidator confirms the row count before any file is written. The ADP CSV and Excel audit report are then emailed and uploaded to S3.

Key Technical Details

The scheduling logic is the most operationally critical piece. Payroll runs on a fixed two-week cycle from an anchor date. A Saturday is only a payroll Saturday if it falls on a two-week boundary from that anchor:

def _is_payroll_saturday(self, date: datetime) -> bool:
    anchor = datetime(2026, 2, 7)  # first payroll Saturday
    days_diff = (date - anchor).days
    return date.weekday() == 5 and (days_diff // 7) % 2 == 0

def check_for_missed_run(self) -> list[str]:
    now = datetime.now()
    runs_due = []
    if self._is_payroll_saturday(now.date() - timedelta(days=2)):
        if not self.run_history.was_run("saturday", now):
            runs_due.append("saturday")
    if now.weekday() == 0 and not self.run_history.was_run("monday", now):
        runs_due.append("monday")
    return runs_due

Missed-run detection checks whether the preceding Saturday was a payroll Saturday and whether a run was recorded. If not, Monday queues it up. Run history is stored as a JSON file locally and in DynamoDB in the cloud deployment, making both paths idempotent.

OutputValidator enforces a hard constraint before anything leaves the system. ADP requires exactly 1003 rows in the import CSV. Any deviation means the file was built incorrectly:

class OutputValidator:
    REQUIRED_ROWS = 1003

    def validate(self, df: pd.DataFrame) -> ValidationResult:
        errors = []
        if len(df) != self.REQUIRED_ROWS:
            errors.append(f"Row count mismatch: expected {self.REQUIRED_ROWS}, got {len(df)}")
        if df["PayrollID"].isna().any():
            errors.append("Null PayrollID values found")
        return ValidationResult(passed=not errors, errors=errors)

Validation runs before any file is written or email sent. A failure aborts the run and sends an alert with the specific errors. The operator knows immediately — the issue does not silently produce a bad import.

Two other choices drove the design. pdfplumber over tabula because ParPOS PDFs are not table-structured; the text layout varies enough that line-by-line regex extraction is more reliable than automatic table detection. pandas for the consolidation step because multi-job employees require a groupby then merge across two source reports, which is cleaner as a DataFrame operation than nested dictionary logic.

Impact

What moved, what constrained it, and what trade-offs stayed visible.

Operational outcome, the limits around it, and the practical decisions that shaped the work.

Impact

Automated bi-weekly payroll processing for Keke's Breakfast Cafe. The pipeline fetches ParPOS PDFs from Outlook via Microsoft Graph, consolidates multi-job employees, applies the 3.5% tip pool allocation, validates a 1003-row ADP import CSV, and delivers the final files — replacing a manual spreadsheet process run every pay period.

Constraints

ADP import requires exactly 1003 rows — no tolerance. ParPOS PDFs are unstructured text, not tables. Payroll is time-sensitive: a failed Saturday run must be caught and recovered by Monday. Dual deployment (local + Lambda) must share scheduling logic exactly.

Trade-offs

pdfplumber over tabula: regex line extraction is more reliable than auto table detection on ParPOS layouts. pandas over direct CSV manipulation: multi-job consolidation across two source reports is cleaner as a DataFrame groupby. Dual deployment over Lambda-only: preserves a local fallback for a time-sensitive payroll deadline without depending on AWS availability.

  • Code samples and architecture details available on request