You are viewing a preview of this job. Log in or register to view more details about this job.

Junior Data Engineer Internship

CountyStat collects, analyzes data and collaborated with County departments to solve problems and improve government services. Allegheny County manages a wide range of programs and services for residents, including public safety, health, recreation, and economic development. The mission of the CountyStat team is to help Allegheny County management make data-informed decisions that will improve operations, provide quality services, lower costs, and increase revenues. 

In 2020 CountyStat deployed a local instance of Apache Airflow for data extraction, transformation and load (ETL) processes. Airflow is an open-source Python-based program which requires users to programmatically write their data pipelines for automation. A strong programming background is necessary to work with Airflow, but expertise in Python is not required. There are a handful of improvements to all CountyStat data pipelines that the Intern will develop, test and deploy. Additionally, the Junior Data Engineer may need to take on some the Airflow maintenance duties at the direction of the Manager of Enterprise Data & Analytics and/or full-time Data Engineer.

The County is in the process of implementing a data catalog to identify, document, enforce quality controls and more easily share data throughout the enterprise. As part of the CountyStat team the Data Engineer Intern will work with the Manager of Enterprise Data & Analytics and Data Engineer on the process of discovering and documenting new data sets for the catalog. 

The CountyStat team maintains a Data Warehouse so familiarity with common Structured Query Language (SQL) platforms is necessary. Applicants must possess a high level of competency with R and/or Python. Experience with Tableau or similar business intelligence software and/or ArcGIS, QGIS, or other spatial analysis abilities or knowledge is a plus. 

Duties:

  • Work closely with department staff to understand data-related needs and challenges.
  • Develop data cleaning and preparation techniques to ensure accurate, reliable data pipelines.
  • Document county data sources in newly acquired data catalog software.
  • Assist in preparing and maintaining datasets for the Western PA Regional Data Center, the County’s open data portal.
  • Performs other tasks and duties as requested or required.

 

Knowledge, Skills and Abilities:

Knowledge of:

  • Consulting skills such as; statistical analysis, interviewing, note taking, reading, and writing documentation for existing or future processes.
  • General programming skills and knowledge of Linux (Ubuntu), Python or R.
  • Data cleaning and manipulation techniques, in either R, SQL, Python or Microsoft Excel.
  • Configuration and utilization of common data sources, i.e.; MS SQL Server, Oracle, Application programming interfaces (API’s) and Excel.

 

Ability to: 

  • Research best practices for the delivery of County services and local implementation issues.
  • Analyze service delivery inputs, outputs, efficiency, and effectiveness and perform root cause analysis.
  • Proficiently utilize Microsoft Office Suite (Word, Excel, and PowerPoint).
  • Ask questions of data, identify patterns, discover anomalies, and iteratively refine analysis and documentation on feedback from the CountyStat team and from department and County management.
  • Learn County operational processes and procedures.
  • Work within strict deadlines while maintaining a high level of professionalism.
  • Balance competing priorities and complex situations.
  • Communicate effectively, orally, visually and in writing.
  • Establish and maintain effective working relationships with supervisors, associates, outside agencies and the public.

 

Qualifications:

Education: Working on a bachelor’s or master’s degree from a fully accredited institution in computer science, information systems, data science/analytics, or related field.