My working portfolio in Data Science

Here are, organized by category, the skills I've acquired during work and studies. In each category I list examples of practical projects in which I applied those skills.

Skills

In brief, here are the types of things I deal with on a daily basis.

  • Conceptual planning of the project including goals, requirements, data selection, model selection and required human competences
  • Data mining, engineering and manipulation (more details →)
  • Descriptive and predictive machine learning models
  • Statistical based models and related statistical knowledge
  • Explainability of automatic decisions
  • Considerations on AI privacy, trustworthiness, regulatory compliance

Experience: projects and work

These are some of the practical applications of Data Science and all work experiences that have helped me develop skills related to the field.

Machine learning predictive models

I've worked in several courses with machine learning models and I've tutored students in Data Mining. Here I showcase the most instructive experiences and the most interesting practical applications.

Classification and pattern mining

Starting from a database of music tracks, and treating the audio waves as time series of signals, we've extracted samples and performed classification tasks with the target a feature of popularity of the song. During this team project, I built the API-style interfaces between the database and the various models; moreover, I've taken responsibility for the data preprocessing, the neural networks and the regression models. I've also contributed to the creation of a new interpretation of pattern mining, which we called Frequent Sequence Mining.

Read extracts of the scientific report… Description of the dataset, preprocessing and initial classification tasks | Some regression models | An original application of pattern mining | Code repository

Peer Tutor

I've worked in a team of two as a peer tutor to students of Data Mining: Advanced Topics and Applications. We've written original exercises and presented them to students in the classroom. Also, we've written some exercises that were later used in the subject's final exam.

Statistical predictive models

Models like Ordinary Least Squares Regression integrate many theoretical concepts of machine learning with statistics. They require specific statistical assumptions to be made over the circumstances and characteristics of the input dataset. Knowing these, along with all the tools at our disposal to correct any possible problems, I believe is essential in being an effective data analyst and researcher.

Bachelor Thesis

The title of my thesis in Economia e Commercio (Economics) was "Is Beta Alive during Covid-19? Structural Breaks in the CAPM". I've worked with market time series data and applied structural break regression analysis to CAPM factors estimations before and after the pandemic break, so to understand the significance of those factors and if they were intimately changed by the health crisis.

Read the thesis

Forecasting the usage of bike sharing stations

We started with a dataset counting how many bikes were taken from which stations. In a team of 4, we extracted the geolocation of stations from their name. I took charge in building a strong data preprocessing pipeline and API interfaces for all predictive models. Then I run linear, quadratic and interaction statistical regressions predicting how many bikes were going to be taken from each station. I checked statistical hypoteses to abscertain the level of certainty in my results. We then employed ML regularization methods on these regressions. Finally we built approximation-based regression ML algorithms like Gradient boost and Random forests.

We kept decision makers up to date with progress reports in the form of Jupyter notebooks, complete but readable by non-technical professionals. We finally delivered a functioning predictive model in the form of a Python package, easily integrable in an online or offline application pipeline. The model delivered was a Random Forest Regressor.

Finally, I read and presented to my peers a paper on a new neural network classifier that a company, Meituan, employs to classify the consumption preferences of their customers. (this is an additional piece of work, unrelated to the rest of the project)

An extract of the report with explainability calculations | Code repository | My presentation on Meituan

Business Intelligence and Communication

Visualization of data, of predictive models results, and of all business insights gained from these is a crucial field. There is a language behind visualization, one that needs to be flexible as to be accessible to decision makers of any kind - from business professionals to lead engineers. I have developed the skills to understand the background of my target audience and to adapt myself and my communication accordingly.

Tableau Desktop Fundamentals Training

I've taken an intensive training course on my favorite BI tool, Tableau. I particularly like its straightforward approach to origin datasets and the power in presenting dashboards on all varieties of client devices. I can do - without requiring supervision - a few kinds of useful, cross-platform visualizations and I have the resources to learn more about the software.

Certificate of attendance of Fabio Michele Russo to Tableau course by Visualitics

Visualizing a database of tennis players

I've worked with Microsoft's BI toolkit: SQL Server, Management Studio, SSAS, SSIS and Power BI. In a group of two, I've taken a completely unstructured dataset and built a structure around it, both programmatically and with direct reasoning - "by hand". Then we've done a few analytics tasks and dashboards. Of note are the business insights I've extracted from the visualizations and proposed to the decision makers through our report.

An extract of the report | Code repository

Apprentice Publicist

Right out of high school I've worked at a local newspaper, La Provincia Quotidiano, as an apprentice publicist. I've learned professional integrity, verbal and written communication, and I've conducted so many interviews to members of the public that I know how to pose questions and what to ask to get the biggest insights into the problem at hand.

One of my articles

Math, statistics and algorithms

My academic curriculum from the start has been rich in quantitative skills. I've taken several statistics classes at Economics, some Quantitative Finance and a very instructive and formative Econometrics course. I've since studied the computer science program of Discrete Mathematics and of Algorithms and Data Structures.

Peer Tutor

I've worked as a peer tutor for the course of Algorithms and Data Structures for Data Science. I've helped students with no background in computer science to develop a more algorithmic way to think about programming tasks; moreover I've provided them many conceptual strategies and practical ideas to apply in solving problems presented by the professor.

An example of a solution (online Jupyter notebook)

Some programming exercises

As in the spare time during university I have been a Python programming instructor, I've written several exercises for my students. My experience in giving one-to-one private lectures helped develop my communication skills, both verbal and written.

Code repository

Optimization: loading a boat

In my limited experience with operative research, in a team of 2 I've solved a multivariate mixed integer linear optimization problem with constraints. We've also presented the work in a beautiful report.

Code repository