Experience: projects and work
These are some of the practical applications of Data Science and all work experiences that have helped me develop skills related to the field.
Machine learning predictive models
I've worked in several courses with machine learning models and I've tutored students in Data Mining. Here I showcase the most instructive experiences and the most interesting practical applications.
Classification and pattern mining
Starting from a database of music tracks, and treating the audio waves as time series of signals, we've extracted samples and performed classification tasks with the target a feature of popularity of the song. During this team project, I built the API-style interfaces between the database and the various models; moreover, I've taken responsibility for the data preprocessing, the neural networks and the regression models. I've also contributed to the creation of a new interpretation of pattern mining, which we called Frequent Sequence Mining.
Read extracts of the scientific report… Description of the dataset, preprocessing and initial classification tasks | Some regression models | An original application of pattern mining | Code repositoryPeer Tutor
I've worked in a team of two as a peer tutor to students of Data Mining: Advanced Topics and Applications. We've written original exercises and presented them to students in the classroom. Also, we've written some exercises that were later used in the subject's final exam.
Statistical predictive models
Models like Ordinary Least Squares Regression integrate many theoretical concepts of machine learning with statistics. They require specific statistical assumptions to be made over the circumstances and characteristics of the input dataset. Knowing these, along with all the tools at our disposal to correct any possible problems, I believe is essential in being an effective data analyst and researcher.
Bachelor Thesis
The title of my thesis in Economia e Commercio (Economics) was "Is Beta Alive during Covid-19? Structural Breaks in the CAPM". I've worked with market time series data and applied structural break regression analysis to CAPM factors estimations before and after the pandemic break, so to understand the significance of those factors and if they were intimately changed by the health crisis.
Read the thesisForecasting the usage of bike sharing stations
We started with a dataset counting how many bikes were taken from which stations. In a team of 4, we extracted the geolocation of stations from their name. I took charge in building a strong data preprocessing pipeline and API interfaces for all predictive models. Then I run linear, quadratic and interaction statistical regressions predicting how many bikes were going to be taken from each station. I checked statistical hypoteses to abscertain the level of certainty in my results. We then employed ML regularization methods on these regressions. Finally we built approximation-based regression ML algorithms like Gradient boost and Random forests.
We kept decision makers up to date with progress reports in the form of Jupyter notebooks, complete but readable by non-technical professionals. We finally delivered a functioning predictive model in the form of a Python package, easily integrable in an online or offline application pipeline. The model delivered was a Random Forest Regressor.
Finally, I read and presented to my peers a paper on a new neural network classifier that a company, Meituan, employs to classify the consumption preferences of their customers. (this is an additional piece of work, unrelated to the rest of the project)
An extract of the report with explainability calculations | Code repository | My presentation on MeituanBusiness Intelligence and Communication
Visualization of data, of predictive models results, and of all business insights gained from these is a crucial field. There is a language behind visualization, one that needs to be flexible as to be accessible to decision makers of any kind - from business professionals to lead engineers. I have developed the skills to understand the background of my target audience and to adapt myself and my communication accordingly.
Tableau Desktop Fundamentals Training
I've taken an intensive training course on my favorite BI tool, Tableau. I particularly like its straightforward approach to origin datasets and the power in presenting dashboards on all varieties of client devices. I can do - without requiring supervision - a few kinds of useful, cross-platform visualizations and I have the resources to learn more about the software.
Visualizing a database of tennis players
I've worked with Microsoft's BI toolkit: SQL Server, Management Studio, SSAS, SSIS and Power BI. In a group of two, I've taken a completely unstructured dataset and built a structure around it, both programmatically and with direct reasoning - "by hand". Then we've done a few analytics tasks and dashboards. Of note are the business insights I've extracted from the visualizations and proposed to the decision makers through our report.
An extract of the report | Code repositoryApprentice Publicist
Right out of high school I've worked at a local newspaper, La Provincia Quotidiano, as an apprentice publicist. I've learned professional integrity, verbal and written communication, and I've conducted so many interviews to members of the public that I know how to pose questions and what to ask to get the biggest insights into the problem at hand.
One of my articlesMath, statistics and algorithms
My academic curriculum from the start has been rich in quantitative skills. I've taken several statistics classes at Economics, some Quantitative Finance and a very instructive and formative Econometrics course. I've since studied the computer science program of Discrete Mathematics and of Algorithms and Data Structures.
Peer Tutor
I've worked as a peer tutor for the course of Algorithms and Data Structures for Data Science. I've helped students with no background in computer science to develop a more algorithmic way to think about programming tasks; moreover I've provided them many conceptual strategies and practical ideas to apply in solving problems presented by the professor.
An example of a solution (online Jupyter notebook)Some programming exercises
As in the spare time during university I have been a Python programming instructor, I've written several exercises for my students. My experience in giving one-to-one private lectures helped develop my communication skills, both verbal and written.
Code repositoryOptimization: loading a boat
In my limited experience with operative research, in a team of 2 I've solved a multivariate mixed integer linear optimization problem with constraints. We've also presented the work in a beautiful report.
Code repository