Metadata-Version: 2.1
Name: geochemistrypi
Version: 0.1.0
Summary: A Python framework for data-driven geochemistry discovery
Project-URL: Homepage, https://github.com/ZJUEarthData/geochemistrypi
Project-URL: Bug Tracker, https://github.com/ZJUEarthData/geochemistrypi/issues
Author-email: Can He <sanyhew1097618435@163.com>
License: MIT License
        
        Copyright (c) 2021 ZJUEarthData
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: flaml==1.0.14
Requires-Dist: geopandas==0.10.2
Requires-Dist: joblib==1.2.0
Requires-Dist: matplotlib==3.5.2
Requires-Dist: multipledispatch==0.6.0
Requires-Dist: numpy==1.21.6
Requires-Dist: openpyxl==3.0.10
Requires-Dist: optuna
Requires-Dist: pandas==1.5.2
Requires-Dist: pyogrio==0.4.2
Requires-Dist: ray==2.2.0
Requires-Dist: ray[tune]
Requires-Dist: scikit-learn==1.1.3
Requires-Dist: scipy
Requires-Dist: seaborn==0.11.0
Requires-Dist: statsmodels==0.13.2
Requires-Dist: threadpoolctl==3.1.0
Requires-Dist: typer==0.7.0
Requires-Dist: xgboost==1.3.1
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Description-Content-Type: text/markdown

<img src="./docs/Geochemistry π.png" width="50%"/>

Geochemistry π is **a Python framework** for data-driven geochemistry discovery. It provides an extendable tool and
one-stop shop for **geochemical data analysis** on tabular data. The goal of the Geochemistry π is to create
a series of user-friendly and extensible products of high automation for the full cycle of geochemistry research.  

## Quick Installation

One instruction to download on command line, such as Terminal on macOS, CMD on Windows.  
```
pip install geochemistrypi
```
**Note**: The beta version runs on MacOS, Windows or Linux.

## Example

**How to run:** After successfully downloading, run this instruction on command line whatever directory it is.

### Case 1: Run with built-in data set for testing
```
geochemistrypi data-mining 
```
**Note**: There are four built-in data sets corresponding to four kinds of model pattern.

### Case 2: Run with your own data set
```
geochemistrypi data-mining --data your_own_data_set.xlsx
```
**Note**: Currently, only `.xlsx` file is supported. Please specify the path your data file exists. 

For more details: Please to refer to 
+ [Manual for Geochemistry π - Beta (International - Google drive)](https://drive.google.com/file/d/1ZQqmi6nkTZUaODAWzmXLvnaQ1bajEjYp/view?usp=sharing)
+ [Manual for Geochemistry π - Beta (China - Tencent Docs)](https://docs.qq.com/pdf/DQ1llWXRiTHp1Y0lj?&u=6868f96d4a384b309036e04e637e367a)

## First Phase
It works as a **software application** with a command-line interface (CLI) to automate **data mining** process with
frequently-used **machine learning algorithms** and **statistical analysis methods**, which would further lower the
threshold for the geochemists.

The highlight is that through choosing **simple number options**, the users are able to implement a completed cycle of data
mining **without knowledge of** SciPy, NumPy, Pandas, Scikit-learn, FLAML, Ray packages.

Its data section, shown as below, provides feature engineering based on **arithmatic operation**. It allows the users
to have a statistic analysis on the data set as well as on the imputation result, which is supported by the combination
of **Monte Carlo simulation** and **hypothesis testing**.


Its models section provides both **supervised learning** and **unsupervised learning** methods from
**Scikit-learn** framework, including four types of algorithms, regression, classification,
clustering, and dimensional reduction. Integrated with **FLAML** and **Ray** framework, it allows the users to run
AutoML easily, fastly and cost-effectively on the built-in supervised learning algorithms in our framework.

The activity diagram of the Geochemistry π Version 1.0.0:

<img src="./docs/Geochemistryπ-Activity%20Diagram_v1.png" />

The whole package is under construction and the documentation is progressively evolving. 



## Team Info
**Leader:**
+ Can He (Sany, National University of Singapore, Singapore)    
  Email: sanyhew1097618435@163.com

**Core Developers:**
+ Jianhao Sun (Jin, China University of Geosciences，Wuhan, China)
+ Jianming Zhao (Jamie, Jilin University, Changchun, China)
+ Yang Lyu (Daisy, Zhejiang University, China)
+ Shengxin Wang (Samson, Lanzhou University, China)

**Members**:
+ Wenyu Zhao (Molly, Zhejiang University, China)
+ Fang Li (liv, Shenzhen University, China)
+ Ting Liu (Kira, Sun Yat-sen University, China)
+ Kaixin Zheng (Hayne, Sun Yat-sen University, China)
+ Aixiwake·Janganuer (Ayshuak, Sun Yat-sen University, China)
+ Parnanjan Dutta (Presidency University, Kolkata, India)
+ Bailun Jiang (EPSI / Lille University, France)
+ Yongkang Chang (Kill-virus, Langzhou University, China)
+ Xirui Zhu (Rae, University of York, United Kingdom)

## Join Us :)
**The recruitment of research interns is ongoing !!!**

**Key Point: All things are done online, remote work (\*^▽^\*)**

**What can you learn?**
+ Learning the full cycle of data mining on tabular data, including the algorithms in regression,
classification, clustering, and decomposition.
+ Learning to be a qualified Python developer, including any Python programing contents towards data mining,
basic software engineering techniques like OOP developing, and cooperation tools like Git.

**What can you get?**  

+ Research internship proof and reference letter after working for > 200 hours.
+ Chance to pay a visit to Hangzhou, China, sponsored by ZJU Earth Data.
+ Chance to be guided by the experts from IT companies in Silicon Valley and Hangzhou.
+ Bonus depending on your performance. 

**Current Working Pattern:**
+ Online working and cooperation
+ Three weeks per working cycle -> One online meeting per working cycle
+ One cycle report (see below) per cycle - 5 mins to finish

Even if you are not familiar with topics above, but if you are interested in and have plenty of time to do it.
That's enough. We have a full-developed training system to help you, as a newbie of data mining or Python developer,
learn steps by steps with seniors until you can make a significant contribution to our project.

**More details about the project?**  
Please refer to:   
English Page: https://person.zju.edu.cn/en/zhangzhou  
Chinese Page: https://person.zju.edu.cn/zhangzhou#0  

**Do you want to contribute to this open-source program?**   
Contact with your CV: sanyhew1097618435@163.com  

## In-house Materials
Materials are in both Chinese and English. Others unshown below are internal materials.
1. [Guideline Manual – Geochemistry π (International - Google drive)](https://docs.google.com/document/d/1LjwB5Lazk33E5vbtnFPJio_MyjYQxjEu/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true)
2. [Guideline Manual – Geochemistry π (China - Tencent Docs)](https://docs.qq.com/doc/DQ21IZUdVQktqRWpm?&u=6868f96d4a384b309036e04e637e367a)
3. [Learning Steps for Newbies – Geochemistry π (International - Google drive)](https://docs.google.com/document/d/1GQO-SXwEx_8midr362pqfxNZtfUf-nA6/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true)
4. [Learning Steps for Newbies - Geochemistry π (China - Tencent Docs)](https://docs.qq.com/doc/DTlVEakt2WnJrdkN1?&u=6868f96d4a384b309036e04e637e367a)
5. [Code Specification v2.1.2 - Geochemistry π (International - Google drive)](https://drive.google.com/file/d/12UPrGqrj9hl0_vK8r-m6xykh_6052OtI/view?usp=sharing)
6. [Code Specification v2.1.2 - Geochemistry π (China - Tencent Docs)](https://docs.qq.com/pdf/DQ2pmc1l1Z2t3QVFa?&u=6868f96d4a384b309036e04e637e367a)
7. [Cycle Report - Geochemistry π (International - Google drive)](https://drive.google.com/file/d/1JPZoSLcPRqzu6LDvw8wLQkV2GfJoER51/view?usp=sharing)
8. [Cycle Report - Geochemistry π (China - Tencent Docs)](https://docs.qq.com/pdf/DQ25VSGNlbGx4UkFZ?&u=6868f96d4a384b309036e04e637e367a)

## In-house Videos
Technical record videos are on Bilibili and Youtube synchronously while other meeting videos are internal materials.
More Videos will be recorded soon.
1. [ZJU_Earth_Data Introduction (Geochemical Data, Python, Geochemistry π) - Prof. Zhang](https://www.bilibili.com/video/BV1Lf4y1w7EK?spm_id_from=333.999.0.0)
2. [How to Collaborate and Provide Bug Report on Geochemistry π Through GitHub - Can He (Sany)](https://www.youtube.com/watch?v=1DWoEsqsfvQ&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=3)
3. [How to Run Geochemistry π v1.0.0-alpha - Can He (Sany)](https://www.bilibili.com/video/BV1i541117dd?spm_id_from=333.999.0.0)
4. [How to Create and Use Virtual Environment on Geochemistry π - Can He (Sany)](https://www.youtube.com/watch?v=4KFi7OXxD-c&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=4)
5. [How to use Github-Desktop in conflict resolution - Qiuhao Zhao (Brad)](https://www.youtube.com/watch?v=KT1g5JpuUVI&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM)
6. [Virtual Environment & Packages On Windows - Jianming Zhao (Jamie)](https://www.youtube.com/watch?v=e4VqSBuNp_o&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=2)
7. [Git Workflow & Coordinating Synchronization - Jianming Zhao (Jamie)](https://www.bilibili.com/video/BV1Sa4y1f74k?spm_id_from=333.999.0.0&vd_source=9adcf2c5fdeffe1d11c89d441ef598ba)


## Contributors
+ Qiuhao Zhao (Brad, Zhejiang University, China)
+ Anzhou Li (Andrian, Zhejiang University, China) 
+ Xunxin Liu (Tante, China University of Geosciences, Wuhan, China)
+ Xin Li (The University of Manchester, United Kingdom)