This project is part of the .NET Foundation and operates under their code of conduct.
Introduction
MiniExcel is simple and efficient to avoid OOM's .NET processing Excel tool.
At present, most popular frameworks need to load all the data into the memory to facilitate operation, but it will cause memory consumption problems. MiniExcel tries to use algorithm from a stream to reduce the original 1000 MB occupation to a few MB to avoid OOM(out of memory).
Features
- Low memory consumption, avoid OOM (out of memory) and full GC
- Support
real-time
operation of each row of data - Support LINQ deferred execution, it can do low-consumption, fast paging and other complex queries
- Lightweight, without Microsoft Office installed, no COM+, DLL size is less than 150KB
- Easy API style to read/write/fill excel
Get Started
Installation
You can install the package from NuGet
Release Notes
Please Check Release Notes
TODO
Please Check TODO
Performance
Benchmarks logic can be found in MiniExcel.Benchmarks , and test cli
dotnet run -p .\benchmarks\MiniExcel.Benchmarks\ -c Release -f netcoreapp3.1 -- -f * --join
Output from the latest run is :
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
[Host] : .NET Framework 4.8 (4.8.4341.0), X64 RyuJIT
Job-ZYYABG : .NET Framework 4.8 (4.8.4341.0), X64 RyuJIT
IterationCount=3 LaunchCount=3 WarmupCount=3
Benchmark History : Link
Import/Query Excel
Logic : Test1,000,000x10.xlsx as performance test basic file, 1,000,000 rows * 10 columns "HelloWorld" cells, 23 MB file size
Library | Method | Max Memory Usage | Mean |
---|---|---|---|
MiniExcel | 'MiniExcel QueryFirst' | 0.109 MB | 0.0007264 sec |
ExcelDataReader | 'ExcelDataReader QueryFirst' | 15.24 MB | 10.66421 sec |
MiniExcel | 'MiniExcel Query' | 17.3 MB | 14.17933 sec |
ExcelDataReader | 'ExcelDataReader Query' | 17.3 MB | 22.56508 sec |
Epplus | 'Epplus QueryFirst' | 1,452 MB | 18.19801 sec |
Epplus | 'Epplus Query' | 1,451 MB | 23.64747 sec |
OpenXmlSDK | 'OpenXmlSDK Query' | 1,412 MB | 52.00327 sec |
OpenXmlSDK | 'OpenXmlSDK QueryFirst' | 1,413 MB | 52.34865 sec |
ClosedXml | 'ClosedXml QueryFirst' | 2,158 MB | 66.18897 sec |
ClosedXml | 'ClosedXml Query' | 2,184 MB | 191.43412 sec |
Export/Create Excel
Logic : create a total of 10,000,000 "HelloWorld" excel
Library | Method | Max Memory Usage | Mean |
---|---|---|---|
MiniExcel | 'MiniExcel Create Xlsx' | 15 MB | 11.53181 sec |
Epplus | 'Epplus Create Xlsx' | 1,204 MB | 22.50971 sec |
OpenXmlSdk | 'OpenXmlSdk Create Xlsx' | 2,621 MB | 42.47399 sec |
ClosedXml | 'ClosedXml Create Xlsx' | 7,141 MB | 140.93992 sec |
Excel Query/Import
1. Execute a query and map the results to a strongly typed IEnumerable [Try it]
Recommand to use Stream.Query because of better efficiency.
public class UserAccount
{
public Guid ID { get; set; }
public string Name { get; set; }
public DateTime BoD { get; set; }
public int Age { get; set; }
public bool VIP { get; set; }
public decimal Points { get; set; }
}
var rows = MiniExcel.Query<UserAccount>(path);
// or
using (var stream = File.OpenRead(path))
var rows = stream.Query<UserAccount>();
2. Execute a query and map it to a list of dynamic objects without using head [Try it]
- dynamic key is
A.B.C.D..
MiniExcel | 1 |
---|---|
Github | 2 |
var rows = MiniExcel.Query(path).ToList();
// or
using (var stream = File.OpenRead(path))
{
var rows = stream.Query().ToList();
Assert.Equal("MiniExcel", rows[0].A);
Assert.Equal(1, rows[0].B);
Assert.Equal("Github", rows[1].A);
Assert.Equal(2, rows[1].B);
}
3. Execute a query with first header row [Try it]
note : same column name use last right one
Input Excel :
Column1 | Column2 |
---|---|
MiniExcel | 1 |
Github | 2 |
var rows = MiniExcel.Query(useHeaderRow:true).ToList();
// or
using (var stream = File.OpenRead(path))
{
var rows = stream.Query(useHeaderRow:true).ToList();
Assert.Equal("MiniExcel", rows[0].Column1);
Assert.Equal(1, rows[0].Column2);
Assert.Equal("Github", rows[1].Column1);
Assert.Equal(2, rows[1].Column2);
}
4. Query Support LINQ Extension First/Take/Skip ...etc
Query First
var row = MiniExcel.Query(path).First();
Assert.Equal("HelloWorld", row.A);
// or
using (var stream = File.OpenRead(path))
{
var row = stream.Query().First();
Assert.Equal("HelloWorld", row.A);
}
Performance between MiniExcel/ExcelDataReader/ClosedXML/EPPlus
5. Query by sheet name
MiniExcel.Query(path, sheetName: "SheetName");
//or
stream.Query(sheetName: "SheetName");
6. Query all sheet name and rows
var sheetNames = MiniExcel.GetSheetNames(path);
foreach (var sheetName in sheetNames)
{
var rows = MiniExcel.Query(path, sheetName: sheetName);
}
7. Get Columns
var columns = MiniExcel.GetColumns(path); // e.g result : ["A","B"...]
var cnt = columns.Count; // get column count
8. Dynamic Query cast row to IDictionary<string,object>
foreach(IDictionary<string,object> row in MiniExcel.Query(path))
{
//..
}
// or
var rows = MiniExcel.Query(path).Cast<IDictionary<string,object>>();
// or Query specified ranges (capitalized)
// A2 represents the second row of column A, C3 represents the third row of column C
// If you don't want to restrict rows, just don't include numbers
var rows = MiniExcel.QueryRange(path, startCell: "A2", endCell: "C3").Cast<IDictionary<string, object>>();
9. Query Excel return DataTable
Not recommended, because DataTable will load all data into memory and lose MiniExcel's low memory consumption feature.
var table = MiniExcel.QueryAsDataTable(path, useHeaderRow: true);
10. Specify the cell to start reading data
MiniExcel.Query(path,useHeaderRow:true,startCell:"B3")
11. Fill Merged Cells
Note: The efficiency is slower compared to not using merge fill
Reason: The OpenXml standard puts mergeCells at the bottom of the file, which leads to the need to foreach the sheetxml twice
var config = new OpenXmlConfiguration()
{
FillMergedCells = true
};
var rows = MiniExcel.Query(path, configuration: config);
support variable length and width multi-row and column filling
12. Reading big file by disk-base cache (Disk-Base Cache - SharedString)
If the SharedStrings size exceeds 5 MB, MiniExcel default will use local disk cache, e.g, 10x100000.xlsx(one million rows data), when disable disk cache the maximum memory usage is 195MB, but able disk cache only needs 65MB. Note, this optimization needs some efficiency cost, so this case will increase reading time from 7.4 seconds to 27.2 seconds, If you don't need it that you can disable disk cache with the following code:
var config = new OpenXmlConfiguration { EnableSharedStringCache = false };
MiniExcel.Query(path,configuration: config)
You can use SharedStringCacheSize
to change the sharedString file size beyond the specified size for disk caching
var config = new OpenXmlConfiguration { SharedStringCacheSize=500*1024*1024 };
MiniExcel.Query(path, configuration: config);
Create/Export Excel
-
Must be a non-abstract type with a public parameterless constructor .
-
MiniExcel support parameter IEnumerable Deferred Execution, If you want to use least memory, please do not call methods such as ToList
e.g : ToList or not memory usage
1. Anonymous or strongly type [Try it]
var path = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.xlsx");
MiniExcel.SaveAs(path, new[] {
new { Column1 = "MiniExcel", Column2 = 1 },
new { Column1 = "Github", Column2 = 2}
});
2. IEnumerable<IDictionary<string, object>>
var values = new List<Dictionary<string, object>>()
{
new Dictionary<string,object>{{ "Column1", "MiniExcel" }, { "Column2", 1 } },
new Dictionary<string,object>{{ "Column1", "Github" }, { "Column2", 2 } }
};
MiniExcel.SaveAs(path, values);
Create File Result :
Column1 | Column2 |
---|---|
MiniExcel | 1 |
Github | 2 |
3. IDataReader
Recommended
, it can avoid to load all data into memory
MiniExcel.SaveAs(path, reader);
DataReader export multiple sheets (recommand by Dapper ExecuteReader)
using (var cnn = Connection)
{
cnn.Open();
var sheets = new Dictionary<string,object>();
sheets.Add("sheet1", cnn.ExecuteReader("select 1 id"));
sheets.Add("sheet2", cnn.ExecuteReader("select 2 id"));
MiniExcel.SaveAs("Demo.xlsx", sheets);
}
4. Datatable
-
Not recommended
, it will load all data into memory -
DataTable use Caption for column name first, then use columname
var path = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.xlsx");
var table = new DataTable();
{
table.Columns.Add("Column1", typeof(string));
table.Columns.Add("Column2", typeof(decimal));
table.Rows.Add("MiniExcel", 1);
table.Rows.Add("Github", 2);
}
MiniExcel.SaveAs(path, table);
5. Dapper Query
Thanks @shaofing #552 , please use CommandDefinition + CommandFlags.NoCache
using (var connection = GetConnection(connectionString))
{
var rows = connection.Query(
new CommandDefinition(
@"select 'MiniExcel' as Column1,1 as Column2 union all select 'Github',2"
, flags: CommandFlags.NoCache)
);
// Note: QueryAsync will throw close connection exception
MiniExcel.SaveAs(path, rows);
}
Below code will load all data into memory
using (var connection = GetConnection(connectionString))
{
var rows = connection.Query(@"select 'MiniExcel' as Column1,1 as Column2 union all select